WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [Xen-users] old issue after 1024 live migrations se

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Subject: Re: [Xen-devel] Re: [Xen-users] old issue after 1024 live migrations seems to still exist.
From: Florian Heigl <florian.heigl@xxxxxxxxx>
Date: Fri, 23 Jul 2010 13:32:33 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 23 Jul 2010 04:33:32 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=JGu64xi1A9SmU7Xj1pKzN3HUumIkooD40p2+kmgcb/M=; b=YwggCnXOT4yp3gJNsPxvolXMhUfg/HtRNjDglsJfaVPE54JKYuAtf5sz2x/FHYRr/V 60/39RhpAkuQduw5kUM/SZgCMX3+yn/gTGM8whh+9VZKOxs/JHSl3Eg1pzQ+bD0zQkPL bhWS2ZHEwnUYAFqgHHIqv7AKWF/nI8xJTt38E=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=CyWNkJsdMSAU27Um/V5ehT5eh0Nzr+ZEuyI+SUGA4H3KODo6if+Tkzorgcfas6C6JR xx5u2DhqeYAe8fA4YBVw2ZRcA26Sy9jH36J5VWiNFreJwtRJWz3tgYgF3bPTi5y6BDWF kYtDfOD+0bxg+9/ZzpQWQE3o+8Ru/fQlEHNFg=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1279791153.5872.2225.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTikWKjU2MS7Daip5z2WTSZwNYFDd9_eqZRKGdi7k@xxxxxxxxxxxxxx> <20100721162450.GJ17817@xxxxxxxxxxx> <1279791153.5872.2225.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Ian & list,

I'll provide the specifics of my config, sure






2010/7/22 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> (dropping xen-users to avoid cross-posting)

> Do you have a reference to this old issue?

I googled for the old mailing list post, but no luck with the traffic
on the Xen lists.
Firstofall, I'm glad if it's a different bug and doesn't exist for
most people :)

> To be honest I think it is unlikely that you are seeing the actual same
> issue as a bug that old, even if your symptoms are very similar.
>
> Can you give details of your precise system configuration for both host
> and guest, hypervisor changeset (I don't know what Oracle VM 2.0 has in
> it), kernel changeset for both dom0 and domU etc.

dom0 (both identical)
xen_major              : 3
xen_minor              : 4
xen_extra              : .0
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xff400000
xen_changeset          : unavailable


[root@waxh0004 ~]# uname -a
Linux waxh0004 2.6.18-128.2.1.4.9.el5xen #1 SMP Fri Oct 9 14:57:31 EDT
2009 i686 i686 i386 GNU/Linux

domU:
debian:~# uname -a
Linux debian 2.6.26-2-xen-686 #1 SMP Wed Nov 4 23:23:33 UTC 2009 i686 GNU/Linux

(debian lenny from stacklet.com, kernel date was nov9 09)

> I am currently doing some live migration testing with guests under load
> (forkbomb) and am regularly doing 4-5000 successful migrations before I
> hit a very subtle deadlock in a PVops domU kernel. I have most likely in
> the past 4-5 years personally done tens of thousands of iterations of
> live migration in various scenarios and we know other people are
> regularly doing automated and manual test of these things so the problem
> you are seeing is almost certainly not a generic failure but must be
> specific to the version of one or more components in your system.

good!

> Are you seeing failure after precisely 1024 migrations in every case or
> is that just a rough figure? It might be worth

no, it was more like "just above 1000", I also had some counter
problem in the script.
Note that before that a few times the migration ended with a domU was
down. so your below hint / leak might just be the thing.

> using /usr/lib/xen/bin/lsevtchn to check what is happening to both the
> dom0 and domU event channels after each migration iteration. Once upon a

okay, will log that

> time I was seeing an evtchn leak in domU (now fixed) but that wouldn't
> fail after precisely 1024 iterations since there is always a number of
> non-leaking event channels also in use.
>
> Are you able to test with an up to date xen-3.4-testing or even better
> the xen-4.0-testing tree?

Retesting with Xen 4 would be a bit tricky. Oracle has an SDK domU
that has all the dom0 sources, would still take a day of work I'm
afraid.

I'd hope some other people can do the testing on other versions, thats
what I asked and what I didn't send to xen-devel in the first place.

I fixed lan management access to one of te hosts (for serial
console/reboot/reset...) so on that one I could try re-testing with
3.4 testing.

If the issue doesn't show up in your tests then I agree it's probably
just in the specific version - in that case I can just inform oracle
and they can look into it on their own.

>> > is it just the gratious arp?
>
> The Grat. ARP doesn't get sent by current PVops kernels (I don't know if
> you are using this since you haven't provided any details about your
> system configuration). A fix is pending in the network subsystem

I know I didn't. Because I just asked for someone else to run the
script and retest ;p

> maintainers tree which I hope will be backported to to 2.6.32.x when it
> goes into mainline during the next merge window.
> See 06c4648d46d1b757d6b9591a86810be79818b60c and
> 592970675c9522bde588b945388c7995c8b51328 in net-next-2.6.git. You will
> also need to configure sysctl to enable the arp_notify option for the
> devices setting net.ipv4.conf.all.arp_notify = 1 is likely sufficient.

classic domU kernel

I'll try if I get a newer dom0 kernel to work, but I'll be on vacation
for a week now.
Considering that you successfully migrate a few thousand times I'd
suggest you forget about the issue until then.


Greetings,
Flo


-- 
'Sie brauchen sich um Ihre Zukunft keine Gedanken zu machen'

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>