WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug

To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
From: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>
Date: Thu, 1 Feb 2007 09:31:58 -0500
Delivery-date: Thu, 01 Feb 2007 06:31:44 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDAAAQnwRAATVWmwABVdeuAAQqkN4A==
Thread-topic: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
Kevin,

> 
> Hi, Simon,
>       You case should be different as what I saw, which may be fixed
> by the original patch I posted which however doesn't apply to latest.
> In 2.6.16 version, it's do_timer to call softlock_tick instead of
> run_local_timers. So the check on "stolen > 5s" is a bit late to still
> allow warning jumped out though adjusted later. Could you try
> attached patch to see whether fixing for your live migration case?
> 

So, I tried this last night - I don't see any problems following live
migration but I am still seeing soft lockups all of which are related to
cases where there is a large stolen value - I haven't looked at all the
logs yet, but I did see a couple of things:

1. There were a ton of occasions when the test for stolen > 5s fired but
the value of stolen
   was actually negative - is a -ve stolen value expected? I think the
patch needs to
   be modified to define stolen_threshold as s64 instead of u64 if this
is expected...

2. Following save/restore, I see absolutely massive positive values of
stolen of the order of the
   time the domain was saved (seems reasonable) but then I immediately
see a soft lockup even though
   we touched the watchdog. Shouldn't this patch also fix soft lockup
after save/restore?

3. I actually saw a bunch of cases where there was a mongo stolen value
during apparently normal
   operation (in the ones I've looked at, the system as a whole was not
particularly stressed); I
   need to work on exactly why the domain is not being secheduled, but
in the meantime, shouldn't
   this patch stop the incorrect soft lockup in DomU when the hypervisor
fails to schedule the
   domain for a long period? (not exactly related to VCPU hotplug I
know)

Simon

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel