WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug

To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
From: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>
Date: Thu, 1 Feb 2007 17:40:49 -0500
Delivery-date: Thu, 01 Feb 2007 14:40:46 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDAAAQnwRAATVWmwABVdeuAAQqkN4AAKff/fAAh92KA=
Thread-topic: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
> 
> No, the patch that Kevin provided cannot work because it touches the
> watchdog before jiffies has been updated. Since both jiffy update and
> watchdog check happens inside do_timer(), this is a hard problem to
fix
> for
> Linux 2.6.16. You could push the watchdog touch inside the following
> loop
> that calls do_timer(): I think that would work!
> 

OK, I've spent a little time to really understand this today
(hopefully!) and I think I know now why none of the patches to date (for
2.6.16 anyway) work -- the problem is they only touched the wdt one time
BUT timer_interrupt in time-xen.c has a loop that repeatedly calls
do_timer to advance the jiffies and check for timeout until the entire
delta time since the last time called is accounted for... any single one
of those do_timer calls might result in a watchdog timer expiration.

It's also not really correct to only touch the watchdog if the stolen
time is > 5s -- you might be currently sitting at 8s since the watchdog
was last updated and get called after 2s of stolen time and that will
cause a timeout.

What's more, if you get called with more than 20s of stolen time (e.g.
after save/restore or pause/unpause), you really need to tickle the
watchdog timer multiple times (at least once for every 10s worth of
jiffies in the total stolen time).

So -- my proposal (patch attached for 2.6.16) is to touch the watchdog
inside the loop that calls do_timer(), right after the call IF the
remaining amount of stolen time is greater than NS_PER_TICK -- since
each call to do_timer advances jiffies by one, this could only go wrong
if there was only a single jiffy left until the watchdog timer expires
on entry and I think that's OK!

I also considered only touching the watchdog timer every 5s or so, but I
think the code to do that would have more overhead than simply touching
it for every do_timer() call (since it's just a call that copies jiffies
to the per-cpu watchdog timer value).

Take a look and let me know what you think (the printk could be removed
-- I just put it in so I could tell the code was running).

Simon

Attachment: softlockup.patch
Description: softlockup.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel