WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] On netfront accelerator add/remove watches

To: BVK Chaitanya <bayapuneni_chaitanya@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] On netfront accelerator add/remove watches
From: Kieran Mansley <kmansley@xxxxxxxxxxxxxx>
Date: Wed, 30 Jul 2008 16:17:29 +0100
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx, Neil Turton <nturton@xxxxxxxxxxxxxx>
Delivery-date: Wed, 30 Jul 2008 08:17:58 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <488FF50A.5000404@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <488E8F39.4020406@xxxxxxxxxxxx> <488EFA50.1070708@xxxxxxxxxxxxxx> <488FF50A.5000404@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Wed, 2008-07-30 at 10:28 +0530, BVK Chaitanya wrote:
> Neil Turton wrote:
> > Is that the BUG_ON in netfront_accelerator_add_watch?  One possible
> > explanation is that suspend_cancel is called and then otherend_changed
> > is called.  Can you add a printk to netfront_suspend_cancel to see if it
> > gets called just before the BUG_ON gets triggered?
> > 
> 
> Yes, BUG_ON was from netfront_accelerator_add_watch function.  I think i 
> got the problem: xen_suspend which calls suspend_cancel is not 
> serialized properly.
> 
> Under heavy load and very fine suspend-resume cycles, multiple 
> suspend_cancel instances can be running simultaneously.

I'd be very surprised if that was the case, a lot more would go wrong if
suspend_cancel was running more than once simultaneously for the same
domain.

We think the bug is due to the suspend being called before the frontend
has reached XenbusStateConnected, then suspend_cancel restoring the
watch that wasn't there before, and then the frontend moving to
XenbusStateConnected and trying to set the watch again.

Here's a patch that should fix that problem.  Could you test and see if
it solves the problem you're seeing?  I've not been able to check it
myself as I'm unable to get a recent xen-unstable.hg that will build for
one reason or another today.

Keir: I don't know if you're tagging a linux-2.6.18-xen.hg tree for the
3.3.0 and 3.2.2 releases, but this fix should probably go into both if
you are.

Thanks

Kieran


diff -r 1d647ef26f3f drivers/xen/netfront/accel.c
--- a/drivers/xen/netfront/accel.c
+++ b/drivers/xen/netfront/accel.c
@@ -709,8 +709,9 @@ int netfront_accelerator_suspend_cancel(
         * accelerator, so no need to call accelerator_probe_new_vif()
         * directly here
         */
-       netfront_accelerator_add_watch(np);
-       return 0;
+       if (dev->state == XenbusStateConnected)
+               netfront_accelerator_add_watch(np);
+       return 0;
 }
  
  

Attachment: accel_watch_suspend_cancel
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel