Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS

To:	MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS
From:	Keir Fraser <keir@xxxxxxx>
Date:	Sat, 16 Oct 2010 08:16:51 +0100
Cc:
Delivery-date:	Sat, 16 Oct 2010 00:17:38 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:user-agent:date :subject:from:to:message-id:thread-topic:thread-index:in-reply-to :mime-version:content-type:content-transfer-encoding; bh=VAOlY7gIELjDYTpYVEWmJkfpnQztrAQGFMOkhjREnYI=; b=RRnY/4ysC9qLZt7UztPl0DTAmdQnkIhdcza5lksOhRTtxSfMtQR0SaClIdyNPTDku/ pEZ1iKG8iCtDsn1CfKLwCBY2iGGi/ZKo/PCTD7R+o0Fff+e6WYluJuBbcGprgQGuomCK urHpAikDwtTWENhT8ZPRpAlLG5K10C6Oxt2s4=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:user-agent:date:subject:from:to:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=htcLA2WsoDW4Sm4oqu3LAzgkKNYaFR3DMt6K07iwwDYjyyRl8fKMiGema6qwh2nt07 BrKyj6y2MlEkXu5i6W0PCM8NgXBNpMQ+s8ZR7c7+8Va32jIBDeBhcezJg2mWACsUdYe8 5tm8Q5B8lV8fK5uNJjlETcNEhyhcCZSxa3hbk=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<BLU157-w8114EC5EB660DA26E51B9DA580@xxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	ActtAhpfxa7BDfC+3UC4aGDkQs35qw==
Thread-topic:	[Xen-devel] Domain 0 stop response on frequently reboot VMS
User-agent:	Microsoft-Entourage/12.26.0.100708

Send a patch to the list, Cc Jeremy Fitzhardinge and also a blktap
maintainer, which you should be able to derive from changeset histories and
signed-off-by lines. Flag it clearly in the subject line as a proposed
bugfix for pv_ops.

 -- Keir

On 16/10/2010 06:39, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:

> Well, Thanks Keir.
> Fortunately we caught the bug, it turned out to be a tapdisk problem.
> A brief explaination for other guys might confront this issue.
>  
> Clear  BLKTAP_DEFERRED on line 19 will lead to the concurrent access of
> tap->deferred_queue between line 24 and 37, which will finally cause bad
> pointer of tap->deferred_queue, and infinte loop in while clause in line 22.
> Lock line 24 will be a simple fix.
>  
> /linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c
>   9 void
>  10 blktap_run_deferred(void)
>  11 {
>  12     LIST_HEAD(queue);
>  13     struct blktap *tap;
>  14     unsigned long flags;
>  15     
>  16     spin_lock_irqsave(&deferred_work_lock, flags);
>  17     list_splice_init(&deferred_work_queue, &queue);
>  18     list_for_each_entry(tap, &queue, deferred_queue)
>  19         clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
>  20     spin_unlock_irqrestore(&deferred_work_lock, flags);
>  21     
>  22     while (!list_empty(&queue)) {
>  23         tap = list_entry(queue.next, struct blktap, deferred_queue);
>  24 &nb sp;       list_del_init(&tap->deferred_queue);
>  25         blktap_device_restart(tap);
>  26     }   
>  27 }   
>  28 
>  29 void
>  30 blktap_defer(struct blktap *tap)
>  31 {
>  32     unsigned long flags;
>  33     
>  34     spin_lock_irqsave(&deferred_work_lock, flags);
>  35     if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) {
>  36         set_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
>  37         list_add_tail(&tap->deferred_queue, &deferred_work_queue);
>  38     }   
>  39     spin_unlock_irqrestore(&deferred_work_lock,! f lags);
>  40 } 
> 
>  
>> Date: Fri, 15 Oct 2010 13:57:09 +0100
>> Subject: Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS
>> From: keir@xxxxxxx
>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>> 
>> You'll probably want to see if you can get SysRq output from dom0 via serial
>> line. It's likely you can if it is alive enough to respond to ping. This
>> might tell you things like what all processes are getting blocked on, and
>> thus indicate what is stopping dom0 from making progress.
>> 
>> -- Keir
>> 
>> On 15/10/2010 13:43, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>> 
>>> 
>>> Hi Keir:
>>> 
>>> First, I'd like to express my appreciation for the help your offered
>>> before.
>>> Well, recently we confront a rather nasty domain 0 no response
>>> problem.
>>> 
>>> We still have 12 HVMs almost continuously and con currently reboot
>>> test on a physical server.
>>> A few hours later, the server looks like dead. We only can ping to
>>> the server and get right response,
>>> the Xen works fine since we can get debug info from serial port. Attached is
>>> the full debug output.
>>> After decode the domain 0 CPU stack, I find the CPU still works for domain 0
>>> since the stack changed
>>> info changed every time I dumped.
>>> 
>>> Could help to take a look at the attentchment to see whether there are
>>> some hints for debugging this
>>> problem. Thanks in advance.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>> 
>> 
>        !



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS