xen-devel
RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS
Well, Thanks Keir.
Fortunately we caught the bug, it turned out to be a tapdisk problem.
A brief explaination for other guys might confront this issue.
Clear BLKTAP_DEFERRED on line 19 will lead to the concurrent access of
tap->deferred_queue between line 24 and 37, which will finally cause bad
pointer of tap->deferred_queue, and infinte loop in while clause in line 22.
Lock line 24 will be a simple fix.
/linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c
9 void 10 blktap_run_deferred(void) 11 { 12 LIST_HEAD(queue); 13 struct blktap *tap; 14 unsigned long flags; 15 16 spin_lock_irqsave(&deferred_work_lock, flags); 17 list_splice_init(&deferred_work_queue, &queue); 18 list_for_each_entry(tap, &queue, deferred_queue) 19 clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse); 20 spin_unlock_irqrestore(&deferred_work_lock, flags); 21 22 while (!list_empty(&queue)) { 23 tap = list_entry(queue.next, struct blktap, deferred_queue); 24 &nb
sp; list_del_init(&tap->deferred_queue); 25 blktap_device_restart(tap); 26 } 27 } 28 29 void 30 blktap_defer(struct blktap *tap) 31 { 32 unsigned long flags; 33 34 spin_lock_irqsave(&deferred_work_lock, flags); 35 if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) { 36 set_bit(BLKTAP_DEFERRED, &tap->dev_inuse); 37 list_add_tail(&tap->deferred_queue, &deferred_work_queue); 38 } 39 spin_unlock_irqrestore(&deferred_work_lock, f
lags); 40 }
> Date: Fri, 15 Oct 2010 13:57:09 +0100 > Subject: Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS > From: keir@xxxxxxx > To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > > You'll probably want to see if you can get SysRq output from dom0 via serial > line. It's likely you can if it is alive enough to respond to ping. This > might tell you things like what all processes are getting blocked on, and > thus indicate what is stopping dom0 from making progress. > > -- Keir > > On 15/10/2010 13:43, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > > > > > Hi Keir: > > > > First, I'd like to express my appreciation for the help your offered > > before. > > Well, recently we confront a rather nasty domain 0 no response > > problem. > > > > We still have 12 HVMs almost continuously and con
currently reboot > > test on a physical server. > > A few hours later, the server looks like dead. We only can ping to > > the server and get right response, > > the Xen works fine since we can get debug info from serial port. Attached is > > the full debug output. > > After decode the domain 0 CPU stack, I find the CPU still works for domain 0 > > since the stack changed > > info changed every time I dumped. > > > > Could help to take a look at the attentchment to see whether there are > > some hints for debugging this > > problem. Thanks in advance. > > > > > > > > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-devel > >
body>
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|