Send a patch to the list, Cc Jeremy Fitzhardinge and also a blktap
maintainer, which you should be able to derive from changeset histories and
signed-off-by lines. Flag it clearly in the subject line as a proposed
bugfix for pv_ops.
-- Keir
On 16/10/2010 06:39, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> Well, Thanks Keir.
> Fortunately we caught the bug, it turned out to be a tapdisk problem.
> A brief explaination for other guys might confront this issue.
>
> Clear BLKTAP_DEFERRED on line 19 will lead to the concurrent access of
> tap->deferred_queue between line 24 and 37, which will finally cause bad
> pointer of tap->deferred_queue, and infinte loop in while clause in line 22.
> Lock line 24 will be a simple fix.
>
> /linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c
> 9 void
> 10 blktap_run_deferred(void)
> 11 {
> 12 LIST_HEAD(queue);
> 13 struct blktap *tap;
> 14 unsigned long flags;
> 15
> 16 spin_lock_irqsave(&deferred_work_lock, flags);
> 17 list_splice_init(&deferred_work_queue, &queue);
> 18 list_for_each_entry(tap, &queue, deferred_queue)
> 19 clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
> 20 spin_unlock_irqrestore(&deferred_work_lock, flags);
> 21
> 22 while (!list_empty(&queue)) {
> 23 tap = list_entry(queue.next, struct blktap, deferred_queue);
> 24 &nb sp; list_del_init(&tap->deferred_queue);
> 25 blktap_device_restart(tap);
> 26 }
> 27 }
> 28
> 29 void
> 30 blktap_defer(struct blktap *tap)
> 31 {
> 32 unsigned long flags;
> 33
> 34 spin_lock_irqsave(&deferred_work_lock, flags);
> 35 if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) {
> 36 set_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
> 37 list_add_tail(&tap->deferred_queue, &deferred_work_queue);
> 38 }
> 39 spin_unlock_irqrestore(&deferred_work_lock,! f lags);
> 40 }
>
>
>> Date: Fri, 15 Oct 2010 13:57:09 +0100
>> Subject: Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS
>> From: keir@xxxxxxx
>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>>
>> You'll probably want to see if you can get SysRq output from dom0 via serial
>> line. It's likely you can if it is alive enough to respond to ping. This
>> might tell you things like what all processes are getting blocked on, and
>> thus indicate what is stopping dom0 from making progress.
>>
>> -- Keir
>>
>> On 15/10/2010 13:43, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>>
>>>
>>> Hi Keir:
>>>
>>> First, I'd like to express my appreciation for the help your offered
>>> before.
>>> Well, recently we confront a rather nasty domain 0 no response
>>> problem.
>>>
>>> We still have 12 HVMs almost continuously and con currently reboot
>>> test on a physical server.
>>> A few hours later, the server looks like dead. We only can ping to
>>> the server and get right response,
>>> the Xen works fine since we can get debug info from serial port. Attached is
>>> the full debug output.
>>> After decode the domain 0 CPU stack, I find the CPU still works for domain 0
>>> since the stack changed
>>> info changed every time I dumped.
>>>
>>> Could help to take a look at the attentchment to see whether there are
>>> some hints for debugging this
>>> problem. Thanks in advance.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>>
>>
> !
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|