On 09/21/2010 12:53 AM, Keir Fraser wrote:
> On 21/09/2010 06:02, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>
>> Take a look at domain 0 event channel with port 105,106, I find on port 105,
>> it pending is
>> 1.(in [1,0], first bit refer to pending, and is 1, second bit refer to mask,
>> is 0).
>>
>> (XEN) 105 [1/0]: s=3 n=2 d=10 p=1 x=0
>> (XEN) 106 [0/0]: s=3 n=2 d=10 p=2 x=0
>>
>> In all, we have domain U cpu blocking on _VPF_blocked_in_xen, and it must set
>> the pending bit.
>> Consider pending is 1, it looks like the irq is not triggered, am I right ?
>> Since if it is triggerred, it should clear the pending bit. (line 361).
> Yes it looks like dom0 is not handling the event for some reason. Qemu looks
> like it still works and is waiting for a notification via select(). But that
> won't happen until dom0 kernel handles the event as an IRQ and calls the
> relevant irq handler (drivers/xen/evtchn.c:evtchn_interrupt()).
>
> I think you're on the right track in your debugging. I don't know much about
> the pv_ops irq handling path, except to say that this aspect is different
> than non-pv_ops kernels which special-case handling of events bound to
> user-space rather more. So at the moment my best guess would be that the bug
> is in the pv_ops kernel irq handling for this type of user-space-bound
> event.
We no longer use handle_level_irq because there's a race which loses
events when interrupt migration is enabled. Current xen/stable-2.6.32.x
has a proper fix for this, but the quick workaround is to disable
irqbalanced.
J
> -- Keir
>
>> ------------------------------/linux-2.6-pvops.git/kernel/irq/chip.c---
>> 354 void
>> 355 handle_level_irq(unsigned int irq, struct irq_desc *desc)
>> 356 {
>> 357 struct irqaction *action;
>> 358 irqreturn_t action_ret;
>> 359
>> 360 spin_lock(&desc->lock);
>> 361 mask_ack_irq(desc, irq);
>> 362
>> 363 if (unlikely(desc->status & IRQ_INPROGRESS))
>> 364 goto out_unlock;
>> 365 desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
>> 366 kstat_incr_irqs_this_cpu(irq, desc);
>> 367
>>
>> BTW, the qemu still works fine when VM is hang. Below is it strace output.
>> No much difference between other well worked qemu instance, other than select
>> all Timeout.
>> -------------------
>> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
>> clock_gettime(CLOCK_MONOTONIC, {673470, 59535265}) = 0
>> clock_gettime(CLOCK_MONOTONIC, {673470, 59629728}) = 0
>> clock_gettime(CLOCK_MONOTONIC, {673470, 59717700}) = 0
>> clock_gettime(CLOCK_MONOTONIC, {673470, 59806552}) = 0
>> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
>> clock_gettime(CLOCK_MONOTONIC, {673470, 70234406}) = 0
>> clock_gettime(CLOCK_MONOTONIC, {673470, 70332116}) = 0
>> clock_gettime(CLOCK_MONOTONIC, {673470, 70419835}) = 0
>>
>>
>>
>>
>>> Date: Mon, 20 Sep 2010 10:35:46 +0100
>>> Subject: Re: VM hung after running sometime
>>> From: keir.fraser@xxxxxxxxxxxxx
>>> To: tinnycloud@xxxxxxxxxxx
>>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx
>>>
>>> On 20/09/2010 10:15, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>>>
>>>> Thanks Keir.
>>>>
>>>> You're right, after I deeply looked into the wait_on_xen_event_channel, it
>>>> is
>>>> smart enough
>>>> to avoid the race I assumed.
>>>>
>>>> How about prepare_wait_on_xen_event_channel ?
>>>> Currently Istill don't know when it will be invoked.
>>>> Could enlighten me?
>>> As you can see it is called from hvm_send_assist_req(), hence it is called
>>> whenever an ioreq is sent to qemu-dm. Note that it is called *before*
>>> qemu-dm is notified -- hence it cannot race the wakeup from qemu, as we will
>>> not get woken until qemu-dm has done the work, and it cannot start the work
>>> until it is notified, and it is not notified until after
>>> prepare_wait_on_xen_event_channel has been executed.
>>>
>>> -- Keir
>>>
>>>>> Date: Mon, 20 Sep 2010 08:45:21 +0100
>>>>> Subject: Re: VM hung after running sometime
>>>>> From: keir.fraser@xxxxxxxxxxxxx
>>>>> To: tinnycloud@xxxxxxxxxxx
>>>>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx
>>>>>
>>>>> On 20/09/2010 07:00, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>>>>>
>>>>>> When IO is not ready, domain U in VMEXIT->hvm_do_resume might invoke
>>>>>> wait_on_xen_event_channel
>>>>>> (where it is blocked in _VPF_blocked_in_xen).
>>>>>>
>>>>>> Here is my assumption of event missed.
>>>>>>
>>>>>> step 1: hvm_do_resume execute 260, and suppose p->state is
>>>>>> STATE_IOREQ_READY
>>>>>> or STATE_IOREQ_INPROCESS
>>>>>> step 2: then in cpu_handle_ioreq is in line 547, it execute line 548 so
>>>>>> quickly before hvm_do_resume execute line 270.
>>>>>> Well, the event is missed.
>>>>>> In other words, the _VPF_blocked_in_xen is cleared before it is actually
>>>>>> setted, and Domian U who is blocked
>>>>>> might never get unblocked, it this possible?
>>>>> Firstly, that code is very paranoid and it should never actually be the
>>>>> case
>>>>> that we see STATE_IOREQ_READY or STATE_IOREQ_INPROCESS in hvm_do_resume().
>>>>> Secondly, even if you do, take a look at the implementation of
>>>>> wait_on_xen_event_channel() -- it is smart enough to avoid the race you
>>>>> mention.
>>>>>
>>>>> -- Keir
>>>>>
>>>>>
>>>
>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|