[Xen-devel] RE: VM hung after running sometime

Hi Keir:

        I spent more time on how event channel works. And now I know that event is bind to
irq with call of request_irq. When event is sent, the other side of the channel will run into
asm_do_IRQ->generic_handle_irq->generic_handle_irq_desc->handle_level_irq(
here it actually invokes desc->handle_irq, and for evtchn this is handle_level_irq).
I noticed that in handle_level_irq the event mask and pending is cleared.

Well I have one more analysis to be discussed.

Attached is the evtchn when a VM is hang in physical server. Domain 10 is hang.
We can see domain 10 CPU info on the bottem the log, its has flags = 4 which means
_VPF_blocked_in_xen.

(XEN) VCPU information and callbacks for domain 10:
(XEN)     VCPU0: CPU11 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={} cpu_affinity={4-15}
(XEN)     paging assistance: shadowed 2-on-3
(XEN)     No periodic timer
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
(XEN)     VCPU1: CPU9 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={9} cpu_affinity={4-15}
(XEN)     paging assistance: shadowed 2-on-3
(XEN)     No periodic timer
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)

And its domain event info is :
(XEN) Domain 10 polling vCPUs: {No periodic timer}
(XEN) Event channel information for domain 10:
(XEN)     port [p/m]
(XEN)        1 [0/1]: s=3 n=0 d=0 p=105 x=1
(XEN)        2 [0/1]: s=3 n=1 d=0 p=106 x=1
(XEN)        3 [0/0]: s=3 n=0 d=0 p=104 x=0
(XEN)        4 [0/1]: s=2 n=0 d=0 x=0
(XEN)        5 [0/0]: s=6 n=0 x=0
(XEN)        6 [0/0]: s=2 n=0 d=0 x=0
(XEN)        7 [0/0]: s=3 n=0 d=0 p=107 x=0
(XEN)        8 [0/0]: s=3 n=0 d=0 p=108 x=0
(XEN)        9 [0/0]: s=3 n=0 d=0 p=109 x=0
(XEN)       10 [0/0]: s=3 n=0 d=0 p=110 x=0

Base on our situation, we only interest in the event channel which consumer_is_xen is 1,
and here "x=1", that is port 1 and 2. According to the log, the other side of the channel
is domain 0, port 105, and 106.

Take a look at domain 0 event channel with port 105,106, I find on port 105, it pending is
1.(in [1,0], first bit refer to pending, and is 1, second bit refer to mask, is 0).

(XEN)      105 [1/0]: s=3 n=2 d=10 p=1 x=0
(XEN)      106 [0/0]: s=3 n=2 d=10 p=2 x=0

In all, we have domain U cpu blocking on _VPF_blocked_in_xen, and it must set the pending bit.
Consider pending is 1, it looks like the irq is not triggered, am I right ?
Since if it is triggerred, it should clear the pending bit. (line 361).

------------------------------/linux-2.6-pvops.git/kernel/irq/chip.c---
354 void
355 handle_level_irq(unsigned int irq, struct irq_desc *desc)
356 {
357         struct irqaction *action;
358         irqreturn_t action_ret;
359
360         spin_lock(&desc->lock);
361         mask_ack_irq(desc, irq);
362
363         if (unlikely(desc->status & IRQ_INPROGRESS))
364                 goto out_unlock;
365         desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
366         kstat_incr_irqs_this_cpu(irq, desc);
367

BTW, the qemu still works fine when VM is hang. Below is it strace output.
No much difference between other well worked qemu instance, other than select all Timeout.
-------------------
select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {673470, 59535265}) = 0
clock_gettime(CLOCK_MONOTONIC, {673470, 59629728}) = 0
clock_gettime(CLOCK_MONOTONIC, {673470, 59717700}) = 0
clock_gettime(CLOCK_MONOTONIC, {673470, 59806552}) = 0
select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {673470, 70234406}) = 0
clock_gettime(CLOCK_MONOTONIC, {673470, 70332116}) = 0
clock_gettime(CLOCK_MONOTONIC, {673470, 70419835}) = 0

> Date: Mon, 20 Sep 2010 10:35:46 +0100
> Subject: Re: VM hung after running sometime
> From: keir.fraser@xxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx
>
> On 20/09/2010 10:15, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>
> > Thanks Keir.
> >
> > You're right, after I deeply looked into the wait_on_xen_event_channel, it is
> > smart enough
> > to avoid the race I assumed.
> >
> > How about prepare_wait_on_xen_event_channel ?
> > Currently Istill don't know when it will be invoked.
> > Could enlighten me?
>
> As you can see it is called from hvm_send_assist_req(), hence it is called
> whenever an ioreq is sent to qemu-dm. Note that it is called *before*
> qemu-dm is notified -- hence it cannot race the wakeup from qemu, as we will
> not get woken u ntil qemu-dm has done the work, and it cannot start the work
> until it is notified, and it is not notified until after
> prepare_wait_on_xen_event_channel has been executed.
>
> -- Keir
>
> >
> >> Date: Mon, 20 Sep 2010 08:45:21 +0100
> >> Subject: Re: VM hung after running sometime
> >> From: keir.fraser@xxxxxxxxxxxxx
> >> To: tinnycloud@xxxxxxxxxxx
> >> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx
> >>
> >> On 20/09/2010 07:00, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> >>
> >>> When IO is not ready, domain U in VMEXIT->hvm_do_resume might invoke
> >>> wait_on_xen_event_channel
> >>> (where it is blocked in _VPF_blocked_in_xen).
> >>>
> >>> Here is my assumption of event missed.
> >>>
> >>> step 1: hvm_do_re sume execute 260, and suppose p->state is STATE_IOREQ_READY
> >>> or STATE_IOREQ_INPROCESS
> >>> step 2: then in cpu_handle_ioreq is in line 547, it execute line 548 so
> >>> quickly before hvm_do_resume execute line 270.
> >>> Well, the event is missed.
> >>> In other words, the _VPF_blocked_in_xen is cleared before it is actually
> >>> setted, and Domian U who is blocked
> >>> might never get unblocked, it this possible?
> >>
> >> Firstly, that code is very paranoid and it should never actually be the case
> >> that we see STATE_IOREQ_READY or STATE_IOREQ_INPROCESS in hvm_do_resume().
> >> Secondly, even if you do, take a look at the implementation of
> >> wait_on_xen_event_channel() -- it is smart enough to avoid the race you
> >> mention.
> >>
> >> -- Keir
> >& gt;
> >>
> >
>
>

hang.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] RE: VM hung after running sometime