RE: [Xen-devel] Re: VM hung after running sometime

Thanks for the details.

Currently guest VM hang in our heavy IO stress test, (In detail, we have created more than 12 HVMS on our 16cores physical server,
and each of HVM inside, iometer and ab regard as heavy IO periodically run). Guest hang shows up in 1 or 2 days. So the IO is very
heavy, so as the interrupts, I think.

According to the hang log, the domain blocked in _VPF_blocked_in_xen, indicates "x=1" in log file below, and that is port 1, 2. And
all our HVM are have PVdriver installed, one thing I am not clear right now is the IO event in these two ports.  Does it only include
"mouse, vga"event, or it also includes hard disk events? (If it has hard disk events include d, the interrupt would be very heavy, right?
and right now we have 4 physical CPU allocated to domain 0, is it appropriate ? )

Anyway, I think I can have irqbalance disabled for a quick test.
Meanwhile, I will spent some time on the patch merge.
Many thanks.

And its domain event info is :
(XEN) Domain 10 polling vCPUs: {No periodic timer}
(XEN) Event channel information for domain 10:
(XEN)     port [p/m]
(XEN)        1 [0/1]: s=3 n=0 d=0 p=105 x=1
(XEN)        2 [0/1]: s=3 n=1 d=0 p=106 x=1
(XEN)        3 [0/0]: s=3 n=0 d=0 p=104 x=0
(XEN)        4 [0/1]: s=2 n=0 d=0 x=0
(XEN)        5 [0/0]: s=6 n=0 x=0
(XEN)        6 [0/0]: s=2 n=0 d=0 x=0
(XEN)        7 [0/0]: s=3 n=0 d=0 p=107 x=0
(XEN)      & nbsp; 8 [0/0]: s=3 n=0 d=0 p=108 x=0
(XEN)        9 [0/0]: s=3 n=0 d=0 p=109 x=0
(XEN)       10 [0/0]: s=3 n=0 d=0 p=110 x=0

> Date: Tue, 21 Sep 2010 17:17:12 -0700
> From: jeremy@xxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: keir.fraser@xxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Re: VM hung after running sometime
>
> On 09/21/2010 05:02 PM, MaoXiaoyun wrote:
> > Thanks Jeremy.
> >
> > Regards to fix you mentioned, did you mean the patch I searched and
> > pasted below, if so, it this all what I need?
>
> No, you need more than that. There are quite a few changes from multiple
> branches, so I'd recommend just using a current kernel.
>
> > For irqbalance disabled, I am afried it might have negative
> > performance impact, right?
>
& gt; I doubt it. Unless you have so many interrupts that they can't all be
> handled on one cpu, it shouldn't make much difference. After all, the
> interrupts have to be handled *somewhere*, but it doesn't matter much
> where - who cares if cpu0 is mostly handling interrupts if it leaves the
> other cpus free for other work?
>
> irqbalanced is primarily concerned with migrating interrupts according
> to the CPU topology to save power and (maybe) handle interrupts closer
> to the interrupting device. But that's meaningless in a domain where the
> vcpus can be mapped to different pcpus from moment to moment.
>
> J
>
>
> >
> > -------------------------------------------------------
> > diff --git a/drivers/xen/events.c
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=32f4a2cfe11e342104b9e568c230f2f17b5ae856;hb=32f4a 2cfe11e342104b9e568c230f2f17b5ae856>
> > b/drivers/xen/events.c
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=06fc9915176cdceca49f554c7a108c0fc3c5e608;hb=06fc9915176cdceca49f554c7a108c0fc3c5e608>
> > index 32f4a2c
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=32f4a2cfe11e342104b9e568c230f2f17b5ae856;hb=32f4a2cfe11e342104b9e568c230f2f17b5ae856>..06fc991
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=06fc9915176cdceca49f554c7a108c0fc3c5e608;hb=06fc9915176cdceca49f554c7a108c0fc3c5e608>
> > 100644 (file)
> > --- a/drivers/xen/events.c
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=32f4a2cfe11e342104b9e568c230f2f17b5ae856;hb=32f4a2cfe11e342104b9e568c230f2f17b5ae856>
> > +++ b/drivers/xen/ events.c
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=06fc9915176cdceca49f554c7a108c0fc3c5e608;hb=06fc9915176cdceca49f554c7a108c0fc3c5e608>
> > @@ -368,7
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=32f4a2cfe11e342104b9e568c230f2f17b5ae856;hb=32f4a2cfe11e342104b9e568c230f2f17b5ae856#l368>
> > +368,7
> > <http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blob;f=drivers/xen/events.c;h=06fc9915176cdceca49f554c7a108c0fc3c5e608;hb=06fc9915176cdceca49f554c7a108c0fc3c5e608#l368>
> > @@ int bind_evtchn_to_irq(unsigned int evtchn)
> > irq = find_unbound_irq();
> > set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
> > - handle_level_irq, "event");
> > + handle_edge_irq, "event");
> > evtchn_to_irq[evtchn] = irq;
> > irq_info[irq] = mk_evtchn_info(ev tchn);
> > > Date: Tue, 21 Sep 2010 10:28:34 -0700
> > > From: jeremy@xxxxxxxx
> > > To: keir.fraser@xxxxxxxxxxxxx
> > > CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> > > Subject: Re: [Xen-devel] Re: VM hung after running sometime
> > >
> > > On 09/21/2010 12:53 AM, Keir Fraser wrote:
> > > > On 21/09/2010 06:02, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> > > >
> > > >> Take a look at domain 0 event channel with port 105,106, I find
> > on port 105,
> > > >> it pending is
> > > >> 1.(in [1,0], first bit refer to pending, and is 1, second bit
> > refer to mask,
> > > >> is 0).
> > > >>
> > > >> (XEN) 105 [1/0]: s=3 n=2 d=10 p=1 x=0
> > > >> (XEN) 106 [0/0]: s=3 n=2 d=10 p=2 x=0
> > > > >
> > > >> In all, we have domain U cpu blocking on _VPF_blocked_in_xen, and
> > it must set
> > > >> the pending bit.
> > > >> Consider pending is 1, it looks like the irq is not triggered, am
> > I right ?
> > > >> Since if it is triggerred, it should clear the pending bit. (line
> > 361).
> > > > Yes it looks like dom0 is not handling the event for some reason.
> > Qemu looks
> > > > like it still works and is waiting for a notification via
> > select(). But that
> > > > won't happen until dom0 kernel handles the event as an IRQ and
> > calls the
> > > > relevant irq handler (drivers/xen/evtchn.c:evtchn_interrupt()).
> > > >
> > > > I think you're on the right track in your debugging. I don't know
> > much about
> > > > the pv_ops irq handling path, except to say that this aspect is
> > different
> > > > than non-pv_ops kernels which special-case handling of events bound to
> > > > user-space rather more. So at the moment my best guess would be
> > that the bug
> > > > is in the pv_ops kernel irq handling for this type of user-space-bound
> > > > event.
> > >
> > > We no longer use handle_level_irq because there's a race which loses
> > > events when interrupt migration is enabled. Current xen/stable-2.6.32.x
> > > has a proper fix for this, but the quick workaround is to disable
> > > irqbalanced.
> > >
> > > J
> > >
> > > > -- Keir
> > > >
> > > >>
> > ------------------------------/linux-2.6-pvops.git/kernel/irq/chip.c---
> > > >> 354 void
> > > >> 355 handle_level_irq(unsigned int irq, struct irq_desc *desc)
> > > >> 356 {
> > > >> 357 struct irqaction *action;
> > > >> 358 irqreturn_t action_ret;
> > > >> 359
> > > >> 360 spin_lock(&desc->lock);
> > > >> 361 mask_ack_irq(desc, irq);
> > > >> 362
> > > >> 363 if (unlikely(desc->status & IRQ_INPROGRESS))
> > > >> 364 goto out_unlock;
> > > >> 365 desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
> > > >> 366 kstat_incr_irqs_this_cpu(irq, desc);
> > > >> 367
> > > >>
> > > >> BTW, the qemu still works fine when VM is hang. Below is it
> > strace output.
> > > >> No much difference between other well worked qemu instance, other
> > than select
> > > > ;> all Timeout.
> > > >> -------------------
> > > >> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
> > > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59535265}) = 0
> > > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59629728}) = 0
> > > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59717700}) = 0
> > > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59806552}) = 0
> > > >> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
> > > >> clock_gettime(CLOCK_MONOTONIC, {673470, 70234406}) = 0
> > > >> clock_gettime(CLOCK_MONOTONIC, {673470, 70332116}) = 0
> > > >> clock_gettime(CLOCK_MONOTONIC, {673470, 70419835}) = 0
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> Date: Mon, 20 Sep 2010 10:35:46 +01 00
> > > >>> Subject: Re: VM hung after running sometime
> > > >>> From: keir.fraser@xxxxxxxxxxxxx
> > > >>> To: tinnycloud@xxxxxxxxxxx
> > > >>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx
> > > >>>
> > > >>> On 20/09/2010 10:15, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> > > >>>
> > > >>>> Thanks Keir.
> > > >>>>
> > > >>>> You're right, after I deeply looked into the
> > wait_on_xen_event_channel, it
> > > >>>> is
> > > >>>> smart enough
> > > >>>> to avoid the race I assumed.
> > > >>>>
> > > >>>> How about prepare_wait_on_xen_event_channel ?
> > > >>>> Currently Istill don't know when it will be invoked.
> > > >>>> Could enlighten me?
> > > >>> As you can see it is called from hvm_send_assist_req(), hence it
> > is called
> > > >>> whenever an ioreq is sent to qemu-dm. Note that it is called
> > *before*
> > > >>> qemu-dm is notified -- hence it cannot race the wakeup from
> > qemu, as we will
> > > >>> not get woken until qemu-dm has done the work, and it cannot
> > start the work
> > > >>> until it is notified, and it is not notified until after
> > > >>> prepare_wait_on_xen_event_channel has been executed.
> > > >>>
> > > >>> -- Keir
> > > >>>
> > > >>>>> Date: Mon, 20 Sep 2010 08:45:21 +0100
> > > >>>>> Subject: Re: VM hung after running sometime
> > > >>>>> From: keir.fraser@xxxxxxxxxxxxx
> > > >>>>> To: tinnycloud@xxxxxxxxxxx
> > > >>>>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx
> > > >>>>>
> > > >>>>> On 20/09/2010 07:00, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> > > >>>>>
> > > >>>>>> When IO is not ready, domain U in VMEXIT->hvm_do_resume might
> > invoke
> > > >>>>>> wait_on_xen_event_channel
> > > >>>>>> (where it is blocked in _VPF_blocked_in_xen).
> > > >>>>>>
> > > >>>>>> Here is my assumption of event missed.
> > > >>>>>>
> > > >>>>>> step 1: hvm_do_resume execute 260, and suppose p->state is
> > > & gt;>>>>> STATE_IOREQ_READY
> > > >>>>>> or STATE_IOREQ_INPROCESS
> > > >>>>>> step 2: then in cpu_handle_ioreq is in line 547, it execute
> > line 548 so
> > > >>>>>> quickly before hvm_do_resume execute line 270.
> > > >>>>>> Well, the event is missed.
> > > >>>>>> In other words, the _VPF_blocked_in_xen is cleared before it
> > is actually
> > > >>>>>> setted, and Domian U who is blocked
> > > >>>>>> might never get unblocked, it this possible?
> > > >>>>> Firstly, that code is very paranoid and it should never
> > actually be the
> > > >>>>> case
> > > >>>>> that we see STATE_IOREQ_READY or STATE_IOREQ_INPROCESS in
> > hvm_do_resume().> > > >>>>> Secondly, even if you do, take a look at the implementation of
> > > >>>>> wait_on_xen_event_channel() -- it is smart enough to avoid the
> > race you
> > > >>>>> mention.
> > > >>>>>
> > > >>>>> -- Keir
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >
> > > >
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/xen-devel
> > > >
> > >
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Re: VM hung after running sometime