On Fri, 30 Sep 2011, Stefan Bader wrote:
> On 22.09.2011 19:44, Stefano Stabellini wrote:
> > On Thu, 22 Sep 2011, Stefan Bader wrote:
> >> On 22.09.2011 13:58, Stefan Bader wrote:
> >>> On 22.09.2011 12:30, Stefano Stabellini wrote:
> >>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
> >>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried
> >>>>>>> using the
> >>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least
> >>>>>>> comes up and
> >>>>>>> gets configured via dhcp. And initial pings also get routed and done
> >>>>>>> correctly.
> >>>>>>> But slightly higher traffic (like checking for updates) hangs. And
> >>>>>>> after a while
> >>>>>>> there are messages about tx timeouts.
> >>>>>>> The ne2k_pci type nic almost immediately has those issues and never
> >>>>>>> comes up
> >>>>>>> correctly.
> >>>>>>>
> >>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am
> >>>>>>> not sure how
> >>>>>>> this should be but both nics get configured with level,low IRQs. Disk
> >>>>>>> emulation
> >>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs
> >>>>>>> seem to be
> >>>>>>> at least not level.
> >>>>>>
> >>>>>
> >>>>>> Does the e1000 emulated card work correctly?
> >>>>>
> >>>>> Yes, that one seems to work ok.
> >>>>>
> >>>>>> What happens if you disable interrupt remapping (see patch below)?
> >>>>>
> >>>>> 8139cp seems to work correctly now (much higher irq stats as well) and
> >>>>> e1000
> >>>>> still works. Both then using IOAPIC-fasteoi.
> >>>>>
> >>>>
> >>>> That means there must be another subtle bug in Xen in interrupt
> >>>> remapping that only affects 8139p emulation
> >>>>
> >>> Right, or to be complete:
> >>> - e1000: ok
> >>> - 8139cp: unstable (setup is possible)
> >>> - ne2k_pci: not working (tx problems from the beginning)
> >>>
> >>> The behaviour feels a bit like interrupts may get lost if occurring at a
> >>> higher
> >>> rate. Why this affects various drivers differently is a bit weird.
> >>>>
> >>
> >> This is mainly speculating... Quite a while back there was this patch to
> >> events:
> >>
> >> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
> >> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> >> Date: Fri Aug 20 19:10:01 2010 -0700
> >>
> >> xen: handle events as edge-triggered
> >>
> >> The commit message stated that Xen events are logically edge triggered. So
> >> PV
> >> events were changed to be handled as edge interrupts. Would that not mean
> >> that
> >> for xen-pirq-apic being using events this would apply the same and those
> >> should
> >> be apic-edge instead of level?
> >
> > That commit is referring to the internal way Linux handles these event,
> > that look like normal interrupt to the Linux irq subsystem. It is not
> > related to the way actual events are delivered from Xen to Linux, so it
> > shouldn't matter here.
> >
> > I would add lots of printk's in:
> >
> > xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> > xen/arch/x86/hvm/irq.c:assert_irq
> > xen/arch/x86/hvm/irq.c:assert_gsi
> >
> > to find out why xen is not injecting those interrupts
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
>
> It took quite a bit of time but at least I got some hopefully useful
> information
> now. So in general, whenever an interrupt is asserted,
> the hypervisor runs through this:
>
> __hvm_pci_intx_assert:
> when assert count was 0 before incrementing
> call assert_gsi
> call send_guest_pirq (when hvm uses pirq)
>
> In the send_guest_pirq chain is a call to evtchn_set_pending which tests as
> one
> of the first actions whether evtchn_pending in the shared_info is set. If that
> is the case the call immediately returns with 1.
>
> Adding printks to call_assert_gsi, I noticed that
> - When things stop working, the last call to send_guest_pirq returned 1.
> - But not every time the return code is one, the stall happens.
> - e1000 also has cases where send_guest_pirq returns 1 but they happen much
> less often (than using the 8139cp).
>
> Usually every intx_assert has a intx_deassert call that follows. when the
> stall
> occurs, this does not happen. Right here I got some troubles to understand
> where
> this intx_deassert is actually triggered. With an added WARN_ON the stack
> traces
> seem odd, like this:
>
> (XEN) [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
> (XEN) [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
> (XEN) [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
> (XEN) [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
> (XEN) [<ffff82c48017dba6>] copy_from_user+0x26/0x90
> (XEN) [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
> (XEN) [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d
>
> Not really sure how one gets from do_update_descriptor to do_hvm_op and the
> only
> thing in there which does the deassert is some irq level setting.
>
> Actually the guest does not really do much do EOI (which I had been assuming).
> But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
> PHYSDEVOP_irq_status_query will hit the following and not set the flag for
> needing EOI.
>
> irq_status_query.flags = 0;
> if ( is_hvm_domain(v->domain) &&
> domain_pirq_to_irq(v->domain, irq) <= 0 )
> {
> ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> break;
> }
>
> So all the guest is doing is to clear evtchn_pending in the pirq EOI
> function. I
> fail to understand what actually is doing the hvm_pci_intx_deassert calls but
> the way the fasteoi code in the guest looks to be working, there seems to be
> some gap between calling the handler and the eoi function... So from what I
> see,
> I would assume the following:
>
> dom0 domU
> - intx_assert (count 0->1)
> - send_guest_pirq = 0
> (evtchn_pending = 1)
> - upcall starts fasteoi handler
> - something does intx_deassert
> (count 1->0)
> - intx_assert (count 0->1)
> - send_guest_pirq = 1
> (evtchn_pending still set)
> - handler->eoi sets evtchn to 0 but
> otherwise does nothing
> - there is no intx_deassert, so even
> when another intx_assert would happen
> (which does not seem to be the case)
> no further send_guest_pirq would be
> called.
>
> Unfortunately I do miss some details on the inner working here. Generally I
> wonder whether not setting the needsEOI flag for those pirqs just is the
> problem. But it also could be intentional...
Thanks for the very detailed analysis.
It seems to me that the problem is that if the interrupt is a level
triggered interrupt when the guest issues an EOI we should be
reinjecting the interrupt again if it has been issued a second time in
the meantime. However this doesn't happen if the interrupt has been
remapped onto an even channel. In that case the guest is not even going
to issue an EOI at all.
So I wrote a patch to force the guest to issue EOIs even on remapped
irqs; in the hypercall handler we check whether we need to reinject the
interrupt and if that is the case we set the corresponding event channel
pending.
Could you please try the patch I appended? I haven't been able to reproduce
your problem so I am not really sure if it works.
diff -r e042fb60e0ee xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c Thu Sep 29 11:23:01 2011 +0000
+++ b/xen/arch/x86/physdev.c Fri Sep 30 14:01:46 2011 +0000
@@ -11,6 +11,7 @@
#include <asm/current.h>
#include <asm/io_apic.h>
#include <asm/msi.h>
+#include <asm/hvm/irq.h>
#include <asm/hypercall.h>
#include <public/xen.h>
#include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
pirq_guest_eoi(pirq);
+ if ( is_hvm_domain(v->domain) &&
+ domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+ {
+ struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
+ int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+ /* if this is a level irq and count > 0, send another
+ * notification */
+ if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+ && hvm_irq->gsi_assert_count[gsi] )
+ send_guest_pirq(v->domain, pirq);
+ }
spin_unlock(&v->domain->event_lock);
ret = 0;
break;
@@ -327,12 +340,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
if ( (irq < 0) || (irq >= v->domain->nr_pirqs) )
break;
irq_status_query.flags = 0;
- if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
- {
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
- break;
- }
/*
* Even edge-triggered or message-based IRQs can need masking from
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|