I was able to finally track this down. Basically, on source machine, if there's
an event for the guest at the right moment during live migration, the line
is asserted via the pci_intx.i bit in:
__hvm_pci_intx_assert():
if ( __test_and_set_bit(device*4 + intx, &hvm_irq->pci_intx.i) ) <-----
return;
when moved to target, this gets carried over, and gsi is asserted again:
irq_load_pci():
if ( test_bit(dev*4 + intx, &hvm_irq->pci_intx.i) )
{
/* Direct GSI assert */
gsi = hvm_pci_intx_gsi(dev, intx);
hvm_irq->gsi_assert_count[gsi]++; <---
/* PCI-ISA bridge assert */
link = hvm_pci_intx_link(dev, intx);
hvm_irq->pci_link_assert_count[link]++;
}
As soon as it gets a xen_platform_pci event, the assert count causes it
to be delivered in a loop, hence the guest hang.
My simple fix is to just check for mask:
vioapic_masked():
.....
+ gsi = hvm_pci_intx_gsi(device, intx);
+ if (vioapic_masked(d, gsi))
+ return;
+
vioapic.c:
+int vioapic_masked(struct domain *d, unsigned int irq)
+{
+ struct hvm_hw_vioapic *vioapic = domain_vioapic(d);
+ union vioapic_redir_entry *ent;
+
+ ent = &vioapic->redirtbl[irq];
+ if ( ent->fields.mask )
+ return 1;
+
+ return 0;
+}
+
This seems to work, but not sure if it's the best fix, and currently waiting
for feedback from intel, and others here now.
Thanks
mukesh
Sheng Liang wrote:
> Mukesh,
>
> Did you ever get a response to this? Were you able to track it down?
>
> Sheng
>
> On Tue, Aug 26, 2008 at 8:57 PM, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx
> <mailto:mukesh.rathor@xxxxxxxxxx>> wrote:
>
> I'm debugging a hang of 64bit HVM guest with PV drivers. The problem
> happens during migrate. So far I've discovered that the guest is
> stuck in loop receiving interrupt 0xa9/169. In the hypervisor I see
> that upon vmx exit, it sends 0xa9 right away...
>
> (XEN) [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40 :0xa9
> (XEN) [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150
> (XEN) [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440
> (XEN) [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0
> (XEN) [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0
> (XEN) [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360
> (XEN) [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0
> (XEN) [<ffff828c8014e568>] mmio_operands+0xa8/0x160
> (XEN) [<ffff828c8014eb96>] handle_mmio+0x576/0x880
> (XEN) [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900
>
>
> I'm now trying ot figure out the IP that causes vm exit so I can
> figure where in the guest/guest-driver its writing to the APIC.
> On the guest side, I see that evtchn_pending_sel is not set in
> evtchn_interrupt().
>
> Any ideas/suggestions would be great as it is a critical bug.
>
> Thanks
> Mukesh
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx <mailto:Xen-devel@xxxxxxxxxxxxxxxxxxx>
> http://lists.xensource.com/xen-devel
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|