[I changed the subject line to reflect the current topic of
conversation. Maybe someone else is seeing this as well?]
I'm using the 64-bit SMP hypervisor, running on dual-CPU VT machines
(Dell 380, Dell SC430). Then I boot a dual-VCPU HVM guest (with VCPUs
bound to CPUs 0 and 1, respectively), running RedHat Enterprise Linux
4 U2 (64bit, smp kernel). Redhat calls this a 2.6.9 kernel, but it
includes a bunch of cherrypicked patches from later versions (through
roughly 2.6.12, as I remember). I'm enabling both APIC and ACPI in
the hvm domain builder, and using the 2-processor BIOS.
If I tell the Linux kernel "noapic" so that it avoids using the
IOAPIC, I boot and run just fine. Without "noapic", I'm getting into
userspace and able to access the (QEMU-emulated) hd. But typically
while running my rc3.d scripts, I get: "hda: dma_timer_expiry
dma_status == 0x64", which stops any further progress. I've tried
disabling dma for hda in the guest ("ide=nodma"), and it still hangs
this time with no "dma_timer_expiry" message (and sometimes a "hda:
lost interrupt" msg, though I don't see that right now).
I tried the patch you just sent, but that doesn't seem to help (even
when combined with my vioapic locking).
FWIW, I've attached my vioapic locking patch. I haven't been able to
verify this code yet, nor have I even given it a good look-over since
I first wrote it ... (This is *not* intended to be checked in yet.)
Dave
On 5/18/06, Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> wrote:
>As I mentioned, I have a very similar patch to make the IOAPIC code
>SMP safe. But since (even with these changes) I still see a huge
>number of lost hda interrupts when using the IOAPIC on SMP guests, I
>haven't been able to test it yet. I assume others see the same
>problems with the IOAPIC?? (I'll be diving into this soon --
>probably tonight or tomorrow. At this point I have no clue what's
>going wrong.)
On which situation will the IOAPIC has a lot of hd lost interrupt?
What's the guest kernel version are you using? I remember some old
version kernel has problem.
Also there is a bug on the round robin code.Current code will always
leads interrupt to vcpu 0.
Followed is the fix for it. But this fix cause problem for timer
interrupt, I'm not sure the cause, but I suspect it is because the timer
is injected in flood.
The below fix is based one of my another APIC patch , so not sure if you
can apply it directly, but I think you can figure out the changes
easily.
Thanks
Yunhong Jiang
diff -r 86d8246c6aff xen/arch/x86/hvm/vlapic.c
--- a/xen/arch/x86/hvm/vlapic.c Wed May 17 23:15:36 2006 +0100
+++ b/xen/arch/x86/hvm/vlapic.c Thu May 18 22:30:06 2006 +0800
@@ -308,8 +308,15 @@ struct vlapic* apic_round_robin(struct d
old = next = d->arch.hvm_domain.round_info[vector];
- do {
- /* the vcpu array is arranged according to vcpu_id */
+ /* the vcpu array is arranged according to vcpu_id */
+ do
+ {
+ next ++;
+ if ( !d->vcpu[next] ||
+ !test_bit(_VCPUF_initialised, &d->vcpu[next]->vcpu_flags) ||
+ next == MAX_VIRT_CPUS )
+ next = 0;
+
if ( test_bit(next, &bitmap) )
{
target = d->vcpu[next]->arch.hvm_vcpu.vlapic;
@@ -321,12 +328,6 @@ struct vlapic* apic_round_robin(struct d
}
break;
}
-
- next ++;
- if ( !d->vcpu[next] ||
- !test_bit(_VCPUF_initialised, &d->vcpu[next]->vcpu_flags)
||
- next == MAX_VIRT_CPUS )
- next = 0;
} while ( next != old );
d->arch.hvm_domain.round_info[vector] = next;
~
>
>Dave
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@xxxxxxxxxxxxxxxxxxx
>http://lists.xensource.com/xen-devel
>
vioapic-smp-safety.patch
Description: Text Data
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|