Hi
First of all I have to state that I am neither a Kernel nor a Xen
developer. Nevertheless, while trying to use Kernel 2.6.31.6 from
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git as a Dom0
Kernel, I discovered an issue and searching the Internet for a long
time, I probably also found the cause. However, I won't be able to fix
it by myself :-), so I am trying to share my knowledge with this list,
in the hope that the issue might gets fixed sometime :-)...
I will try to give you all information that seems relevant to me;
however, if it turns out I missed to give enough details about my system
(configuration), log files or anything else, I will be glad to provide
this information. Furthermore, I would also be happy to support
"testing" of potential patches if this is required. I post to this list
as this has been suggested at
http://wiki.xensource.com/xenwiki/XenParavirtOps (bottom of page). If I
am wrong, please give me a short hint so I wont bother you any longer...
Now, let's get into it...
About my system:
I am running Gentoo (10.0, server profile) on an Asus P2B-D motherboard
(PIIX4 chipset) with two PIII 500 MHz CPUs and 1G of RAM. The system
furthermore possesses 3 PCI network interfaces of chip type Realtek RLT
8139 (rlt8139too Kernel driver). Network interface to be used is eth0 (I
already tried whether using another interface as eth0 would change
anything - without success :-( ).
The issue I have:
While Xen pv_ops Kernel 2.6.31.6 perfectly runs on bare metal, it fails
to get network connectivity when run on top of Xen 3.4.1 (Gentoo default
installation). Though the system seems to come up correctly at a first
sight and network interface is available (I can ping it locally), access
to network fails (I cannot ping other system in the network nor vice-versa).
What I discovered so far:
Consulting the boot messages within "dmesg", I discovered that ACPI SCI
fails to load when run on top of Xen, while this error is not happening
on bare metal.
With XEN:
*********
bio: create slab <bio-0> at 0
ACPI: SCI (IRQ20) allocation failed
ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control
Interrupt handler 20090521 evevent-161
ACPI: Unable to start the ACPI Interpreter
------------[ cut here ]------------
WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c()
Hardware name: System Name
kobject: '<NULL>' (cf805ea0): is not initialized, yet kobject_put() is
being called.
Modules linked in:
Pid: 1, comm: swapper Tainted: G W 2.6.31.6 #14
Call Trace:
[<c043a2db>] warn_slowpath_common+0x60/0x90
[<c043a33f>] warn_slowpath_fmt+0x24/0x27
[<c05588cb>] kobject_put+0x27/0x3c
[<c049e502>] kmem_cache_destroy+0x105/0x11b
[<c058adc8>] acpi_os_delete_cache+0x8/0xc
[<c05a6fe6>] acpi_ut_delete_caches+0xd/0x6b
[<c05a77f7>] acpi_ut_subsystem_shutdown+0x87/0x90
[<c0904837>] ? acpi_init+0x0/0x263
[<c05a8067>] acpi_terminate+0x8/0x14
[<c09049cb>] acpi_init+0x194/0x263
[<c05f0e66>] ? __class_create+0x44/0x5e
[<c09021c5>] ? fbmem_init+0x0/0x78
[<c0904837>] ? acpi_init+0x0/0x263
[<c0403051>] do_one_initcall+0x4c/0x13a
[<c08e030d>] kernel_init+0x12c/0x17d
[<c08e01e1>] ? kernel_init+0x0/0x17d
[<c040ad17>] kernel_thread_helper+0x7/0x10
---[ end trace 4eaa2a86a8e2da23 ]---
------------[ cut here ]------------
WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c()
Hardware name: System Name
kobject: '<NULL>' (cf805f60): is not initialized, yet kobject_put() is
being called.
Modules linked in:
Pid: 1, comm: swapper Tainted: G W 2.6.31.6 #14
Call Trace:
[<c043a2db>] warn_slowpath_common+0x60/0x90
[<c043a33f>] warn_slowpath_fmt+0x24/0x27
[<c05588cb>] kobject_put+0x27/0x3c
[<c049e502>] kmem_cache_destroy+0x105/0x11b
[<c058adc8>] acpi_os_delete_cache+0x8/0xc
[<c05a700e>] acpi_ut_delete_caches+0x35/0x6b
[<c05a77f7>] acpi_ut_subsystem_shutdown+0x87/0x90
[<c0904837>] ? acpi_init+0x0/0x263
[<c05a8067>] acpi_terminate+0x8/0x14
[<c09049cb>] acpi_init+0x194/0x263
[<c05f0e66>] ? __class_create+0x44/0x5e
[<c09021c5>] ? fbmem_init+0x0/0x78
[<c0904837>] ? acpi_init+0x0/0x263
[<c0403051>] do_one_initcall+0x4c/0x13a
[<c08e030d>] kernel_init+0x12c/0x17d
[<c08e01e1>] ? kernel_init+0x0/0x17d
[<c040ad17>] kernel_thread_helper+0x7/0x10
---[ end trace 4eaa2a86a8e2da24 ]---
sync cpu 0 get result ffffffff max_id 0
Failed to sync pcpu 0
xenbus_probe_backend_init bus registered ok
Wihout Xen:
***********
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:00.0: reg 10 32bit mmio: [0xf8000000-0xfbffffff]
pci 0000:00:04.1: reg 20 io port: [0xb800-0xb80f]
pci 0000:00:04.2: reg 20 io port: [0xb400-0xb41f]
* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources
pci 0000:00:04.3: quirk: region e400-e43f claimed by PIIX4 ACPI
pci 0000:00:04.3: quirk: region e800-e80f claimed by PIIX4 SMB
pci 0000:00:04.3: PIIX4 devres B PIO at 0290-0297
pci 0000:00:09.0: reg 10 io port: [0xb000-0xb0ff]
pci 0000:00:09.0: reg 14 32bit mmio: [0xde800000-0xde8000ff]
pci 0000:00:09.0: reg 30 32bit mmio: [0x000000-0x00ffff]
pci 0000:00:0a.0: reg 10 io port: [0xa800-0xa8ff]
pci 0000:00:0a.0: reg 14 32bit mmio: [0xde000000-0xde0000ff]
pci 0000:00:0a.0: supports D1 D2
pci 0000:00:0a.0: PME# supported from D1 D2 D3hot
pci 0000:00:0a.0: PME# disabled
pci 0000:00:0b.0: reg 10 io port: [0xa400-0xa4ff]
pci 0000:00:0b.0: reg 14 32bit mmio: [0xdd800000-0xdd8000ff]
pci 0000:00:0b.0: supports D1 D2
pci 0000:00:0b.0: PME# supported from D1 D2 D3hot
pci 0000:00:0b.0: PME# disabled
pci 0000:01:00.0: reg 10 32bit mmio: [0xe0000000-0xe3ffffff]
pci 0000:01:00.0: reg 14 32bit mmio: [0xdf800000-0xdf87ffff]
pci 0000:01:00.0: reg 18 io port: [0xd800-0xd8ff]
pci 0000:01:00.0: reg 30 32bit mmio: [0xdf7e0000-0xdf7fffff]
pci 0000:01:00.0: supports D1 D2
pci 0000:00:01.0: bridge io port: [0xd000-0xdfff]
pci 0000:00:01.0: bridge 32bit mmio: [0xf4000000-0xf40fffff]
pci 0000:00:01.0: bridge 32bit mmio pref: [0xdf700000-0xe3ffffff]
pci_bus 0000:00: on NUMA node 0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 11 *12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 *4 5 6 7 9 10 11 12 14 15)
xenbus_probe_backend_init bus registered ok
Respective to the error, the /proc/interrupts tables were also different:
With XEN:
*********
CPU0 CPU1
1: 426 0 xen-pirq-ioapic-edge i8042
3: 0 0 xen-pirq-ioapic-edge uhci_hcd:usb1
4: 2 0 xen-pirq-ioapic-edge serial
8: 2 0 xen-pirq-ioapic-edge rtc0
12: 0 0 xen-pirq-ioapic-edge eth0
14: 4319 0 xen-pirq-ioapic-edge ide0
15: 42 0 xen-pirq-ioapic-edge ide1
411: 0 0 xen-dyn-event xenbus
412: 0 703 xen-dyn-ipi callfuncsingle1
413: 0 0 xen-dyn-virq debug1
414: 0 0 xen-dyn-ipi callfunc1
415: 0 45622 xen-dyn-ipi resched1
416: 0 311 xen-dyn-ipi spinlock1
417: 0 153289 xen-dyn-virq timer1
418: 550 0 xen-dyn-ipi callfuncsingle0
419: 0 0 xen-dyn-virq debug0
420: 0 0 xen-dyn-ipi callfunc0
421: 18071 0 xen-dyn-ipi resched0
422: 661 0 xen-dyn-ipi spinlock0
423: 277476 0 xen-dyn-virq timer0
NMI: 0 0 Non-maskable interrupts
LOC: 0 0 Local timer interrupts
SPU: 0 0 Spurious interrupts
CNT: 0 0 Performance counter interrupts
PND: 0 0 Performance pending work
RES: 18071 45622 Rescheduling interrupts
CAL: 550 703 Function call interrupts
TLB: 0 0 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 132 132 Machine check polls
ERR: 0
MIS: 0
Without XEN:
************
CPU0 CPU1
0: 46 0 IO-APIC-edge timer
1: 2567 4239 IO-APIC-edge i8042
6: 3 0 IO-APIC-edge floppy
8: 1 1 IO-APIC-edge rtc0
14: 28604 27089 IO-APIC-edge ide0
15: 0 0 IO-APIC-edge ide1
18: 1942 1978 IO-APIC-fasteoi eth0
20: 0 0 IO-APIC-fasteoi acpi
NMI: 0 0 Non-maskable interrupts
LOC: 1097380 1052641 Local timer interrupts
SPU: 0 0 Spurious interrupts
CNT: 0 0 Performance counter interrupts
PND: 0 0 Performance pending work
RES: 105211 107135 Rescheduling interrupts
CAL: 16 20 Function call interrupts
TLB: 4542 4509 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 289 289 Machine check polls
ERR: 0
MIS: 0
Searching the Internet, I ran across different messages (i.e.
http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg26601.html)
mentioning that on motherboards with the PIIX4 chipset SCI interrupt is
hardwired to IRQ 9. However, on my system it is assigned IRQ 20 on bare
metal, and fails to be set to IRQ 20 on top of Xen (see extract above of
dmesg when run on top of Xen -> ACPI: SCI (IRQ20) allocation failed).
As I started wondering whether it would work with IRQ 9 and having no
knowledge of ACPI and interrupt handling in the Kernel, I badly fixed
the code of <Kernel-DIR>/drivers/acpi/osl.c in the following manner:
osl.c:391
*********
acpi_status
acpi_os_install_interrupt_handler(u32 gsi, acpi_osd_handler handler,
void *context)
{
unsigned int irq;
acpi_irq_stats_init();
/*
* Ignore the GSI from the core, and use the value in our copy
of the
* FADT. It may not be the same if an interrupt source override
exists
* for the SCI.
*/
gsi = acpi_gbl_FADT.sci_interrupt;
if (acpi_gsi_to_irq(gsi, &irq) < 0) {
printk(KERN_ERR PREFIX "SCI (ACPI GSI %d) not registered\n",
gsi);
return AE_OK;
}
+ irq = 9;
acpi_irq_handler = handler;
acpi_irq_context = context;
if (request_irq(irq, acpi_irq, IRQF_SHARED, "acpi", acpi_irq)) {
printk(KERN_ERR PREFIX "SCI (IRQ%d) allocation
failed\n", irq);
return AE_NOT_ACQUIRED;
}
acpi_irq_irq = irq;
return AE_OK;
}
As you can see, I just "overwrote" the IRQ number somehow evaluated by
the system with IRQ 9, recompiled the Kernel and discovered(!) that
networking was now working, even within Xen (btw: it was still working
on bare metal).
Now I don't know why it is working with SCI mapped to IRQ 20 on bare
metal while SCI is supposed to be hardwired to IRQ 9, but the fact that
it works in both cases with IRQ 9 suggests me there is something "wrong"
or at least different when pv_ops Kernel 2.6.31.6 is run on top of Xen.
So someone somewhen might have a look at it, because that's where my
knowledge stops...
Thanks & regards,
Marcial
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|