Jeremy Fitzhardinge wrote:
> On 02/26/2010 08:17 AM, Ian Campbell wrote:
>> On Fri, 2010-02-26 at 12:05 +0000, Ian Campbell wrote:
>>
>>> Which looks might suspicious to me... However simply removing that
>>> causes acpi_probe_gsi to return 16 (instead of 24) and I run out of
>>> interrupts for use by real hardware (specifically my disk
>>> controller). If I hack acpi_probe_gsi to return at least 24
>>> everything works OK so it seems the error is only at the detection
>>> stage.
>>>
>> So this seems to all relate to the removal of the xen_io_apic_(read|
>> write) stuff.
>>
>
> Yep.
>
>> I can see that the GSI routing stuff is effective replaced by
>> PHYSDEVOP_setup_gsi but I don't see what replaces the IO APIC
>> enumeration. We still map a dummy page for FIX_IO_ACPI_* and
>> io_apic_(read|write) now go at that direct (and therefore get 0s
>> back). If the intention is not to enumerate the IO APICs in this way
>> then what seems to be missing is the part which discovers the number
>> of GSIs in the system and I'm not sure what is supposed to replace
>> that.
>>
>
> Nothing, as yet. The "+= 256" is definitely a hack, and we need to
> come up with a sound way to resolve it. There seem to be three
> possibilities:
>
> * Let the kernel see the IO APICs for the purposes of enumeration,
> but nothing else (which seems to defeat the point of the
> exercise) * Make up a fake Xen IO APIC mapping which just
> contains static state for the config registers. (I don't think
> this will work, because the IO APIC registers aren't simply
> memory-mapped) * Add an interface to Xen so it can return the
> results of its own IO APIC enumeration, and use that in dom0.
> I think this is probably most consistent with the idea that
> "Xen owns all the APICs", but I'm not sure how to wire it into
> the Linux side.
>
> Ideally we should also be able to get rid of the fake IO APIC mappings
> because nothing in Linux will even attempt to access them, but I
> suspect in practice it will be easier to let some probe code poke at
> them and find they're not there rather than try and disable the probe.
Currenlty, ioapic access only exists at kernel's boot time to probe some info
related to ioapic(e.g. ioapic version, ioapic's rte number), and no any access
to ioapic at runtime, and this is why we still need the dump page there.
To remove the hack, we can use your third method with existing interface
PHYSDEVOP_apic_read to read the redirect entry number of ioapic. Attached the
patch. What's your opinion ? :)
>From e5a75b3f2f40e56de714818b51932e6f36491f56 Mon Sep 17 00:00:00 2001
From: Xiantao Zhang <xiantao.zhang@xxxxxxxxx>
Date: Mon, 1 Mar 2010 19:06:43 -0500
Subject: [PATCH] x86: ioapic: Remove the hack for calculating nr_irq_gsi for
Xen.
Read the entry number through the hypercall PHYSDEVOP_apic_read, but
the default vaule is also set to 255 if PHYSDEVOP_apic_read doesn't
exist.
Signed-off-by: Xiantao Zhang <xiantao.zhang@xxxxxxxxx>
---
arch/x86/include/asm/io_apic.h | 1 +
arch/x86/kernel/acpi/boot.c | 3 ---
arch/x86/kernel/apic/io_apic.c | 5 +++++
arch/x86/xen/pci.c | 20 ++++++++++++++++++++
4 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 2fc09d3..c58a838 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -172,6 +172,7 @@ extern int restore_IO_APIC_setup(struct IO_APIC_route_entry
**ioapic_entries);
extern void probe_nr_irqs_gsi(void);
extern int get_nr_irqs_gsi(void);
+extern void set_nr_irqs_gsi(int nr_gsi);
extern int setup_ioapic_entry(int apic, int irq,
struct IO_APIC_route_entry *entry,
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 21fc029..7ba650f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -869,9 +869,6 @@ int __init acpi_probe_gsi(void)
max_gsi = gsi;
}
- if (xen_initial_domain())
- max_gsi += 255; /* Plus maximum entries of an ioapic. */
-
return max_gsi + 1;
}
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 68acd64..e116f7f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3831,6 +3831,11 @@ int get_nr_irqs_gsi(void)
return nr_irqs_gsi;
}
+void set_nr_irqs_gsi(int nr_gsi)
+{
+ nr_irqs_gsi = nr_gsi;
+}
+
#ifdef CONFIG_SPARSE_IRQ
int __init arch_probe_nr_irqs(void)
{
diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index f999ad8..1839d5f 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -78,6 +78,26 @@ void __init xen_setup_pirqs(void)
for (irq = 0; irq < NR_IRQS_LEGACY; irq++)
xen_allocate_pirq(irq, 0, "xt-pic");
return;
+ } else {
+ struct physdev_apic apic_op;
+ int ret;
+ union IO_APIC_reg_01 reg_01;
+ int nr_gsi = get_nr_irqs_gsi();
+
+ apic_op.apic_physbase = mp_ioapics[nr_ioapics - 1].apicaddr;
+ apic_op.reg = 1;
+ ret = HYPERVISOR_physdev_op(PHYSDEVOP_apic_read, &apic_op);
+ if (ret) {
+ nr_gsi += 255;
+ printk("PHYSDEVOP_apic_read error,"
+ "set to max value(255) for entry
number!\n");
+ } else {
+ reg_01.raw = apic_op.value;
+ nr_gsi += reg_01.bits.entries;
+ }
+ if (nr_ioapics == 1)
+ nr_gsi -= NR_IRQS_LEGACY;
+ set_nr_irqs_gsi(nr_gsi);
}
/* Pre-allocate legacy irqs */
--
1.6.0.rc1
0001-x86-ioapic-Remove-the-hack-for-calculating-nr_irq_.patch
Description: 0001-x86-ioapic-Remove-the-hack-for-calculating-nr_irq_.patch
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|