Argh, I should also have backported c/s 15527. I'll do that now, and it will
definitely fix this crash for you.
Thanks for the extra info!
-- Keir
On 15/1/08 11:06, "Ralph Passgang" <xen@xxxxxxxxxxxxx> wrote:
> Am Dienstag, 15. Januar 2008 09:17:46 schrieben Sie:
>
> Hi Keir,
>
>> What type of CPU are you running?
>
> 2x Intel Xeon 5120 (Dual-Core), running the 64bit hypervisor.
>
>> Is it valid retail silicon?
>
> what do you mean by "valid"? at least xen 3.0.x, and 3.1.x worked on it
> without any problem before. It seems that just a change between 15590 and
> 15598 must have broken something.
>
> It's not a dell, hp or ibm branded system, but it's quite a normal
> intel-based system.
>
> intel board (S5000PSL)
> intel chipset
> intel e1000 lan
> lsi megaraid_sas based sata-controller
> 8GB RAM
>
>> Can you add a line before the line that is crashing:
>> printk("vmx_basic_msr == %08x:%08x\n", vmx_msr_high, vmx_msr_low);
>> BUG_ON(((vmx_msr_high >> 18) & 15) == 6);
>>
>> ...and then tell me what that line prints out immediately before the
>> crash?
>
> yep, it says:
>
> vmx_basic_msr == 001a0400:00000007
>
>>
>> 'cat /proc/cpuinfo' from booting Linux on that system might also be
>> interesting,
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 15
> model name : Intel(R) Xeon(R) CPU 5120 @ 1.86GHz
> stepping : 6
> cpu MHz : 1861.973
> cache size : 4096 KB
> physical id : 0
> siblings : 1
> core id : 0
> cpu cores : 1
> fpu : yes
> fpu_exception : yes
> cpuid level : 10
> wp : yes
> flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
> clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni
> monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> bogomips : 3727.86
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management:
>
> of course, all the same for processor 0-3 (besides the core id).
>
>> also the CPU vendor/version string printed by Linux in dmesg
>> as it boots.
>
> xm dmesg (from the last known running version):
>
> Xen version 3.1.3-1 (Debian 3.1.3-0-tha10) (ralph@xxxxxxxxxxxxx) (gcc
> version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) Sat Jan 12 02:22:25
> UTC 2008
> Latest ChangeSet: changeset:15590:f479c2af0825
>
> (XEN) Command line:
> (XEN) Video information:
> (XEN) VGA is text mode 80x25, font 8x16
> (XEN) VBE/DDC methods: none; EDID transfer time: 2 seconds
> (XEN) EDID info not retrieved because no DDC retrieval method detected
> (XEN) Disc information:
> (XEN) Found 1 MBR signatures
> (XEN) Found 1 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN) 0000000000000000 - 000000000009f000 (usable)
> (XEN) 000000000009f000 - 0000000000100000 (reserved)
> (XEN) 0000000000100000 - 00000000de2b4000 (usable)
> (XEN) 00000000de2b4000 - 00000000de375000 (ACPI NVS)
> (XEN) 00000000de375000 - 00000000dfa42000 (usable)
> (XEN) 00000000dfa42000 - 00000000dfa9a000 (reserved)
> (XEN) 00000000dfa9a000 - 00000000dfad1000 (usable)
> (XEN) 00000000dfad1000 - 00000000dfb1a000 (ACPI NVS)
> (XEN) 00000000dfb1a000 - 00000000dfb2a000 (usable)
> (XEN) 00000000dfb2a000 - 00000000dfb3a000 (ACPI data)
> (XEN) 00000000dfb3a000 - 00000000dfc00000 (usable)
> (XEN) 00000000dfc00000 - 00000000f0000000 (reserved)
> (XEN) 00000000ffe00000 - 00000000ffe0c000 (reserved)
> (XEN) 0000000100000000 - 0000000220000000 (usable)
> (XEN) System RAM: 8186MB (8382644kB)
> (XEN) Xen heap: 14MB (14944kB)
> (XEN) Domain heap initialised: DMA width 32 bits
> (XEN) Processor #0 6:15 APIC version 20
> (XEN) Processor #6 6:15 APIC version 20
> (XEN) Processor #1 6:15 APIC version 20
> (XEN) Processor #7 6:15 APIC version 20
> (XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
> (XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
> (XEN) Enabling APIC mode: Flat. Using 2 I/O APICs
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Detected 1861.973 MHz processor.
> (XEN) HVM: VMX enabled
> (XEN) VMX: MSR intercept bitmap enabled
> (XEN) CPU0: Intel(R) Xeon(R) CPU 5120 @ 1.86GHz stepping 06
> (XEN) Mapping cpu 0 to node 255
> (XEN) Booting processor 1/6 eip 90000
> (XEN) Mapping cpu 1 to node 255
> (XEN) CPU1: Intel(R) Xeon(R) CPU 5120 @ 1.86GHz stepping 06
> (XEN) Booting processor 2/1 eip 90000
> (XEN) Mapping cpu 2 to node 255
> (XEN) CPU2: Intel(R) Xeon(R) CPU 5120 @ 1.86GHz stepping 06
> (XEN) Booting processor 3/7 eip 90000
> (XEN) Mapping cpu 3 to node 255
> (XEN) CPU3: Intel(R) Xeon(R) CPU 5120 @ 1.86GHz stepping 06
> (XEN) Total of 4 processors activated.
> (XEN) ENABLING IO-APIC IRQs
> (XEN) -> Using new ACK method
> (XEN) Platform timer overflows in 14998 jiffies.
> (XEN) Platform timer is 14.318MHz HPET
> (XEN) Brought up 4 CPUs
> (XEN) acm_init: Loading default policy (NULL).
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN) Xen kernel: 64-bit, lsb, compat32
> (XEN) Dom0 kernel: 64-bit, lsb, paddr 0x200000 -> 0x5bc9b0
> (XEN) PHYSICAL MEMORY ARRANGEMENT:
> (XEN) Dom0 alloc.: 0000000214000000->0000000218000000 (2013303 pages to
> be allocated)
> (XEN) VIRTUAL MEMORY ARRANGEMENT:
> (XEN) Loaded kernel: ffffffff80200000->ffffffff805bc9b0
> (XEN) Init. ramdisk: ffffffff805bd000->ffffffff814cba00
> (XEN) Phys-Mach map: ffffffff814cc000->ffffffff824483b8
> (XEN) Start info: ffffffff82449000->ffffffff8244949c
> (XEN) Page tables: ffffffff8244a000->ffffffff82461000
> (XEN) Boot stack: ffffffff82461000->ffffffff82462000
> (XEN) TOTAL: ffffffff80000000->ffffffff82800000
> (XEN) ENTRY ADDRESS: ffffffff80200000
> (XEN) Dom0 has maximum 4 VCPUs
> (XEN) Initrd len 0xf0ea00, start at 0xffffffff805bd000
> (XEN) Scrubbing Free RAM: .done.
> (XEN) Xen trace buffers: disabled
> (XEN) Std. Loglevel: Errors and warnings
> (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
> (XEN) Xen is relinquishing VGA console.
> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
> Xen).
> (XEN) Freed 100kB init memory.
>
> because the system is half in production use, I cannot run a normal linux
> kernel (without the xen hypervisor running), but my dom0 kernel says:
>
> Linux version 2.6.18-tha2-xen-amd64 (root@s1vX) (gcc version 4.1.2 20061115
> (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Jan 15 04:39:44 UTC 2008
> BIOS-provided physical RAM map:
> Xen: 0000000000000000 - 00000001f0077000 (usable)
> DMI 2.5 present.
> On node 0 totalpages: 2031735
> DMA zone: 2031735 pages, LIFO batch:31
> ACPI: RSDP (v002 INTEL ) @
> 0x00000000000f03c0
> ACPI: XSDT (v001 INTEL S5000PSL 0x00000000 INTL 0x01000013) @
> 0x00000000dfb39120
> ACPI: FADT (v003 INTEL S5000PSL 0x00000000 INTL 0x01000013) @
> 0x00000000dfb37000
> ACPI: MADT (v001 INTEL S5000PSL 0x00000000 INTL 0x01000013) @
> 0x00000000dfb36000
> ACPI: SPCR (v001 INTEL S5000PSL 0x00000000 INTL 0x01000013) @
> 0x00000000dfb2f000
> ACPI: HPET (v001 INTEL S5000PSL 0x00000001 INTL 0x01000013) @
> 0x00000000dfb2e000
> ACPI: MCFG (v001 INTEL S5000PSL 0x00000001 INTL 0x01000013) @
> 0x00000000dfb2d000
> ACPI: OEM1 (v001 INTEL S5000PSL 0x00000001 INTL 0x01000013) @
> 0x00000000dfb2c000
> ACPI: SSDT (v002 INTEL EIST 0x00004000 INTL 0x01000013) @
> 0x00000000dfb2b000
> ACPI: SSDT (v002 INTEL IPMI 0x00004000 INTL 0x01000013) @
> 0x00000000dfb2a000
> ACPI: DSDT (v002 INTEL S5000PSL 0x00000001 INTL 0x01000013) @
> 0x0000000000000000
> ACPI: Local APIC address 0xfee00000
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
> ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x84] disabled)
> ACPI: LAPIC (acpi_id[0x05] lapic_id[0x85] disabled)
> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x86] disabled)
> ACPI: LAPIC (acpi_id[0x07] lapic_id[0x87] disabled)
> ACPI: LAPIC_NMI (acpi_id[0x00] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x01] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x02] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x04] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x05] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x06] high level lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x07] high level lint[0x1])
> ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
> ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
> IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: IRQ0 used by override.
> ACPI: IRQ2 used by override.
> ACPI: IRQ9 used by override.
> Setting APIC routing to xen
> Using ACPI (MADT) for SMP configuration information
> Allocating PCI resources starting at f1000000 (gap: f0000000:fe00000)
> Built 1 zonelists. Total pages: 2031735
> Kernel command line: root=/dev/sda1 ro console=tty0
> Initializing CPU#0
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> Xen reported: 1861.973 MHz processor.
> Console: colour VGA+ 80x25
> Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
> Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
> Software IO TLB enabled:
> Aperture: 64 megabytes
> Kernel range: ffff88000b059000 - ffff88000f059000
> Address size: 27 bits
> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> Memory: 7876732k/8126940k available (1999k kernel code, 241380k reserved,
> 890k data, 148k init)
> Calibrating delay using timer specific routine.. 3727.86 BogoMIPS
> (lpj=7455721)
> Security Framework v1.0.0 initialized
> SELinux: Disabled at boot.
> Capability LSM initialized
> Mount-cache hash table entries: 256
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 4096K
> CPU: Physical Processor ID: 0
> CPU: Processor Core ID: 0
> SMP alternatives: switching to UP code
> ACPI: Core revision 20060707
> SMP alternatives: switching to SMP code
> Initializing CPU#1
> Initializing CPU#2
> Brought up 4 CPUs
> Initializing CPU#3
> migration_cost=11114
> checking if image is initramfs... it is
> Freeing initrd memory: 15418k freed
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: Using MMCONFIG at e0000000
> ACPI: Interpreter enabled
> ACPI: Using IOAPIC for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (0000:00)
> PCI: Probing PCI hardware (bus 00)
> ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
> PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
> Boot video device is 0000:0e:0c.0
> PCI: Transparent bridge - 0000:00:1e.0
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PC32._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX0._PRT]
> ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11)
> ACPI: PCI Interrupt Link [LNKC] (IRQs *5 7 10 11)
> ACPI: PCI Interrupt Link [LNKD] (IRQs 5 7 10 *11)
> ACPI: PCI Interrupt Link [LNKE] (IRQs 5 7 *10 11)
> ACPI: PCI Interrupt Link [LNKF] (IRQs 5 7 10 *11)
> ACPI: PCI Interrupt Link [LNKG] (IRQs *5 7 10 11)
> ACPI: PCI Interrupt Link [LNKH] (IRQs 5 7 10 *11)
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIX._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIO._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIQ._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIF._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIG._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIH._PRT]
> Linux Plug and Play Support v0.97 (c) Adam Belay
> pnp: PnP ACPI init
> pnp: ACPI device : hid PNP0A03
> pnp: ACPI device : hid PNP0C02
> pnp: ACPI device : hid PNP0200
> pnp: ACPI device : hid PNP0B00
> pnp: ACPI device : hid PNP0C04
> pnp: ACPI device : hid PNP0800
> pnp: ACPI device : hid PNP0C02
> pnp: ACPI device : hid PNP0F03
> pnp: ACPI device : hid PNP0303
> pnp: ACPI device : hid PNP0501
> pnp: ACPI device : hid PNP0103
> pnp: ACPI device : hid PNP0003
> pnp: ACPI device : hid IPI0001
> pnp: PnP ACPI: found 13 devices
> xen_mem: Initialising balloon driver.
> usbcore: registered new driver usbfs
> usbcore: registered new driver hub
> [...]
>
> --Ralph
>
>>
>> -- Keir
>>
>> On 15/1/08 03:05, "Ralph Passgang" <xen@xxxxxxxxxxxxx> wrote:
>>> Hi Keir,
>>>
>>> thanks, but it seems that I have just bad news for you.
>>>
>>> now xen 3.1 (cs 15598) compiles on i386 and amd64 but at least on amd64
>>> the hypervisor hangs at boot. I haven't tested i386 (besides that it
>>> compiles) yet. The last known working changeset for the amd64
>>> hypervisor was 15590.
>>>
>>> The most important part of the errormessage from the hypervisor:
>>>
>>> Xen call trace:
>>> [<ffff828c8015f00c>] vmx_init_vmcs_config+0x1bc/0x1f0
>>> [<ffff828c80160430>] start_vmx+0x70/0x260
>>> [<ffff828c80146509>] identify_cpu+0xa8/0x200
>>> [<ffff828c801bbce6>] __start_xen+0x1ff6/0x24f0
>>> [<ffff828c801000b5>] __high_start+0xa1/0xa3
>>>
>>> Panic on CPU 0:
>>> Xen Bug at vmcs.c:159
>>>
>>> If more information is needed, let me know.
>>>
>>> Ralph
>>>
>>> Am Montag, 14. Januar 2008 18:19:24 schrieb Keir Fraser:
>>>> It was stuck in the staging tree, which I've now pushed manually.
>>>>
>>>> K.
>>>>
>>>> On 14/1/08 17:14, "Ralph Passgang" <xen@xxxxxxxxxxxxx> wrote:
>>>>> Hi Keir,
>>>>>
>>>>> xen-3.1-testing.hg still doesn't compile on i386 for the same reason
>>>>> as in my original report. I know that 3.2 is more important, but it
>>>>> would be nice if the 3.1 branch could get fixed.
>>>>>
>>>>> thx,
>>>>> Ralph
>>>>>
>>>>> Am Freitag, 11. Januar 2008 01:54:16 schrieben Sie:
>>>>>> We're missing xen-unstable:15526 from 3.1-testing. I'll add it
>>>>>> tomorrow.
>>>>>>
>>>>>> Thanks,
>>>>>> Keir
>>>>>>
>>>>>> On 11/1/08 00:40, "Ralph Passgang" <xen@xxxxxxxxxxxxx> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I just found that xen 3.1-testing changeset 15577 fails to build on
>>>>>>> i386 on debian sid/lenny/etch. It seems to compile just fine on
>>>>>>> amd64/etch.
>>>>>>>
>>>>>>> The following happens compiling the 32bit version:
>>>>>>> [...]
>>>>>>> gcc -O2 -fomit-frame-pointer -m32 -march=i686 -DNDEBUG
>>>>>>> -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes
>>>>>>> -Wno-unused-value
>>>>>>> -Wdeclaration-after-statement -nostdinc -fno-builtin -fno-common
>>>>>>> -iwithprefix include -Werror -Wno-pointer-arith -pipe
>>>>>>> -I/tmp/buildd/xen-3.1-3.1.3-0/debian/build/build-hypervisor_i386_i3
>>>>>>> 86 /x en/incl ude
>>>>>>> -I/tmp/buildd/xen-3.1-3.1.3-0/debian/build/build-hypervisor_i386_i3
>>>>>>> 86 /x en/incl ude/asm-x86/mach-generic
>>>>>>> -I/tmp/buildd/xen-3.1-3.1.3-0/debian/build/build-hypervisor_i386_i3
>>>>>>> 86 /x en/incl ude/asm-x86/mach-default -msoft-float
>>>>>>> -fno-stack-protector -g -D__XEN__ -DACM_SECURITY -c
>>>>>>> vmx.c -o vmx.o
>>>>>>> cc1: warnings being treated as errors
>>>>>>> vmx.c: In function 'vmx_install_vlapic_mapping':
>>>>>>> vmx.c:2694: warning: right shift count >= width of type
>>>>>>> vmx.c:2695: warning: right shift count >= width of type
>>>>>>> make[8]: *** [vmx.o] Error 1
>>>>>>> make[8]: Leaving directory
>>>>>>> `/tmp/buildd/xen-3.1-3.1.3-0/debian/build/build-hypervisor_i386_i38
>>>>>>> 6/ xe n/arch/ x86/hvm/vmx'
>>>>>>> make[7]: *** [vmx/built_in.o] Error 2
>>>>>>> [...]
>>>>>>>
>>>>>>> The last changeset that I tried and that compiled on i386 was
>>>>>>> 15564. So the last 13 changesets could be the cause. The following
>>>>>>> 4 changesets changed the vmx.c, so most likely one of them causes
>>>>>>> this (but I haven't looked any further):
>>>>>>>
>>>>>>> 15565, 15567, 15571, 15575
>>>>>>>
>>>>>>> Would be great if someone could take a look...
>>>>>>>
>>>>>>> Thx,
>>>>>>> Ralph
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>>>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|