| 
         
xen-users
Re: [Xen-users] xen 3.2.1 / 2.6.18.8-xen	dom0	with	pci_bus_probe_wrapper
 
Scott Garron wrote:
 
Zoltan HERPAI wrote:
> I'm running Ubuntu 8.04.1 on an Asus M2N-E mainboard, latest BIOS,
> 64-bit userland
      I've also wrestled with this issue for some 36 hours or so.  I'm 
running Debian testing (lenny/sid) on a Supermicro X7DBE+ motherboard 
(Intel 5000P chipset).  It currently has a single CPU, Quad-core Xeon 
E5345 (2.33GHz), 4GB RAM
      64-bit Userland consists of gcc-4.3.1-2_amd64  (x86_64-linux-gnu 
target, posix thread model) and libc6-2.7-10_amd64
      In my case, the machine gets partway through the init process, 
and while starting a few of the more involved network services, such 
as bind9 or apache2, the kernel panics and the machine halts (crash).
      While attempting to figure out why it was doing that, I tried 
reverting back to the previous version that I had been running.  Just 
running ./install.sh from dist in that tree was enough to get the 
machine to boot with a xen-enabled kernel, but because I had done an 
aptitude dist-upgrade, none of the Xen utilities were working (xend 
start, xm list, etc).  I cloned the older build tree and did a 
re-compile with the latest versions of the python and libc dev 
libraries.  That yielded a similar result as the Xen 3.2.1 compile: 
During boot, the kernel would complain about the pci probe and then in 
the middle of the init process, it would crash.
      The only way I got the machine back to a working order was to 
install the version of the kernel (2.6.18-xen) and Xen (3.0, changeset 
15521) that I had compiled with earlier gcc and libraries (back in 
July, 2007), and manually cherry pick the install from the 
dist/install/usr/lib64/python/xen directory on the freshly compiled 
copy of that same build tree.  It's running again, but my net result 
was just a dist-upgrade.  I'm not running a newer kernel or Xen, which 
is what I had set out to do in the first place.
      Anyway, the point I'm trying to make is that because a fresh 
compile of my old build tree, a build tree that previously worked, 
yields the same crash result, it seems to be somehow related to the 
version of gcc or development libraries with which I used to compile it.
     The two "Oops"'s I get are:
BUG: warning at 
/usr/src/linux-2.6.18-xen.hg/drivers/xen/core/pci.c:28/pci_bus_probe_wrapper() 
 
[...]
 
--- and:
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200
PGD 19a2d067 PUD 19a2e067 PMD 0
Oops: 0000 [1] SMP
CPU 0
 Modules linked in: video button ac battery ppp_deflate zlib_deflate 
bsd_comp ppp_async crc_ccitt ppp_generic slhc ipt_REDIRECT xt_tcpudp 
xt_multiport iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter 
ip_tables x_tables ipv6 reiserfs nls_iso8859_1 nls_cp437 vfat fat 
serio_raw i2c_i801 intel_rng pcspkr i2c_core tsdev ext3 jbd dm_mirror 
dm_snapshot dm_mod sd_mod usb_storage sg sr_mod cdrom usbhid 3w_9xxx 
3c59x e1000 mii floppy ehci_hcd ata_piix libata scsi_mod uhci_hcd 
usbcore thermal processor fan
Pid: 2964, comm: named Not tainted 2.6.18.8-xen #1
 RIP: e030:[<ffffffff88214114>]  [<ffffffff88214114>] 
:ipv6:udp_v6_get_port+0x81/0x200
RSP: e02b:ffff880019a85e38  EFLAGS: 00010297
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000008000
RDX: 0000000000000000 RSI: 0000000000008000 RDI: 0000000000008000
RBP: 000000000000001c R08: 000000000000ee48 R09: 000000000000807f
R10: 0000000000000008 R11: 0000000000000246 R12: ffff88001b71c3c0
R13: ffff880019a85ec8 R14: 000000000000001c R15: 0000000000000000
 FS:  00002b17d2a5f6e0(0063) GS:ffffffff804d9000(0000) 
knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
 Process named (pid: 2964, threadinfo ffff880019a84000, task 
ffff88001f4c1100)
Stack:  0000000000000000 000000000000001c ffff88001b71c3c0 
ffffffff88201a64
 0000000000000004 ffffffff80397979 ffff88001b71c3c0 ffff880019a85ed0
 0000000000000000 ffff88001b71c698 0000000019a85f54 ffff880019341400
Call Trace:
 [<ffffffff88201a64>] :ipv6:inet6_bind+0x1e6/0x2a6
 [<ffffffff80397979>] sock_getsockopt+0x2d8/0x2fa
 [<ffffffff8039554b>] sys_bind+0x76/0xa6
 [<ffffffff88211256>] :ipv6:ipv6_setsockopt+0x3a/0x84
 [<ffffffff80394ad7>] sys_setsockopt+0xa5/0xb7
 [<ffffffff8020a644>] system_call+0x68/0x6d
 [<ffffffff8020a5dc>] system_call+0x0/0x6d
Code: 48 8b 12 0f 18 0a ff c0 3d fe 7f 00 00 7e f1 48 ff c7 44 39
RIP  [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200
 RSP <ffff880019a85e38>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
  Thanks for the detailed infos. So it seems we've ran into a reproducible 
bug, even if I'm luckier to have at least the dom0 working - I was able 
to get guests running, both paravirt and HVM, stresstested them a bit, 
they were running fine. During your session, were you playing around 
with BIOS version, or were you experiencing this on another similar box 
if you have one?
 What could be the solution if I want to stay with 3.2.1? Running forward 
to 3.2.2 doesn't seem to be a likely option.
Regards,
Zoltan HERPAI
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
 
 |   
 
 | 
    |