WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Debugging a XenU that goes to Zombie state

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Debugging a XenU that goes to Zombie state
From: Russell McOrmond <russell@xxxxxxxx>
Date: Mon, 24 Jul 2006 11:24:01 -0400
Delivery-date: Mon, 24 Jul 2006 08:24:51 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.13) Gecko/20060501 Fedora/1.7.13-1.1.fc4

Once last night and the night before a XenU go into a Zombie state, requiring a reboot. I'm not quite sure what is happening, and am looking for advise on how to diagnose the problem.

.. While writing this email, it crashed again. This time I had an 'xm console calcutta' capturing the output.

I read a suggestion in these archives suggesting that I can just restart xend to get things working again, but trying that gives me:

Going to boot Fedora Core (2.6.17-1.2145_FC5xenU)
  kernel: /boot/vmlinuz-2.6.17-1.2145_FC5xenU
  initrd: /boot/initrd-2.6.17-1.2145_FC5xenU.img
Error: Device 0 (vif) could not be connected. Hotplug scripts not working.

So I had to reboot everything.


Here is what I captured from the 'xm console':


BUG: unable to handle kernel NULL pointer dereference at virtual address 0000009a
^M printing eip:
^Me10fd1ad
^M*pde = ma 08f98067 pa 17077067
^M*pte = ma 00000000 pa fffff000
^MOops: 0002 [#1]
^MSMP
^MModules linked in: ipv6 xennet ipt_REJECT xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables dm_mirror dm_mod
^MCPU:    0
^MEIP:    0061:[<e10fd1ad>]    Not tainted VLI
^MEFLAGS: 00010046   (2.6.17-1.2157_FC5xenU #1)
^MEIP is at network_tx_buf_gc+0xc4/0x1b7 [xennet]
^Meax: 00000011   ebx: 0000000c   ecx: d9fc8cfc   edx: 00000000
^Mesi: 00000001   edi: d9fc8400   ebp: 0000000a   esp: c0651d90
^Mds: 007b   es: 007b   ss: 0069
^MProcess swapper (pid: 0, threadinfo=c0650000 task=c05f1800)
^MStack: <0>d9fc8cfc 00000000 00000000 00000004 d9fc8000 0000f002 0000f003 0000effc ^M 00000000 d9fc8488 d9fc8400 d9fc8000 e10fe150 dba603e0 00000000 00000000 ^M 00000108 c043a57d 00000108 d9fc8000 c0651e3c c0651e3c 00000108 c0643800
^MCall Trace:
^M <e10fe150> netif_int+0x24/0x66 [xennet] <c043a57d> handle_IRQ_event+0x42/0x85
^M <c043a64d> __do_IRQ+0x8d/0xdc  <c040665a> do_IRQ+0x1a/0x25
^M <c0519efd> evtchn_do_upcall+0x66/0x9f <c0404d79> hypervisor_callback+0x3d/0x48 ^M <e10fd9ca> network_alloc_rx_buffers+0x2c3/0x30b [xennet] <e10fe9ac> netif_poll+0x639/0x784 [xennet]
^M <c055a3c5> net_rx_action+0xcd/0x1fe  <c041d5bb> __do_softirq+0x70/0xef
^M <c041d67a> do_softirq+0x40/0x67  <c040665f> do_IRQ+0x1f/0x25
^M <c0519efd> evtchn_do_upcall+0x66/0x9f <c0404d79> hypervisor_callback+0x3d/0x48
^M <c0407a6a> safe_halt+0x84/0xa7  <c0402bde> xen_idle+0x46/0x4e
^M <c0402cfd> cpu_idle+0x94/0xad  <c0655772> start_kernel+0x346/0x34c
^MCode: b4 9f 00 09 00 00 50 e8 9d d5 41 df c7 84 9f 00 09 00 00 00 00 00 00 8b 87 f4 00 00 00 89 84 9f f4 00 00 00 89 9f f4 00 00 00 90 <ff> 8d 90 00 00 00 0f
94 c0 83 c4 10 84 c0 74 62 bb 00 e0 ff ff
^MEIP: [<e10fd1ad>] network_tx_buf_gc+0xc4/0x1b7 [xennet] SS:ESP 0069:c0651d90
^M <0>Kernel panic - not syncing: Fatal exception in interrupt
^M ESC_root@westbengal:~ESC\[root@westbengal ~]#



---cut--- Notes from before most recent crash to give machine context.


For various reasons this is the only XenU currently running on the machine, so I don't currently know if other Xen's would have died if they were on the same machine.

I'm running Fedora Core 5 on both the Xen0 and the XenU, on a dual-core Athlon box with 2G RAM.

powernow-k8: Found 2 AMD Athlon 64 / Opteron processors (version 1.60.2)



I created this XenU a few weeks ago by tar'ing up a server that wasn't running Xen, and decompressing it on some LVM partitions. It ran fine for a while, and I expect that it is something that I upgraded (YUM) that went bad, but wanted to ask if anyone else has seen anything unusual before trying to back out of all recent changes to find out what happened.


Very boring config for XenU:

# FLORA.org server
name = "calcutta"
memory = "512"

# was 'phy:hdb,hda,w',
disk = [ 'phy:mapper/XenImages-CalcuttaSlash,hda1,w',
         'phy:mapper/XenImages-CalcuttaHome,hda2,w',
         'phy:mapper/XenSwap-CalcuttaSWP,hda3,w' ]
vif = [ 'mac=00:16:3e:5c:76:da' ]
bootloader="/usr/bin/pygrub"

on_reboot   = 'restart'
on_crash    = 'restart'



I don't know what all RedHat has patched to 2.6.17, but the following are the relevant RedHat versions: 2.6.17-1.2145_FC5xenU and 2.6.17-1.2157_FC5xenU . I don't think the problem is with the kernel.

July 11 is when I switched to Xen and it worked until yesterday morning when I had to force a reboot. I upgraded various packages (to latest versions via Yum) on July 11, 17, 18, and after the first crash on July 23. (Yum nicely outputs what packages is updates/installs into the logs).

The following are the last 3 restarts (reading from /var/log/messages* , which agrees with my memory of things.)

Jul 17 16:55:10 calcutta kernel: Linux version 2.6.17-1.2145_FC5xenU (brewbuilder@xxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #1 SMP Sat Jul 1 13:54:07 EDT 2006

Note: YUM updates for July 17 and 18'th were after this reboot, which is why I'm fairly sure the problem isn't with the kernel.

Then a reboot after a Zombie:

Jul 23 09:07:55 calcutta kernel: Linux version 2.6.17-1.2157_FC5xenU (brewbuilder@xxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #1 SMP Wed Jul 12 00:46:43 EDT 2006

I then did a yum update 'just in case' something had been fixed.

And another reboot this morning:

Jul 24 09:28:07 calcutta kernel: Linux version 2.6.17-1.2157_FC5xenU (brewbuilder@xxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #1 SMP Wed Jul 12 00:46:43 EDT 2006


My top suspects as things which might touch the system in a way that could cause things to crash the kernel:

Jul 17 18:05:17 calcutta yum: Updated: glibc-common.i386 2.4-8
Jul 17 18:05:25 calcutta yum: Updated: glibc.i386 2.4-8
Jul 17 18:05:26 calcutta yum: Updated: glibc-headers.i386 2.4-8
Jul 17 18:05:26 calcutta yum: Updated: glibc-devel.i386 2.4-8
Jul 17 18:05:27 calcutta yum: Updated: glibc-utils.i386 2.4-8
Jul 17 18:14:04 calcutta yum: Updated: procps.i386 3.2.6-3.5
Jul 17 18:14:05 calcutta yum: Updated: psmisc.i386 22.2-1.1

Jul 18 11:26:24 calcutta yum: Updated: libsepol.i386 1.12.17-1.fc5
Jul 18 11:26:24 calcutta yum: Updated: libselinux.i386 1.30.3-4.fc5
Jul 18 11:26:24 calcutta yum: Updated: libselinux-python.i386 1.30.3-4.fc5


--
 Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
 Please help us tell the Canadian Parliament to protect our property
 rights as owners of Information Technology. Sign the petition!
 http://www.digital-copyright.ca/petition/ict/

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] Debugging a XenU that goes to Zombie state, Russell McOrmond <=