WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] kernel crash in xen domain

To: Csillag Kristof <csillag.kristof@xxxxxxxxx>
Subject: Re: [Xen-users] kernel crash in xen domain
From: Bruce Edge <bruce.edge@xxxxxxxxx>
Date: Wed, 22 Sep 2010 10:37:10 -0700
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 22 Sep 2010 10:38:41 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=UM1YU51kEQe0t5uGWFfuM1mVMAD88xXsj9BG1Dz+c8I=; b=nZ+8PFmzJ8GnGun88Tim4uRCxYZ4a9exiRfAUVvvMbOmtmcddaRYLDvm9k3FmLxQ1V g0WDq4qflQEg80UOdhpSXY1s23nSetDCFNPB9VoHIsnDvxvdqOxbZTlidhfx0OzciLl7 V09KxCxUwEafLW3HD48J8a7OMHnf1DtAIi0hE=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=lUvQZ8H9477IcFsS2u+2LPhhpSIHxbVDYHaaAh7zMrkfB2sUQ8A4JgnGntGEGSv/Wn ov5f5XESdh/VwOHUOMpNzGugtuur0ARlOZBkNHwDqRCFvaZcQf1Q1axkRF2phyDWDO7Q msIgz5FbYtdV2hg6e34/W2TdCnqrK+6G2FOAA=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C32F5E4.80508@xxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <4C32F5E4.80508@xxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
2010/7/6 Csillag Kristof <csillag.kristof@xxxxxxxxx>
Hi all,

I had a kernel crash on a XEN domU right now, on a server running since
4 days.

(Current uprecord is 248 days, so this is not excepted, but then again,
that was before
I have upgraded from (XEN 3.2 / kernel 2.6.26) to (XEN 4.0 / kernel 2.6.32))

  * * *

I run Xen hypervisor version 4.0.0 (Debian 4.0.0-2), and
linux kernel 2.6.32-5-xen-amd64 (debian: 2.6.32-15) on both the Dom0 and
the (PV) DomU.

(The DomU is running a XEN kernel because I have a PCI NIC passed to it,
and current debian 2.4.32 pv_ops kernel does not contain the required
pcifront driver.)

Here is what the DomU kernel has said, copied from the output of "xm
console":

------------------

[403163.914167] ------------[ cut here ]------------
[403163.914186] kernel BUG at
/build/buildd-linux-2.6_2.6.32-15-i386-fb7Hfg/linux-2.6-2.6.32/debian/build/source_i386_xen/mm/slub.c:2969!
[403163.914205] invalid opcode: 0000 [#1] SMP
[403163.914222] last sysfs file: /sys/devices/virtual/net/ppp0/uevent
[403163.914236] Modules linked in: tun xt_limit nf_nat_irc nf_nat_ftp
ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc
nf_conntrack_ftp xt_state xt_TCPMSS xt_tcpmss xt_tcpudp pppoe pppox
ppp_generic slhc sundance iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables
x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev
snd_pcsp snd_pcm snd_timer snd xen_netfront soundcore snd_page_alloc
ext3 jbd mbcache thermal_sys xen_blkfront mii [last unloaded: sundance]
[403163.914455]
[403163.914465] Pid: 0, comm: swapper Not tainted (2.6.32-5-xen-686 #1)
[403163.914478] EIP: 0061:[<c10b73ec>] EFLAGS: 00010246 CPU: 0
[403163.914492] EIP is at kfree+0x69/0xde
[403163.914502] EAX: 40000000 EBX: c1c56a80 ECX: c145942c EDX: c1575c40
[403163.914514] ESI: c2262000 EDI: c11f09eb EBP: c138f9c8 ESP: c1381ed4
[403163.914527]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[403163.914539] Process swapper (pid: 0, ti=c1380000 task=c13c0ba0
task.ti=c1380000)
[403163.914552] Stack:
[403163.914560]  c1575c40 c8829f1d c13bf520 c13bf520 c1c56a80 c163a654
00000000 c138f9c8
[403163.914597] <0> c11f09eb 00000000 c11f5825 c13bf520 c1380000
00000002 00000008 c138f9c8
[403163.914636] <0> c103d004 c1457408 00000001 0000000a 00000000
00000100 c1380000 00000000
[403163.914680] Call Trace:
[403163.914700]  [<c8829f1d>] ? xennet_interrupt+0x4d/0x57 [xen_netfront]
[403163.914717]  [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.914732]  [<c11f5825>] ? net_tx_action+0x58/0xf9
[403163.914748]  [<c103d004>] ? __do_softirq+0xaa/0x151
[403163.914762]  [<c103d0dc>] ? do_softirq+0x31/0x3c
[403163.914776]  [<c103d1b2>] ? irq_exit+0x26/0x58
[403163.914791]  [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
[403163.914816]  [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[403163.914832]  [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[403163.914847]  [<c1006169>] ? xen_safe_halt+0xf/0x1b
[403163.914861]  [<c10042bf>] ? xen_idle+0x23/0x30
[403163.914875]  [<c1008168>] ? cpu_idle+0x89/0xa5
[403163.914890]  [<c13f980d>] ? start_kernel+0x318/0x31d
[403163.914905]  [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
[403163.914920]  [<c1409045>] ? efi_init+0xb4/0x580
[403163.914930] Code: 86 00 00 00 40 c1 e8 0c c1 e0 05 01 d0 89 04 24 66
83 38 00 79 06 8b 40 0c 89 04 24 8b 14 24 8b 02 84 c0 78 19 66 a9 00 c0
75 04 <0f> 0b eb fe 8b 04 24 83 c4 10 5b 5e 5f 5d e9 c7 e9 fd ff 8b 04
[403163.915175] EIP: [<c10b73ec>] kfree+0x69/0xde SS:ESP 0069:c1381ed4
[403163.915201] ---[ end trace c5944bb691c7520c ]---
[403163.915212] Kernel panic - not syncing: Fatal exception in interrupt
[403163.915225] Pid: 0, comm: swapper Tainted: G      D
2.6.32-5-xen-686 #1
[403163.915237] Call Trace:
[403163.915247]  [<c128c4e1>] ? panic+0x38/0xe4
[403163.915261]  [<c100bf56>] ? oops_end+0x91/0x9d
[403163.915275]  [<c100a0d3>] ? do_invalid_op+0x0/0x75
[403163.915288]  [<c100a13f>] ? do_invalid_op+0x6c/0x75
[403163.915301]  [<c10b73ec>] ? kfree+0x69/0xde
[403163.915315]  [<c12075a5>] ? sch_direct_xmit+0x69/0x10c
[403163.915329]  [<c11f8095>] ? dev_queue_xmit+0x260/0x38e
[403163.915343]  [<c103d1fa>] ? _local_bh_enable_ip+0x16/0x6e
[403163.915357]  [<c11f8191>] ? dev_queue_xmit+0x35c/0x38e
[403163.915371]  [<c1021d2e>] ? pvclock_clocksource_read+0x48/0xa7
[403163.915387]  [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[403163.915401]  [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[403163.915416]  [<c128e1d3>] ? error_code+0x73/0x78
[403163.915429]  [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.915442]  [<c10b73ec>] ? kfree+0x69/0xde
[403163.915461]  [<c8829f1d>] ? xennet_interrupt+0x4d/0x57 [xen_netfront]
[403163.915476]  [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.915489]  [<c11f5825>] ? net_tx_action+0x58/0xf9
[403163.915503]  [<c103d004>] ? __do_softirq+0xaa/0x151
[403163.915517]  [<c103d0dc>] ? do_softirq+0x31/0x3c
[403163.915530]  [<c103d1b2>] ? irq_exit+0x26/0x58
[403163.915543]  [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
[403163.915556]  [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[403163.915570]  [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[403163.915584]  [<c1006169>] ? xen_safe_halt+0xf/0x1b
[403163.915597]  [<c10042bf>] ? xen_idle+0x23/0x30
[403163.915609]  [<c1008168>] ? cpu_idle+0x89/0xa5
[403163.915623]  [<c13f980d>] ? start_kernel+0x318/0x31d
[403163.915637]  [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
[403163.915650]  [<c1409045>] ? efi_init+0xb4/0x580

------------------

Meanwhile, the Dom0 kernel has said this:

-----------------
[407187.550176] irq 17: nobody cared (try booting with the "irqpoll" option)
[407187.550217] Pid: 1940, comm: xend Tainted: G        W
2.6.32-5-xen-amd64 #1
[407187.550253] Call Trace:
[407187.550279]  <IRQ>  [<ffffffff810972dd>] ? __report_bad_irq+0x30/0x7d
[407187.550324]  [<ffffffff8109742f>] ? note_interrupt+0x105/0x16e
[407187.550359]  [<ffffffff81097b36>] ? handle_level_irq+0x80/0xc3
[407187.550394]  [<ffffffff811f1a58>] ? __xen_evtchn_do_upcall+0xe1/0x167
[407187.550430]  [<ffffffff811f22e5>] ? xen_evtchn_do_upcall+0x2e/0x42
[407187.550430]  [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30
[407187.550430]  <EOI>
[407187.550430] handlers:
[407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192 [ata_piix])
[407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192 [ata_piix])
[407187.550430] Disabling IRQ #17

-----------------

The mentioned IRQ #17 belongs the the passed-through PCI nic.
(I am not using IOMMU, since my MB does not support it.)

I have rebooted the Dom0 (using xm reset), but the passed through NIC
never worked again,
so eventually I had to reboot the whole physical machine.

  * * *

Any idea what could cause this?

Thank you for your help:

   Kristof Csillag


At the risk of just claiming "me too", I would like to second this report of many such errors:

irq 124: nobody cared (try booting with the "irqpoll" option)

It appears that any card that generates a high level of MSI interrupts causes this message. In our case it's a tachyon FC card.

It seems specific to pv-ops kernels as we did not have this problem with the hvm kernel.
 
-Bruce


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Xen-users] kernel crash in xen domain, Bruce Edge <=