Re: [Xen-users] kernel crash in xen domain

To:	Csillag Kristof <csillag.kristof@xxxxxxxxx>
Subject:	Re: [Xen-users] kernel crash in xen domain
From:	Bruce Edge <bruce.edge@xxxxxxxxx>
Date:	Wed, 22 Sep 2010 10:37:10 -0700
Cc:	xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Wed, 22 Sep 2010 10:38:41 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=UM1YU51kEQe0t5uGWFfuM1mVMAD88xXsj9BG1Dz+c8I=; b=nZ+8PFmzJ8GnGun88Tim4uRCxYZ4a9exiRfAUVvvMbOmtmcddaRYLDvm9k3FmLxQ1V g0WDq4qflQEg80UOdhpSXY1s23nSetDCFNPB9VoHIsnDvxvdqOxbZTlidhfx0OzciLl7 V09KxCxUwEafLW3HD48J8a7OMHnf1DtAIi0hE=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=lUvQZ8H9477IcFsS2u+2LPhhpSIHxbVDYHaaAh7zMrkfB2sUQ8A4JgnGntGEGSv/Wn ov5f5XESdh/VwOHUOMpNzGugtuur0ARlOZBkNHwDqRCFvaZcQf1Q1axkRF2phyDWDO7Q msIgz5FbYtdV2hg6e34/W2TdCnqrK+6G2FOAA=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<4C32F5E4.80508@xxxxxxxxx>
List-help:	<mailto:xen-users-request@lists.xensource.com?subject=help>
List-id:	Xen user discussion <xen-users.lists.xensource.com>
List-post:	<mailto:xen-users@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References:	<4C32F5E4.80508@xxxxxxxxx>
Sender:	xen-users-bounces@xxxxxxxxxxxxxxxxxxx

2010/7/6 Csillag Kristof <csillag.kristof@xxxxxxxxx>

Hi all,

I had a kernel crash on a XEN domU right now, on a server running since
4 days.

(Current uprecord is 248 days, so this is not excepted, but then again,
that was before
I have upgraded from (XEN 3.2 / kernel 2.6.26) to (XEN 4.0 / kernel 2.6.32))

* * *

I run Xen hypervisor version 4.0.0 (Debian 4.0.0-2), and
linux kernel 2.6.32-5-xen-amd64 (debian: 2.6.32-15) on both the Dom0 and
the (PV) DomU.

(The DomU is running a XEN kernel because I have a PCI NIC passed to it,
and current debian 2.4.32 pv_ops kernel does not contain the required
pcifront driver.)

Here is what the DomU kernel has said, copied from the output of "xm
console":

------------------

[403163.914167] ------------[ cut here ]------------
[403163.914186] kernel BUG at
/build/buildd-linux-2.6_2.6.32-15-i386-fb7Hfg/linux-2.6-2.6.32/debian/build/source_i386_xen/mm/slub.c:2969!
[403163.914205] invalid opcode: 0000 [#1] SMP
[403163.914222] last sysfs file: /sys/devices/virtual/net/ppp0/uevent
[403163.914236] Modules linked in: tun xt_limit nf_nat_irc nf_nat_ftp
ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc
nf_conntrack_ftp xt_state xt_TCPMSS xt_tcpmss xt_tcpudp pppoe pppox
ppp_generic slhc sundance iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables
x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev
snd_pcsp snd_pcm snd_timer snd xen_netfront soundcore snd_page_alloc
ext3 jbd mbcache thermal_sys xen_blkfront mii [last unloaded: sundance]
[403163.914455]
[403163.914465] Pid: 0, comm: swapper Not tainted (2.6.32-5-xen-686 #1)
[403163.914478] EIP: 0061:[<c10b73ec>] EFLAGS: 00010246 CPU: 0
[403163.914492] EIP is at kfree+0x69/0xde
[403163.914502] EAX: 40000000 EBX: c1c56a80 ECX: c145942c EDX: c1575c40
[403163.914514] ESI: c2262000 EDI: c11f09eb EBP: c138f9c8 ESP: c1381ed4
[403163.914527] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[403163.914539] Process swapper (pid: 0, ti=c1380000 task=c13c0ba0
task.ti=c1380000)
[403163.914552] Stack:
[403163.914560] c1575c40 c8829f1d c13bf520 c13bf520 c1c56a80 c163a654
00000000 c138f9c8
[403163.914597] <0> c11f09eb 00000000 c11f5825 c13bf520 c1380000
00000002 00000008 c138f9c8
[403163.914636] <0> c103d004 c1457408 00000001 0000000a 00000000
00000100 c1380000 00000000
[403163.914680] Call Trace:
[403163.914700] [<c8829f1d>] ? xennet_interrupt+0x4d/0x57 [xen_netfront]
[403163.914717] [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.914732] [<c11f5825>] ? net_tx_action+0x58/0xf9
[403163.914748] [<c103d004>] ? __do_softirq+0xaa/0x151
[403163.914762] [<c103d0dc>] ? do_softirq+0x31/0x3c
[403163.914776] [<c103d1b2>] ? irq_exit+0x26/0x58
[403163.914791] [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
[403163.914816] [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[403163.914832] [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[403163.914847] [<c1006169>] ? xen_safe_halt+0xf/0x1b
[403163.914861] [<c10042bf>] ? xen_idle+0x23/0x30
[403163.914875] [<c1008168>] ? cpu_idle+0x89/0xa5
[403163.914890] [<c13f980d>] ? start_kernel+0x318/0x31d
[403163.914905] [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
[403163.914920] [<c1409045>] ? efi_init+0xb4/0x580
[403163.914930] Code: 86 00 00 00 40 c1 e8 0c c1 e0 05 01 d0 89 04 24 66
83 38 00 79 06 8b 40 0c 89 04 24 8b 14 24 8b 02 84 c0 78 19 66 a9 00 c0
75 04 <0f> 0b eb fe 8b 04 24 83 c4 10 5b 5e 5f 5d e9 c7 e9 fd ff 8b 04
[403163.915175] EIP: [<c10b73ec>] kfree+0x69/0xde SS:ESP 0069:c1381ed4
[403163.915201] ---[ end trace c5944bb691c7520c ]---
[403163.915212] Kernel panic - not syncing: Fatal exception in interrupt
[403163.915225] Pid: 0, comm: swapper Tainted: G D
2.6.32-5-xen-686 #1
[403163.915237] Call Trace:
[403163.915247] [<c128c4e1>] ? panic+0x38/0xe4
[403163.915261] [<c100bf56>] ? oops_end+0x91/0x9d
[403163.915275] [<c100a0d3>] ? do_invalid_op+0x0/0x75
[403163.915288] [<c100a13f>] ? do_invalid_op+0x6c/0x75
[403163.915301] [<c10b73ec>] ? kfree+0x69/0xde
[403163.915315] [<c12075a5>] ? sch_direct_xmit+0x69/0x10c
[403163.915329] [<c11f8095>] ? dev_queue_xmit+0x260/0x38e
[403163.915343] [<c103d1fa>] ? _local_bh_enable_ip+0x16/0x6e
[403163.915357] [<c11f8191>] ? dev_queue_xmit+0x35c/0x38e
[403163.915371] [<c1021d2e>] ? pvclock_clocksource_read+0x48/0xa7
[403163.915387] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[403163.915401] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[403163.915416] [<c128e1d3>] ? error_code+0x73/0x78
[403163.915429] [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.915442] [<c10b73ec>] ? kfree+0x69/0xde
[403163.915461] [<c8829f1d>] ? xennet_interrupt+0x4d/0x57 [xen_netfront]
[403163.915476] [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.915489] [<c11f5825>] ? net_tx_action+0x58/0xf9
[403163.915503] [<c103d004>] ? __do_softirq+0xaa/0x151
[403163.915517] [<c103d0dc>] ? do_softirq+0x31/0x3c
[403163.915530] [<c103d1b2>] ? irq_exit+0x26/0x58
[403163.915543] [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
[403163.915556] [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[403163.915570] [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[403163.915584] [<c1006169>] ? xen_safe_halt+0xf/0x1b
[403163.915597] [<c10042bf>] ? xen_idle+0x23/0x30
[403163.915609] [<c1008168>] ? cpu_idle+0x89/0xa5
[403163.915623] [<c13f980d>] ? start_kernel+0x318/0x31d
[403163.915637] [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
[403163.915650] [<c1409045>] ? efi_init+0xb4/0x580

------------------

Meanwhile, the Dom0 kernel has said this:

-----------------
[407187.550176] irq 17: nobody cared (try booting with the "irqpoll" option)
[407187.550217] Pid: 1940, comm: xend Tainted: G W
2.6.32-5-xen-amd64 #1
[407187.550253] Call Trace:
[407187.550279] <IRQ> [<ffffffff810972dd>] ? __report_bad_irq+0x30/0x7d
[407187.550324] [<ffffffff8109742f>] ? note_interrupt+0x105/0x16e
[407187.550359] [<ffffffff81097b36>] ? handle_level_irq+0x80/0xc3
[407187.550394] [<ffffffff811f1a58>] ? __xen_evtchn_do_upcall+0xe1/0x167
[407187.550430] [<ffffffff811f22e5>] ? xen_evtchn_do_upcall+0x2e/0x42
[407187.550430] [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30
[407187.550430] <EOI>
[407187.550430] handlers:
[407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192 [ata_piix])
[407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192 [ata_piix])
[407187.550430] Disabling IRQ #17

-----------------

The mentioned IRQ #17 belongs the the passed-through PCI nic.
(I am not using IOMMU, since my MB does not support it.)

I have rebooted the Dom0 (using xm reset), but the passed through NIC
never worked again,
so eventually I had to reboot the whole physical machine.

* * *

Any idea what could cause this?

Thank you for your help:

Kristof Csillag

At the risk of just claiming "me too", I would like to second this report of many such errors:

irq 124: nobody cared (try booting with the "irqpoll" option)

It appears that any card that generates a high level of MSI interrupts causes this message. In our case it's a tachyon FC card.

It seems specific to pv-ops kernels as we did not have this problem with the hvm kernel.

-Bruce

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

Re: [Xen-users] kernel crash in xen domain