Reporting a new bug that appeared during stress tests. The scenario is
the same as reported below, with patches applied:
On 04/14/2011 10:15 AM, Konrad Rzeszutek Wilk wrote:
On Wed, Apr 13, 2011 at 06:02:13PM -0300, Gerd Jakobovitsch wrote:
I'm trying to run several VMs (linux hvm, with tapdisk:aio disks at
a storage over nfs) on a CentOS system, using the up-to-date version
of xen 4.0 / kernel pvops 2.6.32.x stable. With a configuration
without (most of) debug activated, I can start several instances -
I'm running 7 of them - but shortly afterwards the system stops
responding. I can't find any information on this.
First time I see it.
Activating several debug configuration items, among them
DEBUG_PAGEALLOC, I get an exception as soon as I try to start up a
VM. The system reboots.
With the debug information still set, I'm running 42 VMs - mixed Linux
(several distros) and Windows, most of them running benchmarks for CPU
and disk usage. After roughly 15 hours, a bug message appeared at dmesg.
It affected xm commands - it seems to be related to a specific VM - but
xl commands still work. VMs are running.
# xm list
Error: (5, 'Input/output error, while reading
/local/domain/33/console/vnc-port')
Usage: xm list [options] [Domain, ...]
After killing the VM that reported error, xm commands are working again.
The BUG message at dmesg:
[66007.135552] BUG: unable to handle kernel paging request at
ffff8800004ca458
[66007.135567] IP: [<ffffffff8100d4ae>] xen_set_pte+0x3e/0x4b
[66007.135580] PGD 1002067 PUD 1006067 PMD 2d78067 PTE 100000004ca025
[66007.135675] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC
[66007.135686] last sysfs file:
/sys/class/net/virtbr/bridge/topology_change_detected
[66007.135693] CPU 4
[66007.135698] Modules linked in: arptable_filter arp_tables bridge stp
bonding bnx2i libiscsi scsi_transport_iscsi cnic uio bnx2 megaraid_sas
[66007.135729] Pid: 683, comm: pageattr-test Not tainted 2.6.32.36 #7
PowerEdge M610
[66007.135735] RIP: e030:[<ffffffff8100d4ae>] [<ffffffff8100d4ae>]
xen_set_pte+0x3e/0x4b
[66007.135746] RSP: e02b:ffff88007c8edbb0 EFLAGS: 00010202
[66007.135751] RAX: 0000000000e32cb6 RBX: 0000000000e32cb6 RCX:
0000000000000001
[66007.135757] RDX: 0000000000000000 RSI: 8010000800569267 RDI:
ffff8800004ca458
[66007.135764] RBP: ffff88007c8edbd0 R08: 0000000000000001 R09:
0000000000000000
[66007.135770] R10: ffffffff818385f8 R11: ffffffff818385e0 R12:
8010000800569267
[66007.135776] R13: ffff8800004ca458 R14: 8010000416569067 R15:
8010000800569267
[66007.135786] FS: 00007f0eeede66e0(0000) GS:ffff88002813f000(0000)
knlGS:0000000000000000
[66007.135792] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[66007.135797] CR2: ffff8800004ca458 CR3: 000000007b663000 CR4:
0000000000002660
[66007.135804] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[66007.135810] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[66007.135816] Process pageattr-test (pid: 683, threadinfo
ffff88007c8ec000, task ffff88007e4ce480)
[66007.135822] Stack:
[66007.135825] 0000000000000000 8010000004569067 0000000000004569
ffff88007c8edd20
[66007.135835] <0> ffff88007c8edbe0 ffffffff81034740 ffff88007c8edbf0
ffffffff8103474d
[66007.135848] <0> ffff88007c8edcf0 ffffffff81034e77 000000017c8edc40
ffffffff818385e0
[66007.135860] Call Trace:
[66007.135868] [<ffffffff81034740>] set_pte+0x17/0x1b
[66007.135875] [<ffffffff8103474d>] set_pte_atomic+0x9/0xb
[66007.135882] [<ffffffff81034e77>] __change_page_attr_set_clr+0x186/0x82d
[66007.135936] [<ffffffff8124f4a0>] ? _raw_spin_unlock+0xab/0xb1
[66007.135951] [<ffffffff8157641f>] ? _spin_unlock+0x26/0x2a
[66007.135961] [<ffffffff810e587d>] ? vm_unmap_aliases+0x151/0x160
[66007.135969] [<ffffffff81035695>] change_page_attr_set_clr+0x177/0x360
[66007.135976] [<ffffffff8103597a>] change_page_attr_set+0x27/0x29
[66007.135983] [<ffffffff810348e2>] ? pte_flags+0x9/0x18
[66007.135990] [<ffffffff81035c01>] do_pageattr_test+0x285/0x4b1
[66007.135998] [<ffffffff8103597c>] ? do_pageattr_test+0x0/0x4b1
[66007.136097] [<ffffffff8106a9c3>] kthread+0x69/0x71
[66007.136105] [<ffffffff81013daa>] child_rip+0xa/0x20
[66007.136112] [<ffffffff81012ee6>] ? int_ret_from_sys_call+0x7/0x1b
[66007.136119] [<ffffffff81013726>] ? retint_restore_args+0x5/0x6
[66007.136127] [<ffffffff81013da0>] ? child_rip+0x0/0x20
[66007.136131] Code: e8 3c ff ff ff ff 05 b6 5c 94 00 e8 31 ff ff ff 8b
1d b3 5c 94 00 e8 a2 23 02 00 ff c8 0f 94 c0 0f b6 c0 01 d8 89 05 9e 5c
94 00 <4d> 89 65 00 41 59 5b 41 5c 41 5d c9 c3 55 48 89 e5 53 89 fb 48
[66007.136273] RIP [<ffffffff8100d4ae>] xen_set_pte+0x3e/0x4b
[66007.136281] RSP <ffff88007c8edbb0>
[66007.136285] CR2: ffff8800004ca458
[66007.136574] ---[ end trace 4e200a271895cc90 ]---
Attached errors registered in xm dmesg and xend.log.
bug_paging_xend-log.txt
Description: Text document
bug_paging_xm_dmesg.txt
Description: Text document
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|