|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] I/O related (?) domU crash on xen 4.0 + pv_ops
Hey,
I'm running into an issue with pv_ops kernel (2.6.31.13) and xen 4.0 -
my domU's are continually locking up, under heavy IO load. My issue
seems similar to what these guys are reportiing:
https://bugzilla.redhat.com/show_bug.cgi?id=551552
https://bugzilla.redhat.com/show_bug.cgi?id=526627
https://bugzilla.redhat.com/show_bug.cgi?id=550724
Any solutions/ideas that haven't been
covered in those reports? I've turned off tickless kernel, and have
also set the guest I/O scheduler to "noop", but machines are still
crashing. I'm using LVM-backed block devices on the dom0. For awhile, I
didn't have the kernel set to panic on a hung task, and from digging
around while the kernel was still up, I was able to determine that the
device that seems to be causing issues is one that sees a lot of IO
(it's receiving all of the logs on a heavily used web server).
Here's a couple of my tracebacks:
1)
INFO: task pdflush:36 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ffff8801e963f9c0 0000000000000282 00000000e4f10f56 ffff8801e963f9d0
ffff8801eb7a31b0 ffff8801eb67c240 ffff8801eb7a3590 0000000103902b94
00000000e4f10f56 ffff8801e963fa70 ffff8801e963f9b0 ffffffff811f14ec
Call Trace:
[<ffffffff811f14ec>] ? blk_unplug+0x56/0x72
[<ffffffff813f1ee0>] io_schedule+0x37/0x59
[<ffffffff8112b1a8>] ? block_sync_page+0x5b/0x71
[<ffffffff810c2e77>] sync_page+0x5a/0x72
[<ffffffff813f2237>] __wait_on_bit_lock+0x55/0xb3
[<ffffffff810c2e1d>] ? sync_page+0x0/0x72
[<ffffffff810c2b0d>] ? find_get_pages_tag+0xf7/0x144
[<ffffffff810c2dce>] __lock_page+0x71/0x8c
[<ffffffff8107569f>] ? wake_bit_function+0x0/0x51
[<ffffffff810cafe4>] write_cache_pages+0x201/0x3bf
[<ffffffff810cac23>] ? __writepage+0x0/0x5a
[<ffffffff8100ef6c>] ? xen_force_evtchn_callback+0x20/0x36
[<ffffffff8100fa6f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff810cb1d7>] generic_writepages+0x35/0x4f
[<ffffffff810cb230>] do_writepages+0x3f/0x5e
[<ffffffff811261e5>] writeback_single_inode+0x161/0x2d7
[<ffffffff811267ab>] generic_sync_sb_inodes+0x1ef/0x355
[<ffffffff810cc726>] ? pdflush+0x0/0x286
[<ffffffff8112692d>] sync_sb_inodes+0x1c/0x32
[<ffffffff811269bc>] writeback_inodes+0x79/0xdf
[<ffffffff81107819>] ? sync_supers+0xb3/0xce
[<ffffffff810cc1f6>] wb_kupdate+0xb9/0x13a
[<ffffffff810cc84c>] ? pdflush+0x126/0x286
[<ffffffff810cc889>] pdflush+0x163/0x286
[<ffffffff810cc13d>] ? wb_kupdate+0x0/0x13a
[<ffffffff810cc726>] ? pdflush+0x0/0x286
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
1 lock held by pdflush/36:
#0: (&type->s_umount_key#23){......}, at:
[<ffffffff811269a6>] writeback_inodes+0x63/0xdf
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 34, comm: khungtaskd Not tainted 2.6.31.13-xen-4.0.0 #4
Call Trace:
[<ffffffff8105c8f3>] panic+0xb2/0x168
[<ffffffff81085881>] ? print_lock+0x96/0xb1
[<ffffffff810861d5>] ? lockdep_print_held_locks+0xa5/0xc9
[<ffffffff8101907a>] ? show_stack+0x2a/0x40
[<ffffffff8102f025>] ? touch_nmi_watchdog+0x6c/0x87
[<ffffffff810862c7>] ? __debug_show_held_locks+0x33/0x49
[<ffffffff810b146c>] watchdog+0x209/0x258
[<ffffffff810b12d8>] ? watchdog+0x75/0x258
[<ffffffff8104a45f>] ? complete+0x52/0x71
[<ffffffff810b1263>] ? watchdog+0x0/0x258
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
---------------------------------------------
2)
INFO: task kjournald:951 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ffff8801c8993bf0 0000000000000282 00000000cf63a654 ffff8801c8993c00
ffff8801ca899090 ffff8801ccae31b0 ffff8801ca899470 0000000000000001
0000000000000001 0000000000000200 0000000000000001 000000000160015f
Call Trace:
[<ffffffff813f1ee0>] io_schedule+0x37/0x59
[<ffffffff811f1528>] ? blk_backing_dev_unplug+0x20/0x36
[<ffffffff8112cb73>] sync_buffer+0x51/0x69
[<ffffffff813f2387>] __wait_on_bit+0x54/0x9c
[<ffffffff8112cb22>] ? sync_buffer+0x0/0x69
[<ffffffff8112cb22>] ? sync_buffer+0x0/0x69
[<ffffffff813f244c>] out_of_line_wait_on_bit+0x7d/0x9e
[<ffffffff8107569f>] ? wake_bit_function+0x0/0x51
[<ffffffff8112ca8f>] __wait_on_buffer+0x32/0x48
[<ffffffffa005cf62>] journal_commit_transaction+0x684/0x12f2
[jbd]
[<ffffffff8100fa82>] ? check_events+0x12/0x20
[<ffffffff8100fa6f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff813f4ad7>] ? _spin_unlock_irqrestore+0x44/0x5f
[<ffffffff81067de4>] ? try_to_del_timer_sync+0x65/0x84
[<ffffffff81067e03>] ? del_timer_sync+0x0/0xa0
[<ffffffffa0061dd4>] kjournald+0x161/0x3ae [jbd]
[<ffffffff81075641>] ? autoremove_wake_function+0x0/0x5e
[<ffffffffa0061c73>] ? kjournald+0x0/0x3ae [jbd]
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
no locks held by kjournald/951.
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 34, comm: khungtaskd Not tainted 2.6.31.13-xen-4.0.0 #18
Call Trace:
[<ffffffff8105c8f3>] panic+0xb2/0x168
[<ffffffff81086176>] ? lockdep_print_held_locks+0x46/0xc9
[<ffffffff8101907a>] ? show_stack+0x2a/0x40
[<ffffffff8102f025>] ? touch_nmi_watchdog+0x6c/0x87
[<ffffffff810862c7>] ? __debug_show_held_locks+0x33/0x49
[<ffffffff810b146c>] watchdog+0x209/0x258
[<ffffffff810b12d8>] ? watchdog+0x75/0x258
[<ffffffff8104a45f>] ? complete+0x52/0x71
[<ffffffff810b1263>] ? watchdog+0x0/0x258
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
-- Rick Boone
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-devel] I/O related (?) domU crash on xen 4.0 + pv_ops,
Rick Boone <=
|
|
|
|
|