|
|
|
|
|
|
|
|
|
|
xen-users
[Xen-users] Process blocked errors
Hi,
I am currently running Debian Squeeze with stock kernel and Xen
apt version. Before I go into detail, I have also tried Jeremy's
kernel with Xen 4.1.1 from xen.org and these errors are still
present and or worse with even taking down the whole dom0 (reboots
randomly using source version of xen and jeremys kernel). I also
like to add that I am now using clocksource=pit which has fixed
the other issues I was having, however the following remains true.
Now from what I can deduce, these errors only apear on some but
not all the domU's (guests). There are no errors on the dom0
(host) itself. Now I think I maybe on the right track that it
seems to be something to do with either Network or Heavy disk IO
as the main machines which have these errors are either the VPN
server or the BackupPC machine which can cause quite a bit of
heavy disk IO. Most if not all the other domU's dont have any
errors at all. Most guests are running Debain Squeeze also
however these errors also apear using Centos domU's also.
I have two seperate servers in two seperate data centers however
they both are Supermicro machines using Linux Raid. First machine
is duel Intel(R) Xeon(R) CPU 5140 @ 2.33GHz with 12GB RAM, 4x WD
RE3 512GB HDD's with two seperate RAID 1 arrays and the other is a
quad Intel(R) Xeon(R) CPU E5410 @ 2.33GHz with 16GB RAM and 2x WD
RE 3 1TB drives in RAID 1. Hardware is simular but the second
machine is much newer technology. I only mention the specs as
maybe these issues are related to Supermico machines.
Below you will find the latest logs on these errors... do note
the process it complains about seems random...
[1597440.088347] INFO: task BackupPC:18491 blocked for more than
120 seconds.
[1597440.088354] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597440.088363] BackupPC D ffff880002dc46a0 0 18491
1 0x00000000
[1597440.088373] ffff880002dc46a0 0000000000000286
ffff880011ccfd48 ffff880000009680
[1597440.088387] ffff880011ccfad8 0000000000000000
000000000000f9e0 ffff880011ccffd8
[1597440.088401] 0000000000015780 0000000000015780
ffff8800029d7100 ffff8800029d73f8
[1597440.088415] Call Trace:
[1597440.088422] [<ffffffff8102ddcc>] ?
pvclock_clocksource_read+0x3a/0x8b
[1597440.088430] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.088438] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.088447] [<ffffffff8130c16a>] ?
io_schedule+0x73/0xb7
[1597440.088455] [<ffffffff8110f1d5>] ?
sync_buffer+0x3b/0x40
[1597440.088463] [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
[1597440.088472] [<ffffffff8130c57a>] ?
__wait_on_bit_lock+0x3f/0x84
[1597440.088480] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.088487] [<ffffffff8130c62a>] ?
out_of_line_wait_on_bit_lock+0x6b/0x77
[1597440.088497] [<ffffffff81065f34>] ?
wake_bit_function+0x0/0x23
[1597440.088507] [<ffffffff8110f5c7>] ?
sync_dirty_buffer+0x29/0x93
[1597440.088516] [<ffffffffa0034e04>] ?
journal_dirty_data+0xd1/0x1b0 [jbd]
[1597440.088528] [<ffffffffa004bf1f>] ?
ext3_journal_dirty_data+0xf/0x34 [ext3]
[1597440.088538] [<ffffffffa004a3f9>] ?
walk_page_buffers+0x65/0x8b [ext3]
[1597440.088549] [<ffffffffa004bf44>] ?
journal_dirty_data_fn+0x0/0x13 [ext3]
[1597440.088559] [<ffffffffa004da66>] ?
ext3_ordered_write_end+0x73/0x10f [ext3]
[1597440.088570] [<ffffffff810b5ea1>] ?
generic_file_buffered_write+0x18d/0x278
[1597440.088580] [<ffffffff810b633d>] ?
__generic_file_aio_write+0x25f/0x293
[1597440.088589] [<ffffffff8118f534>] ?
cpumask_any_but+0x28/0x34
[1597440.088598] [<ffffffff8100eccf>] ?
xen_restore_fl_direct_end+0x0/0x1
[1597440.088607] [<ffffffff8100c2f1>] ?
__raw_callee_save_xen_pte_val+0x11/0x1e
[1597440.088616] [<ffffffff810b63ca>] ?
generic_file_aio_write+0x59/0x9f
[1597440.088626] [<ffffffff810efa56>] ?
do_sync_write+0xce/0x113
[1597440.088635] [<ffffffff81065f06>] ?
autoremove_wake_function+0x0/0x2e
[1597440.088644] [<ffffffff810cdfc7>] ?
handle_mm_fault+0x7aa/0x80f
[1597440.088654] [<ffffffff810f03a8>] ?
vfs_write+0xa9/0x102
[1597440.088662] [<ffffffff810f04bd>] ? sys_write+0x45/0x6e
[1597440.088670] [<ffffffff81011b42>] ?
system_call_fastpath+0x16/0x1b
[1597440.088686] INFO: task flush-202:3:3497 blocked for more than
120 seconds.
[1597440.088694] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597440.088702] flush-202:3 D 0000000000000002 0 3497
2 0x00000000
[1597440.088713] ffff88001fd89c40 0000000000000246
0000000000000000 ffff880002ea57f8
[1597440.088727] 0000000000000001 0000000000000001
000000000000f9e0 ffff880011d57fd8
[1597440.088740] 0000000000015780 0000000000015780
ffff880002dc0e20 ffff880002dc1118
[1597440.088755] Call Trace:
[1597440.088762] [<ffffffff8102ddcc>] ?
pvclock_clocksource_read+0x3a/0x8b
[1597440.088770] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.088779] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.088786] [<ffffffff8130c16a>] ?
io_schedule+0x73/0xb7
[1597440.088794] [<ffffffff8110f1d5>] ?
sync_buffer+0x3b/0x40
[1597440.088803] [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
[1597440.088811] [<ffffffff8130c57a>] ?
__wait_on_bit_lock+0x3f/0x84
[1597440.088820] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.088828] [<ffffffff8130c62a>] ?
out_of_line_wait_on_bit_lock+0x6b/0x77
[1597440.088837] [<ffffffff81065f34>] ?
wake_bit_function+0x0/0x23
[1597440.088847] [<ffffffff81110567>] ?
__block_write_full_page+0x159/0x2ac
[1597440.088856] [<ffffffff8110f364>] ?
end_buffer_async_write+0x0/0x13b
[1597440.088865] [<ffffffff810bb6b6>] ?
__writepage+0xa/0x25
[1597440.088873] [<ffffffff810bbd3d>] ?
write_cache_pages+0x20b/0x327
[1597440.088881] [<ffffffff810bb6ac>] ?
__writepage+0x0/0x25
[1597440.088889] [<ffffffff8100b3c5>] ?
xen_end_context_switch+0x9/0x12
[1597440.088899] [<ffffffff81108f1e>] ?
writeback_single_inode+0xe7/0x2da
[1597440.088907] [<ffffffff81109c24>] ?
writeback_inodes_wb+0x424/0x4ff
[1597440.088916] [<ffffffff81109e2b>] ?
wb_writeback+0x12c/0x1ab
[1597440.088926] [<ffffffff8105b8c8>] ?
try_to_del_timer_sync+0x63/0x6c
[1597440.088935] [<ffffffff8110a0a1>] ?
wb_do_writeback+0x14f/0x165
[1597440.088944] [<ffffffff8110a0e8>] ?
bdi_writeback_task+0x31/0xaa
[1597440.088953] [<ffffffff810ca00e>] ?
bdi_start_fn+0x0/0xd2
[1597440.088960] [<ffffffff810ca07e>] ?
bdi_start_fn+0x70/0xd2
[1597440.088968] [<ffffffff810ca00e>] ?
bdi_start_fn+0x0/0xd2
[1597440.088975] [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597440.088983] [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597440.088990] [<ffffffff81011d61>] ?
int_ret_from_sys_call+0x7/0x1b
[1597440.088998] [<ffffffff8101251d>] ?
retint_restore_args+0x5/0x6
[1597440.089007] [<ffffffff81012ba0>] ? child_rip+0x0/0x20
[1597440.089013] INFO: task BackupPC_dump:3498 blocked for more
than 120 seconds.
[1597440.089021] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597440.089028] BackupPC_dump D 0000000000000000 0 3498
18491 0x00000000
[1597440.089038] ffff88001fd10e20 0000000000000286
0000000000000000 ffff880000009680
[1597440.089050] 0000000000000008 0000000000000000
000000000000f9e0 ffff880017917fd8
[1597440.089062] 0000000000015780 0000000000015780
ffff880002dc2a60 ffff880002dc2d58
[1597440.089075] Call Trace:
[1597440.089081] [<ffffffff8102ddcc>] ?
pvclock_clocksource_read+0x3a/0x8b
[1597440.089089] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.089096] [<ffffffff8130c16a>] ?
io_schedule+0x73/0xb7
[1597440.089103] [<ffffffff8110f1d5>] ?
sync_buffer+0x3b/0x40
[1597440.089111] [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
[1597440.089118] [<ffffffff8130c57a>] ?
__wait_on_bit_lock+0x3f/0x84
[1597440.089126] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597440.089133] [<ffffffff8130c62a>] ?
out_of_line_wait_on_bit_lock+0x6b/0x77
[1597440.089141] [<ffffffff81065f34>] ?
wake_bit_function+0x0/0x23
[1597440.089149] [<ffffffff8110f5c7>] ?
sync_dirty_buffer+0x29/0x93
[1597440.089158] [<ffffffffa0034e04>] ?
journal_dirty_data+0xd1/0x1b0 [jbd]
[1597440.092016] [<ffffffffa004bf1f>] ?
ext3_journal_dirty_data+0xf/0x34 [ext3]
[1597440.092016] [<ffffffffa004a3f9>] ?
walk_page_buffers+0x65/0x8b [ext3]
[1597440.092016] [<ffffffffa004bf44>] ?
journal_dirty_data_fn+0x0/0x13 [ext3]
[1597440.092016] [<ffffffffa004da66>] ?
ext3_ordered_write_end+0x73/0x10f [ext3]
[1597440.092016] [<ffffffff810b5ea1>] ?
generic_file_buffered_write+0x18d/0x278
[1597440.092016] [<ffffffff810b633d>] ?
__generic_file_aio_write+0x25f/0x293
[1597440.092016] [<ffffffff8118f534>] ?
cpumask_any_but+0x28/0x34
[1597440.092016] [<ffffffff8100eccf>] ?
xen_restore_fl_direct_end+0x0/0x1
[1597440.092016] [<ffffffff8100c2f1>] ?
__raw_callee_save_xen_pte_val+0x11/0x1e
[1597440.092016] [<ffffffff810b63ca>] ?
generic_file_aio_write+0x59/0x9f
[1597440.092016] [<ffffffff810efa56>] ?
do_sync_write+0xce/0x113
[1597440.092016] [<ffffffff81065f06>] ?
autoremove_wake_function+0x0/0x2e
[1597440.092016] [<ffffffff810cdfc7>] ?
handle_mm_fault+0x7aa/0x80f
[1597440.092016] [<ffffffff810f03a8>] ?
vfs_write+0xa9/0x102
[1597440.092016] [<ffffffff810f04bd>] ? sys_write+0x45/0x6e
[1597440.092016] [<ffffffff81011b42>] ?
system_call_fastpath+0x16/0x1b
[1597800.096045] INFO: task kswapd0:30 blocked for more than 120
seconds.
[1597800.096060] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597800.096069] kswapd0 D 0000000000000000 0 30
2 0x00000000
[1597800.096079] ffffffff814791f0 0000000000000246
0000000000000000 ffff88001d8736b0
[1597800.096093] ffff88001d873790 ffffffff8117fd56
000000000000f9e0 ffff88001d873fd8
[1597800.096106] 0000000000015780 0000000000015780
ffff88001fd8cdb0 ffff88001fd8d0a8
[1597800.096120] Call Trace:
[1597800.096134] [<ffffffff8117fd56>] ?
blk_peek_request+0x18b/0x19f
[1597800.096145] [<ffffffff8102ddcc>] ?
pvclock_clocksource_read+0x3a/0x8b
[1597800.096156] [<ffffffff8130c16a>] ?
io_schedule+0x73/0xb7
[1597800.096165] [<ffffffff81180b77>] ?
get_request_wait+0xf0/0x188
[1597800.096175] [<ffffffff81065f06>] ?
autoremove_wake_function+0x0/0x2e
[1597800.096184] [<ffffffff81180f06>] ?
__make_request+0x2f7/0x428
[1597800.096193] [<ffffffff8117f6e3>] ?
generic_make_request+0x299/0x2f9
[1597800.096204] [<ffffffff81193109>] ?
radix_tree_delete+0xbf/0x1ba
[1597800.096214] [<ffffffff8100ece2>] ?
check_events+0x12/0x20
[1597800.096223] [<ffffffff8117f819>] ?
submit_bio+0xd6/0xf2
[1597800.096232] [<ffffffff8110e069>] ?
submit_bh+0x103/0x123
[1597800.096242] [<ffffffff811105e4>] ?
__block_write_full_page+0x1d6/0x2ac
[1597800.096250] [<ffffffff8110f364>] ?
end_buffer_async_write+0x0/0x13b
[1597800.096260] [<ffffffff81112670>] ?
blkdev_get_block+0x0/0x57
[1597800.096272] [<ffffffff810bf3c1>] ?
shrink_page_list+0x375/0x623
[1597800.096281] [<ffffffff810bfda4>] ?
shrink_list+0x45c/0x767
[1597800.096290] [<ffffffff810bbfd0>] ?
determine_dirtyable_memory+0xd/0x1d
[1597800.096299] [<ffffffff810bc048>] ?
get_dirty_limits+0x1d/0x259
[1597800.096308] [<ffffffff8100eccf>] ?
xen_restore_fl_direct_end+0x0/0x1
[1597800.096319] [<ffffffff81099108>] ?
__call_rcu+0x110/0x118
[1597800.096329] [<ffffffff810fe2ab>] ? d_kill+0x58/0x61
[1597800.096338] [<ffffffff810c032f>] ?
shrink_zone+0x280/0x342
[1597800.096351] [<ffffffffa002c226>] ?
mb_cache_shrink_fn+0x26/0x129 [mbcache]
[1597800.096361] [<ffffffff810c0d54>] ? kswapd+0x4b9/0x686
[1597800.096369] [<ffffffff810be3eb>] ?
isolate_pages_global+0x0/0x20f
[1597800.096379] [<ffffffff81065f06>] ?
autoremove_wake_function+0x0/0x2e
[1597800.096388] [<ffffffff8100ece2>] ?
check_events+0x12/0x20
[1597800.096396] [<ffffffff810c089b>] ? kswapd+0x0/0x686
[1597800.096405] [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597800.096414] [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597800.096422] [<ffffffff81011d61>] ?
int_ret_from_sys_call+0x7/0x1b
[1597800.096431] [<ffffffff8101251d>] ?
retint_restore_args+0x5/0x6
[1597800.096439] [<ffffffff81012ba0>] ? child_rip+0x0/0x20
[1597800.096449] INFO: task kjournald:386 blocked for more than
120 seconds.
[1597800.096456] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597800.096464] kjournald D 0000000000000000 0 386
2 0x00000000
[1597800.096475] ffffffff814791f0 0000000000000246
0000000000000000 0000000000000200
[1597800.096489] 0000000000000000 0000000000000001
000000000000f9e0 ffff88001e599fd8
[1597800.096504] 0000000000015780 0000000000015780
ffff880002fa5bd0 ffff880002fa5ec8
[1597800.096518] Call Trace:
[1597800.096525] [<ffffffff8102ddcc>] ?
pvclock_clocksource_read+0x3a/0x8b
[1597800.096534] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597800.096542] [<ffffffff8130c16a>] ?
io_schedule+0x73/0xb7
[1597800.096551] [<ffffffff8110f1d5>] ?
sync_buffer+0x3b/0x40
[1597800.096559] [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
[1597800.096568] [<ffffffff8130c677>] ?
__wait_on_bit+0x41/0x70
[1597800.096577] [<ffffffff8110f19a>] ?
sync_buffer+0x0/0x40
[1597800.096585] [<ffffffff8130c711>] ?
out_of_line_wait_on_bit+0x6b/0x77
[1597800.096594] [<ffffffff81065f34>] ?
wake_bit_function+0x0/0x23
[1597800.096605] [<ffffffffa00361d1>] ?
journal_commit_transaction+0x508/0xe2b [jbd]
[1597800.096616] [<ffffffff8100e629>] ?
xen_force_evtchn_callback+0x9/0xa
[1597800.096625] [<ffffffff8100ece2>] ?
check_events+0x12/0x20
[1597800.096633] [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
[1597800.096643] [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
[1597800.096652] [<ffffffff8100eccf>] ?
xen_restore_fl_direct_end+0x0/0x1
[1597800.096661] [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
[1597800.096671] [<ffffffffa0039423>] ?
kjournald+0xdf/0x226 [jbd]
[1597800.096680] [<ffffffff81065f06>] ?
autoremove_wake_function+0x0/0x2e
[1597800.096690] [<ffffffffa0039344>] ? kjournald+0x0/0x226
[jbd]
[1597800.096699] [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597800.096707] [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597800.096715] [<ffffffff81011d61>] ?
int_ret_from_sys_call+0x7/0x1b
[1597800.096723] [<ffffffff8101251d>] ?
retint_restore_args+0x5/0x6
[1597800.096732] [<ffffffff81012ba0>] ? child_rip+0x0/0x20
[1597800.096751] INFO: task flush-202:3:3497 blocked for more than
120 seconds.
[1597800.096759] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1597800.096767] flush-202:3 D ffff88001ff969f0 0 3497
2 0x00000000
[1597800.096778] ffff88001ff969f0 0000000000000246
ffff880011d578a0 ffff880011d5789c
[1597800.096792] ffff880011d57920 ffffffff8117fd56
000000000000f9e0 ffff880011d57fd8
[1597800.096807] 0000000000015780 0000000000015780
ffff880002dc0e20 ffff880002dc1118
[1597800.096821] Call Trace:
[1597800.096829] [<ffffffff8117fd56>] ?
blk_peek_request+0x18b/0x19f
[1597800.096838] [<ffffffff8102ddcc>] ?
pvclock_clocksource_read+0x3a/0x8b
[1597800.096846] [<ffffffff8130c16a>] ?
io_schedule+0x73/0xb7
[1597800.096856] [<ffffffff81180b77>] ?
get_request_wait+0xf0/0x188
[1597800.096864] [<ffffffff81065f06>] ?
autoremove_wake_function+0x0/0x2e
[1597800.096872] [<ffffffff81180f06>] ?
__make_request+0x2f7/0x428
[1597800.096880] [<ffffffff8117f6e3>] ?
generic_make_request+0x299/0x2f9
[1597800.096890] [<ffffffffa000a43b>] ?
do_blkif_request+0x0/0x374 [xen_blkfront]
[1597800.096899] [<ffffffff8100ece2>] ?
check_events+0x12/0x20
[1597800.096907] [<ffffffff8117f819>] ?
submit_bio+0xd6/0xf2
[1597800.096914] [<ffffffff8110e069>] ?
submit_bh+0x103/0x123
[1597800.096922] [<ffffffff811105e4>] ?
__block_write_full_page+0x1d6/0x2ac
[1597800.096930] [<ffffffff8100ece2>] ?
check_events+0x12/0x20
[1597800.096938] [<ffffffff8110f364>] ?
end_buffer_async_write+0x0/0x13b
[1597800.096947] [<ffffffff81112670>] ?
blkdev_get_block+0x0/0x57
[1597800.096955] [<ffffffff810bb6b6>] ?
__writepage+0xa/0x25
[1597800.096962] [<ffffffff810bbd3d>] ?
write_cache_pages+0x20b/0x327
[1597800.096970] [<ffffffff810bb6ac>] ?
__writepage+0x0/0x25
[1597800.096979] [<ffffffff81108f1e>] ?
writeback_single_inode+0xe7/0x2da
[1597800.096987] [<ffffffff81109c24>] ?
writeback_inodes_wb+0x424/0x4ff
[1597800.096995] [<ffffffff81109e2b>] ?
wb_writeback+0x12c/0x1ab
[1597800.097006] [<ffffffff8105b8c8>] ?
try_to_del_timer_sync+0x63/0x6c
[1597800.097014] [<ffffffff8110a0a1>] ?
wb_do_writeback+0x14f/0x165
[1597800.097022] [<ffffffff8110a0e8>] ?
bdi_writeback_task+0x31/0xaa
[1597800.097031] [<ffffffff810ca00e>] ?
bdi_start_fn+0x0/0xd2
[1597800.097038] [<ffffffff810ca07e>] ?
bdi_start_fn+0x70/0xd2
[1597800.097045] [<ffffffff810ca00e>] ?
bdi_start_fn+0x0/0xd2
[1597800.097052] [<ffffffff81065c39>] ? kthread+0x79/0x81
[1597800.097060] [<ffffffff81012baa>] ? child_rip+0xa/0x20
[1597800.097067] [<ffffffff81011d61>] ?
int_ret_from_sys_call+0x7/0x1b
[1597800.097074] [<ffffffff8101251d>] ?
retint_restore_args+0x5/0x6
[1597800.100015] [<ffffffff81012ba0>] ? child_rip+0x0/0x20
Can anyone give me any clues to what the problem is and or how to
fix them.
Thanks in advanced
--
May the ping be with you ..
|
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-users] Process blocked errors,
Steve Allison <=
|
|
|
|
|