|
|
|
|
|
|
|
|
|
|
xen-users
[Xen-users] Task Blocking / Domu Lockups
Hi All,
We've been getting DomU's locking up for some time now under moderate IO load
(I think) on two different Xen hosts. Everything is Debian - Dom0 is Squeeze
and the DomUs are a mixture of Lenny and Squeeze which both crash in the same
way.
The DomUs and the Dom0 are running the latest Squeeze kernel
(2.6.32-5-xen-amd64) and Xen is 4.0.1-2.
The block device (or the kernel's handling of it) is probably closer to the
cause of the problem than a bug in the individual tasks as you see multiple
tasks lock up at the same time if you get enough output and on separate
incidents you see different tasks as well. A couple of excerpts from the
console are below:
[581606.222303] INFO: task syslogd:1142 blocked for more than 120 seconds.
[581606.222321] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this
message.
[581606.222329] syslogd D ffff8800f9eafc78 0 1142 1
[581606.222338] ffff8800f9eafda8 0000000000000286 ffff880003d74fe8
ffff8800060f
18f0
[581606.222349] ffff8800f9e30440 ffff8800d8d8c940 ffff8800f9e306c0
000000000000
0000
[581606.222360] ffff880000000005 0000000000138512 ffff8800f7c41cc0
ffff88000000
000f
[581606.222368] Call Trace:
[581606.222382] [<ffffffff8022383e>] __wake_up+0x38/0x4f
[581606.222395] [<ffffffffa0032067>] :jbd:log_wait_commit+0xb6/0x11f
[581606.222403] [<ffffffff8023f64d>] autoremove_wake_function+0x0/0x2e
[581606.222413] [<ffffffffa002d552>] :jbd:journal_stop+0x198/0x1f3
[581606.222421] [<ffffffff802a7eec>] __writeback_single_inode+0x1bc/0x2da
[581606.222429] [<ffffffff8028a992>] do_readv_writev+0x176/0x18b
[581606.222436] [<ffffffff802a898d>] sync_inode+0x24/0x53
[581606.222453] [<ffffffffa003e48a>] :ext3:ext3_sync_file+0x9e/0xb0
[581606.222460] [<ffffffff802aafc6>] do_fsync+0x52/0xa4
[581606.222467] [<ffffffff802ab03b>] __do_fsync+0x23/0x36
[581606.222473] [<ffffffff8020b528>] system_call+0x68/0x6d
[581606.222479] [<ffffffff8020b4c0>] system_call+0x0/0x6d
[581606.222484]
[581376.493333] INFO: task apache2:14097 blocked for more than 120 seconds.
[581376.493348] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this
message.
[581376.493356] apache2 D ffffffff8044af00 0 14097 26200
[581376.493365] ffff8800d0091de0 0000000000000286 0000000000000000
ffff8800f759
aec0
[581376.493375] ffff8800d8f17440 ffffffff804ff460 ffff8800d8f176c0
00000000d009
1e68
[581376.493385] 00000000ffffffff 0000000000000000 ffff880073859000
ffff8800f74a
76c4
[581376.493394] Call Trace:
[581376.493408] [<ffffffff8029443f>] path_walk+0x7e/0x8b
[581376.493415] [<ffffffff80294733>] do_path_lookup+0x158/0x1ce
[581376.493423] [<ffffffff804356ad>] __mutex_lock_slowpath+0x79/0xc7
[581376.493430] [<ffffffff80435482>] mutex_lock+0xa/0xb
[581376.493435] [<ffffffff8029542a>] do_filp_open+0x11a/0x7c4
[581376.493445] [<ffffffff80288b3b>] get_unused_fd_flags+0x74/0x13f
[581376.493452] [<ffffffff80288c4c>] do_sys_open+0x46/0xc3
[581376.493458] [<ffffffff8020b528>] system_call+0x68/0x6d
[581376.493464] [<ffffffff8020b4c0>] system_call+0x0/0x6d
[581376.493471]
[1426201.768058] INFO: task sshd:772 blocked for more than 120 seconds.
[1426201.768058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[1426201.768058] sshd D 0000000000000000 0 772 1 0x00000000
[1426201.768058] ffffffff814791f0 0000000000000282 0000000000000000
ffff88000edc35b0
[1426201.768058] ffff88000edc3690 ffffffff8117fd56 000000000000f9e0
ffff88000edc3fd8
[1426201.768058] 0000000000015780 0000000000015780 ffff88000284f100
ffff88000284f3f8
[1426201.768058] Call Trace:
[1426201.768058] [<ffffffff8117fd56>] ? blk_peek_request+0x18b/0x19f
[1426201.768058] [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1426201.768058] [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1426201.768058] [<ffffffff81180b77>] ? get_request_wait+0xf0/0x188
[1426201.768058] [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1426201.768058] [<ffffffff81180f06>] ? __make_request+0x2f7/0x428
[1426201.768058] [<ffffffff81192e43>] ? radix_tree_tag_clear+0x93/0xf1
[1426201.768058] [<ffffffff8117f6e3>] ? generic_make_request+0x299/0x2f9
[1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
[1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1426201.768058] [<ffffffff810bc7ce>] ? __set_page_dirty_nobuffers+0x0/0xfa
[1426201.768058] [<ffffffff8117f819>] ? submit_bio+0xd6/0xf2
[1426201.768058] [<ffffffff810bb841>] ? test_set_page_writeback+0xe0/0xef
[1426201.768058] [<ffffffff810d9a70>] ? swap_writepage+0x9b/0xa5
[1426201.768058] [<ffffffff810bf3c1>] ? shrink_page_list+0x375/0x623
[1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
[1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1426201.768058] [<ffffffff810bfda4>] ? shrink_list+0x45c/0x767
[1426201.768058] [<ffffffff81042abe>] ? pick_next_task_fair+0xca/0xd6
[1426201.768058] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[1426201.768058] [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1426201.768058] [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
[1426201.768058] [<ffffffff810c032f>] ? shrink_zone+0x280/0x342
[1426201.768058] [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[1426201.768058] [<ffffffff810c94f8>] ? congestion_wait+0x74/0x80
[1426201.768058] [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1426201.768058] [<ffffffff810c13f6>] ? try_to_free_pages+0x232/0x38e
[1426201.768058] [<ffffffff810be3eb>] ? isolate_pages_global+0x0/0x20f
[1426201.768058] [<ffffffff810fdb83>] ? pollwake+0x0/0x59
[1426201.768058] [<ffffffff810bb484>] ? __alloc_pages_nodemask+0x3cd/0x5f5
[1426201.768058] [<ffffffff810ba60f>] ? __get_free_pages+0x9/0x46
[1426201.768058] [<ffffffff8104d4f6>] ? copy_process+0xd7/0x115f
[1426201.768058] [<ffffffff811542f6>] ? cap_d_instantiate+0x0/0x1
[1426201.768058] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
[1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1426201.768058] [<ffffffff811542f6>] ? cap_d_instantiate+0x0/0x1
[1426201.768058] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[1426201.768058] [<ffffffff8104e6d5>] ? do_fork+0x157/0x31e
[1426201.768058] [<ffffffff81118548>] ? inotify_d_instantiate+0x12/0x39
[1426201.768058] [<ffffffff812510d3>] ? sock_attach_fd+0x91/0xbf
[1426201.768058] [<ffffffff810ee05f>] ? fd_install+0x2e/0x5a
[1426201.768058] [<ffffffff81011e63>] ? stub_clone+0x13/0x20
[1426201.768058] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
[1426201.768058] INFO: task master:845 blocked for more than 120 seconds.
[1426201.768058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[1426201.768058] master D 0000000000000000 0 845 1 0x00000000
[1426201.768058] ffffffff814791f0 0000000000000286 0000000000000000
ffff88000ebcd588
[1426201.768058] ffff88000ebcd668 ffffffff8117fd56 000000000000f9e0
ffff88000ebcdfd8
[1426201.768058] 0000000000015780 0000000000015780 ffff88000fd1f810
ffff88000fd1fb08
[1426201.768058] Call Trace:
[1426201.768058] [<ffffffff8117fd56>] ? blk_peek_request+0x18b/0x19f
[1426201.768058] [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b
[1426201.768058] [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7
[1426201.768058] [<ffffffff81180b77>] ? get_request_wait+0xf0/0x188
[1426201.768058] [<ffffffff810bee23>] ? move_active_pages_to_lru+0xf3/0x126
[1426201.768058] [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
[1426201.768058] [<ffffffff81180f06>] ? __make_request+0x2f7/0x428
[1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1426201.768058] [<ffffffff81192e43>] ? radix_tree_tag_clear+0x93/0xf1
[1426201.768058] [<ffffffff8117f6e3>] ? generic_make_request+0x299/0x2f9
[1426201.768058] [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
[1426201.768058] [<ffffffff8100ece2>] ? check_events+0x12/0x20
[1426201.768058] [<ffffffff8118f534>] ? cpumask_any_but+0x28/0x34
[1426201.768058] [<ffffffff8117f819>] ? submit_bio+0xd6/0xf2
[1426201.768058] [<ffffffff810bb841>] ? test_set_page_writeback+0xe0/0xef
[1426201.768058] [<ffffffff810d9a70>] ? swap_writepage+0x9b/0xa5
[1426201.768058] [<ffffffff810bf3c1>] ? shrink_page_list+0x375/0x623
[1426201.768058] [<ffffffff810bfda4>] ? shrink_list+0x45c/0x767
[1426201.768058] [<ffffffff810bbfd0>] ? determine_dirtyable_memory+0xd/0x1d
[1426201.768058] [<ffffffff810bc048>] ? get_dirty_limits+0x1d/0x259
[1426201.768058] [<ffffffffa00380ba>] ? journal_cancel_revoke+0xc3/0xec [jbd]
[1426201.768058] [<ffffffff810c032f>] ? shrink_zone+0x280/0x342
[1426201.768058] [<ffffffffa002c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[1426201.768058] [<ffffffff810c0532>] ? shrink_slab+0x141/0x153
[1426201.768058] [<ffffffff810c13f6>] ? try_to_free_pages+0x232/0x38e
[1426201.768058] [<ffffffff810be3eb>] ? isolate_pages_global+0x0/0x20f
[1426201.768058] [<ffffffff810bb484>] ? __alloc_pages_nodemask+0x3cd/0x5f5
[1426201.768058] [<ffffffff810cc224>] ? do_wp_page+0x386/0x707
[1426201.768058] [<ffffffff810efa56>] ? do_sync_write+0xce/0x113
[1426201.768058] [<ffffffff8100c3a5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e
[1426201.768058] [<ffffffff8100c369>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[1426201.768058] [<ffffffff810cdfc7>] ? handle_mm_fault+0x7aa/0x80f
[1426201.768058] [<ffffffff8115421a>] ? cap_cred_commit+0x0/0x1
[1426201.768058] [<ffffffff8130f906>] ? do_page_fault+0x2e0/0x2fc
[1426201.768058] [<ffffffff8130d7a5>] ? page_fault+0x25/0x30
Has anybody seen this before? Is there a fix / workaround or should we be
trying / building different kernels for the DomUs?
Thanks in advance!
Regards,
Richard Maynard
Wessex Networks
Linchmere Place
Ifield
Crawley
West Sussex
RH11 0EX
www.wessexnetworks.com rjm@xxxxxxxxxxxxxxxxxx
T: 01293 542080 F: 01293 553849
Twitter: @wessexnetworks
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Xen-users] Task Blocking / Domu Lockups,
Richard Maynard / Wessex Networks <=
|
|
|
|
|