WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] I/O related (?) domU crash on xen 4.0 + pv_ops

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] I/O related (?) domU crash on xen 4.0 + pv_ops
From: Rick Boone <rick@xxxxxxxxxxxxxx>
Date: Thu, 10 Jun 2010 11:42:03 -0700
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 10 Jun 2010 11:44:08 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C0ECF72.3050509@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C0EA505.8050500@xxxxxxxxxxxxxx> <4C0ECF72.3050509@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.1.7) Gecko/20100111 Thunderbird/3.0.1
I booted into 2.6.32.12 yesterday, but I'm still getting a hung task crash. (Initially, I got a soft lockup crash on one of the cpu's, after switching to 2.6.32.12, but it seems I was able to resolve that by switching the kernel to tickless)

I built the kernel myself (for both dom0 and domU), from the xen/stable-2.6.31.x and xen/stable-2.6.32.x branches.

I tried "clocksource=jiffies" earlier, but didnt give it a fair enough shake. The time drift ended up becoming pretty bad, even with ntp, so I had to switch out of it. I'll give it another try today; I'm sure I can probably figure out the drift issue.


-- Rick

On 06/08/2010 04:17 PM, Jeremy Fitzhardinge wrote:
On 06/08/2010 01:16 PM, Rick Boone wrote:
  
Hey,

I'm running into an issue with pv_ops kernel (2.6.31.13) and xen 4.0 -
my domU's are continually locking up, under heavy IO load. My issue
seems similar to what these guys are reportiing:

https://bugzilla.redhat.com/show_bug.cgi?id=551552
https://bugzilla.redhat.com/show_bug.cgi?id=526627
https://bugzilla.redhat.com/show_bug.cgi?id=550724

Any solutions/ideas that haven't been covered in those reports? I've
turned off tickless kernel, and have also set the guest I/O scheduler
to "noop", but machines are still crashing. I'm using LVM-backed block
devices on the dom0. For awhile, I didn't have the kernel set to panic
on a hung task, and from digging around while the kernel was still up,
I was able to determine that the device that seems to be causing
issues is one that sees a lot of IO (it's receiving all of the logs on
a heavily used web server).
    
Does booting the domU with "clocksource=jiffies" help at all?

Have you tried a 2.6.32-based domU?  I used to see these kinds of
lockups with older kernels on my linode domU, but have seen none since
moving it to Fedora 12 with a 2.6.32 kernel.

Are you using a distro kernel, or one you've built yourself?

    J

  
Here's a couple of my tracebacks:
1)

INFO: task pdflush:36 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 ffff8801e963f9c0 0000000000000282 00000000e4f10f56 ffff8801e963f9d0
 ffff8801eb7a31b0 ffff8801eb67c240 ffff8801eb7a3590 0000000103902b94
 00000000e4f10f56 ffff8801e963fa70 ffff8801e963f9b0 ffffffff811f14ec
Call Trace:
 [<ffffffff811f14ec>] ? blk_unplug+0x56/0x72
 [<ffffffff813f1ee0>] io_schedule+0x37/0x59
 [<ffffffff8112b1a8>] ? block_sync_page+0x5b/0x71
 [<ffffffff810c2e77>] sync_page+0x5a/0x72
 [<ffffffff813f2237>] __wait_on_bit_lock+0x55/0xb3
 [<ffffffff810c2e1d>] ? sync_page+0x0/0x72
 [<ffffffff810c2b0d>] ? find_get_pages_tag+0xf7/0x144
 [<ffffffff810c2dce>] __lock_page+0x71/0x8c
 [<ffffffff8107569f>] ? wake_bit_function+0x0/0x51
 [<ffffffff810cafe4>] write_cache_pages+0x201/0x3bf
 [<ffffffff810cac23>] ? __writepage+0x0/0x5a
 [<ffffffff8100ef6c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff8100fa6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff810cb1d7>] generic_writepages+0x35/0x4f
 [<ffffffff810cb230>] do_writepages+0x3f/0x5e
 [<ffffffff811261e5>] writeback_single_inode+0x161/0x2d7
 [<ffffffff811267ab>] generic_sync_sb_inodes+0x1ef/0x355
 [<ffffffff810cc726>] ? pdflush+0x0/0x286
 [<ffffffff8112692d>] sync_sb_inodes+0x1c/0x32
 [<ffffffff811269bc>] writeback_inodes+0x79/0xdf
 [<ffffffff81107819>] ? sync_supers+0xb3/0xce
 [<ffffffff810cc1f6>] wb_kupdate+0xb9/0x13a
 [<ffffffff810cc84c>] ? pdflush+0x126/0x286
 [<ffffffff810cc889>] pdflush+0x163/0x286
 [<ffffffff810cc13d>] ? wb_kupdate+0x0/0x13a
 [<ffffffff810cc726>] ? pdflush+0x0/0x286
 [<ffffffff810754ce>] kthread+0x9e/0xa8
 [<ffffffff8101606a>] child_rip+0xa/0x20
 [<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
 [<ffffffff81016060>] ? child_rip+0x0/0x20
1 lock held by pdflush/36:
 #0:  (&type->s_umount_key#23){......}, at: [<ffffffff811269a6>]
writeback_inodes+0x63/0xdf
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 34, comm: khungtaskd Not tainted 2.6.31.13-xen-4.0.0 #4
Call Trace:
 [<ffffffff8105c8f3>] panic+0xb2/0x168
 [<ffffffff81085881>] ? print_lock+0x96/0xb1
 [<ffffffff810861d5>] ? lockdep_print_held_locks+0xa5/0xc9
 [<ffffffff8101907a>] ? show_stack+0x2a/0x40
 [<ffffffff8102f025>] ? touch_nmi_watchdog+0x6c/0x87
 [<ffffffff810862c7>] ? __debug_show_held_locks+0x33/0x49
 [<ffffffff810b146c>] watchdog+0x209/0x258
 [<ffffffff810b12d8>] ? watchdog+0x75/0x258
 [<ffffffff8104a45f>] ? complete+0x52/0x71
 [<ffffffff810b1263>] ? watchdog+0x0/0x258
 [<ffffffff810754ce>] kthread+0x9e/0xa8
 [<ffffffff8101606a>] child_rip+0xa/0x20
 [<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
 [<ffffffff81016060>] ? child_rip+0x0/0x20

---------------------------------------------

2)
INFO: task kjournald:951 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 ffff8801c8993bf0 0000000000000282 00000000cf63a654 ffff8801c8993c00
 ffff8801ca899090 ffff8801ccae31b0 ffff8801ca899470 0000000000000001
 0000000000000001 0000000000000200 0000000000000001 000000000160015f
Call Trace:
 [<ffffffff813f1ee0>] io_schedule+0x37/0x59
 [<ffffffff811f1528>] ? blk_backing_dev_unplug+0x20/0x36
 [<ffffffff8112cb73>] sync_buffer+0x51/0x69
 [<ffffffff813f2387>] __wait_on_bit+0x54/0x9c
 [<ffffffff8112cb22>] ? sync_buffer+0x0/0x69
 [<ffffffff8112cb22>] ? sync_buffer+0x0/0x69
 [<ffffffff813f244c>] out_of_line_wait_on_bit+0x7d/0x9e
 [<ffffffff8107569f>] ? wake_bit_function+0x0/0x51
 [<ffffffff8112ca8f>] __wait_on_buffer+0x32/0x48
 [<ffffffffa005cf62>] journal_commit_transaction+0x684/0x12f2 [jbd]
 [<ffffffff8100fa82>] ? check_events+0x12/0x20
 [<ffffffff8100fa6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff813f4ad7>] ? _spin_unlock_irqrestore+0x44/0x5f
 [<ffffffff81067de4>] ? try_to_del_timer_sync+0x65/0x84
 [<ffffffff81067e03>] ? del_timer_sync+0x0/0xa0
 [<ffffffffa0061dd4>] kjournald+0x161/0x3ae [jbd]
 [<ffffffff81075641>] ? autoremove_wake_function+0x0/0x5e
 [<ffffffffa0061c73>] ? kjournald+0x0/0x3ae [jbd]
 [<ffffffff810754ce>] kthread+0x9e/0xa8
 [<ffffffff8101606a>] child_rip+0xa/0x20
 [<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
 [<ffffffff81016060>] ? child_rip+0x0/0x20
no locks held by kjournald/951.
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 34, comm: khungtaskd Not tainted 2.6.31.13-xen-4.0.0 #18
Call Trace:
 [<ffffffff8105c8f3>] panic+0xb2/0x168
 [<ffffffff81086176>] ? lockdep_print_held_locks+0x46/0xc9
 [<ffffffff8101907a>] ? show_stack+0x2a/0x40
 [<ffffffff8102f025>] ? touch_nmi_watchdog+0x6c/0x87
 [<ffffffff810862c7>] ? __debug_show_held_locks+0x33/0x49
 [<ffffffff810b146c>] watchdog+0x209/0x258
 [<ffffffff810b12d8>] ? watchdog+0x75/0x258
 [<ffffffff8104a45f>] ? complete+0x52/0x71
 [<ffffffff810b1263>] ? watchdog+0x0/0x258
 [<ffffffff810754ce>] kthread+0x9e/0xa8
 [<ffffffff8101606a>] child_rip+0xa/0x20
 [<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
 [<ffffffff81016060>] ? child_rip+0x0/0x20

-- Rick Boone


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
  
    
  
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>