WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-bugs

[Xen-bugs] [Bug 1659] New: Dom0 'looses' BIOs

To: xen-bugs@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-bugs] [Bug 1659] New: Dom0 'looses' BIOs
From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
Date: Tue, 31 Aug 2010 06:47:42 -0700
Delivery-date: Tue, 31 Aug 2010 06:47:50 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-bugs-request@lists.xensource.com?subject=help>
List-id: Xen Bugzilla <xen-bugs.lists.xensource.com>
List-post: <mailto:xen-bugs@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-bugs>, <mailto:xen-bugs-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-bugs>, <mailto:xen-bugs-request@lists.xensource.com?subject=unsubscribe>
Reply-to: bugs@xxxxxxxxxxxxxxxxxx
Sender: xen-bugs-bounces@xxxxxxxxxxxxxxxxxxx
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1659

           Summary: Dom0 'looses' BIOs
           Product: Xen
           Version: unspecified
          Platform: x86-64
        OS/Version: Linux-2.6
            Status: NEW
          Severity: major
          Priority: P2
         Component: Unspecified
        AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx
        ReportedBy: xenbugsx6vp3m@xxxxxxxxx


DOM0 Setup: blkback -> DRBD -> LVM2 -> MD RAID1 -> SATA

Symptom: the md raid1 device hangs during raid resync (resync hangs, accesses
to the md raid1 device are hanging, accesses to the underlying SATA devices are
ok). There is a deadlock in the *_barrier functions of raid1.c. The resync
process is waiting for a pending request to finish (but which either never
finishes or at least 'forgets' to decrease the pending count related to the
resync barrier handling in raid1.c. While the resync process waits for pending
regular I/O to complete, it has already risen the resync barrier and all
further normal I/O is therefore waiting for the resync op to lower its barrier.
(see call trace below)

The bug has been tested and verified on totally different x86-64 platforms (AMD
Opterion 1214HE + MCP55 chipset, Intel Core2Duo Notebook ICH9M Chipset), so it
is unlikely to be a hardware issue.

It has been verified using OpenSUSE 11.2 (2.6.31.12-0.2-xen) and 11.3 (2.6.34)
dom0 kernels, running on xen hypervisor 3.4.1, 3.4.2 and 3.4.3.

I could not reproduce the bug with kernels without any xen dom0 patches.

The situation seems to occur preferably when crashing the hardware node and the
VMs therefore start a file system journal replay (ext3). That is also the
potential reason why I could not reproduce the bug with regular kernels (ie.
without xen patches) -- I have no definitive clue whether this is a
xen-specific problem.





[  603.229215] INFO: task md1_resync:1441 blocked for more than 120 seconds.
[  603.229294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  603.229413] md1_resync    D 0000000000000000     0  1441      2 0x00000000
[  603.229505]  ffff88003d967bb0 0000000000000246 ffff88003d967b10
ffff88003d967b30
[  603.229627]  0000000000000000 ffff88003d967b78 000000000000a380
ffff88003dba8be8
[  603.229753]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  603.229880] Call Trace:
[  603.229964]  [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1]
[  603.230037]  [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1]
[  603.230112]  [<ffffffff80399de9>] md_do_sync+0x669/0xc40
[  603.230180]  [<ffffffff8039ac54>] md_thread+0x54/0x150
[  603.230249]  [<ffffffff8006fac6>] kthread+0xb6/0xc0
[  603.230318]  [<ffffffff8000d38a>] child_rip+0xa/0x20
[  603.230401] INFO: task python:5365 blocked for more than 120 seconds.
[  603.230467] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  603.230584] python        D 00000000c7c9e55f     0  5365   5348 0x00000000
[  603.230657]  ffff8800382595b8 0000000000000282 ffff880038259518
ffff880038259538
[  603.230783]  ffff8800382594e8 ffff880038259580 000000000000a380
ffff8800381e88e8
[  603.230909]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  603.231035] Call Trace:
[  603.231098]  [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[  603.231169]  [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[  603.231239]  [<ffffffff80399498>] md_make_request+0xc8/0x140
[  603.231309]  [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[  603.231380]  [<ffffffff8022287d>] submit_bio+0x7d/0x110
[  603.231448]  [<ffffffff80153535>] mpage_bio_submit+0x35/0x50
[  603.231517]  [<ffffffff80153aa3>] do_mpage_readpage+0x383/0x710
[  603.231595]  [<ffffffff80153fb3>] mpage_readpages+0xf3/0x150
[  603.231664]  [<ffffffff801b8ccb>] ext2_readpages+0x2b/0x50
[  603.231733]  [<ffffffff800e0353>] read_pages+0x43/0x110
[  603.231801]  [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[  603.231871]  [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[  603.231936]  [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[  603.232005]  [<ffffffff800e0a60>] page_cache_async_readahead+0xa0/0xc0
[  603.232082]  [<ffffffff800d7311>] T.731+0x1f1/0x440
[  603.232149]  [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[  603.232218]  [<ffffffff80118da2>] do_sync_read+0x102/0x160
[  603.232286]  [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[  603.232352]  [<ffffffff801199fb>] sys_read+0x5b/0xa0
[  603.233212]  [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[  603.233279]  [<00007fab60821a90>] 0x7fab60821a90
[  603.233340] INFO: task blkid:5393 blocked for more than 120 seconds.
[  603.233402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  603.233508] blkid         D 0000000000000001     0  5393      1 0x00000000
[  603.233573]  ffff8800406d96e8 0000000000000282 ffff8800406d9648
ffff8800406d9668
[  603.233685]  ffff8800406d9618 ffff8800406d96b0 000000000000a380
ffff8800382fe4a8
[  603.233798]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  603.233914] Call Trace:
[  603.233969]  [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[  603.234037]  [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[  603.234106]  [<ffffffff80399498>] md_make_request+0xc8/0x140
[  603.234176]  [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[  603.234244]  [<ffffffff8022287d>] submit_bio+0x7d/0x110
[  603.234250]  [<ffffffff80147c12>] submit_bh+0x102/0x150
[  603.234257]  [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[  603.234262]  [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[  603.234268]  [<ffffffff800e03f6>] read_pages+0xe6/0x110
[  603.234273]  [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[  603.234278]  [<ffffffff800e0839>] ondemand_readahead+0xd9/0x260
[  603.234284]  [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[  603.234288]  [<ffffffff800d73d6>] T.731+0x2b6/0x440
[  603.234293]  [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[  603.234300]  [<ffffffff80118da2>] do_sync_read+0x102/0x160
[  603.234305]  [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[  603.234310]  [<ffffffff801199fb>] sys_read+0x5b/0xa0
[  603.234315]  [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[  603.234320]  [<00007fbbf8f1ea90>] 0x7fbbf8f1ea90
[  603.234323] INFO: task blkid:5397 blocked for more than 120 seconds.
[  603.234324] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  603.234326] blkid         D 0000000098d5b5f5     0  5397      1 0x00000000
[  603.234330]  ffff8800381656c8 0000000000000286 ffff880038165628
ffff880038165648
[  603.234333]  0000000000000000 ffff880038165690 000000000000a380
ffff8800382f8be8
[  603.234336]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  603.234339] Call Trace:
[  603.234345]  [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[  603.234351]  [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[  603.234358]  [<ffffffff80399498>] md_make_request+0xc8/0x140
[  603.234363]  [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[  603.234369]  [<ffffffff8022287d>] submit_bio+0x7d/0x110
[  603.234373]  [<ffffffff80147c12>] submit_bh+0x102/0x150
[  603.234379]  [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[  603.234383]  [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[  603.234388]  [<ffffffff800e03f6>] read_pages+0xe6/0x110
[  603.234393]  [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[  603.234398]  [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[  603.234403]  [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[  603.234408]  [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[  603.234412]  [<ffffffff800d73d6>] T.731+0x2b6/0x440
[  603.234417]  [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[  603.234422]  [<ffffffff80118da2>] do_sync_read+0x102/0x160
[  603.234427]  [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[  603.234432]  [<ffffffff801199fb>] sys_read+0x5b/0xa0
[  603.234437]  [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[  603.234442]  [<00007fb60ac89a90>] 0x7fb60ac89a90
[  723.225805] INFO: task md1_resync:1441 blocked for more than 120 seconds.
[  723.225892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  723.226010] md1_resync    D 0000000000000000     0  1441      2 0x00000000
[  723.226108]  ffff88003d967bb0 0000000000000246 ffff88003d967b10
ffff88003d967b30
[  723.226239]  0000000000000000 ffff88003d967b78 000000000000a380
ffff88003dba8be8
[  723.226372]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  723.226504] Call Trace:
[  723.226600]  [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1]
[  723.226684]  [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1]
[  723.226766]  [<ffffffff80399de9>] md_do_sync+0x669/0xc40
[  723.226841]  [<ffffffff8039ac54>] md_thread+0x54/0x150
[  723.226913]  [<ffffffff8006fac6>] kthread+0xb6/0xc0
[  723.226987]  [<ffffffff8000d38a>] child_rip+0xa/0x20
[  723.227078] INFO: task python:5365 blocked for more than 120 seconds.
[  723.227147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  723.227265] python        D 00000000c7c9e55f     0  5365   5348 0x00000000
[  723.227346]  ffff8800382595b8 0000000000000282 ffff880038259518
ffff880038259538
[  723.227482]  ffff8800382594e8 ffff880038259580 000000000000a380
ffff8800381e88e8
[  723.227616]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  723.227750] Call Trace:
[  723.227820]  [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[  723.227904]  [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[  723.227983]  [<ffffffff80399498>] md_make_request+0xc8/0x140
[  723.228059]  [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[  723.228135]  [<ffffffff8022287d>] submit_bio+0x7d/0x110
[  723.228209]  [<ffffffff80153535>] mpage_bio_submit+0x35/0x50
[  723.228284]  [<ffffffff80153aa3>] do_mpage_readpage+0x383/0x710
[  723.228362]  [<ffffffff80153fb3>] mpage_readpages+0xf3/0x150
[  723.228437]  [<ffffffff801b8ccb>] ext2_readpages+0x2b/0x50
[  723.228512]  [<ffffffff800e0353>] read_pages+0x43/0x110
[  723.228586]  [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[  723.228667]  [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[  723.228746]  [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[  723.228828]  [<ffffffff800e0a60>] page_cache_async_readahead+0xa0/0xc0
[  723.228906]  [<ffffffff800d7311>] T.731+0x1f1/0x440
[  723.228978]  [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[  723.229059]  [<ffffffff80118da2>] do_sync_read+0x102/0x160
[  723.229133]  [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[  723.229209]  [<ffffffff801199fb>] sys_read+0x5b/0xa0
[  723.229285]  [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[  723.229362]  [<00007fab60821a90>] 0x7fab60821a90
[  723.229431] INFO: task blkid:5393 blocked for more than 120 seconds.
[  723.229500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  723.229619] blkid         D 0000000000000001     0  5393      1 0x00000000
[  723.229700]  ffff8800406d96e8 0000000000000282 ffff8800406d9648
ffff8800406d9668
[  723.229857]  ffff8800406d9618 ffff8800406d96b0 000000000000a380
ffff8800382fe4a8
[  723.229998]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  723.230083] Call Trace:
[  723.230097]  [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[  723.230109]  [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[  723.230126]  [<ffffffff80399498>] md_make_request+0xc8/0x140
[  723.230142]  [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[  723.230152]  [<ffffffff8022287d>] submit_bio+0x7d/0x110
[  723.230162]  [<ffffffff80147c12>] submit_bh+0x102/0x150
[  723.230173]  [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[  723.230184]  [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[  723.230193]  [<ffffffff800e03f6>] read_pages+0xe6/0x110
[  723.230203]  [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[  723.230213]  [<ffffffff800e0839>] ondemand_readahead+0xd9/0x260
[  723.230223]  [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[  723.230235]  [<ffffffff800d73d6>] T.731+0x2b6/0x440
[  723.230244]  [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[  723.230254]  [<ffffffff80118da2>] do_sync_read+0x102/0x160
[  723.230263]  [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[  723.230278]  [<ffffffff801199fb>] sys_read+0x5b/0xa0
[  723.230288]  [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[  723.230298]  [<00007fbbf8f1ea90>] 0x7fbbf8f1ea90
[  723.230303] INFO: task blkid:5397 blocked for more than 120 seconds.
[  723.230306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  723.230310] blkid         D 0000000098d5b5f5     0  5397      1 0x00000000
[  723.230316]  ffff8800381656c8 0000000000000286 ffff880038165628
ffff880038165648
[  723.230322]  0000000000000000 ffff880038165690 000000000000a380
ffff8800382f8be8
[  723.230327]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  723.230333] Call Trace:
[  723.230344]  [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[  723.230355]  [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[  723.230381]  [<ffffffff80399498>] md_make_request+0xc8/0x140
[  723.230391]  [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[  723.230401]  [<ffffffff8022287d>] submit_bio+0x7d/0x110
[  723.230410]  [<ffffffff80147c12>] submit_bh+0x102/0x150
[  723.230423]  [<ffffffff8014adac>] block_read_full_page+0x23c/0x3b0
[  723.230434]  [<ffffffff801509b6>] blkdev_readpage+0x26/0x50
[  723.230443]  [<ffffffff800e03f6>] read_pages+0xe6/0x110
[  723.230453]  [<ffffffff800e05ac>] __do_page_cache_readahead+0x18c/0x1b0
[  723.230463]  [<ffffffff800e05ff>] ra_submit+0x2f/0x50
[  723.230475]  [<ffffffff800e087d>] ondemand_readahead+0x11d/0x260
[  723.230487]  [<ffffffff800e0aad>] page_cache_sync_readahead+0x2d/0x50
[  723.230495]  [<ffffffff800d73d6>] T.731+0x2b6/0x440
[  723.230504]  [<ffffffff800d7626>] generic_file_aio_read+0xc6/0x1f0
[  723.230514]  [<ffffffff80118da2>] do_sync_read+0x102/0x160
[  723.230523]  [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[  723.230535]  [<ffffffff801199fb>] sys_read+0x5b/0xa0
[  723.230547]  [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[  723.230556]  [<00007fb60ac89a90>] 0x7fb60ac89a90
[  723.230563] INFO: task lvscan:5451 blocked for more than 120 seconds.
[  723.230566] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  723.230570] lvscan        D 000000003a1ec879     0  5451   5449 0x00000000
[  723.230576]  ffff880038207898 0000000000000282 ffff8800382077f8
ffff880038207818
[  723.230581]  ffff8800382077e8 ffff880038207860 000000000000a380
ffff8800407047e8
[  723.230587]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  723.230592] Call Trace:
[  723.230606]  [<ffffffffa002c99d>] wait_barrier+0x15d/0x1f0 [raid1]
[  723.230618]  [<ffffffffa002fef8>] make_request+0x58/0x690 [raid1]
[  723.230631]  [<ffffffff80399498>] md_make_request+0xc8/0x140
[  723.230641]  [<ffffffff802224db>] generic_make_request+0x19b/0x4c0
[  723.230650]  [<ffffffff8022287d>] submit_bio+0x7d/0x110
[  723.230659]  [<ffffffff8015210b>] dio_bio_submit+0x6b/0xc0
[  723.230668]  [<ffffffff80152d28>] direct_io_worker+0x258/0x3c0
[  723.230678]  [<ffffffff801530be>] __blockdev_direct_IO+0x22e/0x4d0
[  723.230687]  [<ffffffff80150838>] blkdev_direct_IO+0x58/0x80
[  723.230695]  [<ffffffff800d7737>] generic_file_aio_read+0x1d7/0x1f0
[  723.230708]  [<ffffffff80118da2>] do_sync_read+0x102/0x160
[  723.230718]  [<ffffffff801192d5>] vfs_read+0xd5/0x1c0
[  723.230727]  [<ffffffff801199fb>] sys_read+0x5b/0xa0
[  723.230736]  [<ffffffff8000c868>] system_call_fastpath+0x16/0x1b
[  723.230745]  [<00007f78d4b85a80>] 0x7f78d4b85a80
[  843.222203] INFO: task md1_resync:1441 blocked for more than 120 seconds.
[  843.222290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  843.222406] md1_resync    D 0000000000000000     0  1441      2 0x00000000
[  843.222508]  ffff88003d967bb0 0000000000000246 ffff88003d967b10
ffff88003d967b30
[  843.222641]  0000000000000000 ffff88003d967b78 000000000000a380
ffff88003dba8be8
[  843.222777]  000000000000a380 000000000000a380 000000000000a380
0000000000007d00
[  843.222912] Call Trace:
[  843.223009]  [<ffffffffa002c63e>] raise_barrier+0xde/0x2e0 [raid1]
[  843.223091]  [<ffffffffa002d5cb>] sync_request+0x12b/0x680 [raid1]
[  843.223175]  [<ffffffff80399de9>] md_do_sync+0x669/0xc40
[  843.223250]  [<ffffffff8039ac54>] md_thread+0x54/0x150
[  843.223325]  [<ffffffff8006fac6>] kthread+0xb6/0xc0
[  843.223400]  [<ffffffff8000d38a>] child_rip+0xa/0x20


-- 
Configure bugmail: 
http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-bugs

<Prev in Thread] Current Thread [Next in Thread>