WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] remus trouble

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] remus trouble
From: Nomen Nescio <info@xxxxxxxxxxxx>
Date: Wed, 7 Jul 2010 14:12:09 +0200
Delivery-date: Wed, 07 Jul 2010 05:13:09 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100706180212.GE13388@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20100706113654.GN9918@xxxxxxxxx> <20100706180212.GE13388@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.13 (2006-08-11)
Hey Brendan & all,

> > I ran into some problems trying remus on xen4.0.1rc4 with the 2.6.31.13
> > dom0 (checkout from yesterday):
> 
> Wat's your domU kernel? pvops support was recently added to dom0, but
> still doesn't work for domU.

Ah, that explains a few things, however similar behaviour occurs with
hvm. Remus starts, spits out the following output:

qemu logdirty mode: enable
 1: sent 267046, skipped 218, delta 8962ms, dom0 68%, target 0%, sent
976Mb/s, dirtied 1Mb/s 290 pages
 2: sent 290, skipped 0, delta 12ms, dom0 66%, target 0%, sent 791Mb/s,
dirtied 43Mb/s 16 pages
 3: sent 16, skipped 0, Start last iteration
PROF: suspending at 1278503125.101352
issuing HVM suspend hypercall
suspend hypercall returned 0
pausing QEMU
SUSPEND shinfo 000fffff
delta 11ms, dom0 18%, target 0%, sent 47Mb/s, dirtied 47Mb/s 16 pages
 4: sent 16, skipped 0, delta 5ms, dom0 20%, target 0%, sent 104Mb/s,
dirtied 104Mb/s 16 pages
Total pages sent= 267368 (0.25x)
(of which 0 were fixups)
All memory is saved
PROF: resumed at 1278503125.111614
resuming QEMU
Sending 6017 bytes of QEMU state
PROF: flushed memory at 1278503125.112014


and then seems to become inactive. ps tree looks like this:

root      4756  0.4  0.1  82740 11040 pts/0    SLl+ 13:45   0:03
/usr/bin/python /usr/bin/remus --no-net remus1 backup


according to strace, it's stuck reading FD6, which is a FIFO file:
/var/run/tap/remus_nas1_9000.msg


the domU comes up in blocked state on the backup machine and seems to
run fine there. however xm list on the primary shows no state whatsoever:

Domain-0                                     0 10208    12     r-----
468.6
remus1                                       1  1024     1     ------
41.8


and after a ctrl-c remus segfaults:
remus[4756]: segfault at 0 ip 00007f3f49cc7376 sp 00007fffec999fd8 error
4 in libc-2.11.1.so[7f3f49ba1000+178000]


> Are these in dom0 or the primary domU? Looks a bit like dom0, but I
> haven't seen these before.

those were in dom0. this time dmesg shows output after destroying
the domU on the primary:

[ 1920.059226] INFO: task xenwatch:55 blocked for more than 120 seconds.
[ 1920.059262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1920.059315] xenwatch      D 0000000000000000     0    55      2
0x00000000
[ 1920.059363]  ffff8802e2e656c0 0000000000000246 0000000000011200
0000000000000000
[ 1920.059439]  ffff8802e2e65720 0000000000000000 ffff8802d55d20c0
00000001001586b3
[ 1920.059520]  ffff8802e2e683b0 000000000000f668 00000000000153c0
ffff8802e2e683b0
[ 1920.059592] Call Trace:
[ 1920.059626]  [<ffffffff8157553d>] io_schedule+0x2d/0x40
[ 1920.059661]  [<ffffffff812afbc9>] get_request_wait+0xe9/0x1c0
[ 1920.059695]  [<ffffffff810af240>] ? autoremove_wake_function+0x0/0x40
[ 1920.059732]  [<ffffffff812a3e87>] ? elv_merge+0x37/0x200
[ 1920.059765]  [<ffffffff812afd41>] __make_request+0xa1/0x470
[ 1920.059800]  [<ffffffff810389ff>] ? xen_restore_fl_direct_end+0x0/0x1
[ 1920.059837]  [<ffffffff8103ed5d>] ? retint_restore_args+0x5/0x6
[ 1920.059874]  [<ffffffff812ae5dc>] generic_make_request+0x17c/0x4a0
[ 1920.059909]  [<ffffffff8111bdf6>] ? mempool_alloc+0x56/0x140
[ 1920.059946]  [<ffffffff8103819d>] ?
xen_force_evtchn_callback+0xd/0x10
[ 1920.059979]  [<ffffffff812ae978>] submit_bio+0x78/0xf0
[ 1920.060013]  [<ffffffff81180489>] submit_bh+0xf9/0x140
[ 1920.060046]  [<ffffffff81182600>] __block_write_full_page+0x1e0/0x3a0
[ 1920.060080]  [<ffffffff811819c0>] ? end_buffer_async_write+0x0/0x1f0
[ 1920.060116]  [<ffffffff81186980>] ? blkdev_get_block+0x0/0x70
[ 1920.060151]  [<ffffffff81186980>] ? blkdev_get_block+0x0/0x70
[ 1920.060186]  [<ffffffff811819c0>] ? end_buffer_async_write+0x0/0x1f0
[ 1920.060222]  [<ffffffff81182ec1>]
block_write_full_page_endio+0xe1/0x120
[ 1920.060259]  [<ffffffff81038a12>] ? check_events+0x12/0x20
[ 1920.060294]  [<ffffffff81182f15>] block_write_full_page+0x15/0x20
[ 1920.060330]  [<ffffffff81187928>] blkdev_writepage+0x18/0x20
[ 1920.060365]  [<ffffffff81120937>] __writepage+0x17/0x40
[ 1920.060399]  [<ffffffff81121897>] write_cache_pages+0x227/0x4d0
[ 1920.060434]  [<ffffffff81120920>] ? __writepage+0x0/0x40
[ 1920.060469]  [<ffffffff810389ff>] ? xen_restore_fl_direct_end+0x0/0x1
[ 1920.060504]  [<ffffffff81121b64>] generic_writepages+0x24/0x30
[ 1920.060539]  [<ffffffff81121b9d>] do_writepages+0x2d/0x50
[ 1920.060576]  [<ffffffff81119beb>]
__filemap_fdatawrite_range+0x5b/0x60
[ 1920.060613]  [<ffffffff8111a1ff>] filemap_fdatawrite+0x1f/0x30
[ 1920.060646]  [<ffffffff8111a245>] filemap_write_and_wait+0x35/0x50
[ 1920.060681]  [<ffffffff81187ba4>] __sync_blockdev+0x24/0x50
[ 1920.060716]  [<ffffffff81187be3>] sync_blockdev+0x13/0x20
[ 1920.060748]  [<ffffffff81187cc8>] __blkdev_put+0xa8/0x1a0
[ 1920.060784]  [<ffffffff81187dd0>] blkdev_put+0x10/0x20
[ 1920.060819]  [<ffffffff81344fea>] vbd_free+0x2a/0x40
[ 1920.060851]  [<ffffffff81344499>] blkback_remove+0x59/0x90
[ 1920.060885]  [<ffffffff8133e890>] xenbus_dev_remove+0x50/0x70
[ 1920.060921]  [<ffffffff8138b9d8>] __device_release_driver+0x58/0xb0
[ 1920.060956]  [<ffffffff8138bb4d>] device_release_driver+0x2d/0x40
[ 1920.060991]  [<ffffffff8138ac0a>] bus_remove_device+0x9a/0xc0
[ 1920.061027]  [<ffffffff81388da7>] device_del+0x127/0x1d0
[ 1920.061061]  [<ffffffff81388e66>] device_unregister+0x16/0x30
[ 1920.061095]  [<ffffffff813441a0>] frontend_changed+0x90/0x2a0
[ 1920.061131]  [<ffffffff8133eb82>] xenbus_otherend_changed+0xb2/0xc0
[ 1920.061167]  [<ffffffff81577aa7>] ? _spin_unlock_irqrestore+0x37/0x60
[ 1920.061209]  [<ffffffff8133f150>] frontend_changed+0x10/0x20
[ 1920.061243]  [<ffffffff8133c794>] xenwatch_thread+0xb4/0x190
[ 1920.061281]  [<ffffffff810af240>] ? autoremove_wake_function+0x0/0x40
[ 1920.061314]  [<ffffffff8133c6e0>] ? xenwatch_thread+0x0/0x190
[ 1920.061349]  [<ffffffff810aecb6>] kthread+0xa6/0xb0
[ 1920.061383]  [<ffffffff8103f3ea>] child_rip+0xa/0x20
[ 1920.061415]  [<ffffffff8103e5d7>] ? int_ret_from_sys_call+0x7/0x1b
[ 1920.061451]  [<ffffffff8103ed5d>] ? retint_restore_args+0x5/0x6
[ 1920.061485]  [<ffffffff8103f3e0>] ? child_rip+0x0/0x20


Any idea what's going wrong? Thanks!

Cheers,

NN

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>