WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Remus : VM on backup not in pause state

To: Dulloor <dulloor@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Remus : VM on backup not in pause state
From: Dulloor <dulloor@xxxxxxxxx>
Date: Mon, 26 Jul 2010 23:17:52 -0700
Cc:
Delivery-date: Mon, 26 Jul 2010 23:18:28 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=YZPJa4bLk45yYy8jMzdwmgz36F7VUlWTlpSPRyon07E=; b=rGhXviGE501D9tMmM0Z0zbUbzHjVgqP59VG3zYdJCwaVQiyly9ZJqBo/uS2jfAMA9P IkK19yzkUO90ueJOb5X4XaYI/j573Z68pX2d/rHflVGOY0YxbiP4Tyg62INOIvvhJbFj rAwlhhxsSfexD0qVF3xXUBKMQLso733VtuK9E=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=JqXc5bY1oBDOQ9ihorHo8VhWh2eGFo24Sse6KmsXb7l9366/uQO3lsxSmA8wL84JTj DS2HeEwIAAGywplZM4bnuJP8Fc0y4PIcOSBRdU0DP1+ycu8jeLxK7ZnB9koonekrnAXM YaAFWuQYjYvtEBgp1ANQDYuBgfZl6rZhd5TQY=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100726220526.GA19006@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTimMdbYpbLEMJooOTwdq-XpjLBmEVNBlnneuPoXz@xxxxxxxxxxxxxx> <20100722214913.GE3994@xxxxxxxxxxxxxxxxx> <AANLkTimV1PGPes9CZfyyg6rbpH_BotkoL7VQwZhWWosf@xxxxxxxxxxxxxx> <20100726220526.GA19006@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thanks for the pointers. I haven't had time to work on this. I will
collect more data and get back as soon as I can.

-dulloor

On Mon, Jul 26, 2010 at 3:05 PM, Brendan Cully <brendan@xxxxxxxxx> wrote:
> On Thursday, 22 July 2010 at 16:40, Dulloor wrote:
>> On Thu, Jul 22, 2010 at 2:49 PM, Brendan Cully <brendan@xxxxxxxxx> wrote:
>> > On Thursday, 22 July 2010 at 13:45, Dulloor wrote:
>> >> My setup is as follows :
>> >> - xen : unstable (rev:21743)
>> >> - Dom0 : pvops (branch : stable-2.6.32.x,
>> >> rev:01d9fbca207ec232c758d991d66466fc6e38349e)
>> >> - Guest Configuration :
>> >> ------------------------------------------------------------------------------------------
>> >> kernel = "/usr/lib/xen/boot/hvmloader"
>> >> builder='hvm'
>> >> name = "linux-hvm"
>> >> vcpus = 4
>> >> memory = 2048
>> >> vif = [ 'type=ioemu, bridge=eth0, mac=00:1c:3e:17:22:13' ]
>> >> disk = [ 'phy:/dev/XenVolG/hvm-linux-snap-1.img,hda,w' ]
>> >> device_model = '/usr/lib/xen/bin/qemu-dm'
>> >> boot="cd"
>> >> sdl=0
>> >> vnc=1
>> >> vnclisten="0.0.0.0"
>> >> vncconsole=0
>> >> vncpasswd=''
>> >> stdvga=0
>> >> superpages=1
>> >> serial='pty'
>> >> ------------------------------------------------------------------------------------------
>> >>
>> >> - Remus command :
>> >> # remus --no-net linux-hvm <dst-ip>
>> >>
>> >> - On primary :
>> >> # xm list
>> >> Name                                        ID   Mem VCPUs      State   
>> >> Time(s)
>> >> linux-hvm                                    9  2048     4     -b-s--     
>> >> 10.8
>> >>
>> >> - On secondary :
>> >> # xm list
>> >> Name                                        ID   Mem VCPUs      State   
>> >> Time(s)
>> >> linux-hvm                                   11  2048     4     -b----     
>> >>  1.9
>> >>
>> >>
>> >> I have to issue "xm pause/unpause" explicitly for the backup VM.
>> >> Any recent changes ?
>> >
>> > This probably means there was a timeout on the replication channel,
>> > interpreted by the backup as a failure of the primary, which caused it
>> > to activate itself. You should see evidence of that in the remus
>> > console logs and xend.log and daemon.log (for the disk side).
>> >
>> > Once you've figured out where the timeout happened it'll be easier to
>> > figure out why.
>> >
>> Please find the logs attached. I didn't find anything interesting in
>> daemon.log.
>> What does remus log there ? I am not using disk replication, since I
>> have issues with that .. but that's for another email :)
>
> daemon.log is just for disk replication, so if you're not using it you
> won't see anything.
>
>> The only visible error is in xend-secondary.log around xc_restore :
>> [2010-07-22 16:15:37 2056] DEBUG (balloon:207) Balloon: setting dom0 target 
>> to 5
>> 765 MiB.
>> [2010-07-22 16:15:37 2056] DEBUG (XendDomainInfo:1467) Setting memory target 
>> of
>> domain Domain-0 (0) to 5765 MiB.
>> [2010-07-22 16:15:37 2056] DEBUG (XendCheckpoint:290) [xc_restore]: 
>> /usr/lib/xen
>> /bin/xc_restore 5 1 5 6 1 1 1 0
>> [2010-07-22 16:18:42 2056] INFO (XendCheckpoint:408) xc: error: Error
>> when reading pages (11 = Resource temporarily unavailabl): Internal
>> error
>> [2010-07-22 16:18:42 2056] INFO (XendCheckpoint:408) xc: error: error
>> when buffering batch, finishing (11 = Resource temporarily
>> unavailabl): Internal error
>>
>> If you haven't seen this before, please let me know and I will try
>> debugging more.
>
> I haven't seen that. It looks like read_exact_timed has failed with
> EAGAIN, which is surprising since it explicitly looks for EAGAIN and
> loops on it. Can you log len and errno after line 77 in
> read_exact_timed in tools/libxc/xc_domain_restore.c? ie change
>
>       if ( len <= 0 )
>            return -1;
>
> to something like
>
>   if ( len <= 0 ) {
>       fprintf(stderr, "read_exact_timed failed (read rc: %d, errno: %d)\n",
>       len, errno);
>       return -1;
>   }
>
> Another possibility is read is returning 0 here (and EAGAIN is just a
> leftover errno from a previous read), which would indicate that the
> _sender_ hung up the connection. It's hard to tell exactly what's
> going on because you seem to have an enormous amount of clock skew
> between your primary and secondary dom0s and I can't tell whether the
> logs match up.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel