WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-api

RE: [Xen-API] How snapshot work on LVMoISCS SR

On Tue, 2010-01-26 at 14:18 -0800, Daniel Stodden wrote:
> On Tue, 2010-01-26 at 17:07 -0500, Anthony Xu wrote:
> > It is clear now, thanks.
> > 
> > The other thing I'd like to do is how XCP handle disk cache inside VM
> > when creating a snapshot? I saw from Xencenter seem the VM is stopped
> > temporarily when creating a snapshot.
> > 
> > Does VM flush dirty disk cache when creating snapshot?
> 
> Depends what you mean by disk caches. All I/O performed by the backend
> non-buffered, so there's presently no need to flush. As soon as a guest
> I/O request is processed, it essentially goes directly to the disk.
> 
> The snapshot is created while the VBD is paused, i.e. guest accesses
> which haven't been issued to the disk are suspended. Next, request which
> have been sent to the disk are waited for, up to completion. Then blktap
> closes the handle to the physical disk node.
> 
> Before resuming guest access, we then reopen the newly created snapshot
> node, as the new leaf node.



That means if guest linux is executing "yum install kernel" when
creating snapshot, the vm created from this snapshot might be not
bootable.


- Anthony




> 
> Daniel
> 
> > How does XCP make sure this snapshot is usable,say, virtual disk
> > metadata is consistent?
> > 
> > Thanks
> > - Anthony
> > 
> > 
> > On Tue, 2010-01-26 at 13:56 -0800, Ian Pratt wrote:
> > > > I still have below questions.
> > > > 
> > > > 1. if a non-leaf node is coalesce-able, it will be coalesced later on
> > > > regardless how big the physical size of this node?
> > > 
> > > Yes: it's always good to coalesce the chain to improve access performance.
> > >  
> > > > 2. there is one leaf node for a snapshot, actually it may be empty, does
> > > > it exist only because it can prevent coalesce.
> > > 
> > > Not quite sure what you're referring to here. The current code has a 
> > > limitation whereby it is unable to coalesce a leaf into its parent, so 
> > > after you've created one snapshot you'll always have a chain length of 2 
> > > even if you delete the snapshot (if you create a second snapshot it can 
> > > be coalesced). 
> > > 
> > > Coalescing a leaf into its parent is on the todo list: its a little bit 
> > > different from the other cases because it requires synchronization if the 
> > > leaf is in active use. It's not a big deal from a performance point of 
> > > view to have the slightly longer chain length, but it will be good to get 
> > > this fixed for cleanliness.  
> > > 
> > > > 3. a clone will introduce a writable snapshot, it will prevent coalesce
> > > 
> > > A clone will produce a new writeable leaf linked to the parent.  It will 
> > > prevent the linked snapshot from being coalesced, but any other snapshots 
> > > above or below on the chain can still be coalesced by the garbage 
> > > collector if the snapshots are deleted. 
> > > 
> > > The XCP storage management stuff is pretty cool IMO...
> > > 
> > > Ian
> > > 
> > > > 
> > > > - Anthony
> > > > 
> > > > 
> > > > 
> > > > On Tue, 2010-01-26 at 02:34 -0800, Julian Chesterfield wrote:
> > > > > Hi Anthony,
> > > > >
> > > > > Anthony Xu wrote: > Hi all, > > Basically snapshot on LVMoISCSI SR 
> > > > > work
> > > > >  well, it provides thin > provisioning, so it is fast and disk space
> > > > >  efficient. > > > But I still have below concern. > > There is one 
> > > > > more
> > > > >  vhd chain when creating snapshot, if I creates 16 > snapshots, there
> > > > >  are 16 vhd chains, that means when one VM accesses a > disk block, it
> > > > >  may need to access 16 vhd lvm one by one, then get the > right block,
> > > > >  it makes VM access disk slow. However, it is > understandable, it is
> > > > >  part of snapshot IMO. >   The depth and speed of access will depend 
> > > > > on
> > > > >  the write pattern to the disk. In XCP we add an optimisation called a
> > > > >  BATmap which stores one bit per BAT entry. This is a fast lookup 
> > > > > table
> > > > >  that is cached in memory while the VHD is open, and tells the block
> > > > >  device handler whether a block has been fully allocated. Once the
> > > > >  block is fully allocated (all logical 2MB written) the block handler
> > > > >  knows that it doesn't need to read or write the Bitmap that
> > > > >  corresponds to the data block, it can go directly to the disk offset.
> > > > >  Scanning through the VHD chain can therefore be very quick, i.e. the
> > > > >  block handler reads down the chain of BAT tables for each node until
> > > > >  it detects a node that is allocated with hopefully the BATmap value
> > > > >  set. The worst case is a random disk write workload which causes the
> > > > >  disk to be fragmented and partially allocated. Every read or write
> > > > >  will therefore potentially incur a bitmap check at every level of the
> > > > >  chain. > But after I delete all these 16 snapshots, there is still 16
> > > > >  vhd chains, > the disk access is still slow, which is not
> > > > >  understandable and > reasonable, even though there may be only 
> > > > > several
> > > > >  KB difference between > each snapshot, >   There is a mechanism in 
> > > > > XCP
> > > > >  called the GC coalesce thread which gets kicked asynchronously
> > > > >  following a VDI deletion event. It queries the VHD tree, and
> > > > >  determines whether there is any coalescable work to do. Coalesceable
> > > > >  work is defined as:
> > > > >
> > > > > 'a hidden child node that has no siblings'
> > > > >
> > > > > Hidden nodes are non-leaf nodes that reside within a chain. When the
> > > > > snapshot leaf node is deleted therefore, it will leave redundant links
> > > > > in the chain that can be safely coalesced. You can kick off a coalesce
> > > > > by issuing an SR scan, although it should kick off automatically 
> > > > > within
> > > > > 30 seconds of deleting the snapshot node, handled by XAPI. If you look
> > > > > in the /var/log/SMlog file you'll see a lot of debug information
> > > > > including tree dependencies which will tell you a) whether the GC 
> > > > > thread
> > > > > is running, and b) whether there is coalescable work to do. Note that
> > > > > deleting snapshot nodes does not always mean that there is coalescable
> > > > > work to do since there may be other siblings, e.g. VDI clones.
> > > > > > is there any way we can reduce depth of vhd chain after deleting
> > > > > > snapshots? get VM back to normal disk performance.
> > > > > >
> > > > > The coalesce thread handles this, see above.
> > > > > > And, I notice there are useless vhd volume exist after deleting snap
> > > > > > shots, can we delete them automatically?
> > > > > >
> > > > > No. I do not recommend deleting VHDs manually since they are almost
> > > > > certainly referenced by something else in the chain. If you delete 
> > > > > them
> > > > > manually you will break the chain, it will become unreadable, and you
> > > > > potentially lose critical data. VHD chains must be correctly coalesced
> > > > > in order to maintain data integrity.
> > > > >
> > > > > Thanks,
> > > > > Julian
> > > > > >
> > > > > > - Anthony
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > xen-api mailing list
> > > > > > xen-api@xxxxxxxxxxxxxxxxxxx
> > > > > > http://lists.xensource.com/mailman/listinfo/xen-api
> > > > > >
> > > > >
> > > > 
> > > > 
> > > > _______________________________________________
> > > > xen-api mailing list
> > > > xen-api@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/mailman/listinfo/xen-api
> > 
> > 
> > _______________________________________________
> > xen-api mailing list
> > xen-api@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/mailman/listinfo/xen-api
> 
> 


_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api