| 
         
xen-api
Re: [Xen-API] How snapshot work on LVMoISCS SR
 
Ian Pratt wrote:
 
That means if guest linux is executing "yum install kernel" when
creating snapshot, the vm created from this snapshot might be not
bootable.
     
 
Because xen issues write completions to the guest only when IO is completed, 
the snapshot will at least be crash consistent from a filesystem point of view 
(just like a physical system loosing power).
 Linux doesn't have a generic mechanism for doing higher-level 'freeze' operations (see Windows VSS) so there's no way to notify yum that we'd like to take a snapshot. Some linux filesystems do support a freeze operation, but it's not clear this buys a great deal.  
  
 Ack. Without application signalling (as provided by VSS) it's unclear 
whether there's any real benefit since the application data may still be 
internally inconsistent.
 FYI - for windows VMs XCP includes a VSS quiesced snapshot option 
(VM.snapshot_with_quiesce) which utilises the agent running in the guest 
as a VSS requestor to quiesce the apps, flush the local cache to disk 
and then trigger a snapshot for all the VMs disks.
- Julian
 
99 times out of 100 you'll get away with just taking a snapshot of a VM. If 
you're wanting to use the snapshot as a template for creating other clones 
you'd be best advised to shut the guest down and get a clean filesystem though. 
Any snapshot should be fine for general file backup purposes.
Ian
PS: I'd be surprised if "yum install kernel" didn't actually go to some lengths 
to be reasonably atomic as regards switching grub over to using the new kernel, otherwise 
you'd have the same problem on a physical machine crashing or losing power.
   
- Anthony
     
Daniel
       
How does XCP make sure this snapshot is usable,say, virtual disk
metadata is consistent?
Thanks
- Anthony
On Tue, 2010-01-26 at 13:56 -0800, Ian Pratt wrote:
         
I still have below questions.
1. if a non-leaf node is coalesce-able, it will be coalesced later
             
 
 
 
 
on
     
regardless how big the physical size of this node?
             
 
Yes: it's always good to coalesce the chain to improve access
           
 
 
 
performance.
     
2. there is one leaf node for a snapshot, actually it may be
             
 
 
 
 
empty, does
     
it exist only because it can prevent coalesce.
             
 
Not quite sure what you're referring to here. The current code has a
           
 
 
 
limitation whereby it is unable to coalesce a leaf into its parent, so
after you've created one snapshot you'll always have a chain length of 2
even if you delete the snapshot (if you create a second snapshot it can be
coalesced).
     
Coalescing a leaf into its parent is on the todo list: its a little
           
 
 
 
bit different from the other cases because it requires synchronization if
the leaf is in active use. It's not a big deal from a performance point of
view to have the slightly longer chain length, but it will be good to get
this fixed for cleanliness.
     
3. a clone will introduce a writable snapshot, it will prevent
             
 
 
 
 
coalesce
     
A clone will produce a new writeable leaf linked to the parent.  It
           
 
 
 
will prevent the linked snapshot from being coalesced, but any other
snapshots above or below on the chain can still be coalesced by the
garbage collector if the snapshots are deleted.
     
The XCP storage management stuff is pretty cool IMO...
Ian
           
- Anthony
On Tue, 2010-01-26 at 02:34 -0800, Julian Chesterfield wrote:
             
Hi Anthony,
Anthony Xu wrote: > Hi all, > > Basically snapshot on LVMoISCSI
               
 
 
 
 
 
SR work
     
 well, it provides thin > provisioning, so it is fast and disk
               
 
 
 
 
 
space
     
 efficient. > > > But I still have below concern. > > There is
               
 
 
 
 
 
one more
     
 vhd chain when creating snapshot, if I creates 16 > snapshots,
               
 
 
 
 
 
there
     
 are 16 vhd chains, that means when one VM accesses a > disk
               
 
 
 
 
 
block, it
     
 may need to access 16 vhd lvm one by one, then get the > right
               
 
 
 
 
 
block,
     
 it makes VM access disk slow. However, it is > understandable,
               
 
 
 
 
 
it is
     
 part of snapshot IMO. >   The depth and speed of access will
               
 
 
 
 
 
depend on
     
 the write pattern to the disk. In XCP we add an optimisation
               
 
 
 
 
 
called a
     
 BATmap which stores one bit per BAT entry. This is a fast
               
 
 
 
 
 
lookup table
     
 that is cached in memory while the VHD is open, and tells the
               
 
 
 
 
 
block
     
 device handler whether a block has been fully allocated. Once
               
 
 
 
 
 
the
     
 block is fully allocated (all logical 2MB written) the block
               
 
 
 
 
 
handler
     
 knows that it doesn't need to read or write the Bitmap that
 corresponds to the data block, it can go directly to the disk
               
 
 
 
 
 
offset.
     
 Scanning through the VHD chain can therefore be very quick,
               
 
 
 
 
 
i.e. the
     
 block handler reads down the chain of BAT tables for each node
               
 
 
 
 
 
until
     
 it detects a node that is allocated with hopefully the BATmap
               
 
 
 
 
 
value
     
 set. The worst case is a random disk write workload which
               
 
 
 
 
 
causes the
     
 disk to be fragmented and partially allocated. Every read or
               
 
 
 
 
 
write
     
 will therefore potentially incur a bitmap check at every level
               
 
 
 
 
 
of the
     
 chain. > But after I delete all these 16 snapshots, there is
               
 
 
 
 
 
still 16
     
 vhd chains, > the disk access is still slow, which is not
 understandable and > reasonable, even though there may be only
               
 
 
 
 
 
several
     
 KB difference between > each snapshot, >   There is a mechanism
               
 
 
 
 
 
in XCP
     
 called the GC coalesce thread which gets kicked asynchronously
 following a VDI deletion event. It queries the VHD tree, and
 determines whether there is any coalescable work to do.
               
 
 
 
 
 
Coalesceable
     
 work is defined as:
'a hidden child node that has no siblings'
Hidden nodes are non-leaf nodes that reside within a chain. When
               
 
 
 
 
 
the
     
snapshot leaf node is deleted therefore, it will leave redundant
               
 
 
 
 
 
links
     
in the chain that can be safely coalesced. You can kick off a
               
 
 
 
 
 
coalesce
     
by issuing an SR scan, although it should kick off automatically
               
 
 
 
 
 
within
     
30 seconds of deleting the snapshot node, handled by XAPI. If
               
 
 
 
 
 
you look
     
in the /var/log/SMlog file you'll see a lot of debug information
including tree dependencies which will tell you a) whether the
               
 
 
 
 
 
GC thread
     
is running, and b) whether there is coalescable work to do. Note
               
 
 
 
 
 
that
     
deleting snapshot nodes does not always mean that there is
               
 
 
 
 
 
coalescable
     
work to do since there may be other siblings, e.g. VDI clones.
               
is there any way we can reduce depth of vhd chain after
                 
 
 
 
 
 
 
deleting
     
snapshots? get VM back to normal disk performance.
                 
 
The coalesce thread handles this, see above.
               
And, I notice there are useless vhd volume exist after
                 
 
 
 
 
 
 
deleting snap
     
shots, can we delete them automatically?
                 
 
No. I do not recommend deleting VHDs manually since they are
               
 
 
 
 
 
almost
     
certainly referenced by something else in the chain. If you
               
 
 
 
 
 
delete them
     
manually you will break the chain, it will become unreadable,
               
 
 
 
 
 
and you
     
potentially lose critical data. VHD chains must be correctly
               
 
 
 
 
 
coalesced
     
in order to maintain data integrity.
Thanks,
Julian
               
- Anthony
_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api
                 
 
 
_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api
             
 
 
_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api
         
       
 
 
   
 
_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api
 
 |   
 
 | 
    |