This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] LVM Snapshot Troubles

To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] LVM Snapshot Troubles
From: Michael Vrable <mvrable@xxxxxxxxxxx>
Date: Tue, 28 Sep 2004 11:09:05 -0700
Delivery-date: Tue, 28 Sep 2004 19:19:00 +0100
Envelope-to: steven.hand@xxxxxxxxxxxx
In-reply-to: <E1CCK8T-0006tk-00@xxxxxxxxxxxxxxxxx>; from Ian.Pratt@xxxxxxxxxxxx on Tue, Sep 28, 2004 at 04:43:25PM +0100
List-archive: <http://sourceforge.net/mailarchive/forum.php?forum=xen-devel>
List-help: <mailto:xen-devel-request@lists.sourceforge.net?subject=help>
List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
List-post: <mailto:xen-devel@lists.sourceforge.net>
List-subscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=subscribe>
List-unsubscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=unsubscribe>
Mail-followup-to: xen-devel@xxxxxxxxxxxxxxxxxxxxx
References: <41597B4B.6020009@xxxxxxxxxxxxxx> <E1CCK8T-0006tk-00@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-admin@xxxxxxxxxxxxxxxxxxxxx
User-agent: Mutt/
On Tue, Sep 28, 2004 at 04:43:25PM +0100, Ian Pratt wrote:
> There's nothing in slabinfo that looks crazy. I wander where all
> your memory is gone? BTW: how big is your dom0? 
> It's possible that dm-io or kcopyd is chewing up pages (which
> won't show up in the slab allocator). I'm surprised they're not
> just transient, though.

When I've run into memory trouble with snapshots, I've always seen a
stack backtrace that points me at kcopyd_client_create.  Following the
code: when creating a snapshot, a new kcopyd client is created with 256
(SNAPSHOT_PAGES in dm-snap.c) pages (= 1 MB) dedicated to that snapshot.
I think I managed to dig up the logs from one of the failures I've seen;
I've attached them to this message.

The problem seems to be made worse by the fact that all 256 pages are
allocated in a fairly short span of time, and (at least this is my
guess) the allocation fails even if it would be possible for the kernel
to free up the necessary memory with a bit more work.

(I've been able to create many more snapshots before running into
trouble if I try to make sure the kernel has a bit of extra free memory
before each lvcreate call--using dd to create a several megabyte file,
then deleting it to free up that space in the page cache.)

As has been noted, LVM doesn't have a very graceful failure mode when
this memory allocation problem is hit--I lose access to all the
snapshots when that happens.

I have also found that I can use dmsetup to create the COW devices
myself, which did at least (if I'm remembering correctly--this was a
little bit ago) have the benefit that if one snapshot failed, the others
were still available.  Basically, I used the same setup that LVM
normally would, except that I didn't create a snapshot-origin device
layered over the original device (this is what intercepts writes to the
source device and propagates a copy of the original data to each
snapshot, if needed).  Doing this manually isn't ideal, however.

Improvements that I think could be made:
  - Change the dm-snapshot driver in the kernel to (optionally?)
    allocate less memory for each snapshot, and fail more gracefully if
    unable to allocate the memory.
  - Adjust the LVM userspace tool to fail more gracefully if the device
    mapper driver gives an out-of-memory error.
  - Add an option to LVM for snapshots with a read-only origin (as I was
    doing manually with dmsetup).

--Michael Vrable

Attachment: lvm_error_log
Description: Text document