This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward

To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date: Tue, 27 Jul 2010 16:58:10 +0100
Delivery-date: Tue, 27 Jul 2010 08:59:40 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Currently the configuration syntax available in a domain configuration
has several ways of specifying devices, some of which have slightly
unexpected semantics wrt whether or not an emulated device is created,
what the major number in xenstore is etc. Some also expose details of
the guest OS's choice of major number (or rather exposes Linux's choice
to all guests AFAICT).

In an attempt to clean this up, or at least make the strange behaviour
more explicit, I'd like to propose some extensions to the dXpY syntax
supported by libxl such that the other existing ways of specifying
devices become syntactic sugar for specific well defined configurations
in the new syntax, whilst preserving backwards compatibility.

I hope that the following will also form the basis for a future document
(gasp!) describing the available syntax, which combinations are valid
etc (unless someone can point me to an existing document I can update).

Virtual Disk Configuration

A virtual disk is defined in the guest configuration file as d<X>p<Y>
where <X> is the disk number and <Y> is the partition number. In
addition a number of options can be specified.

p0 indicates the entire disk.

Device number encoding in xenstore

Given a disk specified as dXpY the device encoding used in xenstore has
two potential formats, legacy and extended. Both of these are already
defined and implemented in guest frontend drivers.

The extended encoding is generally preferred but for backwards
compatibility the legacy format must still be supported.

The legacy encoding is (major and minor 8 bits each):
        (major << 8) | minor

The extended encoding is (disk == 19 bits, partition == 256 bits):
        (1 << 28) | (disk << 8) | partition

Note that the extended encoding for d0p0..d0p255 overlaps in the minor
number space with the legacy encodings of d0p0..d15p15 and therefore
these must not be used simultaneously.

Configuration Options

Each disk dXpY can optionally be followed by one or more of the
following key value pairs (precise syntax TBD, but comma separated is
common in similar situations).

Option keys and values with a _ prefix are for internal use only and are
used only to provide legacy semantics for syntactic sugar and must not
otherwise be used.
        pv = true | false
                Should a PV backend/frontend pair be created in xenstore
                to correspond to this device.
                Default: true for HVM guests, ignored for PV guests
                (treated as true)
        extended = true | false
                Request use of extended device encoding in xenstore.
                extended = false is only valid for d0..d15 (as d16+
                cannot be represented in the legacy encoding)
                When extended = false and in the absence of a specific
                _vdevice configuration option (see below) the encoding
                will use major==202 and minor=="(disk << 4) |
                Default: false for d0p0..d0p255, false if _vdevice
                option present (see below), otherwise true.
        emul = none | ide[01].[01] | _ide[01].[01] | ...
                none = No emulated device to be created.
                ide[01].[01] = Emulate IDE device. First [01] =>
                primary, secondary. Second [01] => master, slave
                _ide[01].[01] = As per ide[01].[01] however emulation is
                enabled iff no other disk is explicitly configured with
                In the future sata<X>.<Y> or similar might be added
                Default: none HVM guests, ignored for PV guests (treated
                as none)
        _vdevice = <N>:<M> | <Q>
                Enforce use of legacy device encoding in xenstore with
                the given major:minor or explicit value.
                Default: unset, encoding determined by "extended" option
                (see above)

Backward compatible disk configuration

Given the above configuration options several short hands are defined
for backwards compatibility with existing configuration files and

These will be implemented by a straight textual substitution before
parsing the configuration.

        hda => d0p0,pv=true,emul=ide0.0,_vdevice=3:0
        hdb => d1p0,pv=true,emul=ide0.1,_vdevice=3:64
        hdc => d2p0,pv=true,emul=ide1.0,_vdevice=22:0
        hdd => d3p0,pv=true,emul=ide1.1,_vdevice=22:64

        xvda => d0p0,pv=true,emul=_ide0.0,_vdevice=202:0
        xvdb => d1p0,pv=true,emul=_ide0.1,_vdevice=202:16
        xvdc => d2p0,pv=true,emul=_ide1.0,_vdevice=202:32
        xvdd => d3p0,pv=true,emul=_ide1.1,_vdevice=202:64
        xvde => d4p0,pv=true,emul=none,_vdevice=202:80
        xvdo => d15p0,pv=true,emul=none,_vdevice=202:240
        xvdp => d16p0,pv=true,emul=none
        xvdz => d25,pv=true,emul=none
        xvda[1..15] =>
        xvdb[1..15] => etc

Note that all the above are Linux (guest) specific.

The sd* syntax is not covered. It's unclear if this is used in the wild
or what the existing semantics of emul= are for SCSI devices. If someone
cares to investigate the existing behaviour then it can be added.

Otherwise it is expected that additions will not be made to this set of
shorthands and that new functionality (e.g. emulation types) will be
available only via the explicit syntax.

(is there any non-Linux specific syntax used by other guest OSes which
needs to be supported?)

Implementation notes

The behaviour specified by the emul=_ide[01].[01] syntax is currently
implemented by qemu (effectively as a workaround for users forgetting to
specify any emulated disks). I propose that as part of implementing this
new syntax we push responsibility for these semantics up into libxl.

libxl currently uses the legacy encoding for devices specified as xvd or
dXpY iff the particular configuration can be represented using the
legacy format (e.g. for d0p0..d15p15 or xvda..xvdp) in order to (1)
avoid the clash between the extended representation of d0p0 and the
legacy representations of d1..d15 and (2) to provide compatibility with
guests which do not support the extended device encoding.

The proposal above suggests instead that d1+ should be encoded using the
extended format unless overridden using the extended=false option or one
of the shorthands which uses the_vdevice option. Only d0 would default
to legacy encoding.

This (1) avoids the clash in minor numbers since d0 is the only disk
which can clash with legacy encodings and (2) provides compatibility
with old guests through their use of the xvd* syntax.

Xen-devel mailing list