WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Greater than 16 xvd devices for blkfront

To: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Greater than 16 xvd devices for blkfront
From: Chris Wright <chrisw@xxxxxxxxxxxx>
Date: Thu, 8 May 2008 08:33:50 -0700
Cc: Chris Wright <chrisw@xxxxxxxxxxxx>, Chris Lalancette <clalance@xxxxxxxxxx>, "Daniel P. Berrange" <berrange@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 08 May 2008 08:34:17 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <18466.51240.492762.405781@xxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <48209705.4030005@xxxxxxxxxx> <20080507015502.GA2121@xxxxxxxxxx> <20080507034726.GC2121@xxxxxxxxxx> <20080507164031.GB18143@xxxxxxxxxxxxxxxxxxxx> <18466.51240.492762.405781@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.17 (2007-11-01)
* Ian Jackson (Ian.Jackson@xxxxxxxxxxxxx) wrote:
> Chris Wright writes ("Re: [Xen-devel] Greater than 16 xvd devices for 
> blkfront"):
> > * Daniel P. Berrange (berrange@xxxxxxxxxx) wrote:
> > > + default:
> > > +         if (major > 202) {
> > > +                 minor += (16 * 16 * (major - 202));
> > > +                 major = 202;
> > > +         }
> > > + }
> 
> The root cause of the problem is the incorporation of the Linux device
> numbering scheme into the xenstore protocol, which is wrong I think.
> What Daniel's excellent if rather unpleasant suggestion is doing is to
> regard the xenstore number not as a `Linux device number' but rather
> as a crazy encoding of the disk number.
> 
> I think this is fine but it would be good if we could think about what
> the new crazy encoding is, and document it.  I infer that in Daniel's
> suggestion it's:
> 
>   xenstore number = (202 << 8) + (actual disk number << 4)
>                         | partition number
> 
> where the actual disk number starts at 0 for xvda and partition
> numbers are 0 for whole disk or 1..15.
> 
> Daniel's solution still doesn't work for partitions >15.  Perhaps,

I think that's OK, and effectively a hard limitation w.r.t. lanana:

202 block       Xen Virtual Block Device
                  0 = /dev/xvda       First Xen VBD whole disk
                  16 = /dev/xvdb      Second Xen VBD whole disk
                  32 = /dev/xvdc      Third Xen VBD whole disk
                    ...
                  240 = /dev/xvdp     Sixteenth Xen VBD whole disk

                Partitions are handled in the same way as for IDE
                disks (see major number 3) except that the limit on
                partitions is 15.


> given that old guests are going to break anyway, we should consider a
> different scheme ?  Since disks and partitions not supported by the
> old encoding won't work on old guests anyway, we can use a completely
> new encoding for that case provided only that it doesn't use numbers
> of the form  (202 << 8) | something

Well, we don't actually need 202, or any minor numbers at all.  The major
is only needed for the case where xvd masquerades as IDE or SCSI.
We ripped this wart out for upstream Linux.  And the guest can happily
dynamically allocate minor numbers on its own behalf.  A disk discovery
event can be completely dynamic, the admin just wouldn't be able to
guarantee which minor slot gets allocated for a particular disk in
a guest.  We do have mount by label or UUID.

> Presumably we can safely use at least 31 bits.  If we reserve one to
> indicate that this is the new encoding that leaves us with 30 which
> should be enough for a reasonable number of disks with many
> partitions each.
> 
> > I didn't think of handling overflow (since the major for scsi/ide/etc
> > were involved, I expected that to fail).  But, aside of crashing an
> > older guest with > 16 disks (not ideal, but I think it's possible
> > already with 0x format), seems good.
> 
> If a guest takes the xenstore number to be the concatenation of its
> own major and minor numbers then obviously it is leaving itself open
> to breaking in the future.  dom0 admins will just have to Not Do That
> Then.  (It's a shame, if true, that the guests don't have actual error
> checking.)

Agreed.

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel