Re: [Xen-devel] [Patch 0/7] pvSCSI driver

Looking through the SCSI spec, I don't think we're going to be able to
get away with passing requests through from the frontend all the way
to the physical disk without sanity checking the actual CDB in the
backend.  There are a couple of commands which look scary:

-- CHANGE ALIAS/REPORT ALIAS -- the alias list is shared across
   everything in the I_T nexus.  That will lead to interesting issues
   if you ever have multiple guests modifying it at the same time.

-- EXTENDED COPY -- allows you to copy arbitrary data between logical
   units, sometimes even ones not in the same target device.  That's
   obviously going to need to be controlled in a VM setting.

-- Some mode pages, as modified by MODE SELECT, can apply across
   multiple LUs.  Even more exciting, the level of sharing can in
   principle vary between devices, even for the same page.

-- WRITE BUFFER commands can be used to change the microcode on a
   device.  I've no idea what the implications of letting an untrusted
   user push microcode into a device would be, but I doubt it's a good
   idea.

-- I'm not sure whether we want to allow untrusted guests to issue SET
   PRIORITY commands.

-- We've already been over REPORT LUNS :)

Plus whatever weird things the various device manufacturers decide to
introduce.

What this means is that the REPORT LUNS issue fundamentally isn't
restricted to just the REPORT LUNS command, but instead affects an
unknown and potentially large set of other commands.  The only way I
can see to deal with this is to white-list commands individually once
they've been confirmed to be safe, and have the backend block any
commands which haven't been checked yet.  That's going to be a fair
amount of work, and it'll screw up the whole ``transparent pass
through'' thing, but I can't see any other way of solving this problem
safely.

(And even that assumes that the hardware people got everything right.
Most devices will be designed on the assumption that only trusted
system components can submit CDBs, so it wouldn't surprise me if some
of them can be made to do bad things if a malicious CDB comes in.
There's not really a great deal we can do about this, though.)



Backtracking a little, the fundamental goal here is to make some
logical units which are accessible to dom0 appear inside the guest.
Guest operating systems are unlikely to be very happy about having
logical units floating around not attached to scsi hosts, and so we
need (somehow) to come up with a scsi host which has the right set of
logical units attached to it.  There are lots of valid use cases in
which there don't exist physical hosts with the right set of LUs, and
so somebody needs to invent one, and then emulate it.  That somebody
will necessarily be either the frontend or the backend.

Doing the emulation also gives you the option of filtering out things
like TCQ support in INQUIRY commands, which might be supported by the
physical device but certainly isn't supported by the pvSCSI protocol.

If you emulate the HBA in the backend, you get a design like this:

-- There is usually only one xenbus scsi device attached to any given
   VM, and that device represents the emulated HBA.

-- scsifront creates a struct scsi_host (or equivalent) for each
   xenbus device, and those provide your interface to the rest of the
   guest operating system.

-- When the guest OS submits a request to the frontend driver, it gets
   packaged up and shipped over the ring to the backend pretty much
   completely unchanged.

-- The backend figures out what the request is doing, and either:

   a) Routes it to a physical device, or
   b) Synthesises an answer (for things like REPORT LUNS), or
   c) Fails the request (for things like WRITE BUFFER),

   as appropriate.

If you emulate the HBA in the frontend, you get a design which looks
like this:

-- Each logical unit exposed to the guest has its own xenbus scsi
   device.

-- scsifront creates a single struct scsi_host, representing the
   emulated HBA.

-- When the guest OS submits a request to the frontend driver, it
   either:

   a) Routes it to a Xen scsifront and passes it off to the backend, or
   b) Synthesises an answer, or
   c) Fails the request,

   as appropriate.

-- When a request reaches the backend, it does a basic check to make
   sure that it's dealing with one of the whitelisted requests, and
   then sends it directly to the relevant physical device.  The
   routing problem is trivial here, because there is only ever one
   physical device (struct scsi_device in Linux-speak) associated with
   any xenbus device, and the request is just dropped directly into
   the relevant request queue.

The first approach gives you a simple frontend at the expense of a
complicated backend, while the second one gives you a simple backend
at the expense of a complicated frontend.  It seems likely that there
will be more frontend implementations than backend, which suggests
that putting the HBA emulation in the backend is a better choice.

The main difference from a performance point of view is that the
second approach will use a ring for each device, whereas the first has
a single ring shared across all devices, so you'll get more requests
in flight with the second scheme.  I'd expect that just making the
rings larger would have more effect, though, and that's easier when
there's just one of them.

Steven.


On Wed, Mar 12, 2008 at 03:23:00PM +0900, Jun Kamada wrote:
> Date: Wed, 12 Mar 2008 15:23:00 +0900
> From: Jun Kamada <kama@xxxxxxxxxxxxxx>
> To: Steven Smith <steven.smith@xxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] [Patch 0/7] pvSCSI driver
> Cc: kama@xxxxxxxxxxxxxx, James Harper <james.harper@xxxxxxxxxxxxxxxx>,
>       xen-devel@xxxxxxxxxxxxxxxxxxx
> 
> Hi Steven-san and James-san,
> 
> Thank you for your comments.
> 
> We have had a internal discussion based on your comments and reached
> following thoughts. I consider that the thoughts can provide both 
> flexibility and ease of implementation.
> 
> We would like to start modification of the pvSCSI driver according to
> the thoughts. How do you think about it? The thoughts is reasonable? 
> If you have any comments, could you please?
> 
> 
> -----
> 1.) Allow specifying arbitrary mapping between Dom0's SCSI tree and
>     Guest's SCSI tree. This includes "lun".
>             ( Dom0's IDs [host1:channel1:id1:lun1] --->
>                         Guest's IDs [host2:channel2:id2:lun2] )
> 2.) Guest has responsibility to have mapping and transform between
>     Dom0's IDs and Guest's IDs. It depends on guest OS's implementation
>     which level(e.g. only "host" or all of 4-tuples or no-transform) of
>     mapping/transformation will be supported.
>     If guest decides to support lun transformation and in case of
>     "lun1 != lun2", the guest's frontend driver should maintain LUN
>     value in CDB data structure. 
> 3.) As for REPORT LUNS command, Dom0 performs munging.
> 4.) Dom0 accepts only LOGICAL UNIT RESET command.
> 5.) Of course, the backend driver performs sanity check of IDs that the
>     guest already transformed.
> 
> 
> And I would like to implement pvSCSI frontend driver for Linux by
> following mapping/transformation policy. (Please note that another guest
> OS such as Windows can take another policy, of cource.)
> 
> - The guest looks identical tree as Dom0 looks except for "host".
>   (This comes by the reason that arbitrary "host" mapping is difficult
>   for current Linux implementation.)
> - Of course, the guest's tree is sparse if some LUNs were not attached
>   to the guest. Linux kernel allows the situation that lun=0 does not
>   exist, therefore sparse tree is not a problem.
> 
> 
> Best regards,
> 
> 
> On Mon, 10 Mar 2008 12:00:59 +0000
> Steven Smith <steven.smith@xxxxxxxxxxxxx> wrote:
> 
> > > Problems discussed in this context, what the portion of whole SCSI
> > > tree should be exposed to guest and how the numbering logic of guest's
> > > tree should be, is very fundamental and difficult, I think.
> > > 
> > > In my current thought, following two options are reasonable solutions.
> > > How do you think about them? Could you please comment me?
> > > 
> > > Option 1 (LUN assignment)
> > > - Specify the assignment like below:
> > >   "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest)
> > >   The lun1 must be same as the lun2.
> > > - Munging :-) REPORT LUNS command on Dom0 according to the number of
> > >   LUNs actually attached to the guest.
> > I think this is the most flexible approach.
> > 
> > One thing to watch out for here is that some old systems get quite
> > confused if lun0 is missing but some of the higher luns are present.
> > That's easy to handle if you allow an arbitrary mapping between dom0
> > and guest luns, but is hard if you require them to be identical.  This
> > might not be an issue in the cases which we care about, though.
> > 
> > > Option 2 (Target Assignment)
> > > - Specify the assignment like below:
> > >   "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest)
> > >   All LUNs under id1 are assigned to one guest.
> > > - Munging for LUN is not needed.
> > > 
> > > For each option, how host/bus/device reset command should be?
> > It's possible that we'll be able to get away with just supporting
> > LOGICAL UNIT RESET commands, and completely ignoring lower granularity
> > resets.  I'm not sure how widely supported they are on actual
> > hardware, but it might be good enough for a first implementation.  You
> > might even be able to get away with not supporting any kind of reset
> > at all, and just accepting that error recovery is going to suck.
> > 
> > Steven.
> > 
> > > On Wed, 5 Mar 2008 13:34:48 +1100
> > > "James Harper" <james.harper@xxxxxxxxxxxxxxxx> wrote:
> > > 
> > > > > This kind of suggests that we should be plumbing things through to the
> > > > > guest with a granularity of whole targets, rather than individual
> > > > > logical units.  The alternative is a much more complicated scsi
> > > > > emulation which can fix up the LUN-sensitive commands.
> > > > 
> > > > I think we should probably have the option of doing either.
> > > > 
> > > > > Allowing this kind of mapping sounds reasonable to me.  It would also
> > > > > make it possible (hopefully) to add support for some of the weirder
> > > > > SCSI logical unit addressing modes without changing the frontends
> > > > > (e.g. hierarchical addressing with 64 bit LUNs).  That might involve a
> > > > > certain amount of munging of REPORT LUNS commands in the backend,
> > > > > though.
> > > > 
> > > > Not sure how much it matters, but any 'munging' of scsi commands would
> > > > be a real drag for Windows drivers. The Windows SCSI layer is very
> > > > strict on lots of things, and is a real pain if you are not talking to a
> > > > physical PCI scsi device.
> > > > 
> > > > James
> > > > 
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/xen-devel
> > > 
> > > Jun Kamada
> > > 
> > > 
> 
> 
> -----
> Jun Kamada
> 
>

signature.asc
Description: Digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [Patch 0/7] pvSCSI driver