Re: [Xen-devel] [Patch 3/7] pvSCSI driver

> diff -r f76e90b4f7ad -r a8a56c423e78 include/xen/interface/io/vscsiif.h
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/include/xen/interface/io/vscsiif.h      Thu Feb 14 11:14:46 2008 +0900
...
> +
> +#define VSCSIIF_CMND_SCSI                    1       /* scsi */
> +#define VSCSIIF_CMND_SCSI_RESET                      2       /* scsi */
Do you ever actually use the reset command?

> +
> +/* ----------------------------------------------------------------------
> +     Definition of Ring Structures
> +   ---------------------------------------------------------------------- */
> +
> +#define VSCSIIF_DEFAULT_CAN_QUEUE    256
What does this mean?

> +#define VSCSIIF_MAX_COMMAND_SIZE     16
Hmm.  SBC-3 includes a couple of 32 byte commands, which won't fit
into this request structure.  Admittedly, none of them look terribly
useful, but it would be unfortunate to discover that we have to do
another PV scsi protocol in a year's time because some critical
request doesn't fit.

> +#define VSCSIIF_SG_TABLESIZE         27
Where did 27 come from?

> +struct vscsiif_request {
> +     uint16_t rqid;
> +     uint8_t cmd;
> +     /* SCSI */
> +     uint8_t cmnd[VSCSIIF_MAX_COMMAND_SIZE];
> +     uint8_t cmd_len;
So cmd_len applies to cmnd rather than cmd, yes?

> +     uint16_t id, lun, channel;
Okay, so each of your PV devices maps to the moral equivalent of an
HBA, yes?  You can have several targets attached to a single PV
device?

> +     uint8_t sc_data_direction;
What does this mean?

> +     uint8_t use_sg;
From looking at your front and back implementations, I think this is
actually a count of the number of segments you have in your SG table,
yes?  If so, it could probably cope with being renamed to something a
bit more appropriate.

Also, I think you have two bytes of implicit padding here, don't you?
We usually try to avoid that in shared structures, because (a) it's
confusing, (b) it's wasteful of ring space, and (c) it can cause
portability problems if guests are built with different compilers.

> +     uint32_t request_bufflen;
Is this not redundant with the scatter-gather list, below?  Or is
there supposed to be some padding on the SG list which isn't included
in the buffer length?

> +     /*int32_t timeout_per_command;*/
> +     struct scsiif_request_segment {
> +             grant_ref_t gref;
> +             uint16_t offset;
> +             uint16_t length;
> +     } seg[VSCSIIF_SG_TABLESIZE];
> +};
That gives you a 248 byte request structure, doesn't it?  That means
you can only have 16 requests outstanding on a single page ring, which
may not be enough to get good performance.

> +#define VSCSIIF_SENSE_BUFFERSIZE     96
Where did 96 come from?  SPC-4 revision 10 section 6.28 seems to imply
that sense data can be up to 252 bytes.

> +
> +struct vscsiif_response {
> +     uint16_t rqid;
Another two bytes of implicit padding.

> +     int32_t  rslt;
> +     uint8_t sense_len;
> +     uint8_t sense_buffer[VSCSIIF_SENSE_BUFFERSIZE];
> +};
For this sort of ring, responses have to be padded up to match request
sizes.  This means that you've effectively got another 143 bytes of
padding after the end of this structure, which could be used to
increase the sense buffer size to 239 bytes (240 if you rearrange the
response fields to avoid the padding).  That's still not really big
enough, but it's at least closer.

> +
> +DEFINE_RING_TYPES(vscsiif, struct vscsiif_request, struct vscsiif_response);
> +
> +#endif

It might be worth considering abandoning the use of fixed-sized
request and response structures, and instead using a more
byte-oriented protocol in which requests can be almost any size.  This
avoids the need to size things like the CDB area to match the largest
possible request, because each request will be precisely as large as
it needs to be.

The main downsides are:

-- It's much harder to access structures directly while they're on the
   ring.  On the other hand, you almost certainly need to copy the
   request into backend-local storage before processing it anyway, to
   avoid TOCTOU races and the associated security problems, and making
   that copy handle the ring is fairly easy.

-- If you allow requests and responses to be different sizes, you need
   to put them on separate rings, so that you don't find that e.g.
   you've had a series of responses which were bigger than their
   associated requests, and the response area has now overflowed on
   top of some requests.  Again, this isn't terribly difficult to deal
   with.

-- The ring.h macros won't work.  You'll need to invent your own
   scheme.

The main alternative here is request chaining, like the netchannel
protocol.  This is generally somewhat tricky to get right, and I'd
expect that just going for variable-sized requests would be easier.
If you do decide to go for this then you may want to pad requests to a
multiple of 8 bytes so that the memcpy()s are suitable aligned.

I think the current netchannel2 plan also calls for variable-sized
messages with split front->back and back->front rings.  It might be
possible to share some code there (although at present there doesn't
exist any code to share).

I'd also strongly recommend supporting multi-page rings.  That will
allow you to have more requests in flight at any one time, which should
lead to better performance.

Steven.

signature.asc
Description: Digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [Patch 3/7] pvSCSI driver