WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH] xen/blkback: Don't let in-flight requests defer

To: "Vincent, Pradeep" <pradeepv@xxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Date: Wed, 01 Jun 2011 10:49:46 -0700
Cc: Xen <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>, "konrad.wilk@xxxxxxxxxx" <konrad.wilk@xxxxxxxxxx>
Delivery-date: Wed, 01 Jun 2011 10:51:32 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CA0B31CB.18CDA%pradeepv@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix VMD
References: <CA0B31CB.18CDA%pradeepv@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Wed, 2011-06-01 at 04:02 -0400, Vincent, Pradeep wrote:
> Re: Opportunistic check in make_response
> 
> Looking through blkfront logic, req_prod is only updated in batches - so
> you are right that given the current blkfront design the opportunistic
> check in make_response doesn't reduce interrupt rate and the benefit will
> be limited to the reduction of I/O latency by an amount less than the
> interrupt latency. 

Yup.

> Having said that, I think batching req_prod update delays the processing
> of the first I/O by blkback until all the I/Os in the batch have been
> added to the I/O ring - potentially increasing I/O latencies. While batche
> interrupt generation makes perfect sense, I don't see a solid reason for
> req_prod not being updated when I/Os are processed by blkfront. Netfront
> does increment req_prod as soon as it processes each skb (but it might
> send an interrupt for each skb as well which isn't great).
> 
> I think it would make sense to enhance blkfront design to increment
> req_prod as soon as it processes an I/O while batching irq generation.
> When blkfront and blkback are busy processing continuous stream of I/O
> requests, it would be great if blkfront-blkback pipeline is able to
> process them without generating unnecessary interrupts while improving I/O
> latency performance. 


> Thoughts ? Any historical context on why this might
> be bad ?

No, and it's a good idea, and your assumptions are imho correct.

The extreme case is an PUSH_REQUESTS_AND_CHECK_NOTIFY after each
request. Even in this case, the majority of interrupts should be held
off by looking at req_event. There are certainly variants, but I can
show you some results for what you're asking. because I happened to try
that just last week. 

Next consider adding a couple event counters in the backend. And you
need the patch above, of course.

req_event: Frontend event received
rsp_event: Frontend notification sent
req_again: FINAL_CHECK indicated more_to_do.

boost-1, order=0:

(This is the unmodified version)

dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 17.0105 s, 63.1 MB/s
1073741824 bytes (1.1 GB) copied, 17.7566 s, 60.5 MB/s
1073741824 bytes (1.1 GB) copied, 17.163 s, 62.6 MB/s

rsp_event 6759
req_event 6753
req_again 16

boost-2, order=0

(This was the aggressive one, one PUSH_NOTIFY per ring request).

dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 17.3208 s, 62.0 MB/s
1073741824 bytes (1.1 GB) copied, 17.4851 s, 61.4 MB/s
1073741824 bytes (1.1 GB) copied, 17.7333 s, 60.5 MB/s

rsp_event 7122
req_event 7141
req_again 5497


So the result is that the event load even in the most aggressive case
will typically increase only moderately. Instead, the restored outer
loop in the dispatcher just starts to play out.

I'm not proposing this as the final solution, we might chose to be more
careful and limit event emission to some stride instead.

Don't be confused by the throughput values not going up, the problem I
had with the array (iSCSI in this case) just turned out to be elsewhere.
I'm pretty convinced there are workload/storage configurations which
benefit from that.

In the case at hand, increasing the ring size was way more productive.
At which point the queue depth multiplies as well. And I currently
expect that the longer it gets the more urgent the issue you describe
will be.

But I also think it needs some experiments and wants to be backed by
numbers.

Daniel

> If blkfront does increment req_prod on every I/O submission, I think
> opportunistic check in make_response would make sense since such a check
> could trigger blkback to start processing requests well ahead of the
> 'batch' completion on blkfront side (just the check for
> RING_HAS_UNCONSUMED_REQUESTS should suffice - other parts of what you
> removed should go)
> 
> 
> Re: Locking
> 
> I was reflecting Jan Beulich's comment earlier in this thread. Like I said
> before (in this thread), the locking logic in blkback isn't obvious from
> the code and the failure modes seem benign. If someone has good context on
> blkback locking strategy, I would love to learn. Also it would be very
> useful to add some comments around lock usage to the blkback code.
> 
> Jan ??
> 
> - Pradeep Vincent
>  
> 
> On 5/29/11 4:34 AM, "Daniel Stodden" <daniel.stodden@xxxxxxxxxx> wrote:
> 
> >On Sun, 2011-05-29 at 04:09 -0400, Vincent, Pradeep wrote:
> >> Opportunistically avoiding interrupts by checking for I/Os in the flight
> >> doesn't sound like a bad idea. I think the RING_HAS_UNCONSUMED_REQUESTS
> >> call and what follows should be retained in 'make_response'.
> >
> >There's not much room for opportunism left here. After FINAL_CHECK
> >returning with !_work_to_do you're going to receive an interrupt.
> >Holding that notification off would kill performance.
> >
> >From there on, still leaving a duplicate check around end_io has only an
> >infinitesimal  chance to preempt (none to prevent) the event reception.
> >Even if it ever happens, the chance of making a difference in time to
> >actual thread wake is probably even smaller.
> >
> >I think it's just overhead. If you disagree, this stuff is easy to prove
> >or confute with event counting. Good luck :)
> >
> >> Also, should RING_FINAL_CHECK_FOR_REQUESTS be protected by
> >>blk_ring_lock ?
> >
> >Nope. The ring lock is only needed to sync rsp production. Specifically,
> >make_response upon request completion colliding with make_response
> >called from the backend thread (the error case in do_block_io_op).
> >
> >Should rather be named rsp_lock or so, it doesn't lock anything except
> >rsp_prod.
> >
> >Daniel
> >
> >> 
> >> - Pradeep Vincent
> >> 
> >> 
> >> On 5/28/11 1:21 PM, "Daniel Stodden" <daniel.stodden@xxxxxxxxxx> wrote:
> >> 
> >> >Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
> >> >idea. It means that in-flight I/O is essentially blocking continued
> >> >batches. This essentially kills throughput on frontends which unplug
> >> >(or even just notify) early and rightfully assume addtional requests
> >> >will be picked up on time, not synchronously.
> >> >
> >> >Signed-off-by: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
> >> >---
> >> > drivers/block/xen-blkback/blkback.c |   36
> >> >++++++++++++++++++----------------
> >> > 1 files changed, 19 insertions(+), 17 deletions(-)
> >> >
> >> >diff --git a/drivers/block/xen-blkback/blkback.c
> >> >b/drivers/block/xen-blkback/blkback.c
> >> >index 9dee545..48ad7fa 100644
> >> >--- a/drivers/block/xen-blkback/blkback.c
> >> >+++ b/drivers/block/xen-blkback/blkback.c
> >> >@@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int
> >> >error)
> >> >  * (which has the sectors we want, number of them, grant references,
> >> >etc),
> >> >  * and transmute  it to the block API to hand it over to the proper
> >> >block disk.
> >> >  */
> >> >-static int do_block_io_op(struct xen_blkif *blkif)
> >> >+static int
> >> >+__do_block_io_op(struct xen_blkif *blkif)
> >> > {
> >> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
> >> >     struct blkif_request req;
> >> >@@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
> >> >     return more_to_do;
> >> > }
> >> > 
> >> >+static int
> >> >+do_block_io_op(blkif_t *blkif)
> >> >+{
> >> >+    blkif_back_rings_t *blk_rings = &blkif->blk_rings;
> >> >+    int more_to_do;
> >> >+
> >> >+    do {
> >> >+        more_to_do = __do_block_io_op(blkif);
> >> >+        if (more_to_do)
> >> >+            break;
> >> >+
> >> >+        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >> >+    } while (more_to_do);
> >> >+
> >> >+    return more_to_do;
> >> >+}
> >> >+
> >> > /*
> >> >  * Transmutation of the 'struct blkif_request' to a proper 'struct
> >>bio'
> >> >  * and call the 'submit_bio' to pass it to the underlying storage.
> >> >@@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif,
> >> >u64 id,
> >> >     struct blkif_response  resp;
> >> >     unsigned long     flags;
> >> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
> >> >-    int more_to_do = 0;
> >> >     int notify;
> >> > 
> >> >     resp.id        = id;
> >> >@@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif,
> >> >u64 id,
> >> >     }
> >> >     blk_rings->common.rsp_prod_pvt++;
> >> >     RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
> >> >-    if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons)
> >>{
> >> >-        /*
> >> >-         * Tail check for pending requests. Allows frontend to avoid
> >> >-         * notifications if requests are already in flight (lower
> >> >-         * overheads and promotes batching).
> >> >-         */
> >> >-        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >> >-
> >> >-    } else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
> >> >-        more_to_do = 1;
> >> >-    }
> >> >-
> >> >     spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> >> >-
> >> >-    if (more_to_do)
> >> >-        blkif_notify_work(blkif);
> >> >     if (notify)
> >> >         notify_remote_via_irq(blkif->irq);
> >> > }
> >> >-- 
> >> >1.7.4.1
> >> >
> >> 
> >
> >
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel