xen-devel
RE: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue
To: |
Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> |
Subject: |
RE: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue |
From: |
"Vincent, Pradeep" <pradeepv@xxxxxxxxxx> |
Date: |
Tue, 24 May 2011 15:40:46 -0700 |
Accept-language: |
en-US |
Acceptlanguage: |
en-US |
Cc: |
Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jeremy, Jan Beulich <JBeulich@xxxxxxxxxx>, "Daniel@xxxxxxxxxxxxxxxxxxxx" <Daniel@xxxxxxxxxxxxxxxxxxxx>, Stodden <daniel.stodden@xxxxxxxxxx> |
Delivery-date: |
Tue, 24 May 2011 15:41:26 -0700 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<20110524160249.GC29481@xxxxxxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<20110516152224.GA7195@xxxxxxxxxxxx> <C9FAE626.161E7%pradeepv@xxxxxxxxxx> <20110524160249.GC29481@xxxxxxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Thread-index: |
AcwaLBTc9+36kKUkSD+cmB75H6003wAM6zEQ |
Thread-topic: |
[Xen-devel] [PATCH] blkback: Fix block I/O latency issue |
Response inline..
-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
Sent: Tuesday, May 24, 2011 9:03 AM
To: Vincent, Pradeep
Cc: Daniel@xxxxxxxxxxxxxxxxxxxx; Jeremy Fitzhardinge;
xen-devel@xxxxxxxxxxxxxxxxxxx; Jan Beulich; Stodden
Subject: Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue
On Thu, May 19, 2011 at 11:12:25PM -0700, Vincent, Pradeep wrote:
> Hey Konrad,
>
> Thanks for running the tests. Very useful data.
>
> Re: Experiment to show latency improvement
>
> I never ran anything on ramdisk.
>
> You should be able to see the latency benefit with 'orion' tool but I am
>Link?
PV: http://www.oracle.com/technetwork/topics/index-089595.html
> sure other tools can be used as well. For a volume backed by a single disk
> drive, keep the number of small random I/O outstanding to 2 (I think
> "num_small" parameter in orion should do the job) with a 50-50 mix of
> write and read. Measure the latency reported by the guest and Dom-0 &
> compare them. For LVM volumes that present multiple drives as a single LUN
> (inside the guest), the latency improvement will be the highest when the
> number of I/O outstanding is 2X the number of spindles. This is the
> 'moderate I/O' scenario I was describing and you should see significant
> improvement in latencies.
Ok.
>
>
> If you allow page cache to perform sequential I/O using dd or other
> sequential non-direct I/O generation tool, you should find that the
> interrupt rate doesn't go up for high I/O load. Thinking about this, I
> think burstiness of I/O submission as seen by the driver is also a key
> player particularly in the absence of I/O coalescing waits introduced by
> I/O scheduler. Page cache draining is notoriously bursty.
Sure, .. thought most of the tests I've been doing have been bypassing
the page cache.
>
> >>queue depth of 256.
>
> What 'queue depth' is this ? If I am not wrong, blkfront-blkback is
The 'request_queue' one. This is the block API one.
PV: Got it.
> restricted to ~32 max pending I/Os due to the limit of one page being used
> for mailbox entries - no ?
>This is the frontend's block API queue I was thinking about. In regards
to the ring buffer .. um, not exactly sure the right number (would have to
compute it), but it is much bigger I believe.
The ring buffer entries are for 'requests', wherein each request can contain
>up to 11 pages of data (nr segments).
PV: I just did a back of the envelope calculation for size of blkif_request
that gave me ~78 bytes, primarily dominated by 6 bytes per segment for 11
segments per request. This would result in max pending I/O count of 32. This
matches my recollection from long time back but not sure if I missed something.
Of course, like you said each I/O req can have 44K of data but small sized
random I/O can't take advantage of it. (If I am not wrong, netback takes a
slightly different approach where each slot is essentially a 4K page and
multiple slots are used for larger sized packets.)
>
> >>But to my surprise the case where the I/O latency is high, the interrupt
> >>generation was quite small
>
> If this patch results in an extra interrupt, it will very likely result in
> reduction of latency for the next I/O. If the interrupt generation
> increase is not high, then the number of I/Os whose latencies this patch
> has improved is low. Looks like your workload belonged to this category.
> Perhaps that's why you didn't much of an improvement in overall
> performance ? I think this is close to the high I/O workload scenario I
> described.
Ok
>
> >>But where the I/O latency was very very small (4 microseconds) the
> >>interrupt generation was on average about 20K/s.
>
> This is not a scenario I tested but the results aren't surprising. This
> isn't the high I/O load I was describing though (I didn't test ramdisk).
> SSD is probably the closest real world workload.
> An increase of 20K/sec means this patch very likely improved latency of
> 20K I/Os per sec although the absolute value of latency improvements would
> be smaller in this case. 20K/sec interrupt rate (50usec delay between
> interrupt) is something I would be comfortable with if they directly
> translate to latency improvement for the users. The graphs seem to
> indicate a 5% increase in throughput for this case - Am I reading the
>I came up with 1%. But those are a bit unrealistic - and I ordered
>an SSD to do some proper testing.
PV: Terrific.
> graphs right ?
>
> Overall, Very useful tests indeed and I haven't seen anything too
> concerning or unexpected except that I don't think you have seen the 50+%
> latency benefit that the patch got me in my moderate I/O benchmark :-)
Let me redo the tests again.
PV: Thanks much. Let me know if you need more info on test setup.
> Feel free to ping me offline if you aren't able to see the latency impact
> using the 'moderate I/O' methodology described above.
>
> About IRQ coalescing: Stepping back a bit, there are few different use
> cases that irq coalescing mechanism would be useful for
>
> 1. Latency sensitive workload: Wait time of 10s of usecs. Particularly
> useful for SSDs.
> 2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs
> which will essentially cap the theoretical interrupt rate to 5K.
> 3. Excessive CPU consumption Mitigation: This is similar to (2) but
> includes the case of malicious guests. Perhaps not a big concern unless
> you have lots of drives attached to each guest.
>
> I suspect the implementation for (1) and (2) would be different (spin vs
> sleep perhaps). (3) can't be implemented by manipulation of 'req_event'
> since a guest has the ability to abuse irq channel independent of what
> 'blkback' tries to tell 'blkfront' via 'req_event' manipulation.
>
> (3) could be implemented in the hypervisor as a generic irq throttler that
> could be leveraged for all irqs heading to Dom-0 from DomUs including
> blkback/netback. Such a mechanism could potentially solve (1) and/or (2)
> as well. Thoughts ?
The hypervisor does have some irq storm avoidancy mechanism. Thought the
>number is 100K/sec and it only applies to physical IRQs.
PV: I will take a closer look to see what hypervisor already does here.
>
> One crude way to address (3) for 'many disk drive' scenario is to pin
> all/most blkback interrupts for an instance to the same CPU core in Dom-0
> and throttle down the thread wake up (wake_up(&blkif->wq) in
> blkif_notify_work) that usually results in IPIs. Not an elegant solution
> but might be a good crutch.
>
> Another angle to (1) and (2) is whether these irq coalesce settings should
> be controllable by the guest, perhaps within limits set by the
> administrator.
>
> Thoughts ? Suggestions ?
>
> Konrad, Love to help out if you are already working on something around
> irq coalescing. Or when I have irq coalescing functionality that can be
Not yet. Hence hinting for you to do it :-)
> consumed by community I will certainly submit them.
>
> Meanwhile, I wouldn't want to deny Xen users the advantage of this patch
> just because there is no irq coalescing functionality. Particularly since
> the downside is very minimal on blkfront-blkback stack. My 2 cents..
>
> Thanks much Konrad,
>
> - Pradeep Vincent
>
>
>
>
> On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> wrote:
>
> >On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
> >> > >>what were the numbers when it came to high bandwidth numbers
> >> >
> >> > Under high I/O workload, where the blkfront would fill up the queue as
> >> > blkback works the queue, the I/O latency problem in question doesn't
> >> > manifest itself and as a result this patch doesn't make much of a
> >> > difference in terms of interrupt rate. My benchmarks didn't show any
> >> > significant effect.
> >>
> >> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
> >> writting as much as they can to a iSCSI disk), the IRQ rate for each
> >> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
> >> job on capturing the submission latency to see if the I/Os get the
> >> response back as fast (or the same) as without your patch.
> >>
> >> And the iSCSI disk on the target side was an RAMdisk, so latency
> >> was quite small which is not fair to your problem.
> >>
> >> Do you have a program to measure the latency for the workload you
> >> had encountered? I would like to run those numbers myself.
> >
> >Ran some more benchmarks over this week. This time I tried to run it on:
> >
> > - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so
> >the
> > latency is set to 1msec).
> > - scsi_debug delay=0 (no delay and as fast possible. Comes out to be
> >about
> > 4 microseconds completion with queue depth of one with 32K I/Os).
> > - local SATAI 80GB ST3808110AS. Still running as it is quite slow.
> >
> >With only one PV guest doing a round (three times) of two threads randomly
> >writting I/Os with a queue depth of 256. Then a different round of four
> >threads writting/reading (80/20) 512bytes up to 64K randomly over the
> >disk.
> >
> >I used the attached patch against #master
> >(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)
> >to gauge how well we are doing (and what the interrupt generation rate
> >is).
> >
> >These workloads I think would be considered 'high I/O' and I was expecting
> >your patch to not have any influence on the numbers.
> >
> >But to my surprise the case where the I/O latency is high, the interrupt
> >generation
> >was quite small. But where the I/O latency was very very small (4
> >microseconds)
> >the interrupt generation was on average about 20K/s. And this is with a
> >queue depth
> >of 256 with four threads. I was expecting the opposite. Hence quite
> >curious
> >to see your use case.
> >
> >What do you consider a middle I/O and low I/O cases? Do you use 'fio' for
> >your
> >testing?
> >
> >With the high I/O load, the numbers came out to give us about 1% benefit
> >with your
> >patch. However, I am worried (maybe unneccassarily?) about the 20K
> >interrupt generation
> >when the iometer tests kicked in (this was only when using the
> >unrealistic 'scsi_debug'
> >drive).
> >
> >The picture of this using iSCSI target:
> >http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png
> >
> >And when done on top of local RAMdisk:
> >http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png
> >
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, (continued)
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Daniel Stodden
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Vincent, Pradeep
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Konrad Rzeszutek Wilk
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Vincent, Pradeep
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Konrad Rzeszutek Wilk
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Konrad Rzeszutek Wilk
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Vincent, Pradeep
- Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Konrad Rzeszutek Wilk
- RE: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue,
Vincent, Pradeep <=
[RE-PATCH] Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue, Daniel Stodden
Re: [Xen-devel] [PATCH] xen/blkback: Don't let in-flight requests defer pending ones., Konrad Rzeszutek Wilk
Re: [Xen-devel] [PATCH] xen/blkback: Don't let in-flight requests defer pending ones., Daniel Stodden
|
|
|