Xen project Mailing List

Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue

To: "Vincent, Pradeep" <pradeepv@xxxxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Tue, 24 May 2011 12:02:49 -0400

Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Daniel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxxxx>, Stodden <daniel.stodden@xxxxxxxxxx>

Delivery-date: Tue, 24 May 2011 09:17:05 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Thu, May 19, 2011 at 11:12:25PM -0700, Vincent, Pradeep wrote: > Hey Konrad, > > Thanks for running the tests. Very useful data. > > Re: Experiment to show latency improvement > > I never ran anything on ramdisk. > > You should be able to see the latency benefit with 'orion' tool but I am Link? > sure other tools can be used as well. For a volume backed by a single disk > drive, keep the number of small random I/O outstanding to 2 (I think > "num_small" parameter in orion should do the job) with a 50-50 mix of > write and read. Measure the latency reported by the guest and Dom-0 & > compare them. For LVM volumes that present multiple drives as a single LUN > (inside the guest), the latency improvement will be the highest when the > number of I/O outstanding is 2X the number of spindles. This is the > 'moderate I/O' scenario I was describing and you should see significant > improvement in latencies. Ok. > > > If you allow page cache to perform sequential I/O using dd or other > sequential non-direct I/O generation tool, you should find that the > interrupt rate doesn't go up for high I/O load. Thinking about this, I > think burstiness of I/O submission as seen by the driver is also a key > player particularly in the absence of I/O coalescing waits introduced by > I/O scheduler. Page cache draining is notoriously bursty. Sure, .. thought most of the tests I've been doing have been bypassing the page cache. > > >>queue depth of 256. > > What 'queue depth' is this ? If I am not wrong, blkfront-blkback is The 'request_queue' one. This is the block API one. > restricted to ~32 max pending I/Os due to the limit of one page being used > for mailbox entries - no ? This is the frontend's block API queue I was thinking about. In regards to the ring buffer .. um, not exactly sure the right number (would have to compute it), but it is much bigger I believe. The ring buffer entries are for 'requests', wherein each request can contain up to 11 pages of data (nr segments). > > >>But to my surprise the case where the I/O latency is high, the interrupt > >>generation was quite small > > If this patch results in an extra interrupt, it will very likely result in > reduction of latency for the next I/O. If the interrupt generation > increase is not high, then the number of I/Os whose latencies this patch > has improved is low. Looks like your workload belonged to this category. > Perhaps that's why you didn't much of an improvement in overall > performance ? I think this is close to the high I/O workload scenario I > described. Ok > > >>But where the I/O latency was very very small (4 microseconds) the > >>interrupt generation was on average about 20K/s. > > This is not a scenario I tested but the results aren't surprising. This > isn't the high I/O load I was describing though (I didn't test ramdisk). > SSD is probably the closest real world workload. > An increase of 20K/sec means this patch very likely improved latency of > 20K I/Os per sec although the absolute value of latency improvements would > be smaller in this case. 20K/sec interrupt rate (50usec delay between > interrupt) is something I would be comfortable with if they directly > translate to latency improvement for the users. The graphs seem to > indicate a 5% increase in throughput for this case - Am I reading the I came up with 1%. But those are a bit unrealistic - and I ordered an SSD to do some proper testing. > graphs right ? > > Overall, Very useful tests indeed and I haven't seen anything too > concerning or unexpected except that I don't think you have seen the 50+% > latency benefit that the patch got me in my moderate I/O benchmark :-) Let me redo the tests again. > Feel free to ping me offline if you aren't able to see the latency impact > using the 'moderate I/O' methodology described above. > > About IRQ coalescing: Stepping back a bit, there are few different use > cases that irq coalescing mechanism would be useful for > > 1. Latency sensitive workload: Wait time of 10s of usecs. Particularly > useful for SSDs. > 2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs > which will essentially cap the theoretical interrupt rate to 5K. > 3. Excessive CPU consumption Mitigation: This is similar to (2) but > includes the case of malicious guests. Perhaps not a big concern unless > you have lots of drives attached to each guest. > > I suspect the implementation for (1) and (2) would be different (spin vs > sleep perhaps). (3) can't be implemented by manipulation of 'req_event' > since a guest has the ability to abuse irq channel independent of what > 'blkback' tries to tell 'blkfront' via 'req_event' manipulation. > > (3) could be implemented in the hypervisor as a generic irq throttler that > could be leveraged for all irqs heading to Dom-0 from DomUs including > blkback/netback. Such a mechanism could potentially solve (1) and/or (2) > as well. Thoughts ? The hypervisor does have some irq storm avoidancy mechanism. Thought the number is 100K/sec and it only applies to physical IRQs. > > One crude way to address (3) for 'many disk drive' scenario is to pin > all/most blkback interrupts for an instance to the same CPU core in Dom-0 > and throttle down the thread wake up (wake_up(&blkif->wq) in > blkif_notify_work) that usually results in IPIs. Not an elegant solution > but might be a good crutch. > > Another angle to (1) and (2) is whether these irq coalesce settings should > be controllable by the guest, perhaps within limits set by the > administrator. > > Thoughts ? Suggestions ? > > Konrad, Love to help out if you are already working on something around > irq coalescing. Or when I have irq coalescing functionality that can be Not yet. Hence hinting for you to do it :-) > consumed by community I will certainly submit them. > > Meanwhile, I wouldn't want to deny Xen users the advantage of this patch > just because there is no irq coalescing functionality. Particularly since > the downside is very minimal on blkfront-blkback stack. My 2 cents.. > > Thanks much Konrad, > > - Pradeep Vincent > > > > > On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> wrote: > > >On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote: > >> > >>what were the numbers when it came to high bandwidth numbers > >> > > >> > Under high I/O workload, where the blkfront would fill up the queue as > >> > blkback works the queue, the I/O latency problem in question doesn't > >> > manifest itself and as a result this patch doesn't make much of a > >> > difference in terms of interrupt rate. My benchmarks didn't show any > >> > significant effect. > >> > >> I have to rerun my benchmarks. Under high load (so 64Kb, four threads > >> writting as much as they can to a iSCSI disk), the IRQ rate for each > >> blkif went from 2-3/sec to ~5K/sec. But I did not do a good > >> job on capturing the submission latency to see if the I/Os get the > >> response back as fast (or the same) as without your patch. > >> > >> And the iSCSI disk on the target side was an RAMdisk, so latency > >> was quite small which is not fair to your problem. > >> > >> Do you have a program to measure the latency for the workload you > >> had encountered? I would like to run those numbers myself. > > > >Ran some more benchmarks over this week. This time I tried to run it on: > > > > - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so > >the > > latency is set to 1msec). > > - scsi_debug delay=0 (no delay and as fast possible. Comes out to be > >about > > 4 microseconds completion with queue depth of one with 32K I/Os). > > - local SATAI 80GB ST3808110AS. Still running as it is quite slow. > > > >With only one PV guest doing a round (three times) of two threads randomly > >writting I/Os with a queue depth of 256. Then a different round of four > >threads writting/reading (80/20) 512bytes up to 64K randomly over the > >disk. > > > >I used the attached patch against #master > >(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git) > >to gauge how well we are doing (and what the interrupt generation rate > >is). > > > >These workloads I think would be considered 'high I/O' and I was expecting > >your patch to not have any influence on the numbers. > > > >But to my surprise the case where the I/O latency is high, the interrupt > >generation > >was quite small. But where the I/O latency was very very small (4 > >microseconds) > >the interrupt generation was on average about 20K/s. And this is with a > >queue depth > >of 256 with four threads. I was expecting the opposite. Hence quite > >curious > >to see your use case. > > > >What do you consider a middle I/O and low I/O cases? Do you use 'fio' for > >your > >testing? > > > >With the high I/O load, the numbers came out to give us about 1% benefit > >with your > >patch. However, I am worried (maybe unneccassarily?) about the 20K > >interrupt generation > >when the iometer tests kicked in (this was only when using the > >unrealistic 'scsi_debug' > >drive). > > > >The picture of this using iSCSI target: > >http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png > > > >And when done on top of local RAMdisk: > >http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.