This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: dm-ioband + bio-cgroup benchmarks

Hi, Andrea,

> >> Ok, I will give more details of the thought process.
> >>
> >> I was thinking of maintaing an rb-tree per request queue and not an
> >> rb-tree per cgroup. This tree can contain all the bios submitted to that
> >> request queue through __make_request(). Every node in the tree will 
> >> represent
> >> one cgroup and will contain a list of bios issued from the tasks from that
> >> cgroup.
> >>
> >> Every bio entering the request queue through __make_request() function
> >> first will be queued in one of the nodes in this rb-tree, depending on 
> >> which
> >> cgroup that bio belongs to.
> >>
> >> Once the bios are buffered in rb-tree, we release these to underlying
> >> elevator depending on the proportionate weight of the nodes/cgroups.
> >>
> >> Some more details which I was trying to implement yesterday.
> >>
> >> There will be one bio_cgroup object per cgroup. This object will contain
> >> many bio_group objects. Each bio_group object will be created for each
> >> request queue where a bio from bio_cgroup is queued. Essentially the idea
> >> is that bios belonging to a cgroup can be on various request queues in the
> >> system. So a single object can not serve the purpose as it can not be on
> >> many rb-trees at the same time.  Hence create one sub object which will 
> >> keep
> >> track of bios belonging to one cgroup on a particular request queue.
> >>
> >> Each bio_group will contain a list of bios and this bio_group object will
> >> be a node in the rb-tree of request queue. For example. Lets say there are
> >> two request queues in the system q1 and q2 (lets say they belong to 
> >> /dev/sda
> >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both
> >> for /dev/sda and /dev/sdb.
> >>
> >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group
> >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree
> >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of
> >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of
> >> bios issued by task t1 for /dev/sdb. I thought the same can be extended
> >> for stacked devices also.
> >>   
> >> I am still trying to implementing it and hopefully this is doable idea.
> >> I think at the end of the day it will be something very close to dm-ioband
> >> algorithm just that there will be no lvm driver and no notion of separate
> >> dm-ioband device. 
> > 
> > Vivek, thanks for the detailed explanation. Only a comment. I guess, if
> > we don't change also the per-process optimizations/improvements made by
> > some IO scheduler, I think we can have undesirable behaviours.
> > 
> > For example: CFQ uses the per-process iocontext to improve fairness
> > between *all* the processes in a system. But it doesn't have the concept
> > that there's a cgroup context on-top-of the processes.
> > 
> > So, some optimizations made to guarantee fairness among processes could
> > conflict with algorithms implemented at the cgroup layer. And
> > potentially lead to undesirable behaviours.
> > 
> > For example an issue I'm experiencing with my cgroup-io-throttle
> > patchset is that a cgroup can consistently increase the IO rate (always
> > respecting the max limits), simply increasing the number of IO worker
> > tasks respect to another cgroup with a lower number of IO workers. This
> > is probably due to the fact the CFQ tries to give the same amount of
> > "IO time" to all the tasks, without considering that they're organized
> > in cgroup.
> BTW this is why I proposed to use a single shared iocontext for all the
> processes running in the same cgroup. Anyway, this is not the best
> solution, because in this way all the IO requests coming from a cgroup
> will be queued to the same cfq queue. If I'm not wrong in this way we
> would implement noop (FIFO) between tasks belonging to the same cgroup
> and CFQ between cgroups. But, at least for this particular case, we
> would be able to provide fairness among cgroups.
> -Andrea

I ever thought the same thing but this approach breaks the compatibility.
I think we should make ionice only effective for the processes in the
same cgroup.

A system gives some amount of bandwidths to its cgroups, and
the processes in one of the cgroups fairly share the given bandwidth.
I think this is the straight approach. What do you think?

I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ
scheduler and dm-ioband with bio-cgroup work like this.

Thank you,
Hirokazu Takahashi.

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] Re: dm-ioband + bio-cgroup benchmarks, Hirokazu Takahashi <=