Re: [Xen-devel] Scheduling of I/O domains
On Thu, 2004-07-22 at 01:53, Keir Fraser wrote:
> > It seems to me that this problem doesn't have anything to do with the
> > choice of scheduling policy or parameters; It is about when the
> > scheduler is called. It appears as though the xen cpu scheduler
> > currently only runs when the hardware timer ticks. It does not run when
> > an external interrupt happens. So there is a large latency introduced to
> > I/O interrupts, and this limits I/O performance. Changing the scheduler
> > algorithm won't help this.
> > The only way to avoid this is to immediately dispatch the I/O domain
> > responsible for a given I/O interrupt as soon as that interrupt occurs.
> > This means giving I/O domains with pending interrupts scheduling
> > priority over any "regular" domains. Just as in a "normal" operating
> > system, interrupt service routines must complete before any user
> > processes are executed. Otherwise, latencies are introduced that kill
> > I/O performance.
> When an event is queued for a domain we call a generic wakeup
> function. A good deal more of that function ought to be
> scheduler-specific, and should do something smarter than our current
> default (which is to force a reschedule only if the CPU is idling).
> However, fixing this shouldn't be that hard -- we should have saner
> scheduling in the next few weeks.
[sorry for the delayed reply]
as keir pointed out the problem is in the wakeup function and in
particular with a BVT hack.
BVT has the notion of a context switch allowance, i.e., the minimum time
a task is allowed run before it gets preempted, to avoid context switch
thrashing (ctx_allow=5ms in sched_bvt.c). after this time a new run
through the scheduler is performed
in our BVT implementation we extend this slightly in that if there is
only one runable task we expand the context switch allowance to 10 times
the normal amount in order to avoid to many runs through the scheduler.
The old (i.e., 1.2) BVT implementation would check on waking up another
domain if the current task already used up the ctx_allow and if it had
would force an immediate run through the scheduler (therefor ignoring
the the expanded context switch allowance).
In the -unstable implementation this is not the case anymore as the BVT
scheduling function reports back a time value for the next run through
the scheduler and no hook is provided into specific scheduler
implementations when a domains is unblocking. Therefor, if your CPU hog
is the only runable task when it is scheduled it will run 10*ctx_allow
(50ms) irrespective of other tasks becoming runnable during that time
(ie. your IO tasks). in the worst case the IO tasks then have to wait
for 50ms rather than 5ms before they get scheduled.
as a quick fix could you comment out line 366-371 (which extends the
context switch allowance if there is only one task running) in
sched_bvt.c and try your experiment again.
The proper fix should be a call into the scheduler if a task unblocks,
which shouldn't be too hard to add.
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
Xen-devel mailing list