> > I've also added a (very stupid) adaptation scheme which tries
> > to adjust the max_count_frags_no_event parameter to avoid
> > hitting the deadlock too often in the first place. It seems
> > to do broadly the right thing for both UDP floods and TCP
> > stream tests, but it probably wouldn't be very hard to come
> > up with some workload for which it falls over.
> OK, I will test how this work on 10 gig NICs when I have some
> time. I am currently doing some tests on Intel 10gig ixgbe NICs
> and I am seeing some behaviour that I cannot explain (without this
> adaptation patch). Netperf is not able to saturate the link and
> at the same time both the guest and dom0 cannot not saturate the
> CPU either ( I made sure the client is not the bottleneck
> either). So some other factor is limiting throughput. (I disabled
> the netchannel2 rate limiter but this did not seem to have any
> effect either). I will spend some time looking into that.
Is it possible that we're seeing some kind of semi-synchronous
bouncing between the domU and dom0? Something like this:
-- DomU sends some messages to dom0, wakes it up, and then goes to
sleep.
-- Dom0 wakes up, processes the messages, sends the responses, wakes
the domU, and then goes to sleep.
-- Repeat.
So that both domains are spending significant time just waiting for
the other one to do something, and neither can saturate their CPU.
That should be fairly obvious in a xentrace trace if you run it while
you're observing the bad behaviour.
If that is the problem, there are a couple of easy-ish things we could
do which might help a bit:
-- Re-arrange the tasklet a bit so that it sends outgoing messages
before checking for incoming ones. The risk is that processing an
incoming message is likely to generate further outgoing ones, so we
risk splitting the messages into two flights.
-- Arrange to kick after N messages, even if we still have more
messages to send, so that the domain which is receiving the
messages runs in parallel with the sending one.
Both approaches would risk sending more batches of messages, and hence
more event channel notifications, trips through the Xen scheduler,
etc., and hence would only ever increase the number of cycles per
packet, but if they stop CPUs going idle then they might increase the
actual throughput.
Ideally, we'd only do this kind of thing if the receiving domain is
idle, but figuring that out from the transmitting domain in an
efficient way sounds tricky. You could imagine some kind of
scoreboard showing which domains are running, maintained by Xen and
readable by all domains, but I really don't think we want to go down
that route.
Steven.
signature.asc
Description: Digital signature
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|