http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1115
Summary: Event channel port scanning unfair
Product: Xen
Version: unstable
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Unspecified
AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx
ReportedBy: diego.ongaro@xxxxxxxx
While studying the behavior of Xen's scheduler, we identified particularly
unfair I/O scheduling in certain cases. In summary, here's what happens:
Inside linux/drivers/xen/core/evtchn.c, when a port number is allocated, it
allocates the smallest port number not in use. Thus, because of the order in
which the computer boots, any physical devices will be assigned relatively low
port numbers. Any virtual devices to support guest operations, such as loopback
filesystems or virtual network interfaces, will be assigned higher port
numbers. Similarly, domains will be assigned port numbers in the order that
they are brought up, and, in our configuration, inbound network traffic is
assigned a lower port number than outbound for each guest.
The function evtchn_do_upcall() scans port numbers to process pending ports.
Effectively, it finds the lowest port number with pending data, processes it,
and repeats until there are no more pending ports. It's actually a two-level
bit vector, but this can be thought of as an optimization.
Here's an example of the problem:
Suppose that Guest 4 is receiving a continuous stream of data. The driver
domain receives a virtual NIC interrupt. It finds the packet's destination is
Guest 4, copies the packet to Guest 4, and finally sends an event channel
notification to Guest 4. Xen wakes Guest 4 and, in applying tickling, preempts
Domain 0 in favor of Guest 4. Guest 4 maps an acknowledgment packet back to the
driver domain and send an event channel notification. When the driver domain
gets its turn to run, it has an event channel notification from Guest 4
pending. Also, the NIC has likely received more inbound data destined for Guest
4, so Domain 0 has the corresponding port pending for the NIC's IRQ.
Since Domain 0 always seeks the lowest port with pending data, it will always
process lower port numbers before higher port numbers. Moreover, because of the
way port numbers are allocated, inbound packets destined for Guest 4 will
always be processed before outbound packets originating from Guest 4. If Xen
repeatedly preempts the driver domain to inform Guest 4 of new inbound data,
the system enters a cycle where it cannot process any outbound acknowledgments.
This simple example uses only one guest, but it's easy to see how this could
become even more of a problem with multiple, competing guests. This problem and
related topics are discussed in more detail in a paper to appear in VEE
(Virtual Execution Environments) 2008. Scott Rixner also presented some of the
results at the recent Xen Summit. I'll quickly relate one example from the
paper here: In a test with 7 guests receiving TCP streaming data, under vanilla
Xen, the bandwidth for each guest ranged from 23.9 Mbps to 192.3 Mbps. With the
attached patch applied, the bandwidth for each guest ranged from 83.8 Mbps to
140.26 Mbps.
The attached patch changes the way evtchn_do_upcall() scans port numbers.
Instead of always looking for the lowest port number with pending data, it
looks for the lowest port number with pending data larger than the one it last
processed, in a round-robin manner. I won't get into the implementation details
here as they will be evident to those familiar with the relevant code. There
are, however, three items worth mentioning:
1. We did not apply this patch to domU, but it should also be relevant to domU
kernels.
2. We did not try a multi-processor machine. Regarding whether the patch is
thread-safe, the static variables we have introduced could, in the worst case,
potentially cause another fairness issue but not a crash.
3. The patch is against xen-unstable changeset 15080 (from May 2007). I know
there have been some modifications in that file since then, so the line numbers
might be off.
---
Diego Ongaro, Alan Cox, Scott Rixner
Rice University
--
Configure bugmail:
http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
_______________________________________________
Xen-bugs mailing list
Xen-bugs@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-bugs
|