Changeset 8460 re-introduced sharing of I/O resources with domains other
than dom0 to allow for the creation of driver domains. However, Xen
currently lacks accounting of those I/O resources allowing for
misconfigurations to give the same resource to multiple domains. When
resources are given to driver domains, they must be manually revoked
from dom0 in order to ensure that dom0 is not using that resource
simultaneously. However, there is no means by which to tell if dom0 is
using that resource at present. These problems may be better illustrated
with a scenario:
The system has a physical device A which uses a page of I/O memory M and
I/O port P. By default, dom0 has access to all pages of I/O memory and
all I/O ports. Say dom0 contains a driver for device A which is loaded
automatically at boot. dom0 begins to communicate with A by mapping in
page M and talking on port P. By making special hypercalls to Xen, a
domU can be started and also given access to page M and port P. By
default, however, dom0 continues to have access to those resources. The
device driver in the domU and in dom0 can simultaneously access the
device (which is bad). Now, some manual steps can be taken to avoid
this. The device driver in dom0 could be blocked from loading (this is
how the PCI Backend works) or manually unloaded. Or dom0's access to
page M and I/O port P could be revoked. Revocation works well for I/O
ports where access is checked on every use, but there is no way to
revoke access to page M because access is only checked when the page is
mapped in. So while revocation appears to succeed (dom0 can no longer
map in page M in the future), dom0 can still communicate with device A
via page M.
Another problem with the lack of accounting for I/O resources is that
Xen does not have exclusive access to the hardware. Dom0 can give itself
or another domain access to I/O resources that Xen thinks that it owns.
If dom0 knew where the local APIC or the I/O APICs were located in
memory, dom0 could grant itself permission to access those pages of
memory (via the same hypercalls used to grant resources to driver
domains) and then map them in (thus allowing dom0 and xen simultaneous
access to these devices). The same thing could happen with the serial
port. When Xen thinks it owns the serial port, dom0 could grant itself
(or a domU driver domain) permission to access those serial port
resources which Xen thinks it has exclusive access to.
This lack of accounting by Xen could potentially threaten the stability
of dom0, the driver domain, and even Xen (if more than one component
tries to use the same device simultaneously yet remains unaware of the
other). For better safety and isolation of device drivers and for the
protection of the hypervisor, this issue should be dealt with.
The patches to follow seek to resolve this problem by properly
accounting for I/O resources and ensuring that two domains cannot
simultaneously access the same resources unless explicitly specified.
Unfortunately, some resource sharing is needed for interrupts because
the number of interrupt lines in today's hardware is limited, but such
sharing should be explicitly controlled and tracked.
Accomplishing this was complicated by the fact that Xen doesn't do any
kind of reference counting on pages of I/O memory that are beyond the
highest physical address of RAM. For example, the region of memory where
PCI devices typically live is not accounted for by Xen. Access controls
exist, but there is no way to know if a given page of memory is still
in-use by a domain.
These patches add the concept of a default I/O domain. Dom0 has always
had the role of the default I/O domain and it still retains this role.
However, the default I/O domain can now only access a resource when it
is not in-use by another domain or by the hypervisor. In an ideal setup,
it would be nice to *not* have a default I/O domain and instead
explicitly give all domains the I/O resources that they need and nothing
more (a default deny setup rather than the default permit stance that
the default I/O domain has) perhaps by specifying command-line options
to say what your initial domain should have access to. But for now, the
default I/O domain should allow for running all of your device drivers
out of dom0 without having to explicitly specify the resources which
dom0 has access to. This patch shouldn't change the way Xen works for
anyone unless they were explicitly doing one of the dangerous things
I'd be interested in hearing people's thoughts on this problem and on my
implementation of a solution.
1) I know very little about IA-64 so this patch will likely break IA-64
since I've not made an attempt to support it. I'd be interested in
working with the IA-64 people to make it work for them as well. I
imagine that my code for tracking I/O memory will probably just work on
any architecture that needs it, but hooks would need to be made into
IA-64 code to do the access checks and reference counting at the right
2) The I/O memory reference counting will probably be inefficient if
lots of I/O memory pages need to be tracked because of its use of
xmalloc. Every page will incur the cost of the tracking header that gets
placed in front of the buffer I request with xmalloc. This is a place
where something like the Linux slab allocator would be useful. I suspect
that in practice, this would only be a problem if someone has limited
memory and tries to map in a large framebuffer from a video card.
3) No reference counting is done on pages of I/O memory used by Xen. It
would probably be a good idea to do this, but for now, it's sufficient
for Xen to mark the I/O pages that it doesn't want anyone else to use as
"reserved" with a call to iocap_reserve in xen/common/iocap.c.
4) The ability to disable certain i/o ports in Dom0 from the
command-line has been removed for now. This was only because it did not
fit easily within the idea of a default I/O domain having only the
resources that were not in-use elsewhere. That said, if this is a
feature that is used/needed, a workaround could probably be found.
5) I've not quantified the performance of this code, but any performance
hit (if significant at all) would only be noticeable at bind time for
Interrupts and I/O Memory (mapping in or unmapping a page from the page
tables); this code should not affect the normal use of these resources
after binding to them. For I/O ports, the cost is incurred at every
access to the I/O port because the access list is checked every time
(although I don't think the additional cost of having the resource
accountant will be significant).
Xen-devel mailing list