On Mon, 2011-05-09 at 22:01 +0100, Konrad Rzeszutek Wilk wrote:
> On Mon, May 09, 2011 at 10:00:30AM +0100, Ian Campbell wrote:
> > On Wed, 2011-05-04 at 15:17 +0100, Konrad Rzeszutek Wilk wrote:
> > > Hello,
> > >
> > > This set of v3 patches allows a PV domain to see the machine's
> > > E820 and figure out where the "PCI I/O" gap is and match it with the
> > > reality.
> > >
> > > Changelog since v2 posting:
> > > - Moved 'libxl__e820_alloc' to be called from do_domain_create and if
> > > machine_e820 == true.
> > > - Made no_machine_e820 be set to true, if the guest has no PCI devices
> > > (and is PV)
> > > - Used Keir's re-worked code for E820 creation.
> > > Changelog since v1 posting:
> > > - Squashed the "x86: make the pv-only e820 array be dynamic" and
> > > "x86: adjust the size of the e820 for pv guest to be dynamic" together.
> > > - Made xc_domain_set_memmap_limit use the 'xc_domain_set_memory_map'
> > > - Moved 'libxl_e820_alloc' and 'libxl_e820_sanitize' to be an internal
> > > operation and called from 'libxl_device_pci_parse_bdf'.
> > > - Expanded 'libxl_device_pci_parse_bdf' API call to have an extra
> > > argument
> > > (optional).
> > >
> > > The short end is that with these patches a PV domain can:
> > >
> > > - Use the correct PCI I/O gap. Before these patches, Linux guest would
> > > boot up and would tell:
> > > [ 0.000000] Allocating PCI resources starting at 40000000 (gap:
> > > 40000000:c0000000)
> > > while in actuality the PCI I/O gap should have been:
> > > [ 0.000000] Allocating PCI resources starting at b0000000 (gap:
> > > b0000000:4c000000)
> > The reason it needs to be a particular gap is that we can't (easily? at
> > all?) rewrite the device BARs to match the guest's idea of the hole, is
> > that right? So it needs to be consistent with the underlying host hole.
> > I wonder if it is time to enable IOMMU for PV guests by default.
> Would be nice. I thought if the IOMMU was present it wouldautomatically do
I must admit I wasn't sure but I thought not, however 21770:510c797ee115
"iommu: Remove pointless iommu=pv boot option" removed the option so I
think you are right, passing iommu=1 is sufficient to enable IOMMU for
both PV and HVM guests.
> > Presumably in that case we can manufacture any hole we like in the e820,
> > which is useful e.g. when migrating to not-completely-homogeneous hosts.
> Hmm. I want to say yes, but not entirely sure what are all the pieces that
> this would entail.
I think it's a decision which will be internal to libxl (in particular
the important thing is that the option isn't exposed in the guest cfg in
a non-forward compatible way) so we can implement it as and when we get
round to it and not block this series on anything along these lines.
> > > This has been tested with 2.6.18 (RHEL5), 2.6.27(SLES11), 2.6.36, 2.6.37,
> > > 2.6.38,
> > > and 2.6.39 kernels. Also tested with PV NetBSD 5.1.
They all saw the full amount of RAM? Since the domain builder does not
obey the e820 I'd have thought they would end up with RAM in their I/O
holes which needs to be swizzled around, which at least some of those
guests won't do...
> > >
> > > Tested this with the PCI devices (NIC, MSI), and with 2GB, 4GB, and 6GB
> > > guests
> > > with success.
> > >
> > > libxc/xc_domain.c | 77 +++++++++++-----
> > > libxc/xc_e820.h | 3
> > > libxc/xenctrl.h | 11 ++
> > > libxl/libxl.idl | 1
> > > libxl/libxl_create.c | 8 +
> > > libxl/libxl_internal.h | 1
> > > libxl/libxl_pci.c | 230
> > > +++++++++++++++++++++++++++++++++++++++++++++++++
> > > libxl/xl_cmdimpl.c | 3
> > > 8 files changed, 309 insertions(+), 25 deletions(-)
> > >
> > >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
Xen-devel mailing list