This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Re: [PATCH 0 of 3] Patches for PCI passthrough with modi

To: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Re: [PATCH 0 of 3] Patches for PCI passthrough with modified E820 (v3) - resent.
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Tue, 17 May 2011 12:34:10 -0400
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
Delivery-date: Tue, 17 May 2011 09:37:15 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1305648479.20907.77.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1305040198.26692.286.camel@xxxxxxxxxxxxxxxxxxxxxx> <20110510152751.GA13469@xxxxxxxxxxxx> <1305041579.26692.291.camel@xxxxxxxxxxxxxxxxxxxxxx> <20110510155147.GA17563@xxxxxxxxxxxx> <1305100191.26692.306.camel@xxxxxxxxxxxxxxxxxxxxxx> <20110512174133.GE11649@xxxxxxxxxxxx> <1305276440.31488.60.camel@xxxxxxxxxxxxxxxxxxxxxx> <20110513135708.GC6042@xxxxxxxxxxxx> <20110517160208.GD3657@xxxxxxxxxxxx> <1305648479.20907.77.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, May 17, 2011 at 05:07:59PM +0100, Ian Campbell wrote:
> On Tue, 2011-05-17 at 17:02 +0100, Konrad Rzeszutek Wilk wrote:
> > On Fri, May 13, 2011 at 09:57:08AM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > memhog 4G worked great.. but then I noticed it started slowing down 
> > > > > and
> > > > > it was using the swap disk?
> > > > 
> > > > I guess the I/O holes shadowed the RAM and hence it is basically wasted.
> > > 
> > > <nods>
> > > > > Anyhow, seems that if you are using RHEL5, SLES11, you need to be 
> > > > > carefull to
> > > > > use 'memory' and 'maxmem'.
> > > > 
> > > > Hrm, changing behaviour for existing guests isn't so nice, at least not
> > > > without a way to turn the behaviour off, perhaps we do need an explicit
> > > > cfg file variable to control this after all?
> > > 
> > > We could do that, and then once your idea below has been completly working
> > > we can rip out the parameter?
> > 
> > How does this patch look to your eyes:
> Looks ok to me.
> We've been using the _override suffix for the cfg visible symbol, not
> just the internal variables, so if we think this is something the user
> typically should not touch then we should call it e820_host_override in
> the cfg file too. Although see my earlier comment about this option also

> enabling hotplug -- perhaps this is an option user will want to care
> about in the long run?

In which case we should decide on a good name since it will stay with us
forever. Perhaps just e820_host and drop the override? And do something like 

# HG changeset patch
# Parent c6fa04014d6e99ca4e62d04132180338403c0478
libxl: Add 'e820_host' option to config file.

.. which will allow PV guests to see the host's E820. Previously
this was latched of the config having an entry in the 'pci' option.
But during testing of the patches which provide a host E820 in a PV guest,
certain inconsistencies were found with guests. When launching a RHEL5 or
SLES11 PV guest with 4GB and a PCI device, the kernel would report 4GB,
but have 1.5G "used". What happend was that the P2M that fall within the
E820 I/O holes would never be used and was just wasted. The mechanism to
go around this is to shrink the size of the guest
before launch (say memory=2048, maxmem=4096) and then balloon back to 4096M
after start. For PVOPS type kernels it would detect the E820 I/O holes and
deflate by the correct amount but would not inflate back to 4GB.
Manually inflating makes it work.

The fix in the future for guests where the memory amount flows over the
PCI hole, is to launch the guest with decreased amount right up to the cusp
of where the E820 PCI hole starts. Also increase the 'maxmem' by the delta
and then when the guest has launched, balloon up to the delta number.

This will require some careful surgery so for right now this parameter
will guard against unsuspecting users seeing their PV guests memory "vanish."

In the future, this option will remain (so that PCI hotplugging
can be done), and turn itself on when there is a 'pci' option.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

diff -r c6fa04014d6e tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c  Tue May 17 10:33:27 2011 -0400
+++ b/tools/libxl/xl_cmdimpl.c  Tue May 17 12:30:38 2011 -0400
@@ -979,6 +979,16 @@ skip_vfb:
     if (!xlu_cfg_get_long (config, "pci_power_mgmt", &l))
         pci_power_mgmt = l;
+    /* To be reworked (automatically enabled) once the auto ballooning
+     * after guest starts is done (with PCI devices passed in). */
+    if (!xlu_cfg_get_long (config, "e820_host", &l)) {
+        if (c_info->hvm)
+          fprintf(stderr, "Can't do e820_host in HVM mode!");
+        else {
+          if (l)
+            b_info->u.pv.machine_e820 = true;
+        }
+    }
     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         int i;
         d_config->num_pcidevs = 0;
@@ -995,8 +1005,6 @@ skip_vfb:
             if (!libxl_device_pci_parse_bdf(ctx, pcidev, buf))
-        if (d_config->num_pcidevs && !c_info->hvm)
-          b_info->u.pv.machine_e820 = true;
     switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {

Xen-devel mailing list