WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH][VT] Patch to allow VMX domains to be destroyed or sh

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] [PATCH][VT] Patch to allow VMX domains to be destroyed or shut down cleanly
From: Khoa Huynh <khoa@xxxxxxxxxx>
Date: Tue, 13 Sep 2005 11:10:42 -0500
Delivery-date: Tue, 13 Sep 2005 16:08:37 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
The problem:
A VMX domain cannot be destroyed or shut down completely.
After trying to destroy or shut down a VMX domain, the domain
data structures still exists, and the domain can still be seen in
'xm list' even though the amount of memory is shown as 0.

The cause:
VMX domains use shadow mode (with refcount, translate, and external
flags set) which uses external shadow tables for address translations and
manipulates page reference counts in a different manner than the usual,
page
table-based, non-VMX mode.

When we tear down VMX domains, we disable shadow mode.  However, when we
disable shadow mode, we do not fix up shadow page reference counts, and
this
was thought to be OK because the VMX domain is dying anyway.  In fact,
there
is a flag (unsigned int) called shadow_tainted_refcnts to indicate that the
shadow page reference counts are "tainted" when the domain is dying.  This
flag,
which is set in shadow_mode_disable(), allows us to ignore the (tainted)
page
reference counts while handling pages.  (If anyone has more insight into
this flag, I'd appreciate it.)

As a result, after we release memory pages belonging to a VMX domain, there
are
pages which have "tainted" ref counts and could not be released
immediately.
This leads to the incorrect VMX domain's ref count and prevents the
domain's
other resources (e.g. hash tables, event channels, grant tables, etc.)
from being released.  This is the reason why VMX domains cannot currently
be destroyed or shut down.

I have looked at scenarios where simple operations are done in Windows XP
running in a VMX domain.  In these scenarios, there are anywhere from 2 to
100
pages still not released when we try to relinquish all memory from the
VMX domain (during a destroy or shutdown operation).  These pages have
tainted
shadow reference counts (these could be external references ?).  Since xend
reports the amount of memory in MB, it reports that the VMX domain's memory
is
0 MB after we try to destroy or shut down the domain, but in reality, there
are still anywhere from 4 KB to 400 KB left (at least in the scenarios that
I
have examined).

Proposed fix:
In the patch below, I have modified the routine relinquish_memory() to
return an indication as to whether or not all pages on the page list have
been released successfully.  If not, I then check to see if we indeed have
tainted shadow ref counts and if the domain is dying.  If all of these
conditions are met, the domain's reference count is adjusted to what it
should be.

Note that the patch does NOT automatically set any ref count to 0 and force
the destruction of the domain. Instead, it adjusts the domain's ref count
to
the value it should have if we do not have tainted ref counts.  The rest of
the
domain's resources should automatically disappear when they are supposed
to.

IMHO, I believe that the patch below is _probably_ the simplest, least
intrusive,
and safest way to fix this problem.  It is simplest because it is
relatively
small patch - certainly much smaller than the amount of code which would be
required to fix up the shadow reference counts properly (I am not even sure
if that is realistic - if anyone has any ideas on how to do this, I'd
appreciate it).  It is least intrusive because all changes are limited to
two
routines in a single file (xen/arch/x86/domain.c) and these two routines
all deal with relinquishing a domain's memory pages.  The patch is safest
because it only tweaks the domain's reference counts when absolutely
necessary, and does not tinker with anything else.

I have tested this to some extent with both VMX (Windows XP) and non-VMX
domains.  So far so good.

I have opened Bug 225 for this problem:
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=225

Any suggestions, comments, etc. are welcome.  Thanks in advance.

--- ./xen/arch/x86/domain.c.org 2005-08-23 21:01:09.000000000 -0500
+++ ./xen/arch/x86/domain.c     2005-09-12 22:22:05.000000000 -0500
@@ -975,7 +975,7 @@
 #define vmx_relinquish_resources(_v) ((void)0)
 #endif

-static void relinquish_memory(struct domain *d, struct list_head *list)
+static int relinquish_memory(struct domain *d, struct list_head *list)
 {
     struct list_head *ent;
     struct pfn_info  *page;
@@ -1029,15 +1029,17 @@
         ent = ent->next;
         put_page(page);
     }
-
+
     spin_unlock_recursive(&d->page_alloc_lock);
+
+    return (list == list->next);
 }

 void domain_relinquish_resources(struct domain *d)
 {
     struct vcpu *v;
     unsigned long pfn;
-
+
     BUG_ON(!cpus_empty(d->cpumask));

     physdev_destroy_state(d);
@@ -1080,12 +1082,19 @@
     for_each_vcpu(d, v)
         destroy_gdt(v);

-    /* Relinquish every page of memory. */
-    relinquish_memory(d, &d->xenpage_list);
-    relinquish_memory(d, &d->page_list);
+    /* Relinquish every page of memory.
+     * If the domain is dying, and if we have tainted shadow reference
counts,
+       adjust the domain's reference count.
+     */
+    if (!relinquish_memory(d, &d->xenpage_list))
+        if (shadow_tainted_refcnts(d) && test_bit(_DOMF_dying,
&d->domain_flags))
+            put_domain(d);
+
+    if (!relinquish_memory(d, &d->page_list))
+        if (shadow_tainted_refcnts(d) && test_bit(_DOMF_dying,
&d->domain_flags))
+            put_domain(d);
 }

-
 /*
  * Local variables:
  * mode: C


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel