WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyedor

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, "Khoa Huynh" <khoa@xxxxxxxxxx>, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyedor shut down cleanly
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Mon, 19 Sep 2005 13:54:13 +0800
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 19 Sep 2005 05:52:06 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcW818epixiHHe+KQva9+fU9VzRVIQAAfdZgAADyMlA=
Thread-topic: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyedor shut down cleanly
Jiang, Yunhong wrote:
> I think the Xin Xiaohui's patch resolved most problem for the page
> left issue.
> 
> Also another small issue found is the on vmx_set_cr0. On Windows, the
> guest will enter/leave protected mode many times.  Following code
> cause problem.
>     if ((value & X86_CR0_PE) && (value & X86_CR0_PG) &&
> !paging_enabled) {
>         /*
>          * The guest CR3 must be pointing to the guest physical.
>          */
>         if ( !VALID_MFN(mfn = get_mfn_from_pfn(
>             d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) ||
>              !get_page(pfn_to_page(mfn), d->domain) )
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         {
>             printk("Invalid CR3 value = %lx",
>             d->arch.arch_vmx.cpu_cr3); domain_crash_synchronous(); /*
>         need to take a clean path */ }
> 
> We should place the get_page to when guest set cr3 when not paging,
> i.e 
>> 
>    case 3:
>     {
>         unsigned long old_base_mfn, mfn;
> 
>         /*
>          * If paging is not enabled yet, simply copy the value to CR3.
>          */
>         if (!vmx_paging_enabled(d)) {
>               ..... get page here....
> 

More exactly should be:
        if (!vmx_paging_enabled(d)) {
                        get_page for new mfn
                        put_page for old mfn
        }

Thanks
Yunhong Jiang

> After above two change, windows destroied succesfully after
> create/login/open IE/destroy.
> 
> Thanks
> Yunhong Jiang
> 
> Khoa Huynh wrote:
>> Keir Fraser wrote:
>>> I mean forcibly decrement them to zero and free them right there and
>>> then. Of course, as you point out, the problem is that some of the
>>> pages are mapped in domain0. I'm not sure how we can distinguish
>>> tainted refcnts from genuine external references. Perhaps there's a
>>> proper way we should be destructing the full shadow pagetables such
>>> that the refcnts end up at zero.
>> 
>> Thanks for your comment.  I have done extensive tracing through
>> the domain destruction code in the hypervisor in the last few days.
>> 
>> The bottom line:  after domain destruction code in the hypervisor
>> is done, all shadow pages were indeed freed up - even though
>> the shadow_tainted_refcnts flag was set.  I now believe the
>> remaining pages are genuinely externally referenced (possibly
>> by the qemu device model still running in domain0).
>> 
>> Here are more details on what I have found:
>> 
>> Ideally, when we destroy or shut down a VMX domain, the general
>> page reference counts ended up at 0 in shadow mode, so that the
>> pages can be released properly from the domain.
>> 
>> I have traced quite a bit of code for different scenarios
>> involving Windows XP running in a VMX domain.  I only
>> did simple operations in Windows XP, but I tried to destroy
>> the VMX domain at different times (e.g. during Windows XP boot,
>> during simple operations, after Windows XP has been shutdown, etc.)
>> 
>> For non-VMX (Linux) domains, after we relinquish memory in
>> domain_relinquish_resources(), all pages in the domain's page
>> list indeed had reference count of 0 and were properly freed from
>> the xen heap - just like we expected.
>> 
>> For VMX (e.g., Windows XP) domains, after we relinquish memory in
>> domain_relinquish_resources(), depending on how many activities
>> were done in Windows XP, there were anywhere from 2 to 100 pages
>> remaining just before the domain's structures were freed up
>> by the hypervisor.  Most of these pages still have page
>> reference counts of 1, and therefore, could not be freed
>> from the heap by the hypervisor.  This prevents the rest
>> of the domain's resources from being released, and therefore,
>> 'xm list' still shows the VMX domains after they were destroyed.
>> 
>> In shadow mode, the following things could be reflected
>> in the page (general) reference counts:
>> 
>> (a) General stuff:
>>     - page is allocated (PGC_allocated)
>>     - page is pinned
>>     - page is pointed by CR3's
>> (b) Shadow page tables (l1, l2, hl2, etc.)
>> (c) Out-of-sync entries
>> (d) Grant table mappings
>> (e) External references (not through grant table)
>> (f) Monitor page table references (external shadow mode) (g)
>> Writable PTE predictions (h) GDTs/LDTs
>> 
>> So I put in a lot of instrumentation and tracing code,
>> and made sure that the above things were taken into
>> account and removed from the page reference counts
>> during the domain destruction code sequence in the
>> hypervisor.  During this code sequence, we disable
>> shadow mode (shadow_mode_disable()) and the
>> shadow_tainted_refcnts flag was set.  However,
>> much to my surprise, the page reference counts
>> were properly taken care of in shadow mode, and
>> all shadow pages (including those in l1, l2, hl2
>> tables and snapshots) were all freed up.
>> 
>> In particular, here's where each of the things
>> in the above list was taken into account during
>> the domain destruction code sequence in the
>> hypervisor:
>> 
>> (a) General stuff:
>>     - None of remaining pages have PGC_allocated
>>       flag set
>>     - None of remaining pages are still pinned
>>     - The monitor shadow ref was 0, and all
>>       pages pointed to by CR3's were taken care
>>       of in free_shadow_pages()
>> (b) All shadow pages (including those pages in
>>     l1, l2, hl2, snapshots) were freed properly.
>>     I implemented counters to track all shadow
>>     page promotions/allocations and demotions/
>>     deallocations throughout the hypervisor code,
>>     and at the end after we relinquished all domain
>>     memory pages, these counters did indeed
>>     return to 0 - as we expected.
>> (c) out-of-sync entries -> in free_out_of_sync_state()
>>     called by free_shadow_pages().
>> (d) grant table mappings -> the count of active
>>     grant table mappings is 0 after the domain
>>     destruction sequence in the hypervisor is
>>     executed.
>> (e) external references not mapped via grant table
>>     -> I believe that these include the qemu-dm
>>     pages which still remain after we relinquish
>>     all domain memory pages - as the qemu-dm may
>>     still be active after a VMX domain has been
>>     destroyed.
>> (f) external monitor page references -> all references
>>     from monitor page table are dropped in
>>     vmx_relinquish_resources(), and monitor table
>>     itself is freed in domain_destruct().  In fact,
>>     in my code traces, the monitor shadow reference
>>     count was 0 after the domain destruction code in     the
>> hypervisor. (g) writable PTE predictions -> I didn't see any pages in
>>     this category in my code traces, but if there
>>     are, they would be freed up in free_shadow_pages().
>> (h) GDTs/LDTs -> these were destroyed in destroy_gdt() and
>>     invalidate_shadow_ldt() called from domain_relinquish_    
>> resources(). 
>> 
>> Based on the code instrumentation and tracing above, I am
>> pretty confident that the shadow page reference counts
>> were handled properly during the domain destruction code
>> sequence in the hypervisor.  There is a problem in keeping
>> track of shadow page counts (domain->arch.shadow_page_count),
>> and I will submit a patch to fix this shortly.  However, this
>> does not really impact how shadow pages are handled.
>> 
>> Consequently, the pages that still remain after the domain
>> destruction code sequence in the hypervisor are externally
>> referenced and may belong to the qemu device model running
>> in domain0.  The fact that qemu-dm is still active for some
>> time after a VMX domain has been torn down in the hypervisor
>> is evident by examining the tools code (python).  In fact,
>> if I forcibly free these remaining pages from the xen heap,
>> the system/dom0 crashed.
>> 
>> Am I missing anything ? Your comments, suggestions, etc.,
>> are welcome!  Thanks for reading this rather long email :-)
>> 
>> Khoa H.
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel