I think the Xin Xiaohui's patch resolved most problem for the page left
issue.
Also another small issue found is the on vmx_set_cr0. On Windows, the
guest will enter/leave protected mode many times. Following code cause
problem.
if ((value & X86_CR0_PE) && (value & X86_CR0_PG) && !paging_enabled)
{
/*
* The guest CR3 must be pointing to the guest physical.
*/
if ( !VALID_MFN(mfn = get_mfn_from_pfn(
d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) ||
!get_page(pfn_to_page(mfn), d->domain) )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
{
printk("Invalid CR3 value = %lx", d->arch.arch_vmx.cpu_cr3);
domain_crash_synchronous(); /* need to take a clean path */
}
We should place the get_page to when guest set cr3 when not paging, i.e
:
case 3:
{
unsigned long old_base_mfn, mfn;
/*
* If paging is not enabled yet, simply copy the value to CR3.
*/
if (!vmx_paging_enabled(d)) {
..... get page here....
After above two change, windows destroied succesfully after
create/login/open IE/destroy.
Thanks
Yunhong Jiang
Khoa Huynh wrote:
> Keir Fraser wrote:
>> I mean forcibly decrement them to zero and free them right there and
>> then. Of course, as you point out, the problem is that some of the
>> pages are mapped in domain0. I'm not sure how we can distinguish
>> tainted refcnts from genuine external references. Perhaps there's a
>> proper way we should be destructing the full shadow pagetables such
>> that the refcnts end up at zero.
>
> Thanks for your comment. I have done extensive tracing through
> the domain destruction code in the hypervisor in the last few
> days.
>
> The bottom line: after domain destruction code in the hypervisor
> is done, all shadow pages were indeed freed up - even though
> the shadow_tainted_refcnts flag was set. I now believe the
> remaining pages are genuinely externally referenced (possibly
> by the qemu device model still running in domain0).
>
> Here are more details on what I have found:
>
> Ideally, when we destroy or shut down a VMX domain, the general
> page reference counts ended up at 0 in shadow mode, so that the
> pages can be released properly from the domain.
>
> I have traced quite a bit of code for different scenarios
> involving Windows XP running in a VMX domain. I only
> did simple operations in Windows XP, but I tried to destroy
> the VMX domain at different times (e.g. during Windows XP boot,
> during simple operations, after Windows XP has been shutdown, etc.)
>
> For non-VMX (Linux) domains, after we relinquish memory in
> domain_relinquish_resources(), all pages in the domain's page
> list indeed had reference count of 0 and were properly freed from
> the xen heap - just like we expected.
>
> For VMX (e.g., Windows XP) domains, after we relinquish memory in
> domain_relinquish_resources(), depending on how many activities
> were done in Windows XP, there were anywhere from 2 to 100 pages
> remaining just before the domain's structures were freed up
> by the hypervisor. Most of these pages still have page
> reference counts of 1, and therefore, could not be freed
> from the heap by the hypervisor. This prevents the rest
> of the domain's resources from being released, and therefore,
> 'xm list' still shows the VMX domains after they were destroyed.
>
> In shadow mode, the following things could be reflected
> in the page (general) reference counts:
>
> (a) General stuff:
> - page is allocated (PGC_allocated)
> - page is pinned
> - page is pointed by CR3's
> (b) Shadow page tables (l1, l2, hl2, etc.)
> (c) Out-of-sync entries
> (d) Grant table mappings
> (e) External references (not through grant table)
> (f) Monitor page table references (external shadow mode)
> (g) Writable PTE predictions
> (h) GDTs/LDTs
>
> So I put in a lot of instrumentation and tracing code,
> and made sure that the above things were taken into
> account and removed from the page reference counts
> during the domain destruction code sequence in the
> hypervisor. During this code sequence, we disable
> shadow mode (shadow_mode_disable()) and the
> shadow_tainted_refcnts flag was set. However,
> much to my surprise, the page reference counts
> were properly taken care of in shadow mode, and
> all shadow pages (including those in l1, l2, hl2
> tables and snapshots) were all freed up.
>
> In particular, here's where each of the things
> in the above list was taken into account during
> the domain destruction code sequence in the
> hypervisor:
>
> (a) General stuff:
> - None of remaining pages have PGC_allocated
> flag set
> - None of remaining pages are still pinned
> - The monitor shadow ref was 0, and all
> pages pointed to by CR3's were taken care
> of in free_shadow_pages()
> (b) All shadow pages (including those pages in
> l1, l2, hl2, snapshots) were freed properly.
> I implemented counters to track all shadow
> page promotions/allocations and demotions/
> deallocations throughout the hypervisor code,
> and at the end after we relinquished all domain
> memory pages, these counters did indeed
> return to 0 - as we expected.
> (c) out-of-sync entries -> in free_out_of_sync_state()
> called by free_shadow_pages().
> (d) grant table mappings -> the count of active
> grant table mappings is 0 after the domain
> destruction sequence in the hypervisor is
> executed.
> (e) external references not mapped via grant table
> -> I believe that these include the qemu-dm
> pages which still remain after we relinquish
> all domain memory pages - as the qemu-dm may
> still be active after a VMX domain has been
> destroyed.
> (f) external monitor page references -> all references
> from monitor page table are dropped in
> vmx_relinquish_resources(), and monitor table
> itself is freed in domain_destruct(). In fact,
> in my code traces, the monitor shadow reference
> count was 0 after the domain destruction code in
> the hypervisor.
> (g) writable PTE predictions -> I didn't see any pages in
> this category in my code traces, but if there
> are, they would be freed up in free_shadow_pages().
> (h) GDTs/LDTs -> these were destroyed in destroy_gdt() and
> invalidate_shadow_ldt() called from domain_relinquish_
> resources().
>
> Based on the code instrumentation and tracing above, I am
> pretty confident that the shadow page reference counts
> were handled properly during the domain destruction code
> sequence in the hypervisor. There is a problem in keeping
> track of shadow page counts (domain->arch.shadow_page_count),
> and I will submit a patch to fix this shortly. However, this
> does not really impact how shadow pages are handled.
>
> Consequently, the pages that still remain after the domain
> destruction code sequence in the hypervisor are externally
> referenced and may belong to the qemu device model running
> in domain0. The fact that qemu-dm is still active for some
> time after a VMX domain has been torn down in the hypervisor
> is evident by examining the tools code (python). In fact,
> if I forcibly free these remaining pages from the xen heap,
> the system/dom0 crashed.
>
> Am I missing anything ? Your comments, suggestions, etc.,
> are welcome! Thanks for reading this rather long email :-)
>
> Khoa H.
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|