Jiang, Yunhong wrote:
> I think the Xin Xiaohui's patch resolved most problem for the page
> left issue.
>
> Also another small issue found is the on vmx_set_cr0. On Windows, the
> guest will enter/leave protected mode many times. Following code
> cause problem.
> if ((value & X86_CR0_PE) && (value & X86_CR0_PG) &&
> !paging_enabled) {
> /*
> * The guest CR3 must be pointing to the guest physical.
> */
> if ( !VALID_MFN(mfn = get_mfn_from_pfn(
> d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) ||
> !get_page(pfn_to_page(mfn), d->domain) )
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> {
> printk("Invalid CR3 value = %lx",
> d->arch.arch_vmx.cpu_cr3); domain_crash_synchronous(); /*
> need to take a clean path */ }
>
> We should place the get_page to when guest set cr3 when not paging,
> i.e
>>
> case 3:
> {
> unsigned long old_base_mfn, mfn;
>
> /*
> * If paging is not enabled yet, simply copy the value to CR3.
> */
> if (!vmx_paging_enabled(d)) {
> ..... get page here....
>
More exactly should be:
if (!vmx_paging_enabled(d)) {
get_page for new mfn
put_page for old mfn
}
Thanks
Yunhong Jiang
> After above two change, windows destroied succesfully after
> create/login/open IE/destroy.
>
> Thanks
> Yunhong Jiang
>
> Khoa Huynh wrote:
>> Keir Fraser wrote:
>>> I mean forcibly decrement them to zero and free them right there and
>>> then. Of course, as you point out, the problem is that some of the
>>> pages are mapped in domain0. I'm not sure how we can distinguish
>>> tainted refcnts from genuine external references. Perhaps there's a
>>> proper way we should be destructing the full shadow pagetables such
>>> that the refcnts end up at zero.
>>
>> Thanks for your comment. I have done extensive tracing through
>> the domain destruction code in the hypervisor in the last few days.
>>
>> The bottom line: after domain destruction code in the hypervisor
>> is done, all shadow pages were indeed freed up - even though
>> the shadow_tainted_refcnts flag was set. I now believe the
>> remaining pages are genuinely externally referenced (possibly
>> by the qemu device model still running in domain0).
>>
>> Here are more details on what I have found:
>>
>> Ideally, when we destroy or shut down a VMX domain, the general
>> page reference counts ended up at 0 in shadow mode, so that the
>> pages can be released properly from the domain.
>>
>> I have traced quite a bit of code for different scenarios
>> involving Windows XP running in a VMX domain. I only
>> did simple operations in Windows XP, but I tried to destroy
>> the VMX domain at different times (e.g. during Windows XP boot,
>> during simple operations, after Windows XP has been shutdown, etc.)
>>
>> For non-VMX (Linux) domains, after we relinquish memory in
>> domain_relinquish_resources(), all pages in the domain's page
>> list indeed had reference count of 0 and were properly freed from
>> the xen heap - just like we expected.
>>
>> For VMX (e.g., Windows XP) domains, after we relinquish memory in
>> domain_relinquish_resources(), depending on how many activities
>> were done in Windows XP, there were anywhere from 2 to 100 pages
>> remaining just before the domain's structures were freed up
>> by the hypervisor. Most of these pages still have page
>> reference counts of 1, and therefore, could not be freed
>> from the heap by the hypervisor. This prevents the rest
>> of the domain's resources from being released, and therefore,
>> 'xm list' still shows the VMX domains after they were destroyed.
>>
>> In shadow mode, the following things could be reflected
>> in the page (general) reference counts:
>>
>> (a) General stuff:
>> - page is allocated (PGC_allocated)
>> - page is pinned
>> - page is pointed by CR3's
>> (b) Shadow page tables (l1, l2, hl2, etc.)
>> (c) Out-of-sync entries
>> (d) Grant table mappings
>> (e) External references (not through grant table)
>> (f) Monitor page table references (external shadow mode) (g)
>> Writable PTE predictions (h) GDTs/LDTs
>>
>> So I put in a lot of instrumentation and tracing code,
>> and made sure that the above things were taken into
>> account and removed from the page reference counts
>> during the domain destruction code sequence in the
>> hypervisor. During this code sequence, we disable
>> shadow mode (shadow_mode_disable()) and the
>> shadow_tainted_refcnts flag was set. However,
>> much to my surprise, the page reference counts
>> were properly taken care of in shadow mode, and
>> all shadow pages (including those in l1, l2, hl2
>> tables and snapshots) were all freed up.
>>
>> In particular, here's where each of the things
>> in the above list was taken into account during
>> the domain destruction code sequence in the
>> hypervisor:
>>
>> (a) General stuff:
>> - None of remaining pages have PGC_allocated
>> flag set
>> - None of remaining pages are still pinned
>> - The monitor shadow ref was 0, and all
>> pages pointed to by CR3's were taken care
>> of in free_shadow_pages()
>> (b) All shadow pages (including those pages in
>> l1, l2, hl2, snapshots) were freed properly.
>> I implemented counters to track all shadow
>> page promotions/allocations and demotions/
>> deallocations throughout the hypervisor code,
>> and at the end after we relinquished all domain
>> memory pages, these counters did indeed
>> return to 0 - as we expected.
>> (c) out-of-sync entries -> in free_out_of_sync_state()
>> called by free_shadow_pages().
>> (d) grant table mappings -> the count of active
>> grant table mappings is 0 after the domain
>> destruction sequence in the hypervisor is
>> executed.
>> (e) external references not mapped via grant table
>> -> I believe that these include the qemu-dm
>> pages which still remain after we relinquish
>> all domain memory pages - as the qemu-dm may
>> still be active after a VMX domain has been
>> destroyed.
>> (f) external monitor page references -> all references
>> from monitor page table are dropped in
>> vmx_relinquish_resources(), and monitor table
>> itself is freed in domain_destruct(). In fact,
>> in my code traces, the monitor shadow reference
>> count was 0 after the domain destruction code in the
>> hypervisor. (g) writable PTE predictions -> I didn't see any pages in
>> this category in my code traces, but if there
>> are, they would be freed up in free_shadow_pages().
>> (h) GDTs/LDTs -> these were destroyed in destroy_gdt() and
>> invalidate_shadow_ldt() called from domain_relinquish_
>> resources().
>>
>> Based on the code instrumentation and tracing above, I am
>> pretty confident that the shadow page reference counts
>> were handled properly during the domain destruction code
>> sequence in the hypervisor. There is a problem in keeping
>> track of shadow page counts (domain->arch.shadow_page_count),
>> and I will submit a patch to fix this shortly. However, this
>> does not really impact how shadow pages are handled.
>>
>> Consequently, the pages that still remain after the domain
>> destruction code sequence in the hypervisor are externally
>> referenced and may belong to the qemu device model running
>> in domain0. The fact that qemu-dm is still active for some
>> time after a VMX domain has been torn down in the hypervisor
>> is evident by examining the tools code (python). In fact,
>> if I forcibly free these remaining pages from the xen heap,
>> the system/dom0 crashed.
>>
>> Am I missing anything ? Your comments, suggestions, etc.,
>> are welcome! Thanks for reading this rather long email :-)
>>
>> Khoa H.
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|