I got the answer because I made a mistake to pass four sl2mfn entries
in v->arch.paging.shadow.l3table[] to sh_walk_l2_table().
Then truth is I only need to pass v->arch.paging.shadow.l3table[0]
because SHADOW_FOREACH_L2E has already done a good job on looping the
four sl2mfns.
But I have another doubt in traversing SPT from level 3, level 2, and level1.
When I am traversing down to the level 1 SPT, I found several
inconsistency between gl1e and sl1e content, which is the same as the
mechanism in sh_audit_l1_table(). Is this a normal case? I thought
they should keep consistent at all times.
My purpose is to walk down the SPT and GPT during each process context
switch (sh_update_cr3), and do some statistics first, e.g. dirty,
access, present bit.
Now I tried another checking in level 2 SPT by skipping those sl1mfn
which does not pass sh_mfn_is_a_page_table(sl1mfn) check, then the
inconsistency is gone is level 1 SPT traversing.
Can anyone show some hint about how to do the right thing? Is there
some special type of SPTE that I should not traverse down?
Many thanks,
Jui-Hao
On Fri, May 1, 2009 at 10:47 PM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx> wrote:
> Hi, sorry for disturbing you guys again.
>
> Assume guest's paging level is 2 and shadow is using level 3 PAE.
> I am now trying to dump the L2 shadow page table information in the
> beginning of sh_update_cr3() as the following (actually copying the
> code from sh_audit_l2_table and audit_gfn_to_mfn functions)
>
> The code accidentally crashes in guest_l2e_get_flags(*gl2e) of the
> sh_walk_l2_table I wrote.
> However, the weird part is the code doesn't crash in gfn =
> guest_l2e_get_gfn(*gl2e) which is accessing the *gl2e in a similar way
> as guest_l2e_get_flags.
>
> static inline mfn_t
> convert_gfn_to_mfn(struct vcpu *v, gfn_t gfn, mfn_t gmfn)
> {
> p2m_type_t p2mt;
> if ( !shadow_mode_translate(v->domain) )
> return _mfn(gfn_x(gfn));
>
> if ( (mfn_to_page(gmfn)->u.inuse.type_info & PGT_type_mask)
> != PGT_writable_page )
> return _mfn(gfn_x(gfn)); // This is a paging-disabled shadow
> else
> return gfn_to_mfn(v->domain, gfn, &p2mt);
> }
>
> /* JuiHao: walk the l2 shadow page table based on input sl2mfn */
> static int sh_walk_l2_table(struct vcpu *v, mfn_t sl2mfn, mfn_t x)
> {
> guest_l2e_t *gl2e, *gp;
> shadow_l2e_t *sl2e;
> mfn_t sl1mfn, gl2mfn;
> gfn_t gfn;
> mfn_t gmfn;
> int done = 0;
>
> /* Follow the backpointer in struct shadow_page_info to get guest
> l2mfn */
> gl2mfn = _mfn(mfn_to_shadow_page(sl2mfn)->backpointer);
> gl2e = gp = sh_map_domain_page(gl2mfn);
>
> SHADOW_FOREACH_L2E(sl2mfn, sl2e, &gl2e, done, v->domain, {
>
> gfn = guest_l2e_get_gfn(*gl2e); // ###!!!! Works Fine
> !!!!!####
> sl1mfn = shadow_l2e_get_mfn(*sl2e);
>
> if (mfn_valid(sl1mfn) && (shadow_l2e_get_flags(*sl2e) &
> _PAGE_PRESENT)) {
>
> // We get this gmfn is just to double check if this is
> equal to sl1mfn
> gmfn = (guest_l2e_get_flags(*gl2e) & _PAGE_PSE) //
> ###!!!! CRASH !!!!!####
> ? get_fl1_shadow_status(v, gfn)
> : get_shadow_status(v, convert_gfn_to_mfn(v,
> gfn, gl2mfn),
> SH_type_l1_shadow);
>
> if (mfn_x(gmfn) != mfn_x(sl1mfn)) {
> printk("!! gmfn %" PRI_mfn " != sl1mfn %"
> PRI_mfn "\n", gmfn, sl1mfn);
> } else {
> printk("going down to traverse level 1 SPT\n");
> }
> }
>
> });
> sh_unmap_domain_page(gp);
> return 0;
> }
>
> Could you help a little bit on this?
> Many thanks,
> Jui-Hao
>
> On Fri, Apr 24, 2009 at 9:32 AM, Gianluca Guida
> <gianluca.guida@xxxxxxxxxxxxx> wrote:
>> On Fri, Apr 24, 2009 at 6:23 AM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx>
>> wrote:
>>> I have some additional doubts as the following:
>>> (1) For normal data page, in order to propagate the Dirty or Access
>>> bit from SPTE to GPTE, the hypervisor needs to set Read-Only in the
>>> SPTE. When the write page fault of this data page comes, hypervisor
>>> can propagate the Dirty or Access bit to GPTE and set it to R/W. My
>>> question is when does the hypervisor make it Read-Only again? Is there
>>> any place inside the source code you can point out?
>>
>> What happens is this: the guest has to clear the dirty/accessed bit
>> and then flush the tlb (or invlpg the entry).
>> If the pagetable is mapped read only (as in levels > 1) the write to
>> the pagetable will trigger the emulator that will update the entry.
>> Otherwhise (if the page is out of sync, which means a writable guest
>> pagetable, and this happens when it's an L1) the flushtlb will do the
>> job of updating the shadow entry.
>>
>> Look at how sh_propagate function works and when it get called. It's
>> what you're looking for.
>>
>>> (2) How many shadow pages are maintained for each guest domain? If the
>>> hypervisor keep only one shadow page table for the active process in
>>> each guest domain, then during the guest context-switch, it might
>>> erase the entire shadow page table, and re-construct it for the new
>>> process, which seems a lot of overhead. I have checked the
>>> sh_update_cr3(), but not sure of the detailed mechanism.
>>
>> There's a pool of shadow memory that get reused in a pseudo-LRU
>> manner. Across cr3 switch toplevel pagetables are kept in memory, and
>> unshadowed when evicted by the allocator or when other things happens,
>> mostly based on heuristic and reference counting.
>>
>> Thanks,
>> Gianluca
>>
>> --
>> It was a type of people I did not know, I found them very strange and
>> they did not inspire confidence at all. Later I learned that I had been
>> introduced to electronic engineers.
>> E. W. Dijkstra
>>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|