Hello,
On Sun, May 3, 2009 at 3:39 PM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx> wrote:
> My purpose is to walk down the SPT and GPT during each process context
> switch (sh_update_cr3), and do some statistics first, e.g. dirty,
> access, present bit.
>
> Now I tried another checking in level 2 SPT by skipping those sl1mfn
> which does not pass sh_mfn_is_a_page_table(sl1mfn) check, then the
> inconsistency is gone is level 1 SPT traversing.
sh_mfn_is_a_page_table is meant to be used for *guest* pages, not shadow pages.
>
> Can anyone show some hint about how to do the right thing? Is there
> some special type of SPTE that I should not traverse down?
No, if a shadow is linked to other shadows (be sure to held the shadow
lock while traversing them, or you can't be sure of what you're
reading) then it can be accessed by domains, so no funny things should
be mapped in there.
The only thing you should be careful of are splintered shadows: when a
guest has PSE set we do create an artificial L1 (called fl1)
containing all individuals 4k mappings. It's very important to note,
for these shadows, that the backpointer will not contain a link to the
guest pagetable, which infact doesn't exist, but the *gfn* of the 2mb
range we're splintering.
Hope this is clear and useful,
Gianluca
>
> Many thanks,
> Jui-Hao
>
>
>
> On Fri, May 1, 2009 at 10:47 PM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx>
> wrote:
>> Hi, sorry for disturbing you guys again.
>>
>> Assume guest's paging level is 2 and shadow is using level 3 PAE.
>> I am now trying to dump the L2 shadow page table information in the
>> beginning of sh_update_cr3() as the following (actually copying the
>> code from sh_audit_l2_table and audit_gfn_to_mfn functions)
>>
>> The code accidentally crashes in guest_l2e_get_flags(*gl2e) of the
>> sh_walk_l2_table I wrote.
>> However, the weird part is the code doesn't crash in gfn =
>> guest_l2e_get_gfn(*gl2e) which is accessing the *gl2e in a similar way
>> as guest_l2e_get_flags.
>>
>> static inline mfn_t
>> convert_gfn_to_mfn(struct vcpu *v, gfn_t gfn, mfn_t gmfn)
>> {
>> p2m_type_t p2mt;
>> if ( !shadow_mode_translate(v->domain) )
>> return _mfn(gfn_x(gfn));
>>
>> if ( (mfn_to_page(gmfn)->u.inuse.type_info & PGT_type_mask)
>> != PGT_writable_page )
>> return _mfn(gfn_x(gfn)); // This is a paging-disabled shadow
>> else
>> return gfn_to_mfn(v->domain, gfn, &p2mt);
>> }
>>
>> /* JuiHao: walk the l2 shadow page table based on input sl2mfn */
>> static int sh_walk_l2_table(struct vcpu *v, mfn_t sl2mfn, mfn_t x)
>> {
>> guest_l2e_t *gl2e, *gp;
>> shadow_l2e_t *sl2e;
>> mfn_t sl1mfn, gl2mfn;
>> gfn_t gfn;
>> mfn_t gmfn;
>> int done = 0;
>>
>> /* Follow the backpointer in struct shadow_page_info to get guest
>> l2mfn */
>> gl2mfn = _mfn(mfn_to_shadow_page(sl2mfn)->backpointer);
>> gl2e = gp = sh_map_domain_page(gl2mfn);
>>
>> SHADOW_FOREACH_L2E(sl2mfn, sl2e, &gl2e, done, v->domain, {
>>
>> gfn = guest_l2e_get_gfn(*gl2e); // ###!!!! Works Fine
>> !!!!!####
>> sl1mfn = shadow_l2e_get_mfn(*sl2e);
>>
>> if (mfn_valid(sl1mfn) && (shadow_l2e_get_flags(*sl2e) &
>> _PAGE_PRESENT)) {
>>
>> // We get this gmfn is just to double check if this
>> is equal to sl1mfn
>> gmfn = (guest_l2e_get_flags(*gl2e) & _PAGE_PSE) //
>> ###!!!! CRASH !!!!!####
>> ? get_fl1_shadow_status(v, gfn)
>> : get_shadow_status(v, convert_gfn_to_mfn(v,
>> gfn, gl2mfn),
>> SH_type_l1_shadow);
>>
>> if (mfn_x(gmfn) != mfn_x(sl1mfn)) {
>> printk("!! gmfn %" PRI_mfn " != sl1mfn %"
>> PRI_mfn "\n", gmfn, sl1mfn);
>> } else {
>> printk("going down to traverse level 1
>> SPT\n");
>> }
>> }
>>
>> });
>> sh_unmap_domain_page(gp);
>> return 0;
>> }
>>
>> Could you help a little bit on this?
>> Many thanks,
>> Jui-Hao
>>
>> On Fri, Apr 24, 2009 at 9:32 AM, Gianluca Guida
>> <gianluca.guida@xxxxxxxxxxxxx> wrote:
>>> On Fri, Apr 24, 2009 at 6:23 AM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx>
>>> wrote:
>>>> I have some additional doubts as the following:
>>>> (1) For normal data page, in order to propagate the Dirty or Access
>>>> bit from SPTE to GPTE, the hypervisor needs to set Read-Only in the
>>>> SPTE. When the write page fault of this data page comes, hypervisor
>>>> can propagate the Dirty or Access bit to GPTE and set it to R/W. My
>>>> question is when does the hypervisor make it Read-Only again? Is there
>>>> any place inside the source code you can point out?
>>>
>>> What happens is this: the guest has to clear the dirty/accessed bit
>>> and then flush the tlb (or invlpg the entry).
>>> If the pagetable is mapped read only (as in levels > 1) the write to
>>> the pagetable will trigger the emulator that will update the entry.
>>> Otherwhise (if the page is out of sync, which means a writable guest
>>> pagetable, and this happens when it's an L1) the flushtlb will do the
>>> job of updating the shadow entry.
>>>
>>> Look at how sh_propagate function works and when it get called. It's
>>> what you're looking for.
>>>
>>>> (2) How many shadow pages are maintained for each guest domain? If the
>>>> hypervisor keep only one shadow page table for the active process in
>>>> each guest domain, then during the guest context-switch, it might
>>>> erase the entire shadow page table, and re-construct it for the new
>>>> process, which seems a lot of overhead. I have checked the
>>>> sh_update_cr3(), but not sure of the detailed mechanism.
>>>
>>> There's a pool of shadow memory that get reused in a pseudo-LRU
>>> manner. Across cr3 switch toplevel pagetables are kept in memory, and
>>> unshadowed when evicted by the allocator or when other things happens,
>>> mostly based on heuristic and reference counting.
>>>
>>> Thanks,
>>> Gianluca
>>>
>>> --
>>> It was a type of people I did not know, I found them very strange and
>>> they did not inspire confidence at all. Later I learned that I had been
>>> introduced to electronic engineers.
>>> E. W. Dijkstra
>>>
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
--
It was a type of people I did not know, I found them very strange and
they did not inspire confidence at all. Later I learned that I had been
introduced to electronic engineers.
E. W. Dijkstra
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|