WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Re: Page fault is 4 times faster with XI shadowmechanism

To: "Robert Phillips" <rsp.vi.xen@xxxxxxxxx>, "zhu" <vanbas.han@xxxxxxxxx>
Subject: RE: [Xen-devel] Re: Page fault is 4 times faster with XI shadowmechanism
From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date: Wed, 5 Jul 2006 14:41:42 +0100
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 05 Jul 2006 06:42:11 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcaeiBv/4Edo85YqS2KBpeRSIDV5FABpOTHw
Thread-topic: [Xen-devel] Re: Page fault is 4 times faster with XI shadowmechanism
> Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-devel] Re: Page fault is 4 times faster with XI
> shadowmechanism
> 
> Keir et al have not given any feedback.  Not a peep.  To be generous,
> though, it is a large body of code to digest.
> -- rsp

As I said in private email with Ben when you guys embarked on the design, I 
really don’t believe the approach embodied in this code is the way to go:

The key to getting good shadow pagetable performance is optimizing demand 
faults, and although the XI patch does some useful optimizations to help the 
snapshot logic, it misses the bigger picture. The patch will still take 3 
vmenter/exit cycles per demand fault and drags a lot of data through the L1 
cache. I believe we can do better than this by taking only 2 vmenter/exits and 
accessing less data.

Key to this is moving to a simpler design and avoiding premature optimization. 
We should start off with using emulation for pte writes, then add selective 
optimizations to make fork, exit, and windows process create work faster. 
Rather than taking a writable snapshot, I believe it will be more fruitful to 
adopt the shadow pagetable equivalent of the 'late pin, early unpin' approach 
we now use for PV guests.   
 
We[*] have been working on an entirely new shadow pt implementation that 
embodies this "keep it simple" approach, and are beginning to collect a lot of 
trace data of the patch running a range of operating systems running different 
workloads. We're now in the process of adding in a few optimizations driven by 
this trace data. [One interesting point to note is Linux is quite different in 
the way it updates pagetables from *BSD and Windows NT kernels as it doesn't 
use recursive and foreign linear mappings for updating PTEs, but uses 
lowmem/highpte mappings. It requires quite different optimizations from the 
other OSes.]  

We're hoping to post a draft of our 'shadow2' patch later this week (along with 
a short design doc), but I expect we'll be adding optimizations to it for a few 
weeks more. Unlike the XI patch it's not just limited to a 64b hypervisor, and 
also supports 32b and 32b PAE. It also supports PV guests as well as just HVM, 
and is SMP safe. [NB: not all the 2-on-2, 2-on-3, 3-on-3, 2-on-4, 3-on-4, and 
4-on-4 combinations have been tested as yet -- we're focussing on X-on-3 as 
this is the hardest.]  Like the XI patch it treats shadow pagetable memory as a 
cache and properly handles evictions and cache size adjustments. Unlike the XI 
patch it doesn't burn memory maintaining pte back pointers -- there are simple 
heuristics to avoid needing these for shooting down writeable mappings.

I think the new code is *much* easier to understand and maintain than the old 
code (as is the XI patch).

Having ragged on about how I think what we have is better, I should point out 
that the XI patch is definitely a nice piece of work. If it wasn't for the fact 
that its 64b-only and HVM-only we'd probably have taken it as an interim 
measure. The shadow pagetable code is one of the most complex parts of xen and 
those that venture in have my respect. (It's the most fun part too!)

Ian

[*] Tim Deegan, George Dunlap, Michael Fetterman

> 
> On 7/2/06, zhu < vanbas.han@xxxxxxxxx <mailto:vanbas.han@xxxxxxxxx> >
> wrote:
> 
>       Really thorough explanation. Now I understand all of your concerns
> for
>       the design. All of us can tune the code if it check in the unstable
> tree.
>       BTW: How about the feedbacks from the Cambridge guys?
> 
> 
> 
> 
> 
>       Robert Phillips 写道:
>       > In XI, the idea is to have a pool of SPTIs all ready to go.  When a
> page
>       > needs to be shadowed, I simply pull a SPTI  off the list, zero its
> pages,
>       > and it is ready for use.  No further memory allocation is needed.
> This is
>       > the critical path and I want it as short as possible.
>       That's quite reasonable. Another classic example of space to time
> trade-off.
>       >
>       > One reason I have backlinks on all guest pages is because one can't
> know
>       > ahead of time which guest pages are (or will become) GPTs.  When
> the code
>       > first detects a guest page being used as a guest page table, it
> would have
>       > to do a linear search to find all SPTEs that point to the new guest
> page
>       > table, so it can mark them as readonly.
>       When the first time we shadow it, we could know it's a GPT and then
> we
>       could connect the backlinks with the SPTE. However, the disadvantages
> is
>       just as you have noted, it will increase the complexity of the
> critical
>       shadow fault path.
>       >
>       > One could do without backlinks altogether if one were willing to
> put up
>       > with
>       > linear searching.  It's a space/performance tradeoff.  I think,
> with
>       > machines now having many megabytes of memory, users are more
> concerned
>       > about
>       > performance than a small memory overhead.
>       >
>       > -- rsp
>       >
>       >
>       > On 7/2/06, zhu < vanbas.han@xxxxxxxxx> wrote:
>       >>
>       >>
>       >> Robert Phillips
> ・?牝?・瑳碣商苳殺苳?・?EURユ繻譫?粹逾?・碵雹・碵雹?皷逅踟EUR?繚∵?・斐??竟闌EUR?
> 椢o塢・・鈞齔殺鈞齔刺龠竅迸矼EUR・賀緕琺シ・?・?EUR゜錫鮠・?・?EUR、苳?赱鱧?竟闌EUR琿楳矼
> 底゜蜴踟EUR・楳?繚r來?鞳髷容流・碵雹・碵雹嗤塢?癇?瘡跫竅?筐闔
>       >> > demand (when a guest page needs to be shadowed) and, when the
> pool runs
>       >> > low,
>       >> > the LRU SPTs are torn down and their SPTIs recycled.
>       >> >
>       >> Well what I mean is that we should not connect a snapshot page
> with a
>       >> SPTI at the first time the SPTIs are reserved. It would be better
> to
>       >> manage these snapshot pages in another dynamic pool.
>       >> BTW: What do you think of the backlink issue mentioned in my
> previous
>       >> mail?
>       >> > Currently I allocate about 5% of system memory for this purpose
> (this
>       >> > includes the SPT, its snapshot and the backlink pages) and, with
> that
>       >> > reasonable investment, we get very good performance.  With more
> study,
>       >> I'm
>       >> > sure things could be tuned even better.  (I hope I have properly
>       >> understood
>       >> > your questions.)
>       >> >
>       >> > -- rsp
>       >> >
>       >> > On 7/1/06, zhu < vanbas.han@xxxxxxxxx > wrote:
>       >> >>
>       >> >> Hi,
>       >> >> After taking some time to dig into your patch about XI Shadow
> page
>       >> >> table, I have to say it's really a good design and
> implementation
>       >> IMHO,
>       >>
>       >> >> especially the parts about the clear hierarchy for each
> smfn,decision
>       >> >> table and how to support 32nopae in a rather elegant way.
> However, I
>       >> >> have several questions to discuss with you.:-)
>       >> >> 1) It seems XI shadow pgt reserve all of the possible resources
> at the
>       >> >> early stage for HVM domain(the first time to create the asi).
> It could
>       >> >> be quite proper to reserve the smfns and sptis. However, do we
> really
>       >> >> need to reserve one snapshot page for each smfn at first and
> retain it
>       >> >> until the HVM domain is destroyed? I guess a large number of
> gpts will
>       >> >> not been modified frequently after them are totally set up.
> IMHO, it
>       >> >> would be better to manage these snapshot pages dynamic. Of
> course,
>       >> this
>       >> >> will change the basic logistic of the code, e.g. you have to
> sync the
>       >> >> shadow pgt when invoke spti_make_shadow instead of leaving it
> out of
>       >> >> sync, you can't set up the total low level shadow pgt when
> invoke
>       >> >> resync_spte  since it could cost a lot of time.
>       >> >> 2) GP back link plays a very important role in XI shadow pgt.
> However,
>       >> >> it will also cause high memory pressure for the domain(2 pages
> for
>       >> each
>       >> >> smfn). For these normal guest pages instead of GPT pages, I
> guess its
>       >> >> usage is limited. Only when invoke xi_invld_mfn,
> divide_large_page or
>       >> >> dirty logging, we will refer to the back link for these normal
> guest
>       >> >> pages. Is it reasonable to implement the back link only for the
> GPT
>       >> >> pages? Of course, this will increase the complexity of the code
> a
>       >> little.
>       >> >> 3) Can you show us the statistics between the current shadow
> pgt
>       >> and XI
>       >> >> pgt for some critical operations, such as shadow_resync_all,
>       >> gva_to_gpa,
>       >> >> shadow_fault and so on. I'm really curious about it.
>       >> >>
>       >> >> I have to say I'm not very familiar with the current shadow pgt
>       >> >> implementation so I could miss some important considerations
> when I
>       >> post
>       >> >> these questions. Please point it out.
>       >> >> Thanks for sharing your idea and code with us. :-)
>       >> >>
>       >> >> _______________________________________________________
>       >> >> Best Regards,
>       >> >> hanzhu
>       >> >>
>       >> >>
>       >> >>
>       >> >> _______________________________________________
>       >> >> Xen-devel mailing list
>       >> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
>       >> >> http://lists.xensource.com/xen-devel
>       >> >>
>       >> >
>       >> >
>       >> >
>       >>
>       >
>       >
>       >
> 
> 
> 
> 
> 
> --
> --------------------------------------------------------------------
> Robert S. Phillips                          Virtual Iron Software
> rphillips@xxxxxxxxxxxxxxx                Tower 1, Floor 2
> 978-849-1220                                 900 Chelmsford Street
>                                                     Lowell, MA 01851

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>
  • RE: [Xen-devel] Re: Page fault is 4 times faster with XI shadowmechanism, Ian Pratt <=