WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: Possible regression in "x86-64: reduce range spanned

On Fri, Dec 11, 2009 at 10:15:32AM +0000, Tim Deegan wrote:
> Hi,
> 
> I seem to have missed this thread earlier; sorry about that. 
> 
> At 01:39 +0000 on 11 Dec (1260495591), Simon Horman wrote:
> > * The memory access is to a MMIO region - a control register for the NIC.
> >   The access is made by the e1000 gPXE driver.
> > 
> > * Interestingly, if there is a write to the page (or perhaps even anywhere
> >   in the entire 20-page MMIO region?) before a read, the problem doesn't
> >   occur. So a simple work-around in the gPXE driver is to just write to a
> >   register before reading any.
> > 
> > * In sh_page_fault() the call to gfn_to_mfn_guest_dbg(d, gfn, &p2mt) sets
> >   p2mt to p2m_mmio_direct, which seems to be correct. I'm a bit
> >   stuck in working out what goes wrong from there.
> 
> So from the trace below it looks like the shadow fault handler is trying
> to emulate an access to a passthrough MMIO address.  That's surprising. 
> And if it only happens when the first access is a read, that's a bit 
> suspect too. 

Sorry, I wasn't as clear as I might have been. The problem occurs
if the first access is a write. The work-around is to make
the first access a read.

> > 106706 sh: sh_page_fault__guest_2(): d:v=1:0 va=0xf30000d8 err=2, rip=6f6
> 
> Oh, but this is a write, according to the error code. 
> 
> > 106707 sh: sh_page_fault__guest_2(): 2954: A
> > 106708 sh: sh_page_fault__guest_2(): 2998: B
> > 106709 sh: sh_page_fault__guest_2(): 3077: page_fault_slow_path:
> > 106710 sh: sh_page_fault__guest_2(): 3081: C
> > 106711 sh: sh_page_fault__guest_2(): 3095: rewalk:
> > 106712 sh: sh_page_fault__guest_2(): 3097: D
> > 106713 sh: sh_page_fault__guest_2(): 3106: E
> > 106714 sh: sh_page_fault__guest_2(): 3122: F
> > (XEN) _gfn_to_mfn_type_dbg: current
> > 106715 sh: sh_page_fault__guest_2(): 3141: gfn_to_mfn_guest_dbg: p2mt=5
> > 106716 sh: sh_page_fault__guest_2(): 3143: G
> > 106717 sh: sh_page_fault__guest_2(): 3181: H
> > 106718 sh: sh_page_fault__guest_2(): 3193: I
> > 106719 sh: sh_page_fault__guest_2(): 3204: J
> > 106720 sh: sh_page_fault__guest_2(): 3216: K
> > 106721 shdebug: make_fl1_shadow(): (f3000)=>118644
> 
> FL1 shadow means we're building a shadow L1 that's equivalent to a
> single PSE guest entry, pointing a guest frames f3000 to f31ff.
> 
> > 106722 sh: set_fl1_shadow_status(): gfn=f3000, type=00000002, smfn=118644
> > 106723 shdebug: _sh_propagate(): demand write level 2 guest f30000e7 shadow 
> > 0000000118644067
> > 106724 sh: sh_page_fault__guest_2(): 3241: L
> > 106725 shdebug: _sh_propagate(): demand write level 1 guest f3000067 shadow 
> > 00000000f0500037
> > 106726 sh: sh_page_fault__guest_2(): 3263: M
> > 106727 shdebug: _sh_propagate(): prefetch level 1 guest f3001067 shadow 
> > 00000000f0501037
> > 106728 shdebug: _sh_propagate(): prefetch level 1 guest f3002067 shadow 
> > 00000000f0502037
> 
> [...] and the shadow PTE flags look OK: A, PCD, U/S, R/W, P.
> 
> > 106758 sh: sh_page_fault__guest_2(): 3285: N
> > 106759 sh: sh_page_fault__guest_2(): 3310: O
> > 106760 sh: sh_page_fault__guest_2(): 3319: P
> > 106761 sh: sh_page_fault__guest_2(): 3332: Q
> > 106762 sh: sh_page_fault__guest_2(): 3343: goto emulate;
> 
> That's the interesting one.  I guess your line numbers are different
> because of the extra printout but it must be this:
> 
>     if ( is_hvm_domain(d) 
>          && unlikely(!hvm_wp_enabled(v)) 
>          && regs->error_code == (PFEC_write_access|PFEC_page_present) )
>     {
>         perfc_incr(shadow_fault_emulate_wp);
>         goto emulate;
>     }

Yes, it is that.

> Hah.  That explains why we're emulating.  The guest has CR0.WP clear,
> and this was a write fault that wouldn't have happened on real hardware,
> so we need to emulate it because we can't actually disable CR0.WP or the
> shadow pagetables stop working altogether.
> 
> In fact in this case we don't need to emulate it because retrying would
> be good enough, but in the general case we might.

I'm not sure that I understand why that is true.

> > 106763 sh: sh_page_fault__guest_2(): 3361: emulate:
> > 106764 sh: sh_page_fault__guest_2(): 3367: R
> > 106765 sh: sh_page_fault__guest_2(): 3390: emulate_readonly:
> > 106766 sh: sh_page_fault__guest_2(): 3403: early_emulation:
> > 106767 sh: sh_page_fault__guest_2(): 3405: S
> > 106768 sh: sh_page_fault__guest_2(): emulate: eip=0x6f6 esp=0x3d264
> > 106769 sh: sh_page_fault__guest_2(): 3446: T
> > 106770 sh: sh_page_fault__guest_2(): emulator failure, unshadowing mfn 
> > 0xf0500
> 
> The emulation fails (because the emulator can't/won't map MMIO space)
> and we're about to bail out, try unshadowing everything and retry (at
> which point everything will just work).  Except:
> 
> > 106771 sh: sh_remove_shadows(): d=1, v=0, gmfn=f0500
> 
> sh_remove_shadows isn't ready to be called with an MMIO MFN.  We used to
> get away with this before the m2p was made sparse, but now MMIO holes
> mean m2p holes. 
> 
> Two fixes suggest themselves.  The first is to gate the CR0.WP emulation
> path on mfn_valid().  The emulator won't handle it so it only leads to
> confusion and delay if we try:
> 
> --- a/xen/arch/x86/mm/shadow/multi.c  Fri Dec 11 09:17:09 2009 +0000
a
> +++ b/xen/arch/x86/mm/shadow/multi.c  Fri Dec 11 10:06:39 2009 +0000
> @@ -3305,7 +3305,8 @@
>       * fault was a non-user write to a present page.  */
>      if ( is_hvm_domain(d) 
>           && unlikely(!hvm_wp_enabled(v)) 
> -         && regs->error_code == (PFEC_write_access|PFEC_page_present) )
> +         && regs->error_code == (PFEC_write_access|PFEC_page_present) 
> +         && mfn_valid(gmfn) )
>      {
>          perfc_incr(shadow_fault_emulate_wp);
>          goto emulate;

That resolves the problem that I observed.
> 
> 
> The second is to make sh_remove_shadows() safe against being called with
> a bogus MFN.  I'm not quite decided whether the right answer is to put 
> "if (!mfn_valid(gmfn)) return;" at the top of it (which would stop a
> certain class of crashes) or just "ASSERT(gmfn_valid(gmfn));" to make it
> clearer what's gone wrong.  

Adding "if (!mfn_valid(gmfn)) return;" near the top of sh_remove_shadows()
also resolves the problem that I was seeing.

Thanks (x2) !

Although I'm not really that familiar with the code in question,
I think that I have a preference for both guarding the goto emulate
and adding an ASSERT to sh_remove_shadows(). That is:

Index: xen-unstable.hg/xen/arch/x86/mm/shadow/common.c
===================================================================
--- xen-unstable.hg.orig/xen/arch/x86/mm/shadow/common.c        2009-12-11 
20:31:48.000000000 +0900
+++ xen-unstable.hg/xen/arch/x86/mm/shadow/common.c     2009-12-11 
20:48:48.000000000 +0900
@@ -2752,6 +2752,7 @@ void sh_remove_shadows(struct vcpu *v, m
     };
 
     ASSERT(!(all && fast));
+    ASSERT(mfn_valid(gmfn));
 
     /* Although this is an externally visible function, we do not know
      * whether the shadow lock will be held when it is called (since it
Index: xen-unstable.hg/xen/arch/x86/mm/shadow/multi.c
===================================================================
--- xen-unstable.hg.orig/xen/arch/x86/mm/shadow/multi.c 2009-12-11 
20:47:17.000000000 +0900
+++ xen-unstable.hg/xen/arch/x86/mm/shadow/multi.c      2009-12-11 
20:48:17.000000000 +0900
@@ -3305,7 +3305,8 @@ static int sh_page_fault(struct vcpu *v,
      * fault was a non-user write to a present page.  */
     if ( is_hvm_domain(d) 
          && unlikely(!hvm_wp_enabled(v)) 
-         && regs->error_code == (PFEC_write_access|PFEC_page_present) )
+         && regs->error_code == (PFEC_write_access|PFEC_page_present)
+         && mfn_valid(gmfn) )
     {
         perfc_incr(shadow_fault_emulate_wp);
         goto emulate;


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>