WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ia64-devel

Re: [Xen-ia64-devel] Re: [patch 0/7] Kexec: EFI Mapping: Take III

On Thu, Nov 01, 2007 at 04:11:29PM +0900, Simon Horman wrote:
> On Wed, Oct 31, 2007 at 11:50:39AM -0600, Alex Williamson wrote:
> > 
> > On Wed, 2007-10-31 at 18:00 +0900, Simon Horman wrote:
> > > I've been unable to reproduce this problem on my RX2620 with the default
> > > config (which includes the E1000 driver), so unfortunately I think you
> > > will have to do some testing for me. Hopefully this won't take too many
> > > round trips.
> > > 
> > > Below is a rather long patch which does two things:
> > [snip]
> > > 
> > >    (XEN) EFI memory fault addr=0xe00000407fff0010 kr2=0x020202020c020202
> > > 
> > > If you could send me the boot log, or at the very least the lines that
> > > look like the one above, that should tell me something interesting.
> > > Your kernel config may also be interesting if it deviates from the
> > > default config significantly.
> > 
> > Hi Simon,
> > 
> >    Well, unfortunately, I didn't see any of those and it fails the same
> > way with the patch.  I am using the default build config.  I also tried
> > testing on an rx2600 to see if I could figure out how to make it fail,
> > (un)fortunately it failed right away.  Something seems wrong with SAL
> > calls on that system, so it never even finds any I/O devices.  I'm
> > attaching the broken boot log and a good boot log without this series
> > applied for comparison.  The rx2600 is running system firmware 2.31 (the
> > latest available) in cause you have one sitting around to test with.
> > 
> >    On the rx6600, the failing address falls within a memory mapped I/O
> > range with runtime mappings in the EFI MDT.  As a test, I used the old
> > uncached offset for these mappings, and that does allow the rx6600 to
> > boot to dom0.  BTW, there are a lot of build warnings introduced by the
> > header changes that need to get cleaned up.  Let me know if you can
> > reproduce either of these or send me more test cases to try.  Thanks,
> 
> Thanks, I'll scrach my head a bit and get back to you.

Hi Alex,

I have made some slight progress with regards to this problem,
perhaps you or someone else with a rx6600[*] could test out
my latest creation. It applies on top of the series that I posted here:

http://lists.xensource.com/archives/html/xen-ia64-devel/2007-10/threads.html#00268

Firstly I have to say that I have had a little luck in reproducing a
panic that looks suspiciously like yours if I force the hypervisor's
ia64_do_page_fault to handle the mapping of EFI memory even if it knows
the fault is from a guest (in this case dom0). Or in other words, if I
add a bug I see a problem similar to yours.

This let me to suspect that perhaps what is happening on the rx6600 is
that for some reason the hypervisor thinks that a fault for EFI memory
is generated by the hypervisor when infact it is from dom0. I'm not sure
how this could occur. But I wrote a patch to excercise the idea.

As before, could you (or someone else with an rx6600) test the this
patch on top of the existing series and look out for lines that look
like this.

EFI memory fault addr=0xe0000040ffff2010 attr=0x8000000000000008 
kr2=0x0202020c02020202 guest_mode=0 rr[7]=0x0000000000000861

I'm particularly interested in any line like that which might occur just
before a panic. Though actually the entire boot log would be of
interest. I'd also be kind of interested to get the binary hypervisor
image that causes the problem. Perhaps I'm just building things
sufficuently differently not to see the problem.


A bit more about the patch:

Its quite long, but thats mainly because its ripping out parts of the
series to see what breaks.

* It disables as much of the assembly handling of EFI mappings as
  possible, so that we can get into ia64_do_page_fault and log things.
* It uses a very naieve algorithm in C to detect EFI memory, again
  to catch as much as possible an log it - this algoritm does
  not attempt to protect EFI from nasty guests.


About the SAL errors you saw in the rx26200:

I am unable to reproduce these and have not put much energy into it.
I'd like to look into this more closely once the fault problem is fixed.


About the warnings produced by header mangling:

Again, I haven't got to that bit yet.


[*] It would make fixing these problems quite a lot easier if
    I could get access to an HP rx6600 and rx2620 as I am really
    struggling to reprouce the problems here. I know this is tricky,
    but if someone has a way, please get in touch.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

Index: xen-unstable.hg/xen/arch/ia64/xen/faults.c
===================================================================
--- xen-unstable.hg.orig/xen/arch/ia64/xen/faults.c     2007-11-10 
11:51:25.000000000 +0900
+++ xen-unstable.hg/xen/arch/ia64/xen/faults.c  2007-11-10 12:07:01.000000000 
+0900
@@ -164,6 +164,7 @@ void ia64_do_page_fault(unsigned long ad
        // FIXME should validate address here
        unsigned long pteval;
        unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL);
+       unsigned long attr = 0;
        IA64FAULT fault;
        int is_ptc_l_needed = 0;
        ia64_itir_t _itir = {.itir = itir};
@@ -182,17 +183,47 @@ void ia64_do_page_fault(unsigned long ad
        }
 
  again:
+       /* This is to handle page faults for EFI memory from guests
+        * N.B: This is probably safe for dom0, but
+        * completely unsafe for domU */
+       if (address >> 59 == __IA64_EFI_CACHED_OFFSET >> 59) {
+               attr = efi_mem_attributes(address &
+                                         ~__IA64_EFI_CACHED_OFFSET);
+               if (! (attr & EFI_MEMORY_RUNTIME && attr & EFI_MEMORY_WB) )
+                       attr = 0;
+       }
+       else if (address >> 59 == __IA64_EFI_UNCACHED_OFFSET >> 59) {
+               attr = efi_mem_attributes(address &
+                                         ~__IA64_EFI_UNCACHED_OFFSET);
+               if (! (attr & EFI_MEMORY_RUNTIME &&
+                      attr & (EFI_MEMORY_UC|EFI_MEMORY_WC|EFI_MEMORY_WT)) )
+                       attr = 0;
+       }
+       if (attr)
+               printk("EFI memory fault addr=0x%016lx attr=0x%016lx "
+                      "kr2=0x%016lx guest_mode=%d rr[%d]=0x%016lx\n",
+                      address, attr, ia64_get_kr(2), guest_mode(regs),
+                      address >> 61, ia64_get_rr(address >> 61));
+       if (attr && guest_mode(regs)) {
+               /* Removing the following line caues
+                * an unhandled page_fault in dom0 on my hardware,
+                * similar to what Alex Williamson is seeing
+                * without this patch. */
+               attr = 0;
+       }
+
+
        fault = vcpu_translate(current, address, is_data, &pteval,
-                              &itir, &iha);
+                              &itir, &iha, attr);
        if (fault == IA64_NO_FAULT || fault == IA64_USE_TLB) {
                struct p2m_entry entry;
                unsigned long m_pteval;
                m_pteval = translate_domain_pte(pteval, address, itir,
-                                               &(_itir.itir), &entry);
+                                               &(_itir.itir), &entry, attr);
                vcpu_itc_no_srlz(current, is_data ? 2 : 1, address,
-                                m_pteval, pteval, _itir.itir, &entry);
+                                m_pteval, pteval, _itir.itir, &entry, attr);
                if ((fault == IA64_USE_TLB && !current->arch.dtlb.pte.p) ||
-                   p2m_entry_retry(&entry)) {
+                    (!attr && p2m_entry_retry(&entry))) {
                        /* dtlb has been purged in-between.  This dtlb was
                           matching.  Undo the work.  */
                        vcpu_flush_tlb_vhpt_range(address, _itir.ps);
@@ -215,7 +246,9 @@ void ia64_do_page_fault(unsigned long ad
                        // indicate a bad xen pointer
                        printk("*** xen_handle_domain_access: exception table"
                               " lookup failed, iip=0x%lx, addr=0x%lx, "
-                              "spinning...\n", iip, address);
+                              "rr[%d]=0x%lx kr2=0x%lx spinning...\n", iip,
+                              address, (address >> 62),
+                              ia64_get_rr(address >> 62), ia64_get_kr(2));
                        panic_domain(regs, "*** xen_handle_domain_access: "
                                     "exception table lookup failed, "
                                     "iip=0x%lx, addr=0x%lx, spinning...\n",
@@ -770,7 +803,7 @@ ia64_shadow_fault(unsigned long ifa, uns
 
                /* FIXME: gives a chance to tpa, as the TC was valid.  */
 
-               fault = vcpu_translate(v, ifa, 1, &pte, &itir, &iha);
+               fault = vcpu_translate(v, ifa, 1, &pte, &itir, &iha, 0);
 
                /* Try again!  */
                if (fault != IA64_NO_FAULT) {
Index: xen-unstable.hg/xen/arch/ia64/xen/ivt.S
===================================================================
--- xen-unstable.hg.orig/xen/arch/ia64/xen/ivt.S        2007-11-10 
11:52:47.000000000 +0900
+++ xen-unstable.hg/xen/arch/ia64/xen/ivt.S     2007-11-10 12:07:01.000000000 
+0900
@@ -65,7 +65,7 @@
 # define PSR_DEFAULT_BITS      0
 #endif
 
-#if 0
+#if 1
   /*
    * This lets you track the last eight faults that occurred on the CPU.
    * Make sure ar.k2 isn't needed for something else before enabling this...
@@ -115,24 +115,14 @@ ENTRY(itlb_miss)
        DBG_FAULT(1)
        mov r16 = cr.ifa
        mov r31 = pr
-       /* If address belongs to VMM, go to alt tlb handler */
-       movl r26=itlb_miss_identity_map
-       movl r27=fast_tlb_miss_reflect
-       ;;
-       br.cond.spnt branch_on_xen_memory
-       ;;
-itlb_miss_identity_map:
-       /* EFI PAGE size is IA64_GRANULE_SIZE
-        * itir's key should be 0, as should the reserved space
-        * thus we can just set itir = (IA64_GRANULE_SHIFT << 2)
-        */
-       movl r20=IA64_GRANULE_SHIFT
        ;;
-       shl r20=r20,2
+       extr.u r17=r16,59,5
        ;;
-       mov cr.itir=r20
+       /* If address belongs to VMM, go to alt tlb handler */
+       cmp.eq p6,p0=0x1e,r17
+(p6)   br.cond.spnt    late_alt_itlb_miss
+       br.cond.sptk fast_tlb_miss_reflect
        ;;
-       br.cond.sptk late_alt_itlb_miss
 END(itlb_miss)
 
        .org ia64_ivt+0x0800
@@ -142,10 +132,14 @@ ENTRY(dtlb_miss)
        DBG_FAULT(2)
        mov r16=cr.ifa                  // get virtual address
        mov r31=pr
-       movl r26=late_alt_dtlb_miss
-       movl r27=fast_tlb_miss_reflect
        ;;
-       br.cond.spnt branch_on_xen_memory
+       extr.u r17=r16,59,5
+       ;;
+       /* If address belongs to VMM, go to alt tlb handler */
+       cmp.eq p6,p0=0x1e,r17
+(p6)   br.cond.spnt    late_alt_dtlb_miss
+       br.cond.sptk fast_tlb_miss_reflect
+       ;;
 END(dtlb_miss)
 
        .org ia64_ivt+0x0c00
Index: xen-unstable.hg/xen/arch/ia64/xen/mm.c
===================================================================
--- xen-unstable.hg.orig/xen/arch/ia64/xen/mm.c 2007-11-10 11:51:25.000000000 
+0900
+++ xen-unstable.hg/xen/arch/ia64/xen/mm.c      2007-11-10 11:52:50.000000000 
+0900
@@ -502,7 +502,7 @@ gmfn_to_mfn_foreign(struct domain *d, un
 // Xen PAGE_SIZE and return modified pte.  (NOTE: TLB insert should use
 // current->arch.vhpt_pg_shift!)
 u64 translate_domain_pte(u64 pteval, u64 address, u64 itir__, u64* itir,
-                         struct p2m_entry* entry)
+                         struct p2m_entry* entry, unsigned long efi_attr)
 {
        struct domain *d = current->domain;
        ia64_itir_t _itir = {.itir = itir__};
@@ -511,6 +511,18 @@ u64 translate_domain_pte(u64 pteval, u64
        u64 arflags2;
        u64 maflags2;
 
+       /* EFI-Runtime areas are itendity mapped into the same location
+        * that they are maped in Linux with GRANULE size pages.
+        * If efi_attr is non-zero, then the address is EFI-Runtime memory.
+        */
+       if (efi_attr) {
+               /* Copy the whole register. */
+               ((ia64_itir_t*)itir)->itir = _itir.itir;
+               /* Overwrite ps part! */
+               ((ia64_itir_t*)itir)->ps = IA64_GRANULE_SHIFT;
+               return pteval;
+       }
+
        pteval &= ((1UL << 53) - 1);// ignore [63:53] bits
 
        // FIXME address had better be pre-validated on insert
Index: xen-unstable.hg/xen/arch/ia64/xen/vcpu.c
===================================================================
--- xen-unstable.hg.orig/xen/arch/ia64/xen/vcpu.c       2007-11-10 
11:51:25.000000000 +0900
+++ xen-unstable.hg/xen/arch/ia64/xen/vcpu.c    2007-11-10 11:52:57.000000000 
+0900
@@ -7,6 +7,7 @@
  */
 
 #include <linux/sched.h>
+#include <linux/efi.h>
 #include <public/xen.h>
 #include <xen/mm.h>
 #include <asm/ia64_int.h>
@@ -1670,13 +1671,30 @@ vcpu_get_domain_bundle(VCPU * vcpu, REGS
 }
 
 IA64FAULT vcpu_translate(VCPU * vcpu, u64 address, BOOLEAN is_data,
-                        u64 * pteval, u64 * itir, u64 * iha)
+                        u64 * pteval, u64 * itir, u64 * iha,
+                        unsigned long efi_attr)
 {
        unsigned long region = address >> 61;
        unsigned long pta, rid, rr, key = 0;
        union pte_flags pte;
        TR_ENTRY *trp;
 
+       /* EFI-Runtime areas are identity mapped into
+        * the same location that they are maped in Linux.
+        * If efi_attr is non-zero then * (efi_attr & EFI_MEMORY_RUNTIME) is
+        * true and that either (efi_attr & EFI_MEMORY_WB) is true or
+        * (efi_attr & (EFI_MEMORY_UC|EFI_MEMORY_WC|EFI_MEMORY_WT)) is true.
+        */
+       if (efi_attr) {
+               if (efi_attr & EFI_MEMORY_WB)
+                       *pteval = (address & _PAGE_PPN_MASK) | __DIRTY_BITS |
+                               _PAGE_AR_RWX;
+               else
+                       *pteval = (address & _PAGE_PPN_MASK) | __DIRTY_BITS |
+                               _PAGE_AR_RWX | _PAGE_PL_PRIV | _PAGE_MA_UC;
+               return IA64_NO_FAULT;
+       }
+
        if (PSCB(vcpu, metaphysical_mode) && !(!is_data && region)) {
                // dom0 may generate an uncacheable physical address (msb=1)
                if (region && ((region != 4) || (vcpu->domain != dom0))) {
@@ -1806,7 +1824,7 @@ IA64FAULT vcpu_tpa(VCPU * vcpu, u64 vadr
        u64 pteval, itir, mask, iha;
        IA64FAULT fault;
 
-       fault = vcpu_translate(vcpu, vadr, TRUE, &pteval, &itir, &iha);
+       fault = vcpu_translate(vcpu, vadr, TRUE, &pteval, &itir, &iha, 0);
        if (fault == IA64_NO_FAULT || fault == IA64_USE_TLB) {
                mask = itir_mask(itir);
                *padr = (pteval & _PAGE_PPN_MASK & mask) | (vadr & ~mask);
@@ -1820,7 +1838,7 @@ IA64FAULT vcpu_tak(VCPU * vcpu, u64 vadr
        u64 pteval, itir, iha;
        IA64FAULT fault;
 
-       fault = vcpu_translate(vcpu, vadr, TRUE, &pteval, &itir, &iha);
+       fault = vcpu_translate(vcpu, vadr, TRUE, &pteval, &itir, &iha, 0);
        if (fault == IA64_NO_FAULT || fault == IA64_USE_TLB)
                *key = itir & IA64_ITIR_KEY_MASK;
        else
@@ -2315,11 +2333,23 @@ vcpu_rebuild_vhpt(VCPU * vcpu, u64 ps)
 
 void
 vcpu_itc_no_srlz(VCPU * vcpu, u64 IorD, u64 vaddr, u64 pte,
-                 u64 mp_pte, u64 itir, struct p2m_entry *entry)
+                u64 mp_pte, u64 itir, struct p2m_entry *entry,
+                unsigned long efi_attr)
 {
        ia64_itir_t _itir = {.itir = itir};
        unsigned long psr;
 
+       /* EFI-Runtime areas are identity mapped into
+        * the same location that they are maped in Linux.
+        * If efi_attr is non-zero, then the address is EFI-Runtime memory
+        */
+       if (efi_attr) {
+               unsigned long psr = ia64_clear_ic();
+               ia64_itc(IorD, vaddr, pte, _itir.itir);
+               ia64_set_psr(psr);
+               return;
+       }
+
        check_xen_space_overlap("itc", vaddr, 1UL << _itir.ps);
 
        // FIXME, must be inlined or potential for nested fault here!
@@ -2364,12 +2394,12 @@ IA64FAULT vcpu_itc_d(VCPU * vcpu, u64 pt
 
  again:
        //itir = (itir & ~0xfc) | (vcpu->arch.vhpt_pg_shift<<2); // ign dom pgsz
-       pteval = translate_domain_pte(pte, ifa, itir, &(_itir.itir), &entry);
+       pteval = translate_domain_pte(pte, ifa, itir, &(_itir.itir), &entry, 0);
        if (!pteval)
                return IA64_ILLOP_FAULT;
        if (swap_rr0)
                set_virtual_rr0();
-       vcpu_itc_no_srlz(vcpu, 2, ifa, pteval, pte, _itir.itir, &entry);
+       vcpu_itc_no_srlz(vcpu, 2, ifa, pteval, pte, _itir.itir, &entry, 0);
        if (swap_rr0)
                set_metaphysical_rr0();
        if (p2m_entry_retry(&entry)) {
@@ -2392,12 +2422,12 @@ IA64FAULT vcpu_itc_i(VCPU * vcpu, u64 pt
 
       again:
        //itir = (itir & ~0xfc) | (vcpu->arch.vhpt_pg_shift<<2); // ign dom pgsz
-       pteval = translate_domain_pte(pte, ifa, itir, &(_itir.itir), &entry);
+       pteval = translate_domain_pte(pte, ifa, itir, &(_itir.itir), &entry, 0);
        if (!pteval)
                return IA64_ILLOP_FAULT;
        if (swap_rr0)
                set_virtual_rr0();
-       vcpu_itc_no_srlz(vcpu, 1, ifa, pteval, pte, _itir.itir, &entry);
+       vcpu_itc_no_srlz(vcpu, 1, ifa, pteval, pte, _itir.itir, &entry, 0);
        if (swap_rr0)
                set_metaphysical_rr0();
        if (p2m_entry_retry(&entry)) {
Index: xen-unstable.hg/xen/include/asm-ia64/mm.h
===================================================================
--- xen-unstable.hg.orig/xen/include/asm-ia64/mm.h      2007-11-10 
11:51:25.000000000 +0900
+++ xen-unstable.hg/xen/include/asm-ia64/mm.h   2007-11-10 11:52:50.000000000 
+0900
@@ -459,7 +459,8 @@ extern unsigned long dom0vp_unexpose_for
 extern volatile unsigned long *mpt_table;
 extern unsigned long gmfn_to_mfn_foreign(struct domain *d, unsigned long gpfn);
 extern u64 translate_domain_pte(u64 pteval, u64 address, u64 itir__,
-                               u64* itir, struct p2m_entry* entry);
+                               u64* itir, struct p2m_entry* entry,
+                               unsigned long efi_attr);
 #define machine_to_phys_mapping        mpt_table
 
 #define INVALID_M2P_ENTRY        (~0UL)
Index: xen-unstable.hg/xen/include/asm-ia64/vcpu.h
===================================================================
--- xen-unstable.hg.orig/xen/include/asm-ia64/vcpu.h    2007-11-10 
11:51:25.000000000 +0900
+++ xen-unstable.hg/xen/include/asm-ia64/vcpu.h 2007-11-10 11:52:50.000000000 
+0900
@@ -162,7 +162,8 @@ union U_IA64_BUNDLE;
 extern int vcpu_get_domain_bundle(VCPU * vcpu, REGS * regs, u64 gip,
                                   union U_IA64_BUNDLE *bundle);
 extern IA64FAULT vcpu_translate(VCPU * vcpu, u64 address, BOOLEAN is_data,
-                                u64 * pteval, u64 * itir, u64 * iha);
+                               u64 * pteval, u64 * itir, u64 * iha,
+                               unsigned long efi_attr);
 extern IA64FAULT vcpu_tpa(VCPU * vcpu, u64 vadr, u64 * padr);
 extern IA64FAULT vcpu_force_inst_miss(VCPU * vcpu, u64 ifa);
 extern IA64FAULT vcpu_force_data_miss(VCPU * vcpu, u64 ifa);
@@ -182,7 +183,7 @@ extern BOOLEAN vcpu_timer_expired(VCPU *
 extern u64 vcpu_deliverable_interrupts(VCPU * vcpu);
 struct p2m_entry;
 extern void vcpu_itc_no_srlz(VCPU * vcpu, u64, u64, u64, u64, u64,
-                             struct p2m_entry *);
+                             struct p2m_entry *, unsigned long efi_attr);
 extern u64 vcpu_get_tmp(VCPU *, u64);
 extern void vcpu_set_tmp(VCPU *, u64, u64);
 

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel