WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ia64-devel

RE: [Xen-ia64-devel] PATCH: slightly improve stability

To: "Xu, Anthony" <anthony.xu@xxxxxxxxx>, "Tristan Gingold" <Tristan.Gingold@xxxxxxxx>, <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>, "Williamson, Alex (Linux Kernel Dev)" <alex.williamson@xxxxxx>
Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
From: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@xxxxxx>
Date: Sun, 30 Apr 2006 20:24:19 -0700
Delivery-date: Sun, 30 Apr 2006 20:24:34 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post: <mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcZqDLi/x07wah5RQxq1NEcRCTOfrAAVXNQAAAP4liAAGCMvIAAf8DegABBtzRAAIDqNUAAuJwmg
Thread-topic: [Xen-ia64-devel] PATCH: slightly improve stability
Excellent!  I agree you have found a very difficult bug!
I am now up to 167 linux compiles with no segfaults!
Congratulations Anthony!

One minor suggestion:  I think the new added store can
be in the same cycle as the previous load (no stop bit
needed).  I didn't look at the bundling... perhaps it
doesn't matter.

Dan

> -----Original Message-----
> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] 
> Sent: Saturday, April 29, 2006 11:44 PM
> To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; 
> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux 
> Kernel Dev)
> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> 
> >From: Magenheimer, Dan 
> >Sent: 2006年4月29日 21:58
> >To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> >Williamson, Alex (Linux Kernel Dev)
> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >
> >Hi Anthony --
> >
> >With both Tristan's stability patch and your earlier patch,
> >I have completed 103 linux compiles now with no segfaults
> >yet.   Did you see your segfault with Tristan's patch
> >included?
> >
> >I'll continue running over the weekend with the bits I
> >have but if I see a segfault I will add in the additional
> >store in Xen entry (minstate.h) from your newer patch.
> >
> 
> --- a/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h        
> Thu Apr 27 02:55:42 2006
> +++ b/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h        
> Sat Apr 29 13:14:58 2006
> @@ -155,6 +155,8 @@
>       ;;                                                      
>                               \
>       ld4 r30=[r8];                                           
>                               \
>       ;;                                                      
>                               \
> +     /* set XSI_INCOMPL_REGFR 0 */                           
>                               \
> +     st4 [r8]=r0;                                            
>                               \
>       cmp.eq  p6,p7=r30,r0;                                   
>                               \
>       ;; /* not sure if this stop bit is necessary */         
>                               \
>  (p6) adds r8=XSI_PRECOVER_IFS-XSI_INCOMPL_REGFR,r8;          
> 
> The additional store is necessary.
> 
> In theory, after Guest executes "cover", incomplete frame 
> changes to complete 
> frame. So Guest should set INCOMPL to 0 just after "cover". 
> At least before guest 
> psr.ic and psr.i are turned on.
> 
> Previously, only when Guest executes "rfi", INCOMPL is set to 
> 0. The window 
> between "cover" and "rfi" causes trouble in below scenario.
> 
> 1. Application A calls system call.
> 
> 2. In OS breaks handler entry, INCOMPL is 0. Due to its system call, 
>    Linux kernel doesn't execute "cover".
> 
> 3. Before returning to Application A, schedule happens, 
> Application B begins
>    to run.
> 
> 4. A TLB miss happens on the context of B, this may make 
> INCOMPL 1, before 
>    Returning to B, (that means "rfi" is not executed, and 
> INCOMPL is still 1)
>   schedule happens again. A resumes to run with INCOMPL 1 
> (this is incorrect now).
> 
> 5. As mentioned before, this is system call, "cover" is executed in 
>    ia64_leave_kernel path.  Because INCOMPL is 1, this 
> "cover" is not actually
>    executed, but this "cover" should be executed.
> 
> 5. Thus application A's frame is destroyed. Issue appears.
> 
> 
> I did catch this scenario.
> 
> Thanks,
> Anthony
> 
> 
> >Dan
> >
> >> -----Original Message-----
> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> >> Sent: Saturday, April 29, 2006 12:03 AM
> >> To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold;
> >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux
> >> Kernel Dev)
> >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >>
> >> Hi Dan,
> >>
> >> Yes, we also got a segmentation fault in 1 run out of 30.
> >>
> >> Could you please try this new patch?
> >>
> >> Thanks,
> >> -Anthony
> >>
> >> >-----Original Message-----
> >> >From: Magenheimer, Dan (HP Labs Fort Collins)
> >> [mailto:dan.magenheimer@xxxxxx]
> >> >Sent: 2006年4月28日 22:49
> >> >To: Xu, Anthony; Tristan Gingold; 
> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> >> >Williamson, Alex (Linux Kernel Dev)
> >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >> >
> >> >Hi Anthony --
> >> >
> >> >I tried your patch overnight and still got a segmentation
> >> >fault in 1 run out of 50.  I didn't try Tristan's patch yet,
> >> >so will try both at the same time next... perhaps there
> >> >are two different problems that show up as the segmentation
> >> >fault.
> >> >
> >> >Dan
> >> >
> >> >> -----Original Message-----
> >> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> >> >> Sent: Thursday, April 27, 2006 9:19 PM
> >> >> To: Xu, Anthony; Tristan Gingold;
> >> >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs
> >> >> Fort Collins); Williamson, Alex (Linux Kernel Dev)
> >> >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >> >>
> >> >> Hi Tristan,
> >> >> Could you please check whether this patch address RSE issue?
> >> >>
> >> >> Yes, Intel QA team is doing the test in the meantime.
> >> >>
> >> >>
> >> >> Thanks,
> >> >> -Anthony
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >> >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> >> >> Behalf Of Xu, Anthony
> >> >> >Sent: 2006?4?28? 9:48
> >> >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> >> >> Magenheimer, Dan (HP
> >> >> >Labs Fort Collins); Alex Williamson
> >> >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >> >> >
> >> >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >> >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> >> >> Behalf Of Tristan
> >> >> >>Gingold
> >> >> >>Sent: 2006?4?27? 23:14
> >> >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan
> >> >> (HP Labs Fort
> >> >> >>Collins); Alex Williamson
> >> >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability
> >> >> >>
> >> >> >>Hi,
> >> >> >>
> >> >> >>as reported earlier, this patch seems to improve stability:
> >> >> crashes are at
> >> >> >>least more coherent and maybe less frequent.
> >> >> >>
> >> >> >>RSE handling seems to have a bug: crahes are now due to
> >> >> either a bad value in
> >> >> >>a stacked register or a use of an invalid stacked register
> >> >> (although cfm
> >> >> >>seems correct in gdb!)
> >> >> >
> >> >> >I'm looking at this too,
> >> >> >Yes there is a bug about handle_lazy_cover.
> >> >> >
> >> >> >void ia64_do_page_fault (unsigned long address, unsigned
> >> >> long isr, struct
> >> >> >pt_regs *regs, unsigned long itir)
> >> >> >{
> >> >> >       unsigned long iip = regs->cr_iip, iha;
> >> >> >       // FIXME should validate address here
> >> >> >       unsigned long pteval;
> >> >> >       unsigned long is_data = !((isr >> 
> IA64_ISR_X_BIT) & 1UL);
> >> >> >       IA64FAULT fault;
> >> >> >
> >> >> >       if ((isr & IA64_ISR_IR) && handle_lazy_cover(current,
> >> >> isr, regs)) return;
> >> >> >
> >> >> >This code sequence is intended to handle following scenario.
> >> >> >
> >> >> >1. Guest executes br.ret, this may cause mandatory RSE load,
> >> >> and this load may
> >> >> >cause TLB miss.
> >> >> >2. VMM gets control, but VMM can't handle this TLB miss
> >> >> itself, then VMM injects
> >> >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi"
> >> >> to jump to Guest
> >> >> >TLB miss handler, this TLB miss happens again.
> >> >> >3. At this time, interrupt_collection_enabled is 0, so
> >> >> handle_lazy_cover
> >> >> >executes "cover" on behalf of Guest, and return to Guest TLB
> >> >> miss handler again,
> >> >> >this time there is no TLB miss.
> >> >> >
> >> >> >
> >> >> >Following code sequence is in ia64_leave_kernel path with
> >> >> psr.ic and psr.i off.
> >> >> >When br.ret.dptk.many b0 is executed, there may be a
> >> >> mandatory load, thus
> >> >> >There may be a tlb miss, according to above description
> >> >> handle_lazy_cover
> >> >> >executes "cover" on behalf of Guest and return to Guest,
> >> >> this is no correct
> >> >> >in this scenario.
> >> >> >
> >> >> >I didn't find an easy way to fix this bug.
> >> >> >
> >> >> >
> >> >> >       mov loc6=0
> >> >> >       mov loc7=0
> >> >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid
> >> >> >       ;;
> >> >> >       mov loc8=0
> >> >> >       mov loc9=0
> >> >> >       cmp.ne pReturn,p0=r0,in1        // if recursion count
> >> >> != 0, we need to do a
> >> >> >br.ret
> >> >> >       mov loc10=0
> >> >> >       mov loc11=0
> >> >> >(pReturn) br.ret.dptk.many b0
> >> >> >#endif /* !CONFIG_ITANIUM */
> >> >> >#      undef pRecurse
> >> >> >#      undef pReturn
> >> >> >       ;;
> >> >> >       alloc r17=ar.pfs,0,0,0,0        // drop current 
> register frame
> >> >> >       ;;
> >> >> >       loadrs
> >> >> >
> >> >> >Thanks,
> >> >> >Anthony
> >> >> >
> >> >> >
> >> >> >>
> >> >> >>Tested by doing many linux kernel compilation in SMP
> >> domU (> 100).
> >> >> >>
> >> >> >>Tristan.
> >> >> >
> >> >> >_______________________________________________
> >> >> >Xen-ia64-devel mailing list
> >> >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> >> >http://lists.xensource.com/xen-ia64-devel
> >> >>
> >>
> 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
<Prev in Thread] Current Thread [Next in Thread>