With this new patch (not including Tristan's stability patch by far), we can
Successfully finish 50 linux compiles.
We'll continue the test.
Thanks,
-Anthony
>-----Original Message-----
>From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx]
>Sent: 2006年4月30日 0:13
>To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; Tristan Gingold;
>xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux Kernel Dev)
>Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>
>Argh! After 103 successful linux compiles, two of the
>next 10 had a segfault. Restarting again with Anthony's
>updated patch (plus Tristan's stability patch)...
>
>> -----Original Message-----
>> From: Magenheimer, Dan (HP Labs Fort Collins)
>> Sent: Saturday, April 29, 2006 7:58 AM
>> To: 'Xu, Anthony'; Tristan Gingold;
>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux
>> Kernel Dev)
>> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>>
>> Hi Anthony --
>>
>> With both Tristan's stability patch and your earlier patch,
>> I have completed 103 linux compiles now with no segfaults
>> yet. Did you see your segfault with Tristan's patch
>> included?
>>
>> I'll continue running over the weekend with the bits I
>> have but if I see a segfault I will add in the additional
>> store in Xen entry (minstate.h) from your newer patch.
>>
>> Dan
>>
>> > -----Original Message-----
>> > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> > Sent: Saturday, April 29, 2006 12:03 AM
>> > To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold;
>> > xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux
>> > Kernel Dev)
>> > Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>> >
>> > Hi Dan,
>> >
>> > Yes, we also got a segmentation fault in 1 run out of 30.
>> >
>> > Could you please try this new patch?
>> >
>> > Thanks,
>> > -Anthony
>> >
>> > >-----Original Message-----
>> > >From: Magenheimer, Dan (HP Labs Fort Collins)
>> > [mailto:dan.magenheimer@xxxxxx]
>> > >Sent: 2006年4月28日 22:49
>> > >To: Xu, Anthony; Tristan Gingold;
>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
>> > >Williamson, Alex (Linux Kernel Dev)
>> > >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>> > >
>> > >Hi Anthony --
>> > >
>> > >I tried your patch overnight and still got a segmentation
>> > >fault in 1 run out of 50. I didn't try Tristan's patch yet,
>> > >so will try both at the same time next... perhaps there
>> > >are two different problems that show up as the segmentation
>> > >fault.
>> > >
>> > >Dan
>> > >
>> > >> -----Original Message-----
>> > >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> > >> Sent: Thursday, April 27, 2006 9:19 PM
>> > >> To: Xu, Anthony; Tristan Gingold;
>> > >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs
>> > >> Fort Collins); Williamson, Alex (Linux Kernel Dev)
>> > >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>> > >>
>> > >> Hi Tristan,
>> > >> Could you please check whether this patch address RSE issue?
>> > >>
>> > >> Yes, Intel QA team is doing the test in the meantime.
>> > >>
>> > >>
>> > >> Thanks,
>> > >> -Anthony
>> > >>
>> > >> >-----Original Message-----
>> > >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> > >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
>> > >> Behalf Of Xu, Anthony
>> > >> >Sent: 2006?4?28? 9:48
>> > >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
>> > >> Magenheimer, Dan (HP
>> > >> >Labs Fort Collins); Alex Williamson
>> > >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>> > >> >
>> > >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> > >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
>> > >> Behalf Of Tristan
>> > >> >>Gingold
>> > >> >>Sent: 2006?4?27? 23:14
>> > >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan
>> > >> (HP Labs Fort
>> > >> >>Collins); Alex Williamson
>> > >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability
>> > >> >>
>> > >> >>Hi,
>> > >> >>
>> > >> >>as reported earlier, this patch seems to improve stability:
>> > >> crashes are at
>> > >> >>least more coherent and maybe less frequent.
>> > >> >>
>> > >> >>RSE handling seems to have a bug: crahes are now due to
>> > >> either a bad value in
>> > >> >>a stacked register or a use of an invalid stacked register
>> > >> (although cfm
>> > >> >>seems correct in gdb!)
>> > >> >
>> > >> >I'm looking at this too,
>> > >> >Yes there is a bug about handle_lazy_cover.
>> > >> >
>> > >> >void ia64_do_page_fault (unsigned long address, unsigned
>> > >> long isr, struct
>> > >> >pt_regs *regs, unsigned long itir)
>> > >> >{
>> > >> > unsigned long iip = regs->cr_iip, iha;
>> > >> > // FIXME should validate address here
>> > >> > unsigned long pteval;
>> > >> > unsigned long is_data = !((isr >>
>> IA64_ISR_X_BIT) & 1UL);
>> > >> > IA64FAULT fault;
>> > >> >
>> > >> > if ((isr & IA64_ISR_IR) && handle_lazy_cover(current,
>> > >> isr, regs)) return;
>> > >> >
>> > >> >This code sequence is intended to handle following scenario.
>> > >> >
>> > >> >1. Guest executes br.ret, this may cause mandatory RSE load,
>> > >> and this load may
>> > >> >cause TLB miss.
>> > >> >2. VMM gets control, but VMM can't handle this TLB miss
>> > >> itself, then VMM injects
>> > >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi"
>> > >> to jump to Guest
>> > >> >TLB miss handler, this TLB miss happens again.
>> > >> >3. At this time, interrupt_collection_enabled is 0, so
>> > >> handle_lazy_cover
>> > >> >executes "cover" on behalf of Guest, and return to Guest TLB
>> > >> miss handler again,
>> > >> >this time there is no TLB miss.
>> > >> >
>> > >> >
>> > >> >Following code sequence is in ia64_leave_kernel path with
>> > >> psr.ic and psr.i off.
>> > >> >When br.ret.dptk.many b0 is executed, there may be a
>> > >> mandatory load, thus
>> > >> >There may be a tlb miss, according to above description
>> > >> handle_lazy_cover
>> > >> >executes "cover" on behalf of Guest and return to Guest,
>> > >> this is no correct
>> > >> >in this scenario.
>> > >> >
>> > >> >I didn't find an easy way to fix this bug.
>> > >> >
>> > >> >
>> > >> > mov loc6=0
>> > >> > mov loc7=0
>> > >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid
>> > >> > ;;
>> > >> > mov loc8=0
>> > >> > mov loc9=0
>> > >> > cmp.ne pReturn,p0=r0,in1 // if recursion count
>> > >> != 0, we need to do a
>> > >> >br.ret
>> > >> > mov loc10=0
>> > >> > mov loc11=0
>> > >> >(pReturn) br.ret.dptk.many b0
>> > >> >#endif /* !CONFIG_ITANIUM */
>> > >> ># undef pRecurse
>> > >> ># undef pReturn
>> > >> > ;;
>> > >> > alloc r17=ar.pfs,0,0,0,0 // drop current
>> register frame
>> > >> > ;;
>> > >> > loadrs
>> > >> >
>> > >> >Thanks,
>> > >> >Anthony
>> > >> >
>> > >> >
>> > >> >>
>> > >> >>Tested by doing many linux kernel compilation in SMP
>> > domU (> 100).
>> > >> >>
>> > >> >>Tristan.
>> > >> >
>> > >> >_______________________________________________
>> > >> >Xen-ia64-devel mailing list
>> > >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > >> >http://lists.xensource.com/xen-ia64-devel
>> > >>
>> >
>>
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|