Dan,
Could you double check the itr which is mapping PAL code is there just before
invoking ia64_pal_call_static?
Thanks
-Anthony
>-----Original Message-----
>From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx]
>Sent: 2005年12月24日 2:22
>To: Xu, Anthony; Yang, Fred; Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on
>domU on tip?
>
>I got up early and spent several hours trying to debug
>this further. By adding timing loops and other debug code
>and moving all the relevant PAL macros around, I proved
>conclusively that the ia64_pal_call_static assembly routine
>is not returning. Next I added an infinite loop to the ivt
>nested TLB handler (which isn't used by Xen except by some
>fast paths that are currently off). With this loop, the
>error message goes away and Xen "freezes". I think this
>proves that the PAL call is inappropriately accessing some
>(unpinned) data location with psr.ic off.
>
>You should note that this is the only PAL call that requires
>psr.ic to be off. I suspect that OS's need to be prepared
>for the possibility that a fault occurs. Linux is not
>so never calls the routine. Xen is not prepared either.
>
>Happy holidays!
>
>Dan
>
>> -----Original Message-----
>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> Sent: Thursday, December 22, 2005 7:29 PM
>> To: Magenheimer, Dan (HP Labs Fort Collins); Yang, Fred;
>> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel]
>> Console problem on domU on tip?
>>
>> >With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter
>> >the problem on a shipping machine and the symptom is that
>> >the machine immediately crashes when a domU is launched.
>>
>> Dan,
>> That means dom0 can boot with CONFIG_IA64_SPLIT_CACHE on, and
>> PAL_CACHE_FLUSH has been invoked successfully in the process
>> of dom0 boot. So this is not PAL_CACHE_FLUSH issue, there
>> must be some other issue. Could you provide more information
>> about the crash, due to we can't reproduce this issue.
>>
>> Thanks.
>>
>> -Anthony
>>
>>
>> >-----Original Message-----
>> >From: Magenheimer, Dan (HP Labs Fort Collins)
>> [mailto:dan.magenheimer@xxxxxx]
>> >Sent: 2005年12月22日 21:26
>> >To: Yang, Fred; Xu, Anthony; Tian, Kevin;
>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel]
>> Console problem on
>> >domU on tip?
>> >
>> >With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter
>> >the problem on a shipping machine and the symptom is that
>> >the machine immediately crashes when a domU is launched.
>> >
>> >With CONFIG_IA64_SPLIT_CACHE off, a developer may encounter
>> >a different problem on an unreleased machine.
>> >
>> >I know that you are focused primarily on the unreleased machine,
>> >but in this case, I think we should be cautious for the new user
>> >as the developer knows to change the option when running
>> >on the unreleased machine.
>> >
>> >I will spend some more time on this when I have a chance.
>> >I think it is a real bug (probably PAL accessing some address
>> >which isn't pinned) that occurs only on some boxes due
>> >to some factor like memory configuration.
>> >
>> >Thanks,
>> >Dan
>> >
>> >P.S. The debug output just before the crash was:
>> >ia64_fault: General Exception: IA-64 Reserved Register/Field
>> fault (data
>> >access): reflecting
>> >
>> >> -----Original Message-----
>> >> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx]
>> >> Sent: Wednesday, December 21, 2005 10:34 PM
>> >> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony;
>> >> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> Subject: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel]
>> >> Console problem on domU on tip?
>> >>
>> >> Dan,
>> >>
>> >> Can we suggest to always turn on #CONFIG_IA64_SPLIT_CACHE as
>> >> the default build configuration. People may not be aware of
>> >> this build flag and miss it one each new build.
>> >>
>> >> All the newer generation ia64 processors will come with
>> >> splitted I/Dcache as discussed in the previous mail thread
>> >> and it is documented in the Itanium architectur of possible
>> >> splitted cache for future implementation. With default
>> >> turning off, it is a potential bugs for all Tiger4 systems
>> >> using for daily development and future platforms to come.
>> >>
>> >> It is also indicated through your mail, it is only HP rx2620
>> >> system has issue and not the other HP boxes. Can you track
>> >> down this issue? Rather than put a kludge for rx2620 box?
>> >>
>> >> Thanks,
>> >>
>> >> -Fred
>> >>
>> >>
>> >> Magenheimer, Dan (HP Labs Fort Collins) wrote:
>> >> > Committed (but without removal of ifdefs until we
>> >> > track down this problem).
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> >> >> Sent: Monday, December 19, 2005 7:15 PM
>> >> >> To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin;
>> >> >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> >> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>> >> >>
>> >> >> I guest maybe the firmware on your machine doesn't implement
>> >> >> this pal call due to there is no split I/D cache at that
>> >> >> time, so when you call this pal call, it will return
>> >> >> PAL_STATUS_UNIMPLEMENTED, Could you please turn on
>> >> >> CONFIG_IA64_SPLIT_CACHE and try this new patch to see
>> >> >> whether your machine can boot domain0?
>> >> >> If this patch works, could you please remove all
>> >> >> CONFIG_IA64_SPLIT_CACHE macro?
>> >> >>
>> >> >> Thanks
>> >> >> -Anthony
>> >> >>
>> >> >>> -----Original Message-----
>> >> >>> From: Magenheimer, Dan (HP Labs Fort Collins)
>> >> >> [mailto:dan.magenheimer@xxxxxx]
>> >> >>> Sent: 2005年12月19日 23:48
>> >> >>> To: Xu, Anthony; Tian, Kevin;
>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> >>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>> >> >>>
>> >> >>> I have been distracted tracking another bug...
>> >> >>>
>> >> >>> Here's where I got:
>> >> >>>
>> >> >>> The machine is a new (April 2005) HP rx2620 so it is
>> >> >>> not old firmware. I can't reproduce it on a machine
>> >> >>> with an ITP (which does have older firmware).
>> >> >>>
>> >> >>> This PAL call is never used in Linux, though there is a
>> >> >>> routine coded for it. It is the only
>> >> >>> PAL call coded in Linux that occurs with psr.ic off.
>> >> >>>
>> >> >>> The crash I am seeing occurs either during the PAL call or
>> >> >>> immediately upon return.
>> >> >>>
>> >> >>> Is it OK to
>> >> >>>
>> >> >>>
>> >> >>>> -----Original Message-----
>> >> >>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> >> >>>> Sent: Monday, December 19, 2005 2:02 AM
>> >> >>>> To: Tian, Kevin; Magenheimer, Dan (HP Labs Fort Collins);
>> >> >>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> >>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>> >> >>>>
>> >> >>>> Dan,
>> >> >>>> Have you got time to verify below discussion?
>> >> >>>>
>> >> >>>> Thanks
>> >> >>>> -Anthony
>> >> >>>>
>> >> >>>>> -----Original Message-----
>> >> >>>>> From: Tian, Kevin
>> >> >>>>> Sent: 2005年12月16日 10:16
>> >> >>>>> To: Xu, Anthony; 'Magenheimer, Dan (HP Labs Fort Collins)';
>> >> >>>>> 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx'
>> >> >>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>> >> >>>>>
>> >> >>>>>> From: Xu, Anthony
>> >> >>>>>> Sent: 2005年12月16日 9:54
>> >> >>>>>>
>> >> >>>>>>> Also, why panic if it fails?
>> >> >>>>>>>
>> >> >>>>>
>> >> >>>>> Panic is not required here, and we could just print out
>> >> a warning
>> >> >>>>> message. Previously panic is kept there to help our debug in
>> >> >>>>> early stage.
>> >> >>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>> Does the problem happen only on VTI? Or both VTI and
>> >> non-VTI on
>> >> >>>>>>> split-cache machines?
>> >> >>>>>>
>> >> >>>>>> Sometimes, it makes domain0 crash at the very
>> beginning of the
>> >> >>>>>> domain0 boot process, especially on MP machine.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> Thanks
>> >> >>>>>> -Anthony
>> >> >>>>>
>> >> >>>>> One complement is, that problem definitely exists on new
>> >> >>>>> split-cache processors, for dom0/domU. For VTI
>> domain, we have
>> >> >>>>> logic within device model to ensure consistence.
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Kevin
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>> -----Original Message-----
>> >> >>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins)
>> >> >>>>>> [mailto:dan.magenheimer@xxxxxx]
>> >> >>>>>>> Sent: 2005年12月16日 1:39
>> >> >>>>>>> To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> >>>>>>> Cc: Xu, Anthony
>> >> >>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on
>> domU on tip?
>> >> >>>>>>>
>> >> >>>>>>>>> Is this code fragment necessary for VTI to boot domU
>> >> >>>>>>>>> or is it OK to remove?
>> >> >>>>>>>>
>> >> >>>>>>>> The comment is inaccurate and it should be
>> >> domU. That I/D
>> >> >>>>>>>> cache sync step is mandatory to boot domU on new IA64
>> >> >>>>>>>> processor which has split L2 I/D cache. If
>> without such I/D
>> >> >>>>>>>> cache sync, control panel loads domU's kernel image
>> >> which only
>> >> >>>>>>>> affects D side cache. If there're some stale
>> entry on I-side
>> >> >>>>>>>> cache within same range of dom0 image, people will
>> >> see machine
>> >> >>>>>>>> going weird.
>> >> >>>>>>>
>> >> >>>>>>> I don't understand... how can there be stale entries in the
>> >> >>>>>>> I-cache? The instructions have just been written to memory
>> >> >>>>>>> (through D-cache) and no instructions in this
>> domain have yet
>> >> >>>>>>> been executed.
>> >> >>>>>>> I do see that the D-cache needs to be flushed so that
>> >> memory is
>> >> >>>>>>> coherent but are there better ways to do that without a pal
>> >> >>>>>>> call?
>> >> >>>>>>>
>> >> >>>>>>>> Normally I/D cache sync shouldn't force any
>> >> problem. Possibly
>> >> >>>>>>>> there's some problem with the pal calling code, like
>> >> incorrect
>> >> >>>>>>>> ITLB mapping for pal or similar issue...
>> >> >>>>>>>
>> >> >>>>>>> Although the ia64_pal_cache_flush routine is defined
>> >> in linux's
>> >> >>>>>>> pal.h, it doesn't appear to be used anywhere in
>> Linux so there
>> >> >>>>>>> is no use model to copy. I suspect there is some use
>> >> model for
>> >> >>>>>>> the call that we don't understand, for example
>> maybe it should
>> >> >>>>>>> only be called with physical &progress? It
>> definitely fails
>> >> >>>>>>> every time on one of my (newer) machines and
>> disabling the pal
>> >> >>>>>>> call makes the problem go away.
>> >> >>>>>>>
>> >> >>>>>>>> Though it's intermittent, please
>> >> >>>>>>>> keep this code
>> >> >>>>>>>> there for correctness.
>> >> >>>>>>>
>> >> >>>>>>> Since the call is definitely failing under some
>> circumstances
>> >> >>>>>>> that we don't understand, I'm inclined to at least
>> >> put the code
>> >> >>>>>>> in an #ifdef CONFIG_SPLIT_CACHE
>> >> >>>>>>>
>> >> >>>>>>> Does the problem happen only on VTI? Or both VTI
>> and non-VTI
>> >> >>>>>>> on split-cache machines?
>> >> >>>>>>>
>> >> >>>>>>> Thanks,
>> >> >>>>>>> Dan
>> >> >>>>>>>
>> >> >>>>>>> P.S. I tried Anthony's patch (which moves the PAL
>> call after
>> >> >>>>>>> new_thread()) but it still crashes.
>> >>
>> >>
>>
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|