RE: [Xen-devel] Need help in debugging partially blocked hypervi

See my comments embedded. :)

Haitao


Dietmar Hahn wrote:
> The conclusion is, that this seems to be a workaround for the endless
> NMI loop. PMI's are a very rarely event and this should not raise a
> performance 
> problem.
I totally agree that this is only a workaround for approach 1.

> 
> I didn't try your second approach
>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical
>> PMI* when guest vcpu unmasks virtual PMI. but I have some question. 
> 
> - What if the 'physical PMI' is not unmasked in vpmu_do_interrupt and
>   a watchdog NMI would occur before the domU unmasks it?
I think the second NMI will be lost.

> - Is it possible that after handling the NMI (and not unmasking)
>   another domU got running on this CPU and therefore PMI's got lost?
LVTPC entry in physical local APIC is save/restored by Xen on VCPU switches. So 
unmasking (or not) of PMI of one vcpu should have no impact on another vcpu. 
When developing vPMU, I treated as vPMU context both PMU MSRs and LVTPC entry 
in local APIC. vPMU context is save/restored on physical HW when vcpus is 
scheduled, either in an active save/restore manner or a lazy one (depending on 
the PMU usage at the time of switch).

> 
> But the real cause of the problem is unknown. As said I saw this only
> on 
> Nehalem. Maybe there is a problem together with the hardware? Perhaps
> your 
> hardware colleagues know something more ;-)
When I found this problem, I just thought it might be a corner case that only 
happens on my box (of course, I only see this in NHM, too). 
I will try to pin HW guy to see if any explanation, since it is proven to be a 
general problem on NHM.

But before everything is clear, I think approach 2 is a better solution now.

> 
> Thanks
> Dietmar
> 
>> 
>>> 
>>> When I met this problem, I remember that I tried two approaches:
>>> 1> Setting the counter to non-zero before unmasking PMI in
>>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt
>>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI. 
>>> I remember that approach 2 can fix this issue. But I do not
>>> remember the result of approach 1, since I met this about one year
>>> ago.  
>>> It is my understanding that approach 2 is quite same as approach 1,
>>> since normally guest will set the counter to some negative value
>>> (for example, -100000) before unmasking virtual PMI.  
>>> However, approach 2 looks cleaner and more reasonable.
>>> 
>>> Can you have a try and let me know the result? If both can not
>>> work, there might be some problems that I have not met before. 
>>> 
>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. So,
>>> there is no need for me to work on that now. :) 
>>> 
>>> Haitao
>>> 
>>> 
>>> Dietmar Hahn wrote:
>>>> Hi Haitao,
>>>> 
>>>>> Can I know how you enabled vPMU on Nehalem? This is not supported
>>>>> in current Xen.
>>>> 
>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
>>>> 
>>>>> 
>>>>> Concerning vpmu support, I totally agree that we can disable this
>>>>> feature by default. If anyone really wants to use it, he can use
>>>>> boot options to turn it on.
>>>> 
>>>> Yes, that's OK for me.
>>>> 
>>>>> I am preparing a patch for that. And I will
>>>>> send a patch to enable NHM vpmu together.
>>>>> 
>>>>> For the problem that Dietmar met, I think I once met this before.
>>>>> Can you add some code in vpmu_do_interrupt that sets the counter
>>>>> you are using to a value other than zero? Please let me know if
>>>>> that can help.
>>>> 
>>>> I don't set the counter to zero. I use 0-val to set the counter.
>>>> Actually I testet on Nehalem with
>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
>>>> val=1100000 
>>>> - Fixed counter #1 (0x30a) and val=1100000
>>>> The thing is that in normal case the overflows of both counters
>>>> appear nearly at the same time. As described I added some extra
>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code looks
>>>> like: 
>>>> 
>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1.
>>>>            Step    { uint32_t HAHN_l, HAHN_h;
>>>>            HAHN_l = (uint32_t) msr_content;
>>>>            HAHN_h = (uint32_t) (msr_content >> 32);
>>>>            HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step    
>>>> }
>>>>     if ( !msr_content )
>>>>         return 0;
>>>>     core2_vpmu_cxt->global_ovf_status |= msr_content;
>>>>     msr_content = 0xC000000700000000 | ((1 <<
>>>>     core2_get_pmc_count()) - 1);
>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3. Step 
>>>> 
>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4.
>>>>         Step       { uint32_t HAHN_l, HAHN_h;
>>>>         HAHN_l = (uint32_t) msr_content;
>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5.
>>>> Step 
>>>> 
>>>>         rdmsrl(0xc3, msr_content);                        -> 6.
>>>>         Step General counter #2 HAHN_l = (uint32_t) msr_content;
>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
>>>>         rdmsrl(0x30a, msr_content);                       -> 7.
>>>>         Step Fixed counter #1 HAHN_l = (uint32_t) msr_content;
>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);   }
>>>> 
>>>> With these tracers I got the following output:
>>>> 
>>>> Last good NMI:
>>>> Both counter cause the NMI. Resetting works OK.
>>>> The counter itself were running further.
>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ] 
>>>> rdmsrl(0xc3) -> #2 general counter 
>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ] 
>>>> rdmsrl(0x30a) -> #1 fixed counter 
>>>> 
>>>> NMI from where things goes wrong:
>>>> Both counter cause the NMI. Resetting works NOT correct, only for
>>>> the general counter! The general counter (caused the NMI) seems to
>>>> be stopped! 
>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
>>>> rdmsrl(0xc3) -> #2 general counter 
>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
>>>> rdmsrl(0x30a) -> #1 fixed counter 
>>>> 
>>>> Wrong NMI:
>>>> Only the fixed counter causes the NMI (which was not resetted
>>>> during NMI handling above!) Both counter seems to be stopped!
>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
>>>> rdmsrl(0xc3) -> #2 general counter 
>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
>>>> rdmsrl(0x30a) -> #1 fixed counter 
>>>> 
>>>> And this state remains forever!
>>>> I hope my explanations are understandable ;-)
>>>> 
>>>> Until now I can see this behavior only on a Nehalem processor.
>>>> 
>>>> Thanks.
>>>> Dietmar
>>>> 
>>>>> 
>>>>> Best Regards
>>>>> Shan Haitao
>>>>> 
>>>>> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
>>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
>>>>>> <dietmar.hahn@xxxxxxxxxxxxxx> wrote: 
>>>>>> 
>>>>>>> I searched the intel processor spec but couldn't find any help.
>>>>>>> So my questions is, what is wrong here?
>>>>>>> Can anybody with more knowledge point me in the right direction,
>>>>>>> what can I still do to find the real cause of this?
>>>>>> 
>>>>>> You should probably Cc one of the Intel guys who implemented this
>>>>>> stuff -- I've added Haitao Shan.
>>>>>> 
>>>>>> Meanwhile I'd be interested to know whether things work okay for
>>>>>> you, minus performance counters and the hypervisor hang, if you
>>>>>> return immediately from vpmu_initialise(). Really at minimum we
>>>>>> need such a fix, perhaps with a boot paremeter to re-enable the
>>>>>> feature, for 3.4.2 release; allowing guests to hose the
>>>>>> hypervisor like this is of course not on.
>>>>>> 
>>>>>>  -- Keir
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Need help in debugging partially blocked hypervisor