Hi Haitao,
> Can I know how you enabled vPMU on Nehalem? This is not supported in
> current Xen.
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
>
> Concerning vpmu support, I totally agree that we can disable this
> feature by default. If anyone really wants to use it, he can use boot
> options to turn it on.
Yes, that's OK for me.
> I am preparing a patch for that. And I will
> send a patch to enable NHM vpmu together.
>
> For the problem that Dietmar met, I think I once met this before. Can
> you add some code in vpmu_do_interrupt that sets the counter you are
> using to a value other than zero? Please let me know if that can help.
I don't set the counter to zero. I use 0-val to set the counter.
Actually I testet on Nehalem with
- General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000
- Fixed counter #1 (0x30a) and val=1100000
The thing is that in normal case the overflows of both counters appear
nearly at the same time.
As described I added some extra tracer for xentrace in
core2_vpmu_do_interrupt() so the code looks like:
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. Step
{
uint32_t HAHN_l, HAHN_h;
HAHN_l = (uint32_t) msr_content;
HAHN_h = (uint32_t) (msr_content >> 32);
HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step
}
if ( !msr_content )
return 0;
core2_vpmu_cxt->global_ovf_status |= msr_content;
msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. Step
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. Step
{
uint32_t HAHN_l, HAHN_h;
HAHN_l = (uint32_t) msr_content;
HAHN_h = (uint32_t) (msr_content >> 32);
HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. Step
rdmsrl(0xc3, msr_content); -> 6. Step General
counter #2
HAHN_l = (uint32_t) msr_content;
HAHN_h = (uint32_t) (msr_content >> 32);
HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
rdmsrl(0x30a, msr_content); -> 7. Step Fixed
counter #1
HAHN_l = (uint32_t) msr_content;
HAHN_h = (uint32_t) (msr_content >> 32);
HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);
}
With these tracers I got the following output:
Last good NMI:
Both counter cause the NMI. Resetting works OK.
The counter itself were running further.
2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ]
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ]
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] rdmsrl(0xc3) -> #2
general counter
7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] rdmsrl(0x30a) -> #1
fixed counter
NMI from where things goes wrong:
Both counter cause the NMI. Resetting works NOT correct, only for the
general counter!
The general counter (caused the NMI) seems to be stopped!
2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ]
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ]
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) -> #2
general counter
7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) -> #1
fixed counter
Wrong NMI:
Only the fixed counter causes the NMI (which was not resetted during NMI
handling above!)
Both counter seems to be stopped!
2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ]
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ]
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) -> #2
general counter
7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) -> #1
fixed counter
And this state remains forever!
I hope my explanations are understandable ;-)
Until now I can see this behavior only on a Nehalem processor.
Thanks.
Dietmar
>
> Best Regards
> Shan Haitao
>
> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
> > On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> >
> >> I searched the intel processor spec but couldn't find any help.
> >> So my questions is, what is wrong here?
> >> Can anybody with more knowledge point me in the right direction, what can I
> >> still
> >> do to find the real cause of this?
> >
> > You should probably Cc one of the Intel guys who implemented this stuff --
> > I've added Haitao Shan.
> >
> > Meanwhile I'd be interested to know whether things work okay for you, minus
> > performance counters and the hypervisor hang, if you return immediately from
> > vpmu_initialise(). Really at minimum we need such a fix, perhaps with a boot
> > paremeter to re-enable the feature, for 3.4.2 release; allowing guests to
> > hose the hypervisor like this is of course not on.
> >
> > -- Keir
> >
--
Company details: http://ts.fujitsu.com/imprint.html
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|