Re: [Xen-devel] Need help in debugging partially blocked hypervi

To:	xen-devel@xxxxxxxxxxxxxxxxxxx
Subject:	Re: [Xen-devel] Need help in debugging partially blocked hypervisor
From:	Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
Date:	Tue, 3 Nov 2009 10:03:32 +0100
Cc:	"Shan, Haitao" <haitao.shan@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date:	Tue, 03 Nov 2009 01:04:03 -0800
Dkim-signature:	v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=dietmar.hahn@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1257239022; x=1288775022; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Dietmar=20Hahn=20<dietmar.hahn@xxxxxxxxxxxxxx> \|Subject:=20Re:=20[Xen-devel]=20Need=20help=20in=20debugg ing=20partially=20blocked=20hypervisor\|Date:=20Tue,=203 =20Nov=202009=2010:03:32=20+0100\|Message-Id:=20<200911031 003.32661.dietmar.hahn@xxxxxxxxxxxxxx>\|To:=20xen-devel@li sts.xensource.com\|Cc:=20"Shan,=20Haitao"=20<haitao.shan@i ntel.com>,=0D=0A=20Keir=20Fraser=20<keir.fraser@xxxxxxxxx .com>\|MIME-Version:=201.0\|Content-Transfer-Encoding:=207b it\|In-Reply-To:=20<61563CE63B4F854986A895DA7AD3C17709DED5 E080@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>\|References:=20<2009103 01320.40125.dietmar.hahn@xxxxxxxxxxxxxx>=20<200911030924. 16374.dietmar.hahn@xxxxxxxxxxxxxx>=20<61563CE63B4F854986A 895DA7AD3C17709DED5E080@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>; bh=4W1UEOXtCua761wb2ZspbPVZ5FbXpeUiwEboMeQ/wik=; b=Ksuyl+2Juf50Y8JMNETqGP1qp/57+bS1h7JzcxXE8hlg8kNEr9VGRIfg fRBSpQghQ6zlJDH9t2n1qnreC44FN9LnWa8OXrSqG6ep+sywXteAPPQ4v OQg+v5W7Gp0fCJS/nInIn7UyhQZ6jXJW23skL8wag58hvuEG4myz+hxtQ lS/0kHdpEPPloKmoYGivgS5QoK33b0U2lJxHWhsseRrGch872f5EL3DTy duYOSc37H3Ll47x7NsXCyPrfnQ2c8;
Domainkey-signature:	s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=ILVfX/ofWhhogyeUGEbMopBmdTVU0KQeNMQI6ZPJoWtnv0tfBZMKINJX 8DAe7gEjl9slgmHNbfypDo3VKzkoPKvGZDtZY0+i3KCXsMabWRDLlwGkb 9s3KTOn6ru4N//AEeTx080vcvH9iuO5ZMpPPYJyOqo6c5Ja7lVsFWh2g4 9KwRvOwcWOMWcnx3YtP72o1TdVJEzma3A5v8oPR2XybPwYDKtQvgFrR43 QViMcl1I8ZoZE3jyFTama9zO/ap0b;
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<61563CE63B4F854986A895DA7AD3C17709DED5E080@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<200910301320.40125.dietmar.hahn@xxxxxxxxxxxxxx> <200911030924.16374.dietmar.hahn@xxxxxxxxxxxxxx> <61563CE63B4F854986A895DA7AD3C17709DED5E080@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	KMail/1.12.2 (Linux/2.6.27.29-0.1-pae; KDE/4.3.1; i686; ; )

> No problem. 
> Can you help to test? I have no test box at hand now, which might cause delay.
> 

Sure :-)
Dietmar.

> Haitao
> 
> 
> Dietmar Hahn wrote:
> >> I suspect the guest will reproduce this PMI loop if guest behaves as
> >> you said in this email. But as far as I know, VTune and oprofile do
> >> not behave like that.  
> >> Of course, this approach is still like workaround (unless I get
> >> comfirm that HW requires to do so). This approach is preferrable
> >> because it does not change the contents of MSRs. Thus, we have no
> >> impact on guest software that does rely on reading the correct value
> >> from HW. Approach 1 existed just because we knew that in event-based
> >> sampling, counter value on receiving PMI was not used by
> >> OProfile/VTune at all and it was safe to set the counter to some
> >> non-zero value.       
> >> 
> >> Haitao
> >> 
> > 
> > OK, then will you send a patch?
> > Dietmar.
> > 
> >> 
> >> Dietmar Hahn wrote:
> >>> Please see below.
> >>> 
> >>>> See my comments embedded. :)
> >>>> 
> >>>> Haitao
> >>>> 
> >>>> 
> >>>> Dietmar Hahn wrote:
> >>>>> The conclusion is, that this seems to be a workaround for the
> >>>>> endless NMI loop. PMI's are a very rarely event and this should
> >>>>> not raise a performance problem.
> >>>> I totally agree that this is only a workaround for approach 1.
> >>>> 
> >>>>> 
> >>>>> I didn't try your second approach
> >>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask
> >>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but I have
> >>>>>> some question. 
> >>>>> 
> >>>>> - What if the 'physical PMI' is not unmasked in vpmu_do_interrupt
> >>>>>   and a watchdog NMI would occur before the domU unmasks it?
> >>>> I think the second NMI will be lost.
> >>>> 
> >>>>> - Is it possible that after handling the NMI (and not unmasking)
> >>>>>   another domU got running on this CPU and therefore PMI's got
> >>>>> lost? 
> >>>> LVTPC entry in physical local APIC is save/restored by Xen on VCPU
> >>>> switches. So unmasking (or not) of PMI of one vcpu should have no
> >>>> impact on another vcpu. When developing vPMU, I treated as vPMU
> >>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU context
> >>>> is save/restored on physical HW when vcpus is scheduled, either in
> >>>> an active save/restore manner or a lazy one (depending on the PMU
> >>>> usage at the time of switch). 
> >>>> 
> >>>>> 
> >>>>> But the real cause of the problem is unknown. As said I saw this
> >>>>> only on Nehalem. Maybe there is a problem together with the
> >>>>> hardware? Perhaps your hardware colleagues know something more ;-)
> >>>> When I found this problem, I just thought it might be a corner case
> >>>> that only happens on my box (of course, I only see this in NHM,
> >>>> too). I will try to pin HW guy to see if any explanation, since it
> >>>> is proven to be a general problem on NHM.
> >>>> 
> >>>> But before everything is clear, I think approach 2 is a better
> >>>> solution now.
> >>> 
> >>> What would be the effect if the guest unmasks the PMI (which leads
> >>> to unmasking the 'physical PMI') but doesn't reset the counter to a
> >>> value != 0? Is the guest able to produce the nmi endless loop?
> >>> 
> >>> Dietmar.
> >>> 
> >>>> 
> >>>>> 
> >>>>> Thanks
> >>>>> Dietmar
> >>>>> 
> >>>>>> 
> >>>>>>> 
> >>>>>>> When I met this problem, I remember that I tried two approaches:
> >>>>>>> 1> Setting the counter to non-zero before unmasking PMI in
> >>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from
> >>>>>>> vpmu_do_interrupt and unmask *physical PMI* when guest vcpu
> >>>>>>> unmasks virtual PMI. 
> >>>>>>> I remember that approach 2 can fix this issue. But I do not
> >>>>>>> remember the result of approach 1, since I met this about one
> >>>>>>> year ago. It is my understanding that approach 2 is quite same
> >>>>>>> as approach 1, since normally guest will set the counter to some
> >>>>>>> negative value (for example, -100000) before unmasking virtual
> >>>>>>> PMI. However, approach 2 looks cleaner and more reasonable.
> >>>>>>> 
> >>>>>>> Can you have a try and let me know the result? If both can not
> >>>>>>> work, there might be some problems that I have not met before.
> >>>>>>> 
> >>>>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before.
> >>>>>>> So, there is no need for me to work on that now. :)
> >>>>>>> 
> >>>>>>> Haitao
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Dietmar Hahn wrote:
> >>>>>>>> Hi Haitao,
> >>>>>>>> 
> >>>>>>>>> Can I know how you enabled vPMU on Nehalem? This is not
> >>>>>>>>> supported in current Xen.
> >>>>>>>> 
> >>>>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> >>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Concerning vpmu support, I totally agree that we can disable
> >>>>>>>>> this feature by default. If anyone really wants to use it, he
> >>>>>>>>> can use boot options to turn it on.
> >>>>>>>> 
> >>>>>>>> Yes, that's OK for me.
> >>>>>>>> 
> >>>>>>>>> I am preparing a patch for that. And I will
> >>>>>>>>> send a patch to enable NHM vpmu together.
> >>>>>>>>> 
> >>>>>>>>> For the problem that Dietmar met, I think I once met this
> >>>>>>>>> before. Can you add some code in vpmu_do_interrupt that sets
> >>>>>>>>> the counter you are using to a value other than zero? Please
> >>>>>>>>> let me know if that can help.
> >>>>>>>> 
> >>>>>>>> I don't set the counter to zero. I use 0-val to set the
> >>>>>>>> counter. Actually I testet on Nehalem with
> >>>>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
> >>>>>>>> val=1100000 
> >>>>>>>> - Fixed counter #1 (0x30a) and val=1100000
> >>>>>>>> The thing is that in normal case the overflows of both counters
> >>>>>>>> appear nearly at the same time. As described I added some extra
> >>>>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code
> >>>>>>>> looks like: 
> >>>>>>>> 
> >>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1.
> >>>>>>>>              Step    { uint32_t HAHN_l, HAHN_h;
> >>>>>>>>              HAHN_l = (uint32_t) msr_content;
> >>>>>>>>              HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>>>              HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. 
> >>>>>>>> Step
> >>>>>>>>         }     if ( !msr_content ) return 0;
> >>>>>>>>     core2_vpmu_cxt->global_ovf_status |= msr_content;
> >>>>>>>>     msr_content = 0xC000000700000000 | ((1 <<
> >>>>>>>>     core2_get_pmc_count()) - 1);
> >>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3.
> >>>>>>>> Step 
> >>>>>>>> 
> >>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4.
> >>>>>>>>         Step         { uint32_t HAHN_l, HAHN_h;
> >>>>>>>>         HAHN_l = (uint32_t) msr_content;
> >>>>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    ->
> >>>>>>>> 5. Step 
> >>>>>>>> 
> >>>>>>>>         rdmsrl(0xc3, msr_content);                        -> 6.
> >>>>>>>>         Step General counter #2 HAHN_l = (uint32_t)
> >>>>>>>>         msr_content; HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> >>>>>>>>         rdmsrl(0x30a, msr_content);                       -> 7.
> >>>>>>>>         Step Fixed counter #1 HAHN_l = (uint32_t) msr_content;
> >>>>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);     }
> >>>>>>>> 
> >>>>>>>> With these tracers I got the following output:
> >>>>>>>> 
> >>>>>>>> Last good NMI:
> >>>>>>>> Both counter cause the NMI. Resetting works OK.
> >>>>>>>> The counter itself were running further.
> >>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]
> >>>>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]
> >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>>>> 
> >>>>>>>> NMI from where things goes wrong:
> >>>>>>>> Both counter cause the NMI. Resetting works NOT correct, only
> >>>>>>>> for the general counter! The general counter (caused the NMI)
> >>>>>>>> seems to be stopped! 
> >>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
> >>>>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
> >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>>>> 
> >>>>>>>> Wrong NMI:
> >>>>>>>> Only the fixed counter causes the NMI (which was not resetted
> >>>>>>>> during NMI handling above!) Both counter seems to be stopped!
> >>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
> >>>>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
> >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>>>> 
> >>>>>>>> And this state remains forever!
> >>>>>>>> I hope my explanations are understandable ;-)
> >>>>>>>> 
> >>>>>>>> Until now I can see this behavior only on a Nehalem processor.
> >>>>>>>> 
> >>>>>>>> Thanks.
> >>>>>>>> Dietmar
> >>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Best Regards
> >>>>>>>>> Shan Haitao
> >>>>>>>>> 
> >>>>>>>>> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
> >>>>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
> >>>>>>>>>> <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> >>>>>>>>>> 
> >>>>>>>>>>> I searched the intel processor spec but couldn't find any
> >>>>>>>>>>> help. So my questions is, what is wrong here?
> >>>>>>>>>>> Can anybody with more knowledge point me in the right
> >>>>>>>>>>> direction, what can I still do to find the real cause of
> >>>>>>>>>>> this? 
> >>>>>>>>>> 
> >>>>>>>>>> You should probably Cc one of the Intel guys who implemented
> >>>>>>>>>> this stuff -- I've added Haitao Shan.
> >>>>>>>>>> 
> >>>>>>>>>> Meanwhile I'd be interested to know whether things work okay
> >>>>>>>>>> for you, minus performance counters and the hypervisor hang,
> >>>>>>>>>> if you return immediately from vpmu_initialise(). Really at
> >>>>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to
> >>>>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to
> >>>>>>>>>> hose the hypervisor like this is of course not on.
> >>>>>>>>>> 
> >>>>>>>>>>  -- Keir
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>> http://lists.xensource.com/xen-devel
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Need help in debugging partially blocked hypervisor