WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Need help in debugging partially blocked hypervisor
From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
Date: Tue, 3 Nov 2009 09:24:16 +0100
Cc: "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Tue, 03 Nov 2009 00:24:53 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=dietmar.hahn@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1257236605; x=1288772605; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Dietmar=20Hahn=20<dietmar.hahn@xxxxxxxxxxxxxx> |Subject:=20Re:=20[Xen-devel]=20Need=20help=20in=20debugg ing=20partially=20blocked=20hypervisor|Date:=20Tue,=203 =20Nov=202009=2009:24:16=20+0100|Message-Id:=20<200911030 924.16374.dietmar.hahn@xxxxxxxxxxxxxx>|To:=20xen-devel@li sts.xensource.com|Cc:=20"Shan,=20Haitao"=20<haitao.shan@i ntel.com>,=0D=0A=20Keir=20Fraser=20<keir.fraser@xxxxxxxxx .com>|MIME-Version:=201.0|Content-Transfer-Encoding:=20qu oted-printable|In-Reply-To:=20<61563CE63B4F854986A895DA7A D3C17709DED5E045@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> |References:=20<200910301320.40125.dietmar.hahn@xxxxxxxxx u.com>=20<200911030852.53272.dietmar.hahn@xxxxxxxxxxxxxx> =20<61563CE63B4F854986A895DA7AD3C17709DED5E045@pdsmsx502. ccr.corp.intel.com>; bh=ffDVHnvVZJZviCl3VfX9Yb4mSckltNenMOEQRRuD0fg=; b=rQf/UU3BCWxD9BK0La0i4UlfgykFPanlgu0FihY+EbT/9HGjebOJRNiZ dIhXQpLWcSotgRhiZXLfNqFGBud1JuEkSekbPqGI4x9UuziJimKxNsXc5 iFy8eHR3k4N+HbKNkPC82qMc+qwDeq6ia3iJA1YrMnVM/Z7VPYN2dGaZ2 0NJOIX1bzkK4tRb0TPe/7QYZ1NNQl0FUfiMdol7S6MLZ93j9n4TzKUOWz vMBuWuOnypWPPnoH73Z8u9HJg70tF;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=KDcVd2dgJd2hez9osdxpilFpaJ3qn3Qt/WjwcJmjgcmafKww4BoeHLBs bbyljT13q7oDRlYFX+s7trFOMXGV7H+bsdte/HNLdprAR1ACZlJ8ZqOOc NO62srFXhYqe+mjvZRdhGSqN0EZ5ONT21XXzF/pwXDVC9Nobv7pl3GHF5 5E0pB5EAJlXRd3WqGwsjvk6eZr1hRu8yPA8Hw31WsB4Kvbks41Mz77H5S 5kh8ob7HYakXaNGRFUufNygKDUFgC;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <61563CE63B4F854986A895DA7AD3C17709DED5E045@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <200910301320.40125.dietmar.hahn@xxxxxxxxxxxxxx> <200911030852.53272.dietmar.hahn@xxxxxxxxxxxxxx> <61563CE63B4F854986A895DA7AD3C17709DED5E045@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.12.2 (Linux/2.6.27.29-0.1-pae; KDE/4.3.1; i686; ; )
> I suspect the guest will reproduce this PMI loop if guest behaves as you said 
> in this email. But as far as I know, VTune and oprofile do not behave like 
> that.
> Of course, this approach is still like workaround (unless I get comfirm that 
> HW requires to do so). This approach is preferrable because it does not 
> change the contents of MSRs. Thus, we have no impact on guest software that 
> does rely on reading the correct value from HW. Approach 1 existed just 
> because we knew that in event-based sampling, counter value on receiving PMI 
> was not used by OProfile/VTune at all and it was safe to set the counter to 
> some non-zero value.
> 
> Haitao
>

OK, then will you send a patch? 
Dietmar.
 
> 
> Dietmar Hahn wrote:
> > Please see below.
> > 
> >> See my comments embedded. :)
> >> 
> >> Haitao
> >> 
> >> 
> >> Dietmar Hahn wrote:
> >>> The conclusion is, that this seems to be a workaround for the
> >>> endless NMI loop. PMI's are a very rarely event and this should not
> >>> raise a performance problem.
> >> I totally agree that this is only a workaround for approach 1.
> >> 
> >>> 
> >>> I didn't try your second approach
> >>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical
> >>>> PMI* when guest vcpu unmasks virtual PMI. but I have some question.
> >>> 
> >>> - What if the 'physical PMI' is not unmasked in vpmu_do_interrupt
> >>>   and a watchdog NMI would occur before the domU unmasks it?
> >> I think the second NMI will be lost.
> >> 
> >>> - Is it possible that after handling the NMI (and not unmasking)
> >>>   another domU got running on this CPU and therefore PMI's got lost?
> >> LVTPC entry in physical local APIC is save/restored by Xen on VCPU
> >> switches. So unmasking (or not) of PMI of one vcpu should have no
> >> impact on another vcpu. When developing vPMU, I treated as vPMU
> >> context both PMU MSRs and LVTPC entry in local APIC. vPMU context is
> >> save/restored on physical HW when vcpus is scheduled, either in an
> >> active save/restore manner or a lazy one (depending on the PMU usage
> >> at the time of switch).      
> >> 
> >>> 
> >>> But the real cause of the problem is unknown. As said I saw this
> >>> only on Nehalem. Maybe there is a problem together with the
> >>> hardware? Perhaps your hardware colleagues know something more ;-)
> >> When I found this problem, I just thought it might be a corner case
> >> that only happens on my box (of course, I only see this in NHM,
> >> too).  
> >> I will try to pin HW guy to see if any explanation, since it is
> >> proven to be a general problem on NHM. 
> >> 
> >> But before everything is clear, I think approach 2 is a better
> >> solution now. 
> > 
> > What would be the effect if the guest unmasks the PMI (which leads to
> > unmasking the 'physical PMI') but doesn't reset the counter to a
> > value != 0? Is the guest able to produce the nmi endless loop? 
> > 
> > Dietmar.
> > 
> >> 
> >>> 
> >>> Thanks
> >>> Dietmar
> >>> 
> >>>> 
> >>>>> 
> >>>>> When I met this problem, I remember that I tried two approaches:
> >>>>> 1> Setting the counter to non-zero before unmasking PMI in
> >>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt
> >>>>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI.
> >>>>> I remember that approach 2 can fix this issue. But I do not
> >>>>> remember the result of approach 1, since I met this about one
> >>>>> year ago. It is my understanding that approach 2 is quite same as
> >>>>> approach 1, since normally guest will set the counter to some
> >>>>> negative value (for example, -100000) before unmasking virtual
> >>>>> PMI. 
> >>>>> However, approach 2 looks cleaner and more reasonable.
> >>>>> 
> >>>>> Can you have a try and let me know the result? If both can not
> >>>>> work, there might be some problems that I have not met before.
> >>>>> 
> >>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before.
> >>>>> So, there is no need for me to work on that now. :)
> >>>>> 
> >>>>> Haitao
> >>>>> 
> >>>>> 
> >>>>> Dietmar Hahn wrote:
> >>>>>> Hi Haitao,
> >>>>>> 
> >>>>>>> Can I know how you enabled vPMU on Nehalem? This is not
> >>>>>>> supported in current Xen.
> >>>>>> 
> >>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> >>>>>> 
> >>>>>>> 
> >>>>>>> Concerning vpmu support, I totally agree that we can disable
> >>>>>>> this feature by default. If anyone really wants to use it, he
> >>>>>>> can use boot options to turn it on.
> >>>>>> 
> >>>>>> Yes, that's OK for me.
> >>>>>> 
> >>>>>>> I am preparing a patch for that. And I will
> >>>>>>> send a patch to enable NHM vpmu together.
> >>>>>>> 
> >>>>>>> For the problem that Dietmar met, I think I once met this
> >>>>>>> before. Can you add some code in vpmu_do_interrupt that sets
> >>>>>>> the counter you are using to a value other than zero? Please
> >>>>>>> let me know if that can help.
> >>>>>> 
> >>>>>> I don't set the counter to zero. I use 0-val to set the counter.
> >>>>>> Actually I testet on Nehalem with
> >>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
> >>>>>> val=1100000 
> >>>>>> - Fixed counter #1 (0x30a) and val=1100000
> >>>>>> The thing is that in normal case the overflows of both counters
> >>>>>> appear nearly at the same time. As described I added some extra
> >>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code
> >>>>>> looks like: 
> >>>>>> 
> >>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1.
> >>>>>>                Step    { uint32_t HAHN_l, HAHN_h;
> >>>>>>                HAHN_l = (uint32_t) msr_content;
> >>>>>>                HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>                HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. 
> >>>>>> Step
> >>>>>>         }     if ( !msr_content ) return 0;
> >>>>>>     core2_vpmu_cxt->global_ovf_status |= msr_content;
> >>>>>>     msr_content = 0xC000000700000000 | ((1 <<
> >>>>>>     core2_get_pmc_count()) - 1);
> >>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3. Step
> >>>>>> 
> >>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4.
> >>>>>>         Step   { uint32_t HAHN_l, HAHN_h;
> >>>>>>         HAHN_l = (uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5.
> >>>>>> Step 
> >>>>>> 
> >>>>>>         rdmsrl(0xc3, msr_content);                        -> 6.
> >>>>>>         Step General counter #2 HAHN_l = (uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> >>>>>>         rdmsrl(0x30a, msr_content);                       -> 7.
> >>>>>>         Step Fixed counter #1 HAHN_l = (uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);       }
> >>>>>> 
> >>>>>> With these tracers I got the following output:
> >>>>>> 
> >>>>>> Last good NMI:
> >>>>>> Both counter cause the NMI. Resetting works OK.
> >>>>>> The counter itself were running further.
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> NMI from where things goes wrong:
> >>>>>> Both counter cause the NMI. Resetting works NOT correct, only for
> >>>>>> the general counter! The general counter (caused the NMI) seems
> >>>>>> to be stopped! 
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> Wrong NMI:
> >>>>>> Only the fixed counter causes the NMI (which was not resetted
> >>>>>> during NMI handling above!) Both counter seems to be stopped!
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> And this state remains forever!
> >>>>>> I hope my explanations are understandable ;-)
> >>>>>> 
> >>>>>> Until now I can see this behavior only on a Nehalem processor.
> >>>>>> 
> >>>>>> Thanks.
> >>>>>> Dietmar
> >>>>>> 
> >>>>>>> 
> >>>>>>> Best Regards
> >>>>>>> Shan Haitao
> >>>>>>> 
> >>>>>>> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
> >>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
> >>>>>>>> <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> >>>>>>>> 
> >>>>>>>>> I searched the intel processor spec but couldn't find any
> >>>>>>>>> help. So my questions is, what is wrong here?
> >>>>>>>>> Can anybody with more knowledge point me in the right
> >>>>>>>>> direction, what can I still do to find the real cause of this?
> >>>>>>>> 
> >>>>>>>> You should probably Cc one of the Intel guys who implemented
> >>>>>>>> this stuff -- I've added Haitao Shan.
> >>>>>>>> 
> >>>>>>>> Meanwhile I'd be interested to know whether things work okay
> >>>>>>>> for you, minus performance counters and the hypervisor hang,
> >>>>>>>> if you return immediately from vpmu_initialise(). Really at
> >>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to
> >>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to
> >>>>>>>> hose the hypervisor like this is of course not on.
> >>>>>>>> 
> >>>>>>>>  -- Keir
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
-- 
Dietmar Hahn
TSP ES&S SWE OS                                Telephone: +49 (0) 89 636 40274
Fujitsu Technology Solutions                Email: dietmar.hahn@xxxxxxxxxxxxxx
Otto-Hahn-Ring 6                              Internet:  http://ts.fujitsu.com
D-81739 München                    Company details:ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel