WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Need help in debugging partially blocked hypervisor
From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
Date: Fri, 30 Oct 2009 13:20:39 +0100
Delivery-date: Fri, 30 Oct 2009 05:21:09 -0700
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=dietmar.hahn@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1256905255; x=1288441255; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Dietmar=20Hahn=20<dietmar.hahn@xxxxxxxxxxxxxx> |Subject:=20Re:=20[Xen-devel]=20Need=20help=20in=20debugg ing=20partially=20blocked=20hypervisor|Date:=20Fri,=2030 =20Oct=202009=2013:20:39=20+0100|Message-Id:=20<200910301 320.40125.dietmar.hahn@xxxxxxxxxxxxxx>|To:=20xen-devel@li sts.xensource.com|MIME-Version:=201.0 |Content-Transfer-Encoding:=207bit|In-Reply-To:=20<200910 211507.05738.dietmar.hahn@xxxxxxxxxxxxxx>|References:=20< 200910211507.05738.dietmar.hahn@xxxxxxxxxxxxxx>; bh=17XefLHM03KwnVR6zMTpVHE8pX5yM71G62jK5YLczoQ=; b=lKNvCY5c7UmhFnKwRQbYXoP/ibVVTpxurs80VSYhAlH77KQwQ6Iz/f9N uRlxH9CfbmIgadsKff+sylLy0l2xbtOwfeMhPeLNYfu+JkkTWt1r3qvA7 cqpCJf2uyv+meuihNMIhQvXdIGHCgZQkJA6ptmLn+8S6UzVn1Wt9b4DeI 9TGrl+q35gAOoPblTrGZSrfLGXLETOqQbRjw4erc0p4lO3FiCIQ7ZyNUb CRpjjiHLjG0KAQXxj+xISQLxdrwbH;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=n/ifMltTFeGi00nPMSNvDQTQnj1vXxH/lmT5SF8vl/ZGLesHPyeu/mgu ylxaSrorbLjjUz2r6K1/O5NnZkUQMLANO71QXFPw+1bDgWhDX1y9qAwHR hwuR8fwpIVWUTzaT81FCDc7uBLtTqRu2VYwZWNEINds8lFxC+f04esjZa iKXIri622WlIRFRncKU5DwR9mcmgHc07KXKovHHNslCF5ktfQHYLN4MhJ BrbImiGmbHNADtHhMLPlwLOAkPY3Q;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <200910211507.05738.dietmar.hahn@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <200910211507.05738.dietmar.hahn@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.12.2 (Linux/2.6.27.29-0.1-pae; KDE/4.3.1; i686; ; )
Hi,
 
> I need some help in debugging a strange hypervisor behavior together
> with using fully virtualized performance counters.
> 

I added some own tracer to xentrace to find, what the CPU is doing.
No I can see, that in the strange case the CPU is doing endless (and nothing
else!) performance counter NMI's within the hypervisor.

pmu_apic_interrupt
  smp_pmu_apic_interrupt
    vmx_do_pmu_interrupt
      vpmu_do_interrupt

In the normal case in core2_vpmu_do_interrupt:
            1. Read the cause of the nmi
        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
        ...
            2. Save the value for the domU
        ...
            3. Reset the cause
        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
            4. Inject NMI in domU

This works very well for a short time.
Then the hypervisor falls in the endless nmi loop. The cause for this seems
to be that "3. Reset the cause" doesn't work anymore. Means writing to the
MSR_CORE_PERF_GLOBAL_OVF_CTRL doesn't reset the MSR_CORE_PERF_GLOBAL_STATUS
which leads to the next nmi immediately.
I found this by adding another tracer which reads the 
MSR_CORE_PERF_GLOBAL_STATUS
once again after writing the MSR_CORE_PERF_GLOBAL_OVF_CTRL.
In the normal case this contains now 0, in the strange case the value is 
unchanged!

I searched the intel processor spec but couldn't find any help.
So my questions is, what is wrong here?
Can anybody with more knowledge point me in the right direction, what can I 
still
do to find the real cause of this?

Many thanks in advance!
Dietmar.

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel