WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine

To: Len Brown <lenb@xxxxxxxxxx>
Subject: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Mon, 14 Nov 2011 09:31:24 -0500
Cc: "linux-acpi@xxxxxxxxxxxxxxx" <linux-acpi@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
Delivery-date: Mon, 14 Nov 2011 06:33:23 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAJvTdK=yAek4sJFXUp=kQSmnuE=HpgefN5Q-GLWhMXo=LhJ-2A@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4EA7DFD1.9060608@xxxxxxxxxxxxx> <CAJvTdK=yAek4sJFXUp=kQSmnuE=HpgefN5Q-GLWhMXo=LhJ-2A@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
Hey Len,

> > The problem I see is that select_idle_routine() is called from
> > arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
> > anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
> > In testing it seem amd_e400_idle in PVM domU at least does not immediately 
> > cause
> > problems, but mwait_idle just causes crashes. From the reports I have
> > this may be related to older hypervisors (3.1 and older) not clearing the 
> > mwait
> > capability. But overall there seems something wrong in the interaction.
> 
> Why is Xen advertising X86_FEATURE_MWAIT and then crashing
> when the dom0 (or other guests) use what it advertises?

The only case where I've seen this is with Amazon EC2. The other
newer hypervisors (4.1.1 and such) do not trigger this.

> 
> What versions of Xen have this bug?

Whatever Amazon is using. I think they are RHEL5 based hypervisor.

> 
> > I am not really sure whether the logic of calling pm_idle() on all errors 
> > from
> > cpuidle_call_idle() is already flawed or the assumption in the Xen patch 
> > about
> > being able to prevent the wrong idle function by turning cpuidle off is 
> > incorrect.
> 
> The patches above appear to be operating as intended.
> What wasn't expected, was that some version of Xen is deployed that
> advertises the MWAIT feature, but crashes when it is used.

How does that work with AMD? On those machines it ends up calling
amd_e400_idle instead of the default_idle. Granted it does not "BUG" out
but it does lead to extra trap-n-emulate (the MSR operation) in to the 
hypervisor
which is not good.

> 
> > One quick fix could be to add some Xen case into select_idle_routine() which
> > picks default_idle...
> 
> No.
> 
> Working around this Xen bug for a newly compiled Dom0 is insufficient.
> 
> All guests that also look for MWAIT support w/o asking ACPI
> (ie. all versions of Linux that use intel_idle, such as the last few
> Fedora's, RHEL, SLES etc.)
> will trip over the same Xen bug, even if Dom0 doesn't.
> 
> Xen must not advertises MWAIT support if it doesn't have MWAIT support.

How does work out when we figure MWAIT support from the CPUID?
Or are you saying that it is correct - if the CPU advertises it, then yes
advertise it to the Linux kernel?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel