WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Kernel 2.6.39+ hangs when running as HVM guest under Xen

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] Kernel 2.6.39+ hangs when running as HVM guest under Xen
From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
Date: Tue, 09 Aug 2011 09:54:25 -0500
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Bug 791850 <791850@xxxxxxxxxxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Delivery-date: Tue, 09 Aug 2011 07:55:09 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110809023848.GB13905@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4E3A9799.50503@xxxxxxxxxxxxx> <20110809023848.GB13905@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11
On 08.08.2011 21:38, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 04, 2011 at 02:59:05PM +0200, Stefan Bader wrote:
>> Since kernel 2.6.39 we were experiencing strange hangs when booting those as 
>> HVM
>> guests in Xen (similar hangs but different places when looking at CentOS 5.4 
>> +
>> Xen 3.4.3 as well as Xen 4.1 and a 3.0 based dom0). The problem only happens
>> when running with more than one vcpu.
>>
> 
> Hey Stefan,
> 
> We were all at the XenSummit and I think did not get to think about this at 
> all.
> Also the merge window openned so that ate a good chunk of time. Anyhow..
>

Ah, right. Know the feeling. :) I am travelling this week, too.

> Is this related to this: 
> http://marc.info/?i=4E4070B4.1020008@xxxxxxxxxxxxxxxxxxxxxx ?
>

On a quick glance it seems to be different. What I was looking at was dom0
setups which worked for HVM guests up to kernel 2.6.38. And locked up at some
point when a guest kernel after that was started in SMP mode.

>> I was able to examine some dumps[1] and it always seemed to be a weird
>> situations. In one example (booting 3.0 HVM under Xen 3.4.3/2.6.18 dom0) the
>> lockup always seemed to occur when the delayed mtrr init took place. Cpu#0
>> seemed to have been starting the rendevouz (stop_cpu) but then been 
>> interrupted
>> and the other (I was using vcpu=2 for simplicity) was idling somewhere else 
>> but
>> had the mtrr
>> rendevouz handler queued up (just seemed to never get started).
>>
>> Things seemed to indicate some IPI problem but to be sure I went to bisect 
>> when
>> the problem started. I ended up with the following patch which, when 
>> reverted,
>> allows me to bring up a 3.0 HVM guest with more than one CPU without any 
>> problems.
>>
>> commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
>> Author: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>> Date:   Thu Dec 2 17:55:10 2010 +0000
>>
>>     xen: PV on HVM: support PV spinlocks and IPIs
>>
>>     Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus
>>     (that switch to APIC mode and initialize APIC routing); on secondary
>>     CPUs on CPU_UP_PREPARE.
>>
>>     Enable the usage of event channels to send and receive IPIs when
>>     running as a PV on HVM guest.
>>
>> Though I have not yet really understood why exactly this happens, I thought I
>> post the results so far. It feels like either signalling an IPI through the
>> eventchannel does not come through or goes to the wrong CPU. It did not seem 
>> to
>> cause the exactly same place to fail. Like said, the 3.0 guest running in the
>> CentOS dom0 was locking up early right after all CPUs were brought up. While
>> during the bisect (using a kernel between 2.6.38 and .39-rc1) the lockup was 
>> later.
>>
>> Maybe someone has a clue immediately. I will dig a bit deeper in the dumps in
>> the meantime. Looking at the description, which sounds like using event 
>> channels
> 
> Anything turned up?

>From the data structures everything seems to be set up correctly.

>> only was intended for PV on HVM guests, it is wrong in the first place to set
>> the xen ipi functions on the HVM side...
> 
> On true HVM - sure, but on PVonHVM it sounds right.

Though exactly that seems to be what is happening. So I am looking at the guest
which is started as a HVM guest and the patch is modifying ipi delivery to be
tried as hypervisor calls instead of using the native apic method.

>>
>> -Stefan
>>
>> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/791850
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel