Re: [Xen-devel] DomU crash during migration when suspending sour

To:	Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>, "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] DomU crash during migration when suspending source domain
From:	Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date:	Wed, 14 Feb 2007 10:48:55 +0000
Delivery-date:	Wed, 14 Feb 2007 02:48:18 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<C1F89699.1E30%Keir.Fraser@xxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcdP6h4+HveIAzruQ3+gt7NQNapEGwANqzaeAADJUVAAAHIl2w==
Thread-topic:	[Xen-devel] DomU crash during migration when suspending source domain
User-agent:	Microsoft-Entourage/11.3.3.061214

Are you migrating between unlike boxes? My guess is that the original box
has processors supporting cacheinfo cpuid leaves and the target box does
not. Migrating to older less-capable CPUs is definitely hit-and-miss I'm
afraid. It really is best not to do it!

 -- Keir

On 14/2/07 10:36, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote:

> Your theory that the cpu_down() is happening too early sounds plausible
> except that cpu_up/cpu_down are both entirely protected by the hotplug lock.
> See their definitions in kernel/cpu.c.
> 
> The notifier calls of interest are CPU_ONLINE and CPU_DEAD. These are the
> events that the cacheinfo code cares about. You can see that both
> notifications are broadcast under the cpu_hotplug_lock, so there should be
> no race possible in which a CPU starts to be taken down before all
> notification work associated with it coming online has completed.
> 
>  -- Keir
> 
> On 14/2/07 10:13, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote:
> 
>> Is this with a 2.6.16 guest from 3.0.4? This would most likely be a CPU
>> hotplug issue in Linux, but we did so lots of testing of that...
>> 
>>  -- Keir
>> 
>> On 14/2/07 03:42, "Graham, Simon" <Simon.Graham@xxxxxxxxxxx> wrote:
>> 
>>> Just run into an odd DomU crash doing live migration of a 4-VCPU domain
>>> (with
>>> 3.0.4 but the code looks the same in 2.6.18/unstable to me) - the actual
>>> panic
>>> is attached at the end of this, but the bottom line is that the code in
>>> cache_remove_shared_cpu_map (in arch/i385/kernel/cpu/intel_cacheinfo.c) is
>>> attempting to clean up the cache info for a processor that does not yet have
>>> this info setup - the code is dereferencing a pointer in the cpuid4_info[]
>>> array and looking at the dump I can see that this is NULL.
>>> 
>>> My working theory here is that we attempted the migration waaay early and
>>> the
>>> initialization of the array of cache info pointers was not setup for all
>>> processors yet; it would be relatively easy to protect against this by
>>> checking for NULL, but I'm not sure if this is the correct solution or not
>>> --
>>> if anyone is familiar with this code and can comment on an appropriate fix
>>> I'd
>>> be grateful.
>>> 
>>> One thing I'm really not sure about is the timing of marking the CPUs up
>>> with
>>> respect to the trace re initializing CPUs (see console output below) -- I
>>> can
>>> see that the four VCPUs are setup in the cpu_sys_devices array (which is
>>> setup
>>> by the code that outputs the 'Initializing CPU#n' trace) but the array of
>>> cache info structures only has an entry for VCPU 0:
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] DomU crash during migration when suspending source domai