WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] DomU crash during migration when suspending source domai

To: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] DomU crash during migration when suspending source domain
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Wed, 14 Feb 2007 10:36:09 +0000
Delivery-date: Wed, 14 Feb 2007 02:35:30 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C1F89152.1B9E%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdP6h4+HveIAzruQ3+gt7NQNapEGwANqzaeAADJUVA=
Thread-topic: [Xen-devel] DomU crash during migration when suspending source domain
User-agent: Microsoft-Entourage/11.3.3.061214
Your theory that the cpu_down() is happening too early sounds plausible
except that cpu_up/cpu_down are both entirely protected by the hotplug lock.
See their definitions in kernel/cpu.c.

The notifier calls of interest are CPU_ONLINE and CPU_DEAD. These are the
events that the cacheinfo code cares about. You can see that both
notifications are broadcast under the cpu_hotplug_lock, so there should be
no race possible in which a CPU starts to be taken down before all
notification work associated with it coming online has completed.

 -- Keir

On 14/2/07 10:13, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote:

> Is this with a 2.6.16 guest from 3.0.4? This would most likely be a CPU
> hotplug issue in Linux, but we did so lots of testing of that...
> 
>  -- Keir
> 
> On 14/2/07 03:42, "Graham, Simon" <Simon.Graham@xxxxxxxxxxx> wrote:
> 
>> Just run into an odd DomU crash doing live migration of a 4-VCPU domain (with
>> 3.0.4 but the code looks the same in 2.6.18/unstable to me) - the actual
>> panic
>> is attached at the end of this, but the bottom line is that the code in
>> cache_remove_shared_cpu_map (in arch/i385/kernel/cpu/intel_cacheinfo.c) is
>> attempting to clean up the cache info for a processor that does not yet have
>> this info setup - the code is dereferencing a pointer in the cpuid4_info[]
>> array and looking at the dump I can see that this is NULL.
>> 
>> My working theory here is that we attempted the migration waaay early and the
>> initialization of the array of cache info pointers was not setup for all
>> processors yet; it would be relatively easy to protect against this by
>> checking for NULL, but I'm not sure if this is the correct solution or not --
>> if anyone is familiar with this code and can comment on an appropriate fix
>> I'd
>> be grateful.
>> 
>> One thing I'm really not sure about is the timing of marking the CPUs up with
>> respect to the trace re initializing CPUs (see console output below) -- I can
>> see that the four VCPUs are setup in the cpu_sys_devices array (which is
>> setup
>> by the code that outputs the 'Initializing CPU#n' trace) but the array of
>> cache info structures only has an entry for VCPU 0:
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel