|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] DomU crash during migration when suspending source domai
Your theory that the cpu_down() is happening too early sounds plausible
except that cpu_up/cpu_down are both entirely protected by the hotplug lock.
See their definitions in kernel/cpu.c.
The notifier calls of interest are CPU_ONLINE and CPU_DEAD. These are the
events that the cacheinfo code cares about. You can see that both
notifications are broadcast under the cpu_hotplug_lock, so there should be
no race possible in which a CPU starts to be taken down before all
notification work associated with it coming online has completed.
-- Keir
On 14/2/07 10:13, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote:
> Is this with a 2.6.16 guest from 3.0.4? This would most likely be a CPU
> hotplug issue in Linux, but we did so lots of testing of that...
>
> -- Keir
>
> On 14/2/07 03:42, "Graham, Simon" <Simon.Graham@xxxxxxxxxxx> wrote:
>
>> Just run into an odd DomU crash doing live migration of a 4-VCPU domain (with
>> 3.0.4 but the code looks the same in 2.6.18/unstable to me) - the actual
>> panic
>> is attached at the end of this, but the bottom line is that the code in
>> cache_remove_shared_cpu_map (in arch/i385/kernel/cpu/intel_cacheinfo.c) is
>> attempting to clean up the cache info for a processor that does not yet have
>> this info setup - the code is dereferencing a pointer in the cpuid4_info[]
>> array and looking at the dump I can see that this is NULL.
>>
>> My working theory here is that we attempted the migration waaay early and the
>> initialization of the array of cache info pointers was not setup for all
>> processors yet; it would be relatively easy to protect against this by
>> checking for NULL, but I'm not sure if this is the correct solution or not --
>> if anyone is familiar with this code and can comment on an appropriate fix
>> I'd
>> be grateful.
>>
>> One thing I'm really not sure about is the timing of marking the CPUs up with
>> respect to the trace re initializing CPUs (see console output below) -- I can
>> see that the four VCPUs are setup in the cpu_sys_devices array (which is
>> setup
>> by the code that outputs the 'Initializing CPU#n' trace) but the array of
>> cache info structures only has an entry for VCPU 0:
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|