Hi, Keir, recently a bug is reported for the alloc vmcs page for the new added
CPU. The reason is because when we try to allocate vmcs page, if we need flush
TLB for allocated page, we will hit ASSERT in flush_area_mask in following code.
So my question are:
a) Why do we need make sure the irq is enabeld in flush_area_mask? I assume it
is because this function may takes a long time waiting for all CPU flush TLB,
am I right?
b) Can we still turn to the original patch, i.e. pre-allocate all VMCS pages
for all possible CPU?
Thanks
Yunhong Jiang
void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
{
ASSERT(local_irq_is_enabled());
if ( cpu_isset(smp_processor_id(), *mask) )
flush_area_local(va, flags);
The trace for the error:.
----------------------
(XEN) Xen call trace:
(XEN) [<ffff82c48016d40b>] flush_area_mask+0x2c/0x140
(XEN) [<ffff82c480114da7>] alloc_heap_pages+0x44f/0x493
(XEN) [<ffff82c48011642b>] alloc_domheap_pages+0x115/0x186
(XEN) [<ffff82c4801164ec>] alloc_xenheap_pages+0x50/0xd8
(XEN) [<ffff82c4801b3b4f>] vmx_alloc_vmcs+0x18/0x65
(XEN) [<ffff82c4801b45e9>] vmx_cpu_up+0x4d3/0x72a
(XEN) [<ffff82c4801b83fc>] start_vmx+0x86/0x144
(XEN) [<ffff82c480191703>] init_intel+0x2a2/0x2ff
(XEN) [<ffff82c480191094>] identify_cpu+0xc3/0x23d
(XEN) [<ffff82c48016d7b4>] smp_store_cpu_info+0x3b/0xc1
(XEN) [<ffff82c48016d93b>] smp_callin+0x101/0x227
(XEN) [<ffff82c48016ec85>] start_secondary+0xb3/0x43f
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 56:
(XEN) Assertion 'local_irq_is_enabled()' failed at smp.c:223
>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>Sent: Thursday, November 12, 2009 7:23 PM
>To: Jiang, Yunhong; xen-devel@xxxxxxxxxxxxxxxxxxx
>Subject: Re: [PATCH] Allocate vmcs pages when system booting
>
>On 12/11/2009 10:52, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>
>> Currently the VMCS page is allocated when the CPU is brought-up and
>> identified, and then spin_debug_enable() is called.
>>
>> This does not work for cpu hot-add. When hot-added CPU is brought-up and try
>> to allocate the vmcs page, it will hit check_lock, because irq is disabled
>> still.
>>
>> This patch allocate the vmcs pages for all possible pages when system
>> booting,
>> so that we don't allocate vmcs anymore when secondary CPU is up.
>>
>> Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx>
>
>A general point: using cpu_possible_map is not a good idea any longer, since
>it is going to be all-ones as soon as your physical cpu hotplug patches go
>in (I don't intend to make that a build-time option). Hence we could
>potentially massively over-allocate pages with this approach.
>
>The good news is that, when bringing a CPU online, we don't need to worry
>about acquiring IRQ-unsafe locks with IRQS disabled until the CPU is added
>to the cpu_online_map. This is because the race we are trying to avoid with
>the spin-debug checks involves rendezvous of CPUs via IPIs, and a CPU not in
>cpu_online_map will never get waited on by others.
>
>So, your better fix would be to spin_debug_disable() at the top of
>start_secondary(), and spin_debug_enable() just before
>cpu_set(...online_map).
>
>Can you try this alternative fix please?
>
> -- Keir
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|