Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xe

To:	"Shan, Haitao" <haitao.shan@xxxxxxxxx>, Haitao Shan <maillists.shan@xxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Subject:	Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
From:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date:	Thu, 11 Sep 2008 15:15:14 +0100
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Thu, 11 Sep 2008 07:15:54 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<823A93EED437D048963A3697DB0E35DE01C1EB0A@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AckTRQdBPwPE1uAaTHujwHsOZlG02QAgVtYgAA47ZqAAAGuaQAAF9BQs
Thread-topic:	[Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
User-agent:	Microsoft-Entourage/11.4.0.080122

I applied the patch with the following changes:
 * I rewrote your changes to fixup_irqs(). We should force lazy EOIs *after*
we have serviced any straggling interrupts. Also we should actually clear
the EOI stack so it is empty next time the CPU comes online.
 * I simplified your changes to schedule.c in light of the fact we run in
stop_machine context. Hence we can be quite relaxed about locking, for
example.
 * I removed your change to __csched_vcpu_is_migrateable() and instead put a
similar check in csched_load_balance(). I think this is clearer and also
cheaper.

I note that the VCPU currently running on the offlined CPU continues to run
there even after __cpu_disable(), and until that CPU does a final run
through the scheduler soon after. I hope it does not matter there is one
vcpu with v->processor == offlined_cpu for a short while (e.g., what if
another CPU does vcpu_sleep_nosync(v) -> cpu_raise_softirq(v->processor,
...)). I *think* it's actually okay, but I'm not totally certain. Really I
guess this patch needs some stress testing (lots of online/offline cycles
while pausing/unpausing domains, etc). Perhaps we could plumb through a Xen
sysctl and make a small dom0 utility for this purpose?

 -- Keir

On 11/9/08 12:33, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:

> Thanks!
> Concerning cpu online/offline development, I have a small question here.
> Since cpu_online_map is very important, code in different subsystems may use
> it extensively. If such code is not designed with cpu online/offline in mind,
> it may introduce race conditions, just like the one fixed in cpu calibration
> rendezvous.
> Currently, we solve it in a find-and-fix manner. Do you have any idea that can
> solve the problem in a cleaner way?
> Thanks in advance.
> 
> Shan Haitao 
> 
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: 2008年9月11日 19:13
> To: Shan, Haitao; Haitao Shan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
> 
> It looks much better. I'll read through, maybe tweak, and most likely then
> check it in.
> 
>  Thanks,
>  Keir
> 
> On 11/9/08 09:02, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
> 
>> Hi, Keir,
>> 
>> Attached is the updated patch using the methods as you described in
>> another mail.
>> What do you think of the one?
>> 
>> Signed-off-by: Shan Haitao <haitao.shan@xxxxxxxxx>
>> 
>> Best Regards
>> Haitao Shan
>> 
>> Haitao Shan wrote:
>>> Agree. Placing migration in stop_machine context will definitely make
>>> our jobs easier. I will start making a new patch tomorrow. :)
>>> I place the migraton code outside the stop_machine_run context, partly
>>> because I am not quite sure how long it will take to migrate all the
>>> vcpus away. If it takes too much time, all useful works are blocked
>>> since all cpus are in the stop_machine context. Of course, I borrowed
>>> the ideas from kernel, which also let me made the desicion.
>>> 
>>> 2008/9/10 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
>>>> I feel this is more complicated than it needs to be.
>>>> 
>>>> How about clearing VCPUs from the offlined CPU's runqueue from the
>>>> very end of __cpu_disable()? At that point all other CPUs are safely
>>>> in softirq context with IRQs disabled, and we are running on the
>>>> correct CPU (being offlined). We could have a hook into the
>>>> scheduler subsystem at that point to break affinities, assign to
>>>> different runqueues, etc. We would just need to be careful not to
>>>> try an IPI. :-) This approach would not need a cpu_schedule_map
>>>> (which is really increasing code fragility imo, by creating possible
>>>> extra confusion about which cpumask is the wright one to use in a
>>>> given situation).
>>>> 
>>>> My feeling, unless I've missed something, is that this would make
>>>> the patch quite a bit smaller and with a smaller spread of code
>>>> changes. 
>>>> 
>>>>  -- Keir
>>>> 
>>>> On 9/9/08 09:59, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
>>>> 
>>>>> This patch implements cpu offline feature.
>>>>> 
>>>>> Best Regards
>>>>> Haitao Shan
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen