Concerning the last running vcpu on the dying cpu, I have some thought.
Yes, there would be a short time after the stop_machine_run when this vcpu
v->processor == dying_cpu. But anyhow, we set fie __VPF_migrating flag for that
vcpu and issued a schedule_softirq on the dying cpu.
This softirq should run immediately after stop_machine context, am I right? If
so, by the time the schedule softirq is executed, this last vcpu is migrated
away from this dying cpu. But saving of its context will be delayed to
If another cpu issues the schedule request to this dying cpu
(vcpu_sleep_nosync->cpu_raise_softirq(vc->processor....)) during this time, the
request will be serviced by the above code sequence. So it is safe in such
Am I missing something important? I am not quite confident on the statements,
From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
Sent: 2008年9月11日 22:15
To: Shan, Haitao; Haitao Shan; Tian, Kevin
Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
I applied the patch with the following changes:
* I rewrote your changes to fixup_irqs(). We should force lazy EOIs *after*
we have serviced any straggling interrupts. Also we should actually clear
the EOI stack so it is empty next time the CPU comes online.
* I simplified your changes to schedule.c in light of the fact we run in
stop_machine context. Hence we can be quite relaxed about locking, for
* I removed your change to __csched_vcpu_is_migrateable() and instead put a
similar check in csched_load_balance(). I think this is clearer and also
I note that the VCPU currently running on the offlined CPU continues to run
there even after __cpu_disable(), and until that CPU does a final run
through the scheduler soon after. I hope it does not matter there is one
vcpu with v->processor == offlined_cpu for a short while (e.g., what if
another CPU does vcpu_sleep_nosync(v) -> cpu_raise_softirq(v->processor,
...)). I *think* it's actually okay, but I'm not totally certain. Really I
guess this patch needs some stress testing (lots of online/offline cycles
while pausing/unpausing domains, etc). Perhaps we could plumb through a Xen
sysctl and make a small dom0 utility for this purpose?
On 11/9/08 12:33, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
> Concerning cpu online/offline development, I have a small question here.
> Since cpu_online_map is very important, code in different subsystems may use
> it extensively. If such code is not designed with cpu online/offline in mind,
> it may introduce race conditions, just like the one fixed in cpu calibration
> Currently, we solve it in a find-and-fix manner. Do you have any idea that can
> solve the problem in a cleaner way?
> Thanks in advance.
> Shan Haitao
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: 2008年9月11日 19:13
> To: Shan, Haitao; Haitao Shan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
> It looks much better. I'll read through, maybe tweak, and most likely then
> check it in.
> On 11/9/08 09:02, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
>> Hi, Keir,
>> Attached is the updated patch using the methods as you described in
>> another mail.
>> What do you think of the one?
>> Signed-off-by: Shan Haitao <haitao.shan@xxxxxxxxx>
>> Best Regards
>> Haitao Shan
>> Haitao Shan wrote:
>>> Agree. Placing migration in stop_machine context will definitely make
>>> our jobs easier. I will start making a new patch tomorrow. :)
>>> I place the migraton code outside the stop_machine_run context, partly
>>> because I am not quite sure how long it will take to migrate all the
>>> vcpus away. If it takes too much time, all useful works are blocked
>>> since all cpus are in the stop_machine context. Of course, I borrowed
>>> the ideas from kernel, which also let me made the desicion.
>>> 2008/9/10 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
>>>> I feel this is more complicated than it needs to be.
>>>> How about clearing VCPUs from the offlined CPU's runqueue from the
>>>> very end of __cpu_disable()? At that point all other CPUs are safely
>>>> in softirq context with IRQs disabled, and we are running on the
>>>> correct CPU (being offlined). We could have a hook into the
>>>> scheduler subsystem at that point to break affinities, assign to
>>>> different runqueues, etc. We would just need to be careful not to
>>>> try an IPI. :-) This approach would not need a cpu_schedule_map
>>>> (which is really increasing code fragility imo, by creating possible
>>>> extra confusion about which cpumask is the wright one to use in a
>>>> given situation).
>>>> My feeling, unless I've missed something, is that this would make
>>>> the patch quite a bit smaller and with a smaller spread of code
>>>> -- Keir
>>>> On 9/9/08 09:59, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
>>>>> This patch implements cpu offline feature.
>>>>> Best Regards
>>>>> Haitao Shan
>>>> Xen-devel mailing list
Xen-devel mailing list