On 16/04/2010 07:42, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
> Thanks for pointing out the stop_machine_run deadlock issue.
>
> After more consideration and internally discussion, seems the key point is,
> the tasklet softirq is something like getting a lock for the current vcpu's
> state(i.e. no one else could change that state unless this softirq is
> finished).
>
> So any block action in softirq context, not just vcpu_pause_sync,
> is dangerous and should be avoided (we can't get a lock and do block action
> per my understanding).
> This is because vcpu's state can only be changed by schedule softirq (am I
> right on this?), while schedule softirq can't prempt other softirq. So, more
> generally, anything that will be updated by a softirq context, and will be
> syncrhozed/blocking waitied in xen's vcpu context is in fact a implicit lock
> held by the softirq.
I think you're at risk of over-analysing the situation. You cannot safely
synchronously pause vcpus from within softirq context. I'm not sure we can
extrapolate further than that.
> To the tricky bug on the stop_machine_run(), I think it is in fact similar to
> the cpu_add_remove_lock. The stop_machine_run() is a block action, so we must
> make sure no one will be blocking to get the lock that is held by
> stop_machine_run() already. At that time, we change all components that try to
> get the cpu_add_remove_lock to be try_lock.
>
> The changes caused by the tasklet is, a new implicit lock is added, i.e. the
> vcpu's state.
Let me be clear: the deadlock I described in stop_machine_run() is *not*
introduced by the c_h_o_c reimplementation. It has been there all along.
Nothing in my description of the deadlock depended on the implementation of
c_h_o_c: it depended only on the baheviour of stop_machine_run itself, which
does not internally use c_h_o_c.
> The straightforward method is like the cpu_add_remove_lock: make everything
> that waiting for the vcpu state change to do softirq between the checking.
Possible, could have impacts of its own of course. We couldn't do
SCHEDULE_SOFTIRQ as we would lose the caller's context, and the caller could
not hold locks when pausing others (although I suppose they generally can't
anyway).
> Maybe the cleaner way is your previous suggestion, that is, put the
> stop_machine_run() in the idle_vcpu(), another way is, turn back to the
> original method, i.e. do it in the schedule_tail.
Argh. That would be annoying!
Another possibility is to... shudder... introduce a timeout. Wait only e.g.,
1ms for all vcpus to enter the STOPMACHINE_PREPARE state and if they don't,
cancel the operation and return -EBUSY. There are already a bunch of
opportunities to return 'spurious' -EBUSY already in the cpu-offline paths,
so tools already need to know to retry at least a certain number of times.
Undoubtedly this is the dirty solution, but it is easy-ish to implement and
should only be allowing us to avoid an extremely rare deadlock possibility.
It would just be critical to pick a large enough timeout!
> Also We are not sure why the continue_hypercall_on_cpu is changed to use
> tasklet. What's the benifit for it? At least I think this is quite confusing,
> because per our understanding, usually hypercall is assumed to execut in
> current context, while this change break the assumption. So any hypercall that
> may use this _c_h_o_c, and any function called by that hypercall, should be
> aware of this, I'm not sure if this is really so correct, at least it may
> cause trouble if someone use this without realize the limitation. From Juergen
> Gross's mail, it seems for cpupool, but I have no idea of the cpupool :-(
The implementation is simpler and lets us get rid of the complexity around
vcpu affinity logic. There aren't that many users of c_h_o_c and most are
still trivially correct. I don't think the changes to c_h_o_c have
introduced any more deadlocks, apart from the freeze_domains issue (which I
believe I have now fixed).
-- Keir
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|