This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration

>>> On 15.03.11 at 10:21, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote:
> On 03/15/11 10:01, Jan Beulich wrote:
>>>>> On 15.03.11 at 09:46, Juergen Gross<juergen.gross@xxxxxxxxxxxxxx>  wrote:
>>> On 03/15/11 08:57, Jan Beulich wrote:
>>>>>>> On 15.03.11 at 06:50, Juergen Gross<juergen.gross@xxxxxxxxxxxxxx>   
>>>>>>> wrote:
>>>>> On 03/14/11 16:03, Jan Beulich wrote:
>>>>>>>>> On 14.03.11 at 15:39, Juergen Gross<juergen.gross@xxxxxxxxxxxxxx>    
>>>>>>>>> wrote:
>>>>>>> On multi-thread multi-core systems an endless loop can occur in 
>>>>>>> vcpu_migrate()
>>>>>>> with credit scheduler. Avoid this loop by changing the interface of 
>>>>>>> pick_cpu
>>>>>>> to indicate a repeated call in this case.
>>>>>> But you're not changing in any way the loop that doesn't get
>>>>>> exited - did you perhaps read my original description as the
>>>>>> pick function itself looping (which - afaict - it doesn't)?
>>>>> I'm changing the way the pick_cpu function is reacting on multiple calls 
>>>>> in
>>>>> a loop. If I've understood the idle_bias correctly, updating it in each
>>>>> loop iteration did result in returning another cpu for each call.
>>>>> By updating idle_bias only once, it should return the same cpu in 
>>>>> subsequent
>>>>> calls. This should exit the loop in vcpu_migrate.
>>>> You're only decreasing the likelihood of a live lock, as the return
>>>> value of pick_cpu not only depends on idle_bias.
>>> Hmm, then another solution would be to let pick_cpu really return the
>>> proposed cpu from the first iteration, if it doesn't contradict the
>>> allowed settings. It could be sub-optimal, but I don't think this is
>>> critical, as vcpu_migrate is called rarely.
>>> Patch attached.
>> That candidate-is-valid check seems absolutely independent of the
>> particular scheduler used, and hence could be done in the (sole)
>> caller, thus not requiring any change to the scheduler interface.
>> Which at once would eliminate unnecessary calls into pick_cpu (i.e.
>> you'd call it a second time only if the previously selected CPU really
>> is no longer valid to be used for that vCPU).
> True.
> The patch seems to become smaller :-)

This looks good to me now, and it makes quite obvious that there
is a likely exit path from the loop (it can only live lock now if
v->cpu_affinity and/or v->domain->cpupool->cpu_valid are
constantly changing, which could only be due to a misbehaving


Xen-devel mailing list