My sense is that:
* Pinning N vcpus to N-M pcpus (where M is a significant fraction of
N) is just a really bad idea; it would be better just not to do that.
It would be ideal if somehow when dom0's cpu pool shrinks, it
automatically offlines an appropriate number of vcpus; but it
shouldn't be difficult for an administrator to do that themselves.
* On average, a vcpu shouldn't have to wait more than 60ms or so for
an interrupt. It seems like there's a non-negligible possibility that
there's some kind of bug in the interrupt delivery and handling,
either on the Xen side or the Linux side (or as Jan pointed out, a bug
in the driver). In that case, doing something in the scheduler isn't
actually fixing the problem, it's just making it less likely to
happen. (NB that we've had intermittent failures in the xen.org
testing infrastructure with what looks like might be missed interrupts
as well -- and those weren't on heavily loaded boxes.)
* Even if it is ultimately a scheduler bug, understanding exactly what
the scheduler is doing and why is key to making a proper fix. It's
possible that there's just a simple quirk in the algorithm, such that
a general fix will make everything work better without needing to
introduce a special case for hardware interrupts.
* I'm not opposed in principle to a mechanism which will prioritize
vcpus awaiting hardware interrupts. But I am wary of guessing what
the problem is and then introducing a patch without proper root-cause
analysis. Even if it seems to fix the immediate problem, it may
simply be masking the real problem, and may also cause problems of its
own. Behavior of the scheduler is hard enough to understand already,
and every special case makes it even harder.
So to conclude: I think the first answer to someone with this problem
should be, "Make sure that V<=P", where P is the number of physical
cpus a VM can be scheduled on and V is the number of virtual cpus. If
there are still problems, then we need to find out how it is that
interrupts come to be missing before attempting a fix.
-George
On Mon, Feb 14, 2011 at 9:58 AM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
>>>> On 14.02.11 at 10:38, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote:
>> On 02/14/11 10:26, Jan Beulich wrote:
>>>>>> On 14.02.11 at 07:59, Juergen Gross<juergen.gross@xxxxxxxxxxxxxx> wrote:
>>>> I used xen-unstable, kernel 2.6.32.24 from SLES11 SP1 on a 12 core INTEL
>>>> nehalem machine. I pinned all 12 Dom0 vcpus to pcpu 1-2 and started a
>>>> parallel
>>>> build. After about 2 minutes the first missing interrupts were reported, a
>>>> little bit later the next one, no xen messages are printed:
>>>
>>> That's certainly not too surprising, somewhat depending on the
>>> maximally tolerated latencies. It seems unlikely to me for a 6-fold
>>> CPU over-commit to promise stable operation, yet certain
>>> adjustments could probably be done to make it work better (like
>>> temporarily boosting the priority of a hardware interrupt's target
>>> vCPU).
>>
>> I would understand timeouts. But shouldn't the interrupt come in sooner or
>> later? At least the megasas driver seems not to be able to recover from this
>> problem, as a result my root filesystem is set to read-only...
>
> I'm sure these interrupts arrive eventually, but the driver not
> seeing them within an expected time window may still make it
> report them as "lost".
>
>> This would mean there is a problem in the megasas driver, correct?
>> And Andre reports stability problems of his machine in similar cases, but
>> in his case the network driver seems to be the reason.
>
> Yes, this certainly depends on how the driver is implemented.
>
>> Are you planning to prepare a patch for boosting the priority of vcpus being
>> the target for a hardware interrupt? I think I would have to search some
>> time
>> to find the correct places to change...
>
> So far I had no plan to do so, and I too would have to do some
> looking around. Nor am I convinced everyone would appreciate
> such fiddling with priorities - I was merely suggesting that might
> be one route to go. George?
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|