Robert Kaiser wrote:
thanks for your response!
Am Freitag 19 September 2008 16:36:13 schrieb Daniel Magenheimer:
Is your mini-OS pinned and you're sure dom0 or other
domains are not getting a piece of the pcpu? If so...
Well as I said: It's my own scheduler that decides. I can see that it gets
invoked a few times, but I checked that it always returns the VCPU that is
running the spinning loop. Thus, if everything outside my scheduler plays by
(what I think are) the rules, that VCPU should be the only one to get access
to the PCPU. (Except for interrupt-level activities, of course). So, just
_assuming_ that interrupt processing does not eat up those tens of
milliseconds, where else can they possibly go?
Any hints as to how I could proceed to pinpoint this problem?
Try your minios test domain w/o your own changes to the Xen scheduler. When
you run the test, use a uniprocessor dom0 bound to cpu 0. Bind your minios
test domain to cpu 1. This will verify your test domain code independent from
you Xen scheduler changes. If your test domain is still seeing large time
jumps, verify that the idle vcpu for cpu 1 is not getting any cpu time. If it
is, your test domain is doing something that is causing it to block.
So far I have
debugged my code by running the entire system on Qemu, using its built-in
debug stub. However, anything time-related behaves completely different on
Qemu than on real hardware, so I can't use that setup any more. Presently,
I'm trying to get Xen's built-in GDB stub to work, I wonder if that will be
any better than Qemu. AFAIU, the stub would have to preserve TSC register
contents across breakpoints, otherwise the time coordinate perceived by the
system will jump erratically. Not sure if it does that really, so this may
turn out to be another dead end -- oh well..
I've seen anecdotal evidence of long pauses that led me
to wonder about interrupt latency here:
I don't recall the situation or the length of the pause
but perhaps you are seeing something similar. Unfortunately,
I am seeing situations where two subsequent calls to NOW() in the Mini-OS
context deliver time coordinates that differ by 95(!) milliseconds. If this
were due to interrupt latencies, surely that would have been noticed by
I never pursued the answer to the interrupt latency question.
From: Robert Kaiser [mailto:kaiser@xxxxxxxxxxxxxxxxxxxxxxxxxx]
Sent: Friday, September 19, 2008 6:00 AM
Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs?
I'm currently developing/testing a new scheduler for Xen and
I am seeing some
very strange behaviour which I can't seem to pinpoint: For
purposes, I am running a task inside Mini-OS in a tight,
for some time. The loop repeatedly polls NOW() until it
exceeds a certain
time limit. What I am observing is that NOW() seems to "jump"
subsequent reads return values which differ by tens of
milliseconds! I notice
that my scheduler gets invoked a couple of times, but it does
*not* switch to
another VCPU and I doubt that the scheduler invocations alone
take that long.
So the loop should indeed be contiuously spinning with sporadic
interruptions in the range of a few microseconds, but not tens of
milliseconds. Yet, this is not what I am seeing. I wonder
where the (P)CPU
goes during those time intervals and so this possibly weird
idea came up that
Xen might use some trickery trying to detect and pause
Is there anything like that in Xen (BTW: This is xen-3.2.1) ,
and, if there
is, can it be disabled for a given domain?
(Sorry if this is a silly question. Since my code is
experimental and not well
tested yet, there is of course the possibility that I made
mistake. However, I've been staring at code, debug logs, etc.
days now without much success and I am slowly getting
desperate. If Xen
really does pause spinning VCPUs it would explain everything.)
Thanks for any help
Labor für Verteilte Systeme
FH Wiesbaden - University of Applied Sciences tel:
Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax:
Xen-devel mailing list
Xen-devel mailing list