What does NOW() translate to in your mini-OS? Is it built
on a TSC read or on top of the paravirtualized time parameters
exported by Xen via the shared page?
Once Xen is entered (for scheduling or anything else), Xen
can choose to do anything it wants. It may do some housekeeping
tasks for example (for example, there's code to zero out
pages from a destroyed domain that gets executed asynchonously).
But I'd think 95msec is excessive, so if you are absolutely
certain Xen is not scheduling another domain on that physical
CPU during that timeslice, nor moving the mini-OS to another
physical CPU, this is very worthwhile to track down.
> surely that would have been noticed by someone?
Perhaps not, because most real environments have lots of
scheduling events that could cause gaps like that.
One possible thought: If you turn on tracing (xentrace) and
can isolate the suspect interval in the trace, you might
get some clues as to what is happening.
Last, Mukesh Rathor has some new gdb technology working with
Xen though, as you've already discovered, time and debugging
don't mix well. See Mukesh's talk at the last Xen summit
for more info.
> -----Original Message-----
> From: Robert Kaiser [mailto:kaiser@xxxxxxxxxxxxxxxxxxxxxxxxxx]
> Sent: Friday, September 19, 2008 9:47 AM
> To: Daniel Magenheimer
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Does Xen detect busy-spinning VCPUs?
> thanks for your response!
> Am Freitag 19 September 2008 16:36:13 schrieb Daniel Magenheimer:
> > Is your mini-OS pinned and you're sure dom0 or other
> > domains are not getting a piece of the pcpu? If so...
> Well as I said: It's my own scheduler that decides. I can see
> that it gets
> invoked a few times, but I checked that it always returns the
> VCPU that is
> running the spinning loop. Thus, if everything outside my
> scheduler plays by
> (what I think are) the rules, that VCPU should be the only
> one to get access
> to the PCPU. (Except for interrupt-level activities, of
> course). So, just
> _assuming_ that interrupt processing does not eat up those tens of
> milliseconds, where else can they possibly go?
> Any hints as to how I could proceed to pinpoint this problem?
> So far I have
> debugged my code by running the entire system on Qemu, using
> its built-in
> debug stub. However, anything time-related behaves completely
> different on
> Qemu than on real hardware, so I can't use that setup any
> more. Presently,
> I'm trying to get Xen's built-in GDB stub to work, I wonder
> if that will be
> any better than Qemu. AFAIU, the stub would have to preserve
> TSC register
> contents across breakpoints, otherwise the time coordinate
> perceived by the
> system will jump erratically. Not sure if it does that
> really, so this may
> turn out to be another dead end -- oh well..
> > I've seen anecdotal evidence of long pauses that led me
> > to wonder about interrupt latency here:
> > I don't recall the situation or the length of the pause
> > but perhaps you are seeing something similar. Unfortunately,
> I am seeing situations where two subsequent calls to NOW() in
> the Mini-OS
> context deliver time coordinates that differ by 95(!)
> milliseconds. If this
> were due to interrupt latencies, surely that would have been
> noticed by
> > I never pursued the answer to the interrupt latency question.
> > > -----Original Message-----
> > > From: Robert Kaiser [mailto:kaiser@xxxxxxxxxxxxxxxxxxxxxxxxxx]
> > > Sent: Friday, September 19, 2008 6:00 AM
> > > To: xen-devel@xxxxxxxxxxxxxxxxxxx
> > > Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs?
> > >
> > >
> > > Hi all,
> > >
> > > I'm currently developing/testing a new scheduler for Xen and
> > > I am seeing some
> > > very strange behaviour which I can't seem to pinpoint: For
> > > benchmarking
> > > purposes, I am running a task inside Mini-OS in a tight,
> > > busy-spinning loop
> > > for some time. The loop repeatedly polls NOW() until it
> > > exceeds a certain
> > > time limit. What I am observing is that NOW() seems to "jump"
> > > sometimes: two
> > > subsequent reads return values which differ by tens of
> > > milliseconds! I notice
> > > that my scheduler gets invoked a couple of times, but it does
> > > *not* switch to
> > > another VCPU and I doubt that the scheduler invocations alone
> > > take that long.
> > > So the loop should indeed be contiuously spinning with sporadic
> > > interruptions in the range of a few microseconds, but not tens of
> > > milliseconds. Yet, this is not what I am seeing. I wonder
> > > where the (P)CPU
> > > goes during those time intervals and so this possibly weird
> > > idea came up that
> > > Xen might use some trickery trying to detect and pause
> > > busy-spinning VCPUs.
> > > Is there anything like that in Xen (BTW: This is xen-3.2.1) ,
> > > and, if there
> > > is, can it be disabled for a given domain?
> > >
> > > (Sorry if this is a silly question. Since my code is
> > > experimental and not well
> > > tested yet, there is of course the possibility that I made
> > > some stupid
> > > mistake. However, I've been staring at code, debug logs, etc.
> > > for several
> > > days now without much success and I am slowly getting
> > > desperate. If Xen
> > > really does pause spinning VCPUs it would explain everything.)
> > >
> > > Thanks for any help
> > >
> > > Rob
> > >
> > > --
> > > Robert Kaiser
> > > http://wwwvs.informatik.fh-wiesbaden.de
> > > Labor für Verteilte Systeme
> > > kaiser@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > FH Wiesbaden - University of Applied Sciences tel:
> > > (+49)611-9495-1294
> > > Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax:
> > > (+49)611-9495-1289
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-devel
> Robert Kaiser
> Labor für Verteilte Systeme
> FH Wiesbaden - University of Applied Sciences tel:
> Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax:
> Xen-devel mailing list
Xen-devel mailing list