WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ia64-devel

[Xen-ia64-devel] RE: Timer merge

To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>, <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-ia64-devel] RE: Timer merge
From: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@xxxxxx>
Date: Thu, 25 Aug 2005 07:14:42 -0700
Cc: "Mallick, Asit K" <asit.k.mallick@xxxxxxxxx>
Delivery-date: Thu, 25 Aug 2005 14:12:39 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id: DIscussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post: <mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcWnjSOhSjX2nMKXQI2EGFjNgzyS+AAXrvdAAAAqXDAAAf54IAAFcXoAAATDFSAAADCbAAAOO83gAB29uYAAKFIVoAACuHFQ
Thread-topic: Timer merge
Thanks Kevin for your thoughtful answer.

> >The current (non-VTI) code is not perfect but it *is* fast.  This
> >is due a great deal to the fact that all the operations are
> >handled either with "fast reflection" or "fast hyperprivop"
> >code.  This code reflects/emulates operations without requiring
> >all the state save/restore and stack switching that is necessary
> >to call C.  Indeed it is done without even turning on psr.ic
> >because all data that is accessed is pinned by TRs.  As a
> >result each operation takes on the order of 100 cycles, as
> >opposed to 1000-2000 cycles if C must be called.  And there are
> >several (8-10 IIRC) operations per guest timer tick.
> 
> Yes, this is a fast path to reflect guest timer interrupt, 
> which we didn't note before.

Not just the fast path for reflection (once per tick).  Also
the fast path for reading ivr (twice per tick), setting tpr
(once per tick), reading tpr (twice per tick), setting eoi
(once per tick), and (of course) setting itm (once per tick).

These nine operations total ~1000 cycles when using the fast
path and ~15000 when using the slow path.  Multiplied by
1024 hz, the slow path uses an additional (above what
Linux uses) ~1.5% of the total CPU just processing clock ticks.

> But considering that IPF Linux 
> can catch up losing ticks based on ITC as a monotonic 
> increasing timer source, the requirement for accuracy of 
> virtual timer injection may not look so strictly.

Isn't this a requirement of all operating systems on
IPF since a long PAL call can happen asynchronously?

> To some 
> extent, to let guest catch up may have better performance 
> than triggering as many machine interrupts as what guest 
> wants. Because you can save much cycles to do context 
> switches in that way.

Delivering all ticks to all guests is certainly not
scalable.  Say there are 1000 lightly-loaded guests sharing
a single processor server.  The entire processor would be
utilized just delivering all the ticks to each guest!

Is this what Xen/x86 does?

> Drawback of this way may let guest 
> application observe time stagnant within small time slot.

Hmmm.... can you explain?  Are you talking about a guest
application that is making system calls to count jiffies
(which I think is a Linux-wide problem) or a guest application
that is reading the itc?  In the current model, the itc
is always monotonically increasing unless the guest operating
system sets itc.

> Of course actual performance difference needs future benchmark 
> data. But this is a factor we need to balance. ;-)

Agreed.  Perhaps we should set a system-wide quota, e.g. no
more than 0.2% total system overhead for the hypervisor processing
guest clock ticks.  (I'm not proposing that 0.2% is the right
number, just using it as an example.)

> >The core Xen code for handling timers is all in C so using
> >any of it will slow every guest timer tick substantially,
> >thus slowing the entire guest substantially.  It may be
> >possible to write the semantic equivalent of the Xen ac_timer
> >code in ia64 assembly, but this defeats the purpose of sharing
> >the core Xen code.  Also, I'm doubtful that walking timer
> >queues can be done with psr.ic off.
> 
> It's cleaner to consolidate all places modifying machine itm 
> into one uniform interface (Ac_timer here). This conforms to 
> common interface and also benefits merge process. If using 
> above policy to inject less interrupt, the benefit of 
> assembly is a bit reduced and instead show more error-prone.

As discussed in a different thread on xen-devel some time ago,
I believe the ac_timer queue mechanism is an elegant interface
that is overkill for how it is used.  It was pointed out
(by Rolf I believe) that it is used more heavily in SMP.
I was skeptical but couldn't argue because Xen/ia64 doesn't
do SMP yet.

Without changing core code, the ac_timer queue mechanism MUST
be used for scheduling domains.  Since this is less performance
critical, I am OK with that.

> >Note that hypervisor ticks (primarily used for scheduling
> >timeouts) are much less frequent (32 hz?) so not as
> >performance-sensitive.  The current code does call C for
> >these ticks.
> 
> Now HZ is defined as 100 in config.h, however current itm 
> modification policy actually makes this periodic value 
> useless. Even when itm_delta is added and set into itm, an 
> immediately following ac_timer softirq will reprogram the itm 
> to the closest time point in the ac timer list. 

This sounds like a bug (but on the path for scheduling domains,
not delivering guest ticks, correct?)

> >In short, I am open to rearchitecting the timer code to
> >better merge with VTI.  However the changes should not have
> >a significant effect on performance.  And anything that calls
> >C code multiple times at 1024hz almost certainly will.
> 
> Agree. Actually this area is the one missing enough 
> discussion for a long time. We need to make progress without 
> breaking anything. Since we begin this discussion, broader 
> usage model should also be considered for future support:
>       - When guest is presented with multiple vcpus, current 
> guest linux smp boot code will try to sync itc and thus write to itc.

The current model should handle this just fine using a
delta.  This delta is not currently implemented, but that's
only because setting itc hasn't been an issue yet.

>       - When vcpu is allowed to be scheduled to different 
> physical cpu (host smp), itc on different physical cpu is 
> unlikely to be exactly same even after sync.

This is much less frequent so doing extra work here is OK.

>       - For large machine like NUMA, the itc is completely 
> un-synchronized driven by different ratio. People need to 
> access global platform timer for different cpus to have a 
> base monotonic time base.

Agreed.  But this is an operating system problem that
is currently being discussed and solved in the Linux
community.  I would prefer to see the problem solved
by Linux and then (to the extent possible) leverage
that solution.

> All these cases in my head just pose the importance of a 
> scalable and well-organized time mechanism for both system 
> time keep and virtual time emulation. To implement all in 
> assembly code seems frighten me. Without virtualized itc (by 
> offset) and itm, it's difficult to handle above cases. This 
> is why Eddie gave the proposal as the below of this thread.

I'm not proposing that *everything* be implemented in assembly,
just that the architecture and design assume that the
most frequent paths can be implemented in assembly
(and with psr.ic off).  I think this will be hard to do
using Xen core ac_timer queues.

> However, current assembly approach is also a good research 
> direction to consider. Whether we can short-circuit in some 
> special case is also the way to gain maximum performance. We 
> just need balance, but let's draw out a achievable goal first. ;-)

Well, it's hard to call it a research direction if its already
implemented and working :-)

As I said, I'm not against a new time architecture/design.  I'm
simply proposing that performance is more important than utilizing
elegant-but-overcomplicated existing core Xen code.

Oh, and of course, that any new architecture/design works properly.
As we've seen from the recent changes to Xen/x86, getting time
working properly is not always easy.

Dan

> >> -----Original Message-----
> >> From: Dong, Eddie [mailto:eddie.dong@xxxxxxxxx]
> >> Sent: Tuesday, August 23, 2005 9:07 PM
> >> To: Magenheimer, Dan (HP Labs Fort Collins)
> >> Cc: Yang, Fred; Tian, Kevin; Xu, Anthony; Mallick, Asit K; 
> Dong, Eddie
> >> Subject: Timer merge
> >>
> >> Dan:
> >>    We are looking for simplest way to merge the timer code
> >> together now. Probably you still remember the discussion
> >> several months ago, now it looks like we are much easier to
> >> merge together as the XEN ac_timer is moved to one-shot timer.
> >>
> >>
> >>    Current code is setting machine ITM to smallest one of
> >> HV timer (HZ=1024), guest ITM (suppose 1024HZ) and next
> >> ac_timer. How about following changes to support both VTI 
> & non-VTI?
> >>    1:  machine ITM is set only by next ac_timer (current
> >> non VTI HV also do this)
> >>    2: Setting guest ITM will be done by adding a new
> >> ac_timer for that ITM. (expire = requested_iTM - current ITC)
> >>    3: HV timer probably is not necessary now, but up to you.
> >>    4: xen_timer_interrupt do corresponsive modification to
> >> reflect above changes.
> >>
> >>
> >>    Any suggestion?
> >> Thx,eddie
> >>
> >>
> 

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

<Prev in Thread] Current Thread [Next in Thread>