This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Time skew on HP DL785 (and possibly other boxes)

To: "Xen-Devel (E-mail)" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] Time skew on HP DL785 (and possibly other boxes)
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Fri, 27 Mar 2009 20:49:37 +0000 (GMT)
Cc: john.v.morris@xxxxxx
Delivery-date: Fri, 27 Mar 2009 13:51:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
(Raising a yellow flag because this could turn into
a serious issue for Xen and it may take quite a bit
of work to come up with a solution.)

We recently measured Xen system time skew on an HP DL785
and found it to be horrible... nearly a quarter millisecond
worst case (with only about 10000 samples so it may get worse).

This box uses 8 quad-core AMD chips connected via
hypertransport.  BUT each chip is on a separate motherboard.
On this system hypertransport is fast and cross-node
memory accesses are fast enough so that these NUMA systems
need not behave like NUMA systems from a memory access
perspective.  So Xen just views the system as a 32-cpu box
(other than some code in the memory allocator that tries
to allocate near-memory where possible, but silently falls
back to far-memory if necessary) and guest vcpus migrate
freely between the nodes.  (Correct?)

However, I'm told that its not possible to route a clocksource
over hypertransport, so TSC's on processors on different
motherboards may be VERY different and apparently the
mechanisms for synchronizing Xen system time across
motherboards may not be up to the challenge.  As a result,
OS's and apps sensitive to time that are running on PV
domains may be in for a rough ride on systems like this.
(HVM domains may run into other problems because time will
apparently stop for a "long time".)

Since systems like this are targeted for consolidation
and virtualization, I see this as a potentially big problem
as it may appear to real Xen customers as bizarre
non-reproducible problems, such as "make" failing,
leading to questions about the stability and viability
of using Xen.



Xen-devel mailing list