Keir:
This has a long history behind this:-) It is not caused by TSC hardware
acceleration. But how we handle periodic time IRQ.
In X86 VMM, the guest platform time like PIT, RTC has an periodic IRQ
(same for guest local APIC time) whose pending interrupt is accumulated
(pending_intr_nr) if the host time goes more. In this periodic time model, the
guest time seen can be explained like:
Pit_time = total_pit_int_nr * T0 + CNT0 + Off0
T0: period
CNT0: The time elapsed since last IRQ fires. It can be known from the
“Counter” register
CNT0 = (LATCH-1- “Counter”)* T0 / LATCH
Off0: A constant offset standing for the time when 1st IRQ happens.
Here CNT0 is always less than T0. Say guest see guest_pit0 at last PIT
IRQ injection time, now before we inject the next PIT IRQ, the guest time seen
from guest PIT is actually limited within (guest_pit0, guest_pit0 + T0) no
matter how many physical time elapsed. If we sync guest TSC with host TSC using
a fixed OFFSET, the guest time seen from TSC may be much ahead than guest PIT
time if the host time elapsed a lot. Due to this, we freeze guest time (TSC
time) at domain switch time to solve the problem. It works fine for UP.
In SMP guest, each VP has its TSC time and local APIC time which we
need to sync them together like above UP approach. Ideally this per VP time
should sync with platform time too like PIT (and thus all VP guest time is
synced). But the problem is that some VPs may be deactive and thus no way to
inject perioidic time IRQ no matter for LOCAL APIC time or PIT time. The
solution will be very complicated to this and it may need to change scheduler
to do switch in/out with affinity for VPs in same VM.
After some investigation, we think this is too complicate for now, and
we want to go with simple solution first, i.e. Each guest time is synced, but
the guest time among different VP is out of sync if the pending_intr_nr is not
0. In this way the guest application (may migrate among processors) is OK
because pending_intr_nr is always 0 before a guest application get executed on
a VP, and thus the guest application see persistently guest time going ahead
even it is migrated to anothre VP. We assume guest kernel critical path will
not migrate between processors.
With this approach, the platform time (PIT here) must be pinned to one
processor (VP0 here), and guest TSC, PIT & local APIC time are synced in VP0
(other VPs only sync TSC time and local APIC time). If it is routed to
different VP dynamically say VP1, then the guest time seen within same VP (here
VP1) will be not synced. That will cause the "well-known" guest lost too many
ticks issue (refer CSET: 7478, 9324).
thx,eddie
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Keir Fraser
Sent: 2006年6月27日 20:53
To: Li, Xin B
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
On 27 Jun 2006, at 12:39, Li, Xin B wrote:
> But on our VM, we will have to synchronize TSC from time to time, so
> PIT
> irq handler on different vcpu may see big TSC diff and complain about
> the unreliable TSC, then maybe it will try to do TSC sync, which make
> guest time keeping complex and unreliable.
How out-of-sync do TSCs of different VCPUs get?
I'd like to see a non-optimised TSC mode in which RDTSC always vmexits
and we implement constant-rate always-sync'ed TSC in Xen. It'd be good
to see if we could at least get that time mode working properly, and
we'll need it anyway for save/restore/migration between machines with
(even only slightly) different clock speeds or the guest will get very
confused.
-- Keir
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|