RE: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution

To:	"Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, "Li, Xin B" <xin.b.li@xxxxxxxxx>
Subject:	RE: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
From:	"Dong, Eddie" <eddie.dong@xxxxxxxxx>
Date:	Wed, 28 Jun 2006 11:54:39 +0800
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Tue, 27 Jun 2006 20:55:59 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcaZ6PTXV+VEtuIlTIy+pV0dKIadiwAddO2w
Thread-topic:	[Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution

Keir:
        This has a long history behind this:-) It is not caused by TSC hardware 
acceleration. But how we handle periodic time IRQ.
        In X86 VMM, the guest platform time like PIT, RTC has an periodic IRQ 
(same for guest local APIC time) whose pending interrupt is accumulated 
(pending_intr_nr) if the host time goes more.  In this periodic time model, the 
guest time seen can be explained like:

        Pit_time = total_pit_int_nr * T0 + CNT0 + Off0
        T0: period
        CNT0: The time elapsed since last IRQ fires. It can be known from the 
“Counter” register
                CNT0 = (LATCH-1- “Counter”)* T0 / LATCH
        Off0: A constant offset standing for the time when 1st IRQ happens.


        Here CNT0 is always less than T0. Say guest see guest_pit0 at last PIT 
IRQ injection time, now before we inject the next PIT IRQ, the guest time seen 
from guest PIT is actually limited within (guest_pit0,  guest_pit0 + T0) no 
matter how many physical time elapsed. If we sync guest TSC with host TSC using 
a fixed OFFSET, the guest time seen from TSC may be much ahead than guest PIT 
time if the host time elapsed a lot. Due to this, we freeze guest time (TSC 
time) at domain switch time to solve the problem. It works fine for UP.
        In SMP guest, each VP has its TSC time and local APIC time which we 
need to sync them together like above UP approach. Ideally this per VP time 
should sync with platform time too like PIT (and thus all VP guest time is 
synced). But the problem is that some VPs may be deactive and thus no way to 
inject perioidic time IRQ no matter for LOCAL APIC time or PIT time. The 
solution will be very complicated to this and it may need to change scheduler 
to do switch in/out with affinity for VPs in same VM.
        After some investigation, we think this is too complicate for now, and 
we want to go with simple solution first, i.e. Each guest time is synced, but 
the guest time among different VP is out of sync if the pending_intr_nr is not 
0. In this way the guest application (may migrate among processors) is OK 
because pending_intr_nr is always 0 before a guest application get executed on 
a VP, and thus the guest application see persistently guest time going ahead 
even it is migrated to anothre VP. We assume guest kernel critical path will 
not migrate between processors.
        With this approach, the platform time (PIT here) must be pinned to one 
processor (VP0 here), and guest TSC, PIT & local APIC time are synced in VP0 
(other VPs only sync TSC time and local APIC time). If it is routed to 
different VP dynamically say VP1, then the guest time seen within same VP (here 
VP1) will be not synced. That will cause the "well-known" guest lost too many 
ticks issue (refer CSET: 7478, 9324).


thx,eddie


-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Keir Fraser
Sent: 2006年6月27日 20:53
To: Li, Xin B
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution


On 27 Jun 2006, at 12:39, Li, Xin B wrote:

> But on our VM, we will have to synchronize TSC from time to time, so 
> PIT
> irq handler on different vcpu may see big TSC diff and complain about
> the unreliable TSC, then maybe it will try to do TSC sync, which make
> guest time keeping complex and unreliable.

How out-of-sync do TSCs of different VCPUs get?

I'd like to see a non-optimised TSC mode in which RDTSC always vmexits 
and we implement constant-rate always-sync'ed TSC in Xen. It'd be good 
to see if we could at least get that time mode working properly, and 
we'll need it anyway for save/restore/migration between machines with 
(even only slightly) different clock speeds or the guest will get very 
confused.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution