This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
From: "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx>
Date: Sun, 13 Dec 2009 17:17:34 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Dugger, Donald D" <donald.d.dugger@xxxxxxxxx>
Delivery-date: Sun, 13 Dec 2009 01:18:00 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <bd9393f0-0b46-4c7e-ad45-1a75940469be@default>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C08B02B7E75BDA4BBAA8F1648BDCC20D56F9D58B@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <bd9393f0-0b46-4c7e-ad45-1a75940469be@default>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acp6sKaExnNhlIDORtCxS1qLS34UMQBHqpXg
Thread-topic: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
Dan Magenheimer wrote:
> Well, although it might be nice to be able to use
> rdtscp and TSC_AUX to determine pcpu/vcpu/pnode/vnode
> information, I think Jeremy and Jan convinced me in
> another thread a couple of months ago that in userland:
> x = vgetcpu()
> do_other_stuff();
> y = vgetcpu()
> if x==1 and y==2, there's no way to determine that
> do_other_stuff() was executed on cpu 1 vs cpu 2,
> or (though unlikely) even on cpu 3.  And if
> x==y==4, there's  no guarantee that do_other_stuff()
> is executed on cpu 4.
> If this is true the only safe use of TSC_AUX is for
> its originally designed intent: To determine if two
> successive rdtscp instructions were or were not
> executed on the same processor.  Since this cannot
> be guaranteed in a VM, that's a reasonable argument
> that TSC_AUX shouldn't be exposed at all (meaning the
> rdtscp bit in cpuid should be turned off by Xen).

Why do you think this is the design intent of this instruction ?  

For guest NUMA support,  it should be a must to pin each vcpu of one VM to some 
logical proceossors which belong to one specific node(disable vcpu migration 
between nodes), I think, otherwise, virutal numa may suffer from performance 
loss.  For example, in a numa system which has two nodes and each node has 4G 
memory and 8 logical processors. And in this Xen-configured system,  if we 
carete a VM with 2 G memory with4  vcpu support,  Xen system may allocate 1 G 
memory from physical node 0 and another 1 G memory from physical node 1.  And 
in this case, if we virtualize numa for this VM, vcpu0 and vcpu1 can be 
assinged to virtual node0 , vcpu2 and vcpu3 can be configured for virtual 
node1, certainly, we also can safely pin vcpu0 and vpcu1 to the physical 
node0's 8 locial processors and accordingly pin vcpu2 and vcpu3 to the physical 
node1's 8 physical processors.  Since virtual TSC_AUX is virtualized for each 
vcpu, and the value is saved/restored for the vcpu when its migration occurs, 
so if one application always runs on a virtual processors, it should get a 
fixed value when it calls vgetcpu, envn if this vcpu often migrates among 
logical processors of one node.   

Back to this topic, in all,  we can't mix the virtual  TSC_AUX of guest with 
the host's TSC_AUX.  If switch to HVM's vcpu context,  load this vcpu's virtual 
 TSC_AUX_MSR to physical TSC_AUX_MSR, and when it is sheduled out,  host's 
TSC_AUX_MSR(which maybe used for pv guests) is loaded.  

> True, as long as the information is ONLY used
> heuristically to obtain pcpu/vcpu/pnode/vnode info,
> and no guarantee of correctness is implied or expected,
> it might be useful some of the time.
> But frankly, if "performance sucks" when the heuristic
> fails due to the fact that the app is running on
> a VM instead of native OS, I'd see that as a problem
> and suggest the proper way to fix that is to define
> more App-to-Xen ABIs so that the app can get the
> real information, not a heuristic.  Which also argues
> for Xen leaving the rdtscp bit in cpuid turned off
> Dan
>> -----Original Message-----
>> From: Nakajima, Jun [mailto:jun.nakajima@xxxxxxxxx]
>> Sent: Friday, December 11, 2009 12:30 PM
>> To: Jeremy Fitzhardinge; Dan Magenheimer
>> Cc: Keir Fraser; Zhang, Xiantao; Xu, Dongxiao;
>> xen-devel@xxxxxxxxxxxxxxxxxxx; Dugger, Donald D
>> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>> Jeremy Fitzhardinge wrote on Fri, 11 Dec 2009 at 10:50:29:
>>> On 12/11/09 10:35, Dan Magenheimer wrote:
>>>>> However, the vcpu number is definitely useful to usermode apps,
>>>>> so they can get some idea how they're moved between (v)cpus.  I
>>>>> don't think it will matter to them that it isn't pcpu.
>>>> My point is that an app running on native Linux can
>>>> safely assume that, if TSC_AUX==3 at time T1 and
>>>> TSC_AUX is still 3 at time T2,it is running
>>>> on the same processor and the same node at both T1
>>>> and T2.  In a virtual environment it cannot even
>>>> assume it is running on the same machine.
>>>> Further if the app sees that TSC_AUX==2 at time T3
>>>> and TSC_AUX==3 at time T4, on native Linux it
>>>> can safely assume that it is running on a different
>>>> processor.  While rarer, in a virtual environment,
>>>> this may also be a false assumption.
>>>> That's why I say the information is misleading.
>>>  Sure, but that info is, at best, of heuristic value, and won't
>>> cause any correctness problems if it is wrong.  The performance may
>>> suck, but that's part of the larger problem of running NUMA-aware
>>> code in a virtual environment. 
>> And to utilize various NUMA optimizations in the kernel/apps
>> in the guest, we need "the virtual numa info bears some vague
>> resemblance to the real topology" (from Jeremy's email) with
>> the vcpus bound to the CPU/node.
>> I understand that enabling RDTSCP in HVM will disable the
>> pvrdtscp algorithm if used by the kernel. One way is to mask
>> off the feature in CPUID (by default). Then kernel won't use it.
>> Jun
>> ___
>> Intel Open Source Technology Center

Xen-devel mailing list