This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

To: Keir Fraser <keir.xen@xxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
From: Wei Huang <wei.huang2@xxxxxxx>
Date: Thu, 14 Apr 2011 17:57:08 -0500
Cc: "'xen-devel@xxxxxxxxxxxxxxxxxxx'" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 14 Apr 2011 16:02:37 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C9CD2137.16600%keir.xen@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C9CD2137.16600%keir.xen@xxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv: Gecko/20110303 Thunderbird/3.1.9
Hi Keir,

I ran a quick test to calculate the overhead of __fpu_unlazy_save() and __fpu_unlazy_restore(), which are used to save/restore LWP state. Here are the results:

(1) tsc_total: total time used for context_switch() in x86/domain.c
(2) tsc_unlazy: total time used for __fpu_unlazy_save() + __fpu_unlazy_retore()

One example:
(XEN) tsc_unlazy=0x00000000008ae174
(XEN) tsc_total=0x00000001028b4907

So the overhead is about 0.2% of total time used by context_switch(). Of course, this is just one example. I would say the overhead ratio would be <1% for most cases.


On 04/14/2011 04:09 PM, Keir Fraser wrote:
On 14/04/2011 21:37, "Wei Huang"<wei.huang2@xxxxxxx>  wrote:

The following patches support AMD lightweight profiling.

Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
handle lazy and unlazy FPU states differently. Lazy FPU state (such as
SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
is saved and restored on each vcpu context switch. To simplify the code,
we also add a mask option to xsave/xrstor function.
How much cost is added to context switch paths in the (overwhelmingly
likely) case that LWP is not being used by the guest? Is this adding a whole
lot of unconditional overhead for a feature that noone uses?

  -- Keir


Xen-devel mailing list

Xen-devel mailing list