Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

To:	Keir Fraser <keir.xen@xxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
From:	Wei Huang <wei.huang2@xxxxxxx>
Date:	Thu, 14 Apr 2011 17:57:08 -0500
Cc:	"'xen-devel@xxxxxxxxxxxxxxxxxxx'" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Thu, 14 Apr 2011 16:02:37 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<C9CD2137.16600%keir.xen@xxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<C9CD2137.16600%keir.xen@xxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9

Hi Keir,

I ran a quick test to calculate the overhead of __fpu_unlazy_save() and__fpu_unlazy_restore(), which are used to save/restore LWP state. Hereare the results:


(1) tsc_total: total time used for context_switch() in x86/domain.c

(2) tsc_unlazy: total time used for __fpu_unlazy_save() +__fpu_unlazy_retore()


One example:
(XEN) tsc_unlazy=0x00000000008ae174
(XEN) tsc_total=0x00000001028b4907

So the overhead is about 0.2% of total time used by context_switch(). Ofcourse, this is just one example. I would say the overhead ratio wouldbe <1% for most cases.


Thanks,
-Wei



On 04/14/2011 04:09 PM, Keir Fraser wrote:

On 14/04/2011 21:37, "Wei Huang"<wei.huang2@xxxxxxx>  wrote:

The following patches support AMD lightweight profiling.

Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
handle lazy and unlazy FPU states differently. Lazy FPU state (such as
SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
is saved and restored on each vcpu context switch. To simplify the code,
we also add a mask option to xsave/xrstor function.

How much cost is added to context switch paths in the (overwhelmingly
likely) case that LWP is not being used by the guest? Is this adding a whole
lot of unconditional overhead for a feature that noone uses?

  -- Keir

Thanks,
-Wei



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description