|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
Hi Keir,
I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
__fpu_unlazy_restore(), which are used to save/restore LWP state. Here
are the results:
(1) tsc_total: total time used for context_switch() in x86/domain.c
(2) tsc_unlazy: total time used for __fpu_unlazy_save() +
__fpu_unlazy_retore()
One example:
(XEN) tsc_unlazy=0x00000000008ae174
(XEN) tsc_total=0x00000001028b4907
So the overhead is about 0.2% of total time used by context_switch(). Of
course, this is just one example. I would say the overhead ratio would
be <1% for most cases.
Thanks,
-Wei
On 04/14/2011 04:09 PM, Keir Fraser wrote:
On 14/04/2011 21:37, "Wei Huang"<wei.huang2@xxxxxxx> wrote:
The following patches support AMD lightweight profiling.
Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
handle lazy and unlazy FPU states differently. Lazy FPU state (such as
SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
is saved and restored on each vcpu context switch. To simplify the code,
we also add a mask option to xsave/xrstor function.
How much cost is added to context switch paths in the (overwhelmingly
likely) case that LWP is not being used by the guest? Is this adding a whole
lot of unconditional overhead for a feature that noone uses?
-- Keir
Thanks,
-Wei
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|