Hi Dan,
This isn't the cycles of a single switch. This is the total cycle count (added)
over a period. I randomly dumped the numbers when a guest was running.
Thanks,
-Wei
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Dan Magenheimer
Sent: Friday, April 15, 2011 3:16 PM
To: Huang2, Wei; Keir Fraser
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
Wait... a context switch takes over 4 billion cycles?
Not likely!
And please check your division. I get the same
answer from "dc" only when I use lowercase hex
numbers and dc complains about unimplemented chars,
else I get 0.033%... also unlikely.
> -----Original Message-----
> From: Wei Huang [mailto:wei.huang2@xxxxxxx]
> Sent: Thursday, April 14, 2011 4:57 PM
> To: Keir Fraser
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
>
> Hi Keir,
>
> I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
> __fpu_unlazy_restore(), which are used to save/restore LWP state. Here
> are the results:
>
> (1) tsc_total: total time used for context_switch() in x86/domain.c
> (2) tsc_unlazy: total time used for __fpu_unlazy_save() +
> __fpu_unlazy_retore()
>
> One example:
> (XEN) tsc_unlazy=0x00000000008ae174
> (XEN) tsc_total=0x00000001028b4907
>
> So the overhead is about 0.2% of total time used by context_switch().
> Of
> course, this is just one example. I would say the overhead ratio would
> be <1% for most cases.
>
> Thanks,
> -Wei
>
>
>
> On 04/14/2011 04:09 PM, Keir Fraser wrote:
> > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@xxxxxxx> wrote:
> >
> >> The following patches support AMD lightweight profiling.
> >>
> >> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
> >> handle lazy and unlazy FPU states differently. Lazy FPU state (such
> as
> >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as
> LWP,
> >> is saved and restored on each vcpu context switch. To simplify the
> code,
> >> we also add a mask option to xsave/xrstor function.
> > How much cost is added to context switch paths in the (overwhelmingly
> > likely) case that LWP is not being used by the guest? Is this adding
> a whole
> > lot of unconditional overhead for a feature that noone uses?
> >
> > -- Keir
> >
> >> Thanks,
> >> -Wei
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> >
> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|