If you partition the perf counters and do not share them you still need
to enable/disable them on context switch. Otherwise you would be
counting events that happen when other domains are running. Probably
saving and restoring counters should not be much more expensive than
Even if this is not true we could still do lazy save/restore similar to
what is done with FPU registers. Thus we would only need to save and
restore the counters when needed and only the counters being used. If
you use counters in only one domain, overhead with full perf counter
virtualization would be equivalent to your approach with the advantage
of transparency to the guest (no need to ask for resources, etc.). If
you use perf counters in multiple domains you may have additional
overhead of saving/restoring them, but I think that is more than
compensated by a more powerfull abstraction.
I think full virtualization should be the first option. Only if overhead
proves to be very painfull we should consider an alternative. Not the
other way around...
Of course, gathering some data on the overhead of saving/restoring
counters would help clarify this.
>> -----Original Message-----
>> From: Ray Bryant [mailto:raybry@xxxxxxxxxxxxxxxxx]
>> Sent: Thursday, May 25, 2006 11:39 AM
>> To: Santos, Jose Renato G
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Steve Dobbelstein;
>> Eranian, Stephane
>> Subject: Re: [Xen-devel] Xenoprof in an HVM domain
>> On Thursday 25 May 2006 11:44, Santos, Jose Renato G wrote:
>> > I don't think this is how performance counters should be
>> > Virtualizing performance counter should save/restore the values of
>> > active perf counters on every VCPU/domain context switch.
>> There should
>> > be no need for such a "xm" command.
>> Agreed, but as a practical matter, reading and writing PMC
>> registers like that
>> may be too slow to actually do on every context switch.
>> Read and write PMC
>> can sometimes be quite slow, depending on implementation.
>> Its the same kind of argument that leads to lazy save and
>> restore of the FPU
>> registers. If it is done on every context switch it is
>> simply too slow.
>> Of course, hypervisor context switches >>might<< occur less
>> frequently than process context switch in a native O/S, but
>> thus far I've not seen evidence of this. :-)
>> At any rate, if complete virtualization of PMC's is too slow
>> (data required), then one could treat them as a system
>> resource and allocate them out to the
>> domains as required. That was all I was suggesting.
>> > Performance counter virtualization is not currently
>> supported in Xen,
>> > although it would be nice to have it. With counter virtualization,
>> > guest domains would be able to profile themselves with
>> un-modified oprofile.
>> > This would be usefull to enable users to profile their
>> applications on
>> > Xen guests in the same way they are used to do on vanila linux.
>> My point, exactly.
>> > The current model supported by Xenoprof is system-wide profiling,
>> > where counters are used to profile the collection of
>> domains and Xen together.
>> > This is usefull for Xen developers to optimize Xen and
>> > para-virtualized kernels running on Xen.
>> Yes. And it is very helpful in that regard. Don't get me
>> wrong. In essence I'm really asking how xenoprof
>> would/could/should evolve to better support profiling of HVM domains.
>> > Ideally we would like to have support for both system-wide
>> > (for Xen developers) and independent guest profiling with
>> perf counter
>> > virtualization (for Xen users). Adding perf counter
>> virtualization is
>> > in our to do list. If anybody is interested in working on
>> this please
>> > let me know.
>> > We would appreciate any help we could get.
>> I'll put it on my todo list. :-)
>> In the meantime, off to get passive domain support working
>> on my latest xenbits-unstable tree.
>> > Thanks
>> > Renato
>> Thank you,
>> Ray Bryant
>> AMD Performance Labs Austin, Tx
>> 512-602-0038 (o) 512-507-7807 (c)
Xen-devel mailing list