This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Xenoprof in an HVM domain

To: "Santos, Jose Renato G" <joserenato.santos@xxxxxx>
Subject: Re: [Xen-devel] Xenoprof in an HVM domain
From: Stephane Eranian <eranian@xxxxxxxxxx>
Date: Tue, 30 May 2006 04:58:55 -0700
Address: HP Labs, 1U-17, 1501 Page Mill road, Palo Alto, CA 94304, USA.
Cc: "Eranian, Stephane" <stephane.eranian@xxxxxx>, Steve Dobbelstein <steved@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Ray Bryant <raybry@xxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 30 May 2006 09:10:01 -0700
E-mail: eranian@xxxxxxxxxx
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <6C21311CEE34E049B74CC0EF339464B96454D5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organisation: HP Labs Palo Alto
References: <200605250920.56553.raybry@xxxxxxxxxxxxxxxxx> <6C21311CEE34E049B74CC0EF339464B96454D5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: eranian@xxxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.4.1i

On Thu, May 25, 2006 at 09:44:15AM -0700, Santos, Jose Renato G wrote:
> >> -----Original Message-----
> >> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> >> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> >> Ray Bryant
> >> Sent: Thursday, May 25, 2006 7:21 AM
> >> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> >> Cc: Steve Dobbelstein; Eranian, Stephane
> >> Subject: Re: [Xen-devel] Xenoprof in an HVM domain
>    ... <stuff deleted> ....
> >> This assumes, of course, that one can figure out 
> >> how to virtualize the performance counters, and then the 
> >> hypervisor would have to sort out conflicts, etc, between 
> >> domains that wanted to use the performance counters (perhaps 
> >> these would be a resource the hypervisor could dynamically 
> >> allocate to a domain, by, for 
> >> example, some kind of "xm" command).    
> I don't think this is how performance counters should be virtualized.
> Virtualizing performance counter should save/restore the values of
> active perf counters on every VCPU/domain context switch. There should
> be no need for such a "xm" command.
> Performance counter virtualization is not currently supported in Xen,
> although it would be nice to have it. With counter virtualization, guest
> domains would be able to profile themselves with un-modified oprofile.
> This would be usefull to enable users to profile their applications on
> Xen guests in the same way they are used to do on vanila linux.
I think we need to clearly identify and prioritize the needs.
The first thing to do is to ensure that guests OSes using the PMU when
running native can continue to do so when running virtualized. That holds
true for both para-virtualized and fully virtualized (Pacifica/VT) guests.

This is the highest priority because some OSes do rely on
performance counters. Without such support, they cannot provide the same
kernel level API to their applications. In other words, certain applications
will fail. 

The second need is what XenOprofile is addressing which is how to get a "global 
of what is going in the the guests and in the VMM. To me this is a lower 
need because the system can function without it. Yet I recognize it is important
for tuning the VMM. 

Those two distinct needs are not specific to Xen, in fact, they are exact 
on what you need to provide in a native OS. The perfmon2 subsystem does this. 
global view is equivalent to "system-wide" monitoring and the per-guest 
PMU is equivalent to the per-thread mode.

To support per-guest monitoring, the PMU must be virtualized. The counters must 
saved/restored on domain switch. A similar operation is done on thread-switch in
the Linux kernel for perfmon2. In general performance counters are quite 
to read, ranging from 35 cycles on Itanium2 to thousands of cycles on some IA-32
processors. As indicated by Ray/Renato, you can be smart about that. In perfmon2
we do lazy save/restore of performance counters. This has worked fairly well. I 
expect domain switches to happen less frequently that thread switches, anyway. 
many measurements do use only a limited number of PMU registers.

Another important point is that I do not think that per-guest measurements 
include VMM-level execution, unlike for a system-wide measurement. That is true
for both para-virtualized and fully virtualized (VT/Pacifica) guests. This is 
important for sampling. I am not sure tools would know what to do with samples 
cannot attribute to code they know about. Furthermore, the goal of 
is to HIDE to guests applications the fact that they run virtualized. What would
we make an exception for monitoring tools? Note that this implies that the VMM 
turn off/on monitoring upon entry/exit.

For system-wide monitoring, you do need visibility into the VMM.  Yet monitoring
is driven from a guest domain, most likely domain0. On counter overflow, the VMM
receives the PMU interrupt and the corresponding interrupted IP (IIP). That 
must somehow be conveyed to the monitoring tool. It is not possible to simply 
pass the
interrupt to domain0 (controlling domain for the monitoring session). To solve 
problem, XenOprofile uses a in-VMM buffer where the "samples" are first saved. 
there needs to be a communication channel with controlling domain to send 
when the buffer becomes full. There needs to be one such buffer per 
virtual-CPU. Those
buffers only need to be visible to domain0. The whole mechanism should NOT 
any special code in the guest domains, except for domain0. That way it would 
with para-virtualized and fully virtualized guests be they Linux, Windows or 
else. In XenOprofile, I understand the buffer is shared via remapping. I think
the interface to setup/control the buffer needs to be more generic. For 
certain measurements may need to record in the buffer more than just the IIP.
they may need to also save certain counters values. The controlling domain needs
some interface to express what needs to be recorded in each sample. Furthermore,
it also needs to know how to resume after an overflow, i.e., what sampling 
to reload in the overflowed counter. All this information must be passed to the 
because there is not intervention from the controlling domain until the buffer 
up. Once again, this is not something new. We have the equivalent mechanism in 
simply because we support an in-kernel sampling buffer.

The next step is to see how the PMU can be shared between a system-wide usage
and a per-guest usage. On some PMU models, this may not be possible due to 
limitations, i.e., fully independence of the counters. This gets into a new 
of complexity which has to be managed by the VMM. Basically, this requires a VMM
PMU register allocator per virtual-CPU. This also implies that consumers cannot
expect to systematically have access to the full PMU each time they ask for it.
Note that it may be acceptable for the time being to say that system-wide and
per-guest are mutually exclusive.

Hope this helps.

> The current model supported by Xenoprof is system-wide profiling, where
> counters are used to profile the collection of domains and Xen together.
> This is usefull for Xen developers to optimize Xen and para-virtualized
> kernels running on Xen.
> Ideally we would like to have support for both system-wide profiling
> (for Xen developers) and independent guest profiling with perf counter
> virtualization (for Xen users). Adding perf counter virtualization is in
> our to do list. If anybody is interested in working on this please let
> me know.
> We would appreciate any help we could get.
> Thanks
> Renato



Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>