RE: [Xen-devel] Proposal for Xen support of performance monitori


> -----Original Message-----
> From: William Cohen [mailto:wcohen@xxxxxxxxxx] 
> Sent: Monday, April 25, 2005 8:18 AM
> To: Santos, Jose Renato G (Jose Renato Santos)
> Cc: Ian Pratt; xen-devel@xxxxxxxxxxxxxxxxxxx; Aravind Menon; 
> G John Janakiraman; Turner, Yoshio
> Subject: Re: [Xen-devel] Proposal for Xen support of 
> performance monitoringanddebug hardware
> 
> 
> Santos,
> 
> Thanks for the comments. I will take a closer look at the Xen 
> oprofile 
> support and see what I can incoporate into the proposal.
  
  Good!

[...]

> I was thinking that the xen_msr_allocate function would provide some 
> information on how to route the performance monitoring 
> hardware. Select 
> scope as GLOBAL for domain 0 to reserve the performance monitoring 
> hardware for domain 0. The xen_msr_irq_hander sets the irq for 
> performance monitoring to route all perf irq to the domain 
> that reserved 
> the perf HW.
> 
   In Xenoprof we have a similar notion, in which one domain 
   receives the interrupts generated by counter overflows in
   other domains(which we call passive domains). In this case,
   profiling is at a coarser domain level (fine grain 
   profiling at application/function level is lost).  
   In general, I think enabling a domain to handle perf counter
   interrupts for other domains is a good thing, but we should
   NOT be limited to this case. It is still useful to have
   interrupts delivered to the running domain for system-wide
   profiling. I think your interface should enable that option
   for the GLOBAL case too.

   
 > 
> >   1) It would not be possible to profile hypervisor code, since 
> > interrupts
> >      caused by hardware overflow would be handled by the 
> domain. When
> >      the domain start executing the information about what 
> Xen code was
> >      running at the time of MSR overflow is lost. In Xenoprof we
> >      handle the MSR interrupts inside the hypervisor and save
> >      the PC value at that time, enabling the profile of
> >      hypervisor code. An additional complication is the use 
> of normal
> >      IRQs instead of NMI. This would prevent performance profiling
> >      of some parts of the kernel (including interrupt handlers).
> 
> Shouldn't it be possible for the hypervisor to send the needed 
> information about address to the irq handler in the domain? From the 
> address it should be possible to determine that it is a 
> sample from the 
> hypervisor. The overhead of moving things from hypervisor to domain 
> might be undesirable.
> 

  Yes, it is possible for the hypervisor to send PC samples to the
  domain. But this requires saving the PC value at the time of
  interrupt, i.e. at physical interrupt handler in the hypervisor. 
  This is exactly how Xenoprof works. The NMI handler in Xen stores
  the PC sample in a buffer and triggers an virtual IRQ in the 
  domain. The domain interrupt handler reads the sample from the
  buffer. The overhead is insignificant since this is done through
  a shared memory page.
  
> I have some reservations about using NMI in this case. With 
> OProfile it 
> is quite possible to kill the machine by setting a sampling 
> interval to 
> be smaller than the overhead incurred by the interrupt servicing 
> routine. Allowing NMIs would be a way for a domain to crash 
> the entire 
> machine. The NMI do allow better coverage of code.
> 
> 
   Programming small performance counters with low values
   can trash the system, but this is not restrictred to NMI.
   Even with maskable interrupts, the machine will trash in this
   case. We should prevent this by other means: e.g by preventing
   the counters  to be programmed with small values to begin with,
   or by reprogramming the counter to not generate interrupts 
   when a high interrupt rate is detected (i.e. kind of disabling NMI).
   I still think NMI is a better choice since it has better coverage
   and does not solve the problem you mentioned.

> >   2) It seems you plan to have interrupts that occurs in other
> >      domains to be delivered to the owner of the MSR. A potential
> >      problem with this approach is that this could cause additional
> >      domain context swiching (to schedule the owner domain to 
> >      handle the interrupt) and this could change your profiling
> >      results. In addition, it is not clear how the interrupt
> >      handler would get information about the PC sample at the
> >      time of MSR overflow. Even if it was possible to receive this
> >      information from the hypervisor, we would still need a way
> >      to map this PC value to the right process and associated
> >      binary file running on the other domain, which seems difficult.
> 
> PC values are pretty transient. Memory maps go away. The 
> mapping the pc 
> values to something reasonable is still an issue; there is a FIXME in 
> the document for this. OProfile has some help in the kernel 
> to convert 
> the raw pc value to a dcookie and file offset. This help is not 
> available to outside the domain.
> 
  Exactly! That is why it is important to have a framework
  that does not prevent interrupts to be delivered to multiple
  domains in case of system wide profiling. It is better to have 
  domains interpreting their own samples if we want fine grain 
  profiling.
   
> >   I think both system wide profiling and single domain (virtualized)
> >   profiling are important and it would be nice to have both.
> >   As Ian mentioned we cannot have both at the same time,
> >   at least for the same MSR. However, it would be possible to have
> >   some registers being virtualized and others being used 
> >   for system wide profiling, at the same time.
> >   It would be nice to have a unified framework that could provide
> >   both functionalities and a way to select.
> > 
> >   Renato
> 
> Slicing and dicing the performance monitoring hardware may be 
> possible, 
> but it is a complicated operation. There are lots of 
> constraints about 
> the combinations that are allow and not allowed. Combinations like 
> inter-domain and intra-domain sampling would be difficult because the 
> interrupt would be the same. The allocation software would 
> have to have 
> a picture of all the domain allocations. There are lots of 
> constraints 
> on which registers can be used for what on pentium 4 and ppc64.
> 
> For the time being the proposal will address both global and virtual 
> modes but not allow concurrent use of the global and virtual modes.
> 

  I agree with your point. It makes sense to have a simple initial
  implementation without concurrent use of global and virtual modes.
  But maybe we could have a generic interface that can accommodate
  this flexibility. This could avoid interface changes in the future.
  Not sure it this is worth, though... Just a thought ...

  Renato

> -Will
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Proposal for Xen support of performance monitoringanddeb