WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [RFC][PATCH] Per-cpu xentrace buffers

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] Re: [RFC][PATCH] Per-cpu xentrace buffers
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Wed, 20 Jan 2010 17:50:05 +0000
Cc:
Delivery-date: Wed, 20 Jan 2010 09:51:00 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <de76405a1001200938j8210aadkeaf5b64e6833cea9@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcqZ92Qf0m6qWJWzSDSf3BEpRy502gAAZtT0
Thread-topic: [RFC][PATCH] Per-cpu xentrace buffers
User-agent: Microsoft-Entourage/12.23.0.091001
Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but
actually plenty is going in for rc2. What do you think?

 -- Keir

On 20/01/2010 17:38, "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:

> Keir, would you mind commenting on this new design in the next few
> days?  If it looks like a good design, I'd like to do some more
> testing and get this into our next XenServer release.
> 
>  -George
> 
> On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@xxxxxxxxx> wrote:
>> In the current xentrace configuration, xentrace buffers are all
>> allocated in a single contiguous chunk, and then divided among logical
>> cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
>> limited, in my experience about 128 pages (512KiB).  As the number of
>> logical cores increase, this means a much smaller maximum per-cpu
>> trace buffer per cpu; on my dual-socket quad-core nehalem box with
>> hyperthreading (16 logical cpus), that comes to 8 pages per logical
>> cpu.
>> 
>> The attached patch addresses this issue by allocating per-cpu buffers
>> separately.  This allows larger trace buffers; however, it requires an
>> interface change to xentrace, which is why I'm making a Request For
>> Comments.  (I'm not expecting this patch to be included in the 4.0
>> release.)
>> 
>> The old interface to get trace buffers was fairly simple: you ask for
>> the info, and it gives you:
>> * the mfn of the first page in the buffer allocation
>> * the total size of the trace buffer
>> 
>> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
>> buffers were, and went on to consume records from them.
>> 
>> -- Interface --
>> 
>> The proposed interface works as follows.
>> 
>> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
>> changes to the library).  However, this new are is to a trace buffer
>> info area  (t_info), allocated once at boot time.  The trace buffer
>> info area contains mfns of the per-pcpu buffers.
>> * The t_info struct contains an array of "offset pointers", one per
>> pcpu.  These are an offset into the t_info data area of an array of
>> mfns for that pcpu.  So logically, the layout looks like this:
>> struct {
>>  int16_t tbuf_size; /* Number of pages per cpu */
>>  int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
>>  uint32_t mfn[NR_CPUS][TBUF_SIZE];
>> };
>> 
>> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
>> struct {
>>  int16_t tbuf_size; /* Number of pages per cpu */
>>  int16_t offset[16]; /* Offset into the t_info area of the array */
>>  uint32_t p0_mfn_list[32];
>>  uint32_t p1_mfn_list[32];
>>  ...
>>  uint32_t p15_mfn_list[32];
>> };
>> * So the new way to map trace buffers is as follows:
>>  + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map
>> it.
>>  + Get the number of cpus
>>  + For each cpu:
>>  - Calculate the offset into the t_info area thus: unsigned long
>> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
>>  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()
>> 
>> In the current implementation, the t_info size is fixed at 2 pages,
>> allowing about 2000 pages total to be mapped.  For a 32-way system,
>> this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
>> 4 would allow even larger systems if required.
>> 
>> The current implementation also allocates each trace buffer
>> contiguously, since that's the easiest way to get contiguous virtual
>> address space.  But this interface allows Xen the flexibility, in the
>> future, to allocate buffers in several chunks if necessary, without
>> having to change the interface again.
>> 
>> -- Implementation notes --
>> 
>> The t_info area is allocated once at boot.  Trace buffers are
>> allocated either at boot (if a parameter is passed) or when
>> TBUFOP_set_size is called.  Due to the complexity of tracking pages
>> mapped by dom0, unmapping or resizing trace buffers is not supported.
>> 
>> I introduced a new per-cpu spinlock guarding trace data and buffers.
>> This allows per-cpu data to be safely accessed and modified without
>> tracing with current tracing events.  The per-cpu spinlock is grabbed
>> whenever a trace event is generated; but in the (very very very)
>> common case, the lock should be in the cache already.
>> 
>> Feedback welcome.
>> 
>>  -George
>> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel