Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but
actually plenty is going in for rc2. What do you think?
-- Keir
On 20/01/2010 17:38, "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:
> Keir, would you mind commenting on this new design in the next few
> days? If it looks like a good design, I'd like to do some more
> testing and get this into our next XenServer release.
>
> -George
>
> On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@xxxxxxxxx> wrote:
>> In the current xentrace configuration, xentrace buffers are all
>> allocated in a single contiguous chunk, and then divided among logical
>> cpus, one buffer per cpu. The size of an allocatable chunk is fairly
>> limited, in my experience about 128 pages (512KiB). As the number of
>> logical cores increase, this means a much smaller maximum per-cpu
>> trace buffer per cpu; on my dual-socket quad-core nehalem box with
>> hyperthreading (16 logical cpus), that comes to 8 pages per logical
>> cpu.
>>
>> The attached patch addresses this issue by allocating per-cpu buffers
>> separately. This allows larger trace buffers; however, it requires an
>> interface change to xentrace, which is why I'm making a Request For
>> Comments. (I'm not expecting this patch to be included in the 4.0
>> release.)
>>
>> The old interface to get trace buffers was fairly simple: you ask for
>> the info, and it gives you:
>> * the mfn of the first page in the buffer allocation
>> * the total size of the trace buffer
>>
>> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
>> buffers were, and went on to consume records from them.
>>
>> -- Interface --
>>
>> The proposed interface works as follows.
>>
>> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
>> changes to the library). However, this new are is to a trace buffer
>> info area (t_info), allocated once at boot time. The trace buffer
>> info area contains mfns of the per-pcpu buffers.
>> * The t_info struct contains an array of "offset pointers", one per
>> pcpu. These are an offset into the t_info data area of an array of
>> mfns for that pcpu. So logically, the layout looks like this:
>> struct {
>> int16_t tbuf_size; /* Number of pages per cpu */
>> int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
>> uint32_t mfn[NR_CPUS][TBUF_SIZE];
>> };
>>
>> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
>> struct {
>> int16_t tbuf_size; /* Number of pages per cpu */
>> int16_t offset[16]; /* Offset into the t_info area of the array */
>> uint32_t p0_mfn_list[32];
>> uint32_t p1_mfn_list[32];
>> ...
>> uint32_t p15_mfn_list[32];
>> };
>> * So the new way to map trace buffers is as follows:
>> + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map
>> it.
>> + Get the number of cpus
>> + For each cpu:
>> - Calculate the offset into the t_info area thus: unsigned long
>> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
>> - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()
>>
>> In the current implementation, the t_info size is fixed at 2 pages,
>> allowing about 2000 pages total to be mapped. For a 32-way system,
>> this would allow up to 63 pages per cpu (256MiB). Bumping this up to
>> 4 would allow even larger systems if required.
>>
>> The current implementation also allocates each trace buffer
>> contiguously, since that's the easiest way to get contiguous virtual
>> address space. But this interface allows Xen the flexibility, in the
>> future, to allocate buffers in several chunks if necessary, without
>> having to change the interface again.
>>
>> -- Implementation notes --
>>
>> The t_info area is allocated once at boot. Trace buffers are
>> allocated either at boot (if a parameter is passed) or when
>> TBUFOP_set_size is called. Due to the complexity of tracking pages
>> mapped by dom0, unmapping or resizing trace buffers is not supported.
>>
>> I introduced a new per-cpu spinlock guarding trace data and buffers.
>> This allows per-cpu data to be safely accessed and modified without
>> tracing with current tracing events. The per-cpu spinlock is grabbed
>> whenever a trace event is generated; but in the (very very very)
>> common case, the lock should be in the cache already.
>>
>> Feedback welcome.
>>
>> -George
>>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|