[Xen-devel] [RFC][PATCH] Per-cpu xentrace buffers

To:	xen-devel@xxxxxxxxxxxxxxxxxxx
Subject:	[Xen-devel] [RFC][PATCH] Per-cpu xentrace buffers
From:	George Dunlap <dunlapg@xxxxxxxxx>
Date:	Thu, 7 Jan 2010 15:13:48 +0000
Delivery-date:	Thu, 07 Jan 2010 07:14:17 -0800
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=gKjf4CfERMLnVVePxeSdwI9LNjmdEYOPByCL9WKPUy4=; b=C0z9jN8C3HPj6wYsAvt8Ot6CfDOyL77YO7C6JMhvbTORuJh1WDGHNE76Vq5g/r1/EP vvN91JyXxkKCG6l2pMIIzIUB0AyOOOnOlPUrYpWXq0j+2PjpzxcyKyV85nsehSdvDiw/ 0rBJ29PmRuymLBXBxRJsa3SqPeZUyjPAzcQi4=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=i6WdRoJw0+V3AF3o8OQuWa4bFuvN1EwtesW7sRoJogPmTPOH3zT6SkXch6NRoPfzde 2oMgEHpQb0WDFpHZpmQOyimvVDhzocybHFI4qlUfU5ZugVQBI7qFM0JLj9JkqXVwA07G NSdZFN6a6swuO7Z56CxJG4csa0ZVMdbIRiuZ8=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

In the current xentrace configuration, xentrace buffers are all
allocated in a single contiguous chunk, and then divided among logical
cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
limited, in my experience about 128 pages (512KiB).  As the number of
logical cores increase, this means a much smaller maximum per-cpu
trace buffer per cpu; on my dual-socket quad-core nehalem box with
hyperthreading (16 logical cpus), that comes to 8 pages per logical
cpu.

The attached patch addresses this issue by allocating per-cpu buffers
separately.  This allows larger trace buffers; however, it requires an
interface change to xentrace, which is why I'm making a Request For
Comments.  (I'm not expecting this patch to be included in the 4.0
release.)

The old interface to get trace buffers was fairly simple: you ask for
the info, and it gives you:
* the mfn of the first page in the buffer allocation
* the total size of the trace buffer

The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
buffers were, and went on to consume records from them.

-- Interface --

The proposed interface works as follows.

* XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
changes to the library).  However, this new are is to a trace buffer
info area  (t_info), allocated once at boot time.  The trace buffer
info area contains mfns of the per-pcpu buffers.
* The t_info struct contains an array of "offset pointers", one per
pcpu.  These are an offset into the t_info data area of an array of
mfns for that pcpu.  So logically, the layout looks like this:
struct {
 int16_t tbuf_size; /* Number of pages per cpu */
 int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
 uint32_t mfn[NR_CPUS][TBUF_SIZE];
};

So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
struct {
 int16_t tbuf_size; /* Number of pages per cpu */
 int16_t offset[16]; /* Offset into the t_info area of the array */
 uint32_t p0_mfn_list[32];
 uint32_t p1_mfn_list[32];
  ...
 uint32_t p15_mfn_list[32];
};
* So the new way to map trace buffers is as follows:
 + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map it.
 + Get the number of cpus
 + For each cpu:
  - Calculate the offset into the t_info area thus: unsigned long
*mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()

In the current implementation, the t_info size is fixed at 2 pages,
allowing about 2000 pages total to be mapped.  For a 32-way system,
this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
4 would allow even larger systems if required.

The current implementation also allocates each trace buffer
contiguously, since that's the easiest way to get contiguous virtual
address space.  But this interface allows Xen the flexibility, in the
future, to allocate buffers in several chunks if necessary, without
having to change the interface again.

-- Implementation notes --

The t_info area is allocated once at boot.  Trace buffers are
allocated either at boot (if a parameter is passed) or when
TBUFOP_set_size is called.  Due to the complexity of tracking pages
mapped by dom0, unmapping or resizing trace buffers is not supported.

I introduced a new per-cpu spinlock guarding trace data and buffers.
This allows per-cpu data to be safely accessed and modified without
tracing with current tracing events.  The per-cpu spinlock is grabbed
whenever a trace event is generated; but in the (very very very)
common case, the lock should be in the cache already.

Feedback welcome.

 -George

20100106-unstable-xentrace-interface.diff
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] [RFC][PATCH] Per-cpu xentrace buffers