Request for opinion #2:
In order to remove a (last?) concurrency bottleneck in tmem,
I have to replicate a pair of fairly large buffers, one is
two pages and the other is 8 pages. (Note that if tmem
ever works on ia64, pagesize is larger.) Since the buffers
are too large for the stack, they are declared as globals
and protected by a single lock. But the buffers are used
for compression, which can take quite a bit of time (up
to tens of thousands of cycles and probably >80% of the
total time spent in tmem), and so are magnets for any spinlock.
I see two solutions: cascading or per-cpu.
In per-cpu, I would allocate at system initialization one
pair of buffers for each cpu (question: num_present_cpus,
num_online_cpus, or num_possible_cpus?). Then no lock
In cascading, I would allocate a small number of pairs
of buffers, perhaps only two or three, and "trylock"
each, falling back to trylock the second if locked,
then the third and so on, then spinlock if all are in
use. Statistically this is probably good enough, unless
I choose a small number, and Xen is running on a huge box.
I suppose a combination of the two would be to cascade,
but dynamically choose and allocate the quantity of
buffers based on (maybe log+1 of?) the number of cpus
(again, present, online, or possible?). But this is
probably going overboard.
Opinions? And if per-cpu, is the current Xen infrastructure
sufficiently robust to handle hot-plug CPUs and I should too?
Xen-devel mailing list