This is a topic which has been brought up before, but is still something
that plagues certain machines. Let me describe the test scenario, the problem
as I see it, and some of the data that I've collected.
The test is on a large AMD NUMA machine with 128GB of memory and 32 cpus (8 x
quad-core), memory interleaved, running RHEL-5.4 Xen (although I believe the
issue probably affects upstream Xen as well). I install 2 RHEL-5.3 guests, one
with 32GB of memory, and one with 64GB of memory. On the first guest, I run a
continuous ping (just out to the default gateway). While that ping test is
running, on the dom0 I do "xm destroy <64GB_guest>". This takes a while to
complete (as expected), but what is not expected is some huge jumps in the ping
responses on the 32GB domains. For instance, in the test I'm currently running,
normal ping response time is ~0.5ms, but during the xm destroy of the other
domain the ping response can jump up all the way to 3000 (or more) ms. Once the
big domain destroy is finished, everything returns to normal.
>From what I can tell, the problem lies in page_scrub_softirq(). As a first
test, I disabled page-scrubbing completely (obviously insecure, but just a
test). With no page-scrubbing at all, and direct memory freeing in
free_domheap_pages(), no delays of the kind experienced in the original test
were seen. As a second test, I implemented the page scrubbing inside
free_domheap_pages(), and again, no spikes at all were seen.
I then put things back like they were, and instrumented page_scrub_softirq().
Now, the serialize_lock at the top of the function makes sure only one CPU at a
time comes in here. However, when I instrumented the rest of the function, I
found that when a CPU was in here doing work, it was spending 80-95% of it's
time waiting to get the page_scrub_lock (I have raw numbers, if you want to see
At first I would think this was purely contention with the other page_scrub_lock
user in free_domheap_pages(). However, after changing the
spin_lock(&page_scrub_lock) into a spin_trylock() inside page_scrub_softirq(), I
still saw the spikes in the ping test, even though my instrumentation showed I
was only waiting like 20 - 30% of the time on the spinlock. So I can't fully
explain the rest of the spike. Any ideas? Other things I should probe?
There are a couple of solutions that I can think of:
1) Just clear the pages inside free_domheap_pages(). I tried this with a 64GB
guest as mentioned above, and I didn't see any ill effects from doing so. It
seems like this might actually be a valid way to go, although then a single CPU
is doing all of the work of freeing the pages (might be a problem on UP
2) Clear the pages inside free_domheap_pages(), but do some kind of yield every
once in a while. I don't know how feasible this would be.
3) Do a lockless FIFO between free_domheap_pages() and page_scrub_softirq()
(since that is all it really is). While this would certainly work, it seems
like a bit of overengineering for this problem.
Other ideas? I'm happy to try to implement these, I'm just not sure what we
would prefer to do.
Xen-devel mailing list