|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] [PATCH] turn off writable page tables
fork a quite linear from small number to large number of dirty pages.
Below are the min and max:
1280 pages 128000 pages
wtpt: 813 usec 37552 usec
emulate: 3279 usec 283879 usec
Good, at least that suggests that the code works for the usage it was
intended for.
So, in a -perfect-world- this works great. Problem is most workloads
don't appear to have a vast percentage of entries that need to be
updated. I'll go ahead and expand this test to find out what the
threshold is to break even. I'll also see if we can implement a
batched
call in fork to update the parent -I hope this will show just as good
performance even when most entries need modification and even better
performance over wtpt with a low number of entries modified.
With license to make more invasive changes to core Linux mm it certainly
should be possible to optimize this specific case with a batched update
fairly easily. You could even go further an implement a 'make all PTEs
in pagetable RO' hypercall, possibly including a copy to the child. This
could potentially work better than current 'late pin', at least the
validation would be incremental rather than in one big hit at the end.
Ian
FWIW, I found the threshold for emulate vs wtpt. I ran the fork test
with a set number of pages dirtied such that we had x number of PTEs per
pte_page.
writable-pt
-----------
#pte usec
002 5242
004 5251
006 5373
008 5519
010 5873
emulate
--------
#pte usec
002 4922
004 5265
006 6074
008 6991
010 7806
012 5988
So, the threshold appears to be around 4 PTEs/page. I was a little
shocked at first how low this number is, but considering the near
identical performance with the various workloads, this make sense. All
of the workloads had the vast majority of writable pages flushed with
just 2 PTEs/page changed and a handful with more PTEs/page changed. It
would not surprise me if the overall average was around 4 PTEs/page.
I am having a hard time finding any "enterprise" workloads which have a
lot of PTEs/page right before fork. If anyone can point me to some,
that would be great.
I will look into batching next, but I am curious if simply using a
hypercall in stead of write fault + emulate will make any difference at
all. I'll try that first, then implement the batched update.
Eventually a hypercall which does more would be nice, but I guess we'll
have to convince the Linux maintainers it's a good idea.
-Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|