|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] Poor HVM performance with 8 vcpus
Jeurgen,
I think this problem is a good candidate for xentrace/xenalyze. If
you take a 30-second trace (xentrace -D -e all -T 30
/tmp/[traceid].trace) while the benchmark is at its heaviest, and then
analyze it using xenalyze
(http://xenbits.xensource.com/ext/xenalyze.hg), it should show up
whether the shadow performance is due to brute-force search or
something else.
If you're using 3.3, you'll have to apply the back-patch to xenalyze
to make it work properly.
If you post the summary output (xenalyze -s [traceid].trace >
[traceid].summary), I can help interpret it.
-George
On Wed, Oct 7, 2009 at 10:40 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
> Tim Deegan wrote:
>> At 09:08 +0100 on 07 Oct (1254906487), James Harper wrote:
>>>> At the very least it would be good to have a predictor which figured
>>> out which
>>>> of the several heuristics should actually be used for a given VM. A
>>> simple
>>>> "try whichever one worked last time first" should work fine.
>>>>
>>>> Even smarter would be two just have heuristics for the two general
>>> classes of
>>>> mapping (1:1 and recursive), and have the code automatically figure
>>> out the
>>>> starting virtual address being used for a given guest.
>>>>
>>> Are there any other of these heuristics tucked away in xen? Would there
>>> be any benefit to specifying the OS being virtualised in the config? Eg
>>> "os=windows"?
>>
>> It would be better to allow the specific heuristic to be specified in
>> the Xen interface (e.g. that it's a recursive pagetable at a particular
>> address, or a one-to-one mapping). Which isn't to say the python layer
>> couldn't put some syntactic sugar on it.
>>
>> But the bulk of the win will be had from adding BS2000 to the list of
>> heuristics. There's probably some benefit in making the heuristic list
>> pull-to-front, too.
>>
>> Automatically detecting 1:1 mappings and linear pagetable schemes would
>> be fun and is probably the Right Thing[tm], but making sure it works
>> with all the OSes that currently work (e.g. all HALs of all Windows
>> versions) will be a significant investment in time. :)
>>
>> Also, before getting too stuck into this it'd be worth running once more
>> with performance counters enabled and checking that this is actually
>> your problem! You should see a much higher number for "shadow writeable
>> brute-force" running BS2000 than running Windows.
>
> I still had the numbers for a test with 6 vcpus, which already showed severe
> performance degradation. I edited the numbers a little bit to show only the
> counters for the cpus running BS2000 and no other domain. The test ran for
> 60 seconds.
>
> calls to shadow_alloc 438 427 424 480 436 422
> number of shadow pages in use 2765 2151 2386 2509 4885 1391
> calls to shadow_free 168 132 185 144 181 105
> calls to shadow_fault 65271 69132 60495 53756 73363 52449
> shadow_fault fast path n/p 7347 8081 6713 6134 8521 6112
> shadow_fault fast path error 14 12 15 3 13 11
> shadow_fault really guest fault 24004 25723 22815 19709 27049 19190
> shadow_fault emulates a write 1045 949 1018 995 1015 901
> shadow_fault fast emulate 424 361 449 348 387 314
> shadow_fault fixed fault 32503 34264 29624 26689 36641 26096
> calls to shadow_validate_gl2e 875 748 917 731 795 667
> calls to shadow_validate_gl3e 481 456 443 491 489 446
> calls to shadow_validate_gl4e 104 97 95 112 105 95
> calls to shadow_hash_lookup 2109654 2203254 2228896 2245849 2164727 2309059
> shadow hash hit in bucket head 2012828 2111164 2161113 2177591 2104099 2242458
> shadow hash misses 851 840 841 910 852 838
> calls to get_shadow_status 2110031 2202828 2228769 2246689 2164213 2309241
> calls to shadow_hash_insert 438 436 428 481 437 430
> calls to shadow_hash_delete 168 150 185 154 202 128
> shadow removes write access 335 324 329 385 330 336
> shadow writeable: linux high 130 139 152 155 138 149
> shadow writeable: sl1p 14508 15402 12961 11823 16474 11472
> shadow writeable brute-force 205 185 177 230 192 187
> shadow unshadows for fork/exit 9 12 12 12 18 12
> shadow unshadows a page 10 13 13 13 19 13
> shadow walks guest tables 647527 727336 649397 646601 659655 621289
> shadow checks gwalk 526 544 535 550 614 554
> shadow flush tlb by rem wr perm 235 233 229 268 238 237
> shadow emulates invlpg 14688 15499 14604 12630 16627 11370
> shadow OOS fixup adds 14467 15335 13059 11840 16624 11339
> shadow OOS unsyncs 14467 15335 13058 11840 16624 11339
> shadow OOS evictions 566 449 565 369 589 336
> shadow OOS resyncs 14510 15407 12964 11828 16478 11481
>
> I don't think the "shadow writable brute-force" is the problem.
> get_shadow_status seems to be a more critical candidate.
>
>
> Juergen
>
> --
> Juergen Gross Principal Developer Operating Systems
> TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950
> Fujitsu Technolgy Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx
> Otto-Hahn-Ring 6 Internet: ts.fujitsu.com
> D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|