RE: [Xen-devel] Re: [ANNOUNCE] virtbench now has xen support

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Jan Michael
> Sent: 24 May 2007 16:37
> To: Jeremy Fitzhardinge; Anthony Liguori; Rusty Russell; Xen 
> Mailing List
> Cc: Alex Iribarren
> Subject: Re: [Xen-devel] Re: [ANNOUNCE] virtbench now has xen support
> 
> Hi Everybody,
> 
> On 23.05.2007, at 20:05, Jan Michael wrote:
> > The benchmark passed with the following outcome:
> >
> > Time for one context switch via pipe: 8734 (8640 - 9575)
> > Time for one Copy-on-Write fault: 5898 (5814 - 8963)
> > Time to exec client once: 573046 (565921 - 615390)
> > Time for one fork/exit/wait: 347687 (345750 - 362250)
> > Time to send 4 MB from host: 55785000 (27069625 - 315191500)
> > Time for one int-0x80 syscall: 370 (370 - 403)
> > Time for one syscall via libc: 376 (376 - 377)
> > Time to walk linear 64 MB: 1790875 (1711750 - 3332875)
> > Time to walk random 64 MB: 2254500 (2246000 - 2266250)
> > Time for one outb PIO operation: 721 (717 - 733)
> > DISABLED pte-update: glibc version is too old
> > Time to read from disk (256 kB): 18810406 (14266718 - 24088906)
> > Time for one disk read: 56343 (38593 - 201718)
> > DISABLED vmcall: not a VT guest
> > DISABLED vmmcall: not an SVM guest
> > Time to send 4 MB between guests: 94326750 (79872250 - 729306500)
> > Time for inter-guest pingpong: 130316 (119722 - 186511)
> > Time to sendfile 4 MB between guests: 134768000 (86528000 - 
> 417646000)
> > Time to receive 1000 1k UDPs between guests: 26010000 (23384000 -  
> > 66784000)
> 
> I didn't had anything to do with benchmarking in the past, and  
> especially not with virtualization benchmarks, so there are again  
> some questions related to the results of the benchmarking test:
> 
>       1. What can I read out of every single value which is 
> listed above?  

The time it takes to perform the particular microbenchmark. 

> Can you please give a short explenation?
>       2. What are the unit(s) of the measured values?

Good question, and I don't know the actual answer. I suspect they are
clock-cycles or perhaps nanoseconds. It's clearly not milliseconds or
microseconds, so it's a "very short time-unit". 

>       3. What is a good value and what is a bad value? On 
> what does these  
> measures depend on - hardware or software or both?

They aren't good or bad values as such - they are comparative numbers.
There are no "absolute" good or bad values. If I say "ten seconds", that
may be a good value if you're running 100m. But it's certainly a bad
value for a computer running 10000 instructions, for example. 

Using these values, one could either compare one implementation of Xen
with another, or compare two machines with different specs (e.g.
different processors, different memory types, different network cards or
disks, or whatever). The on with the higher numbers is the slower one. 

>       4. If I get a certain value like this one: Time for one 
> context  
> switch via pipe: 8734 (8640 - 9575). What can I do to improve/tune  
> the performance or the values?

Like any other performance improvement, you'd have to figure out where
the majority of time[1] is spent for this microbenchmark, and then try
to improve that somehow. Repeatedly (unless it runs for a long time in
itself) running this particular benchmark and running "oprofile" on the
machine would be able to give a fair idea of where in the system the
time is spent. 

The numbers in the bracket is the upper/lower numbers, the first number
being the average of several runs. 

>       5. I googled through the web to find any results to 
> compare with  
> mine, but I couldn't find anything. Do you have some?

I don't.
>       6. In the README file is said that virtbench contains 
> "low level"  
> benchmarks. What do you consider as a "high level" benchmark?

Low-level benchmark is similar to "microbenchmark". It tests ONE
particular feature of the system in isolation. For example, the "context
switch via pipe" is sending a message via a pipe from one process to
another process, and measuring the time it takes from sending the
message until it's been received at the other end. This is a good way to
measure very precise parts of a system, but improving this by 10%, 20%
or 50% may serve no purpose if it isn't a large portion of a higher
level functionality. E.g. if you run the Blurg[2] web-server, it may not
use pipes at all, so the performance of Blurg is completely unrelated of
the performance of this particular benchmark. Some other functions in
the microbenchmark are likely to have some effect on Blurg, but it may
also be that a major portion of Blurg's execution time isn't in the
OS/Hypervisor at all, so it doesn't really make much difference at all. 

To give another type of example: 

We can measure the horse-power of a car-enging. There are several ways
to do this. The most realistic is one that actually uses the car itself
(such as a rolling-road), but we can also dismount the engine from the
car and measure it without the gearbox, cooling fans, water pumps, and
whatever else that can be "removed". This method will of course generate
(somewhat) more power, but also less useful numbers. On the other hand,
all of this is pointless if you can't actually USE the power (e.g. the
suspension isn't good enough to go round corners, the brakes don't work
well, so if you don't have half a kilometer to stop, you can't use the
maximum speed of the car, etc, etc) on the road/racetrack, right? So the
BEST way to compare two cars would be to use the same (skilled) driver
around a track or a road, to see which performs best. 

Microbenchmarks measure the engine-power, braking power, suspension
springs, etc, etc. High level/application benchmarks measures the
systems ability to perform a higher level task, such as web-serving,
file-serving, complex calculation tasks, or some such. 

[1] Figuring out where the majority of time is spent is USUALLY the best
place to start. However, there are cases where small distributed bits of
code are the major part. In the past, I've seen cases where a function
called many times got inlined, and eaach individual "call" of the
function didn't amount to much, but since it was called many times
during the overall benchmark, it amounted to a noticable overhead. In
another case, there was a "trace-function" that got called thousands of
times a second, but since tracing was turned off, it didn't actually do
anything but return. This "nothing but return" was about 2% of the
overall time of the "benchmark". However, the effect of actually CALLING
the function (passing a bunch of parameters and often extracting those
parameters from pointers/data structures) was taking about 15% of the
overall time. Moving the check to see if there was any output to be done
to outside the function call improved the overall performance by about
16%. Worth having, eh?

[2] Blurg is a fictional web-server, not a product in real life, but for
this example, it doesn't really matter. 

--
Mats
> 
> Ok. Enough of my questions so far. If you answere these ones 
> I'll may  
> be have more afterwards.
> Thanks for your help,
> 
>       Jan 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Re: [ANNOUNCE] virtbench now has xen support