RE: [Xen-devel] fair scheduling

 

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Harry Smith
> Sent: 10 May 2007 12:31
> To: Petersson, Mats
> Cc: Atsushi SAKAI; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] fair scheduling
> 
> 
> 
> On 5/10/07, Petersson, Mats <Mats.Petersson@xxxxxxx> wrote:
> 
>       > -----Original Message-----
>       > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>       > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx ] On Behalf Of
>       > Harry Smith
>       > Sent: 10 May 2007 09:24
>       > To: Atsushi SAKAI
>       > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>       > Subject: Re: [Xen-devel] fair scheduling 
>       >
>       > Hi,
>       >
>       > that's true. But dom0 has 4 VCPUs mapped on 2 physical CPU.
>       > CPU usage -->
>       > case1)   when vm2 doesn't have any load
>       >    dom0 20-25%    vm1  100%       vm2  0% 
>       >
>       > case2)  when vm2 has a compute-intense load
>       >    dom0  20-25%   vm1  100%        vm2  100%
>       >
>       > So my question is that in this case there is 200% of CPU
>       > available to dom0, still it is using only 20-25%,  but in 
>       > case2  webserver throughput in vm1 goes down  by 15-20%. Why
>       > this is so?
>       
>       Here are some architectural features that may affect things:
>       
>       1. Are PCPU's "real" CPU cores or HyperThreaded 
> "virtual" cores?  If the 
>       latter, the answer to why performance goes down is 
> obvious... [Sorry if
>       you covered this in an earlier mail]
> 
> 
> thanks Mats, 
> yeah, I have mentioned that, but I am quite curious about 
> knowing how this affects the performance.  
> So bigger issue comes in my mind is that if I want to manage 
> some dynamic workload with such hyperthreaded dual core CPUs, 
> then how should I estimate resource usage?  Is there any tool 
> which can take my high-level QoS requirements, process them 
> and come up with some scheduling/ workload management scheme 
> such that resource utilization is maximized.  Not only 
> resource utilization but throughput should be maximized 
> because in this case utilization was same in both cases, but 
> throughput affected largely. 

Ok, let's start with one important factor here: CPU-load and actual
performance are two different things. CPU-load isn't easy to measure (on
any reasonably modern processor at least), because the processor,
applicaton and OS interactions are very complex. The only thing we can
reasonably accurately measure is CPU idle-time. So when you see 100%
CPU-load, it actually means that it's spending 0% of the time in idle.
How it uses that time "not in idle" is not necessarily the same when you
put the processor in different circumstances, e.g. the second core on
the same socket being more or less used, memory or I/O operations being
more or less congested, etc, etc.

One of the real problems here is cache-utilization (and collisions in
the cache). TLB collisions/pressure can also cause similar increases in
latency for individual instructions - so when an instruction when
there's nothing else to run takes (on average) 4 clock-cycles, when
there's a competing process[1] running on the second core, the
instruction takes 7 cycles, because the processor had to reload the
cache and/or TLB every. 

Another factor is simply that the increased load on the memory bus,
where the first and second processor(core) competes for a shared
resource. If there is only one process running, then there is less
congestion on the "road" to the processor, whilst if there is two
processes both using the memory bus intensely, there will be heavy
congestion (not helped much by memory devices being more likely to need
full addressing, rather than "within page"[2] addressing, which
increases the memory access time noticably for each of those accesses.

I'm not aware of a specific tool that does load balancing and such
automagically. I don't really know how to solve the problem (other than
the obvious suggestion of "underestimate what the processor can do at
full load"), I'm only trying to explain that it's not just a case of
adjusting the scheduler. The scheduler (whether we talk of the current
or some new fancy one) will not be able to determine how well or bad a
particular domain will put pressure on the memory bus for example. 

And on the account of your specific processor model: you have a
dual-core, dual hyperthread processor. This essentially means that most
of the CPU resources when running two threads on the the same physical
core are shared. This in turn means that Core0, HT(0,1) will be affected
by each other (i.e. increase in load on HT0 will affect HT1 and vice
versa. Core1, HT(0,1) will affect each other. 

All four hyperthreads share the same IO and memory bus, so if you have
some load on the memory bus from the first core, the second core will be
affected by this. 

> 
> With every increase in number of virtual machine, performance 
> of existing VMs will degrade. Then question is how does 
> Virtualization helps in maximizing resource utilization & 
> giving predictable throughput to applications running inside VMs? 

It helps in cases where you have, say, three different servers, that
each have a longish peak CPU-load of 15%, and you combine all three into
one physical machine. But you will use more than 3 x 15% CPU-load to
achieve the same functionality, because the management of the virtual
machines will take some effort from the CPU. 

If you try to squeeze three 100% loads from single CPU servers into a
three virtual machines on a 4-CPU server, then you may not get the same
level of performance - but it does depend A LOT on what the exactly what
the two guests are doing. 

[1] I use the word process here quite loosely. It could be another
thread of the same application just as much as another virtual machine. 

[2] Memories have a few small buffer, called "pages", which contain a
(copy of) small amount of memory content. Each time a memory location
not covered by a "currently open" page, the memory controller must issue
a "full address", which means two sets of address cycles to the memory
chips, rather than a single "address within page".

--
Mats  
> 
> 
> 
> 
> 
>       2. Is it possible that cache-contention is affecting 
> the performance?
>       
>       3. Is it possible that memory bandwidth is affecting 
> the performance?
>       
>       4. Is it possible that Page-table/TLB contention is affecting
>       performance?
>       
>       
>       > Why dom0 can't use more CPU to process vm1 & vm2 requests 
>       > separately ?  As we are trying to show that vm1, vm2 are two
>       > OS running independetly, why they affect each other's 
> performance ?
>       
>       Because there are still running on shared hardware, so one OS's 
>       behaviour will affect the overall hardware load, 
> perhaps? I'm not saying
>       it is so, but I suspect it's at least part of the answer.
>       
>       --
>       Mats
>       >
>       > thanks,
>       > Harry
>       >
>       >
>       >
>       >
>       > On 5/10/07, Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx> wrote:
>       >
>       >       Hi,
>       >
>       >       You should check I/O behavior.
>       >
>       >       If I/O occured, 
>       >       other domain(vm1, vm2) data is handled by dom0 
> as driver domain.
>       >
>       >       Thanks
>       >       Atsushi SAKAI
>       >
>       >
>       >       "Harry Smith" < harry.smith272@xxxxxxxxx 
> <mailto:harry.smith272@xxxxxxxxx> 
>       > <mailto:harry.smith272@xxxxxxxxx> > wrote:
>       >
>       >       > Hi Atsushi & Pradeep,
>       >       >
>       >       > thanks for replying back. 
>       >       > I have 4 VCPUs for each of VM.  But the point I
>       > wanted to stress upon is -
>       >       > "This happened even in the case where CPU usage by
>       > both of vm1,vm2 is
>       >       > restricted to 100% each. " 
>       >       > I had pinned all 4 VCPUs of each VM to a single phys.
>       > CPU. & I have 4 phys.
>       >       > CPUs
>       >       > means my vm1 was using cpu1, vm2 using cpu2 &
>       > domain-0 using cpu0,cpu3 
>       >       >
>       >       > Problem is when there is no load on vm2, webserver
>       > performance of vm1 is
>       >       > better.  But when vm2 has some compute-intense load
>       > then vm1 webserver
>       >       > performance goes down.
>       >       > Please note that CPU consumption of vm1 shown by
>       > xentop in both cases is
>       >       > 100%,  still webserver performance goes down 
> by around 15-20%.
>       >       > Even after trying to isolate two VMs, existence of
>       > load on one VM is
>       >       > affecting other.
>       >       >
>       >       > so is it expected behavior ?
>       >       >
>       >       > thanks, 
>       >       > Harry
>       >       >
>       >       >
>       >       >
>       >       > On 5/10/07, pradeep singh rautela 
> <rautelap@xxxxxxxxx> wrote:
>       >       > > 
>       >       > >
>       >       > >
>       >       > > On 5/10/07, Atsushi SAKAI < 
> sakaia@xxxxxxxxxxxxxx> wrote:
>       >       > > >
>       >       > > > One vcpu can use one pcpu at one time. 
>       >       > > > It means 100% is maxium for one vcpu domain.
>       >       > > > If you want to use cpu resources, you should set
>       > more vcpu.
>       >       > >
>       >       > >
>       >       > > Ok, this explains a lot of things.
>       >       > > As i understand this , more VCPUs means more
>       > freedom to hypervisor to
>       >       > > migrate them among physical CPUs, depending on the 
>       > free PCPUs available.
>       >       > >
>       >       > > In general
>       >       > >
>       >       > >                 domU1
>       >       > >                /      |       \
>       >       > >         vcpu1 vcpu2 vcpu3 
>       >       > >
>       >       > > pcpu1 pcpu2 pcpu3 pcpu4 pcpu5 pcpu6
>       >       > >
>       >       > > I mean ,domU1 can run on any vcpu , right? now
>       > vcpu1, vcpu2, vcpu3 share a
>       >       > > one to many reationship between pcpus[1....6]. That 
>       > is a vcpu can run on any
>       >       > > of the pcus available to the Xen hypervisor(unless
>       > i explicitly pin it to ).
>       >       > >
>       >       > >
>       >       > > Is my naive understanding of what you 
> explained is correct? 
>       >       > >
>       >       > > Thank you
>       >       > > ~psr
>       >       > >
>       >       > > > Thanks
>       >       > > > Atsushi SAKAI
>       >       > > >
>       >       > > >
>       >       > > > "pradeep singh rautela" 
> <rautelap@xxxxxxxxx> wrote:
>       >       > > >
>       >       > > > > Hi Atsushi, 
>       >       > > > >
>       >       > > > > On 5/10/07, Atsushi SAKAI <
>       > sakaia@xxxxxxxxxxxxxx> wrote:
>       >       > > > > > 
>       >       > > > > >
>       >       > > > > > You should show detail configuration.
>       >       > > > > > Your information is too short.
>       >       > > > > > 
>       >       > > > > > Anyway I guess each domain has one vcpu.
>       >       > > > > > If so, this is normal behavior.
>       >       > > > > > Because one vcpu cannot allocate two or more 
>       > pcpu at once.
>       >       > > > >
>       >       > > > >
>       >       > > > > Right, but shouldn't Xen hypervisor be capable
>       > of migrating the VCPU
>       >       > > > among 
>       >       > > > > the available PCPUs on a multiprocessor system,
>       > like in this case? And
>       >       > > > > criteria should be the load on the PCPU or the
>       > idle PCPUs.
>       >       > > > > yes/no? 
>       >       > > > >
>       >       > > > > Am i missing something here?
>       >       > > > >
>       >       > > > > Thanks
>       >       > > > > ~psr
>       >       > > > > 
>       >       > > > > Thanks
>       >       > > > > > Atsushi SAKAI
>       >       > > > > >
>       >       > > > > > "Harry Smith" < 
> harry.smith272@xxxxxxxxx <mailto:harry.smith272@xxxxxxxxx> 
>       > <mailto:harry.smith272@xxxxxxxxx> > wrote:
>       >       > > > > >
>       >       > > > > > > hi all, 
>       >       > > > > > >
>       >       > > > > > > I am using xen3.0.3 on dual core
>       > hyperthreaded processor (in all 4
>       >       > > > > > cores).
>       >       > > > > > > There are 2 VMs vm1,vm2 among which vm1 has 
>       > a webserver running on
>       >       > > > it.
>       >       > > > > > >
>       >       > > > > > > While testing the performance of webserver,
>       > when I introduce some 
>       >       > > > load
>       >       > > > > > on
>       >       > > > > > > vm2 which involves some computations the
>       > webserver performance
>       >       > > > goes 
>       >       > > > > > down.
>       >       > > > > > > This happened even in the case where CPU
>       > usage by both of vm1,vm2
>       >       > > > is
>       >       > > > > > > restricted to 100% each. 
>       >       > > > > > >
>       >       > > > > > > Is it expected behavior ?  if yes then how
>       > does one can control
>       >       > > > addition
>       >       > > > > > of 
>       >       > > > > > > new virtual machines as adding every new VM
>       > will result in
>       >       > > > lowering
>       >       > > > > > > performance of other VMs.  Through 
>       > scheduling parameters we can
>       >       > > > just
>       >       > > > > > specify
>       >       > > > > > > amount of CPU to be used in relative sense
>       > (weight) & upper limit 
>       >       > > > (cap).
>       >       > > > > > But
>       >       > > > > > > how to tackle this point.
>       >       > > > > > >
>       >       > > > > > > I am new in this area & wanna set up a lab 
>       > using virtualization,
>       >       > > > so want
>       >       > > > > > to
>       >       > > > > > > find solution for this.
>       >       > > > > > > 
>       >       > > > > > > thanks,
>       >       > > > > > > Harry
>       >       > > > > > >
>       >       > > > > > > we always have a choice...
>       >       > > > > >
>       >       > > > > >
>       >       > > > > >
>       >       > > > > > 
> _______________________________________________
>       >       > > > > > Xen-devel mailing list 
>       >       > > > > > Xen-devel@xxxxxxxxxxxxxxxxxxx
>       >       > > > > > http://lists.xensource.com/xen-devel 
>       >       > > > > >
>       >       > > > >
>       >       > > > >
>       >       > > > >
>       >       > > > > --
>       >       > > > > ---
>       >       > > > > pradeep singh rautela
>       >       > > > >
>       >       > > > > "Genius is 1% inspiration, and 99%
>       > perspiration" - not me :)
>       >       > > > 
>       >       > > >
>       >       > > >
>       >       > >
>       >       > >
>       >       > > --
>       >       > > ---
>       >       > > pradeep singh rautela
>       >       > > 
>       >       > > "Genius is 1% inspiration, and 99% perspiration" -
>       > not me :)
>       >       > >
>       >
>       >
>       >
>       >
>       >
>       >
>       
>       
>       
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] fair scheduling