[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] unnecessary VCPU migration happens again

Petersson, Mats write on 2006年12月7日 18:52:
>> -----Original Message-----
>> From: Emmanuel Ackaouy [mailto:ack@xxxxxxxxxxxxx]
>> Sent: 07 December 2006 10:38
>> To: Xu, Anthony
>> Cc: Petersson, Mats; xen-devel@xxxxxxxxxxxxxxxxxxx; xen-ia64-devel
>> Subject: Re: [Xen-devel] unnecessary VCPU migration happens again
>> Argueably, if 2 unrelated VCPUs are runnable on a dual socket
>> host, it is useful to spread them across both sockets. This
>> will give each VCPU more achievable bandwidth to memory.
>> What I think you may be argueing here is that the scheduler
>> is too aggressive in this action because the VCPU that blocked
>> on socket 2 will wake up very shortly, negating the host-wide
>> benefits of the migration when it does while still maintaining the
>> costs. 

Yes, you are right, the VCPU that blocked will wake up very shortly,

as Mats mentioned , the migration is expensive in IPF flatform,

1. TLB penalty,
    assume a VCPU is migrated from CPU0 to CPU1,
      (1) TLB purge penalty
      HV must purge all  CPU0 TLB, in case this VCPU is migrated back, and
    the CPU0 may contain stale TLB entries.

    IA32 doesn't have such penalty, because every time VCPU switch happens, it 
will purge all TLB.
    TLB purge is not caused by this migration.
    (2) TLB warm up penalty
    When VCPU is migrated to CPU1, it will warm up TLB in CPU1,
     Both IPF and IA32 have this penalty.

2. cache penatly,
    When VCPU is migrated to CPU1, it will warm up cache in CPU1
    Both IPF and IA32 have this penalty. 

>> There is a tradeoff here. We could try being less aggressive
>> in spreading stuff over idle sockets. It would be nice to do
>> this with a greater understanding of the tradeoff though. Can
>> you share more information, such as benchmark perf results,
>> migration statistics, or scheduler traces?

I got following basical data on LTP benchmark.

IPF platform
two sockets, two core per socket, two thread per core.
There are 8 logical CPU,

Dom0 is UP
VTIdomaim is 4VCPU,

It takes 66 minutes to run LTP .

Then I comments following code, there is no unnecessary migration.

It takes 48 minites to run LTP,

The degradation is,

(66-48)/66 = 27%

That's a "big" degradation!

        while ( !cpus_empty(cpus) )
            nxt = first_cpu(cpus);

            if ( csched_idler_compare(cpu, nxt) < 0 )
                cpu = nxt;
                cpu_clear(nxt, cpus);
            else if ( cpu_isset(cpu, cpu_core_map[nxt]) )
                cpus_andnot(cpus, cpus, cpu_sibling_map[nxt]);
                cpus_andnot(cpus, cpus, cpu_core_map[nxt]);

            ASSERT( !cpu_isset(nxt, cpus) );


> I don't know if I've understood this right or not, but I believe the
> penalty for switching from one core (or socket) to another is higher
> on IA64 than on x86. I'm not an expert on IA64, but I remember
> someone at the Xen Summit saying something to that effect - I think
> it was something like executing a bunch of code to flush the TLB's or
> some such...
>> Emmanuel.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.