[Xen-devel] RE: A credit scheduler issue

To:	"Emmanuel Ackaouy" <ack@xxxxxxxxxxxxx>
Subject:	[Xen-devel] RE: A credit scheduler issue
From:	"Kamble, Nitin A" <nitin.a.kamble@xxxxxxxxx>
Date:	Fri, 30 Jun 2006 12:23:47 -0700
Cc:	Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Fri, 30 Jun 2006 12:24:25 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcacMmgV9rQt4VP1SSe9GkWuymCHnwARHyEg
Thread-topic:	A credit scheduler issue

Keir, Emmanuel,
  Thanks for the detailed answers, and your views. I agree that my small
change should not affect correctness. I didn't see the migration often,
I saw the dom0 vcpus migrations happening 4, 5 times from boot to start
of xend. I think we should avoid these migrations; why waste the cache
hotness?
    How solid is the credit scheduler now for DomUs on a SMP box? On
32bit, PAE & 64bit? It would be a useful data point for me to debug the
HVM guest issues with the credit scheduler.

Thanks & Regards,
Nitin
------------------------------------------------------------------------
-----------
Open Source Technology Center, Intel Corp

>-----Original Message-----
>From: Emmanuel Ackaouy [mailto:ack@xxxxxxxxxxxxx]
>Sent: Friday, June 30, 2006 3:46 AM
>To: Kamble, Nitin A
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ian Pratt
>Subject: Re: A credit scheduler issue
>
>Hi Nitin,
>
>On Thu, Jun 29, 2006 at 06:13:51PM -0700, Kamble, Nitin A wrote:
>>        I am trying to debug the credit scheduler to solve the many
HVM
>domain
>>    instability issues we have found with the credit scheduler.
>
>Great. As Keir pointed out though the problems you are seeing
>may not actually be in the credit scheduler itself.
>
>>        While debugging I notice an odd behavior; When running on a 2
CPU
>>    system, dom0 gets 2 vcpus by default. And even if there are no
other
>>    domains running in the system,  the dom0 vcpus are getting
migrated to
>>    different pcpus in the load balance. I think it is due to the
>preemption
>>    happening in the credit scheduler; and it is not necessary and is
>actually
>>    wasteful to move vcpus when no of vcpus in the system are equal to
no
>of
>>    pcpus.
>>
>>        I would like to know your thinking about this behavior. Is it
an
>>    intended in the design?
>
>This should be very rare. If a VCPU were woken up and put on
>the runq of an idle CPU, a peer physical CPU that is in the
>scheduler code at that exact time could potentially pick up
>the just woken up VCPU.
>
>We can do things to shorten this window, like not pick up a
>VCPU from a remote CPU that is currently idle and therefore
>probably racing with us to run said newly woken up VCPU on
>its runq. But I'm not sure this happens frequently enough to
>warrant the added complexity. On top of that, it seems to
>me this is more likely to happen to VCPUs that aren't doing
>very much work and therefore would not suffer a performance
>loss from migrating physical CPU on occasion.
>
>Are you seeing a lot of these migrations?
>
>>    I added this small fix to the scheduler to fix this behavior. And
with
>it
>>    I see the stability of Xen improved. Win2003 boot was crashing
with
>>    unhandled MMIO error on xen64 earlier with credit scheduler. I am
not
>>    seeing that crash with this small fix anymore. It is quiet
possible
>that
>>    there are more bugs I need to catch for HVM domains in the credit
>>    scheduler. And I would like to know your thoughts for this change.
>
>I don't agree with this change.
>
>When a VCPU is the only member of a CPU's runq, it's still
>waiting for a _running_ VCPU to yield or block. We should
>absolutely be picking up such a VCPU to run elsewhere on
>an idle CPU. Else, you'd end up with two VCPUs time-slicing
>on a processor while other processors in the system are idle.
>
>Your change effectively turns off migration on systems where
>the number of active VCPUs is less than 2 multiplied by the
>number of physical CPUs. I can see why that would hide any
>bugs in the context migrating paths, but that doesn't make
>it right. :-)
>
>>
>>    csched_runq_steal(struct csched_pcpu *spc, int cpu, int pri)
>>
>>    {
>>
>>        struct list_head *iter;
>>
>>        struct csched_vcpu *speer;
>>
>>        struct vcpu *vc;
>>
>>
>>
>>        /* If there are only 1 vcpu in the queue then stealing it from
the
>>    queue
>>
>>         * is not going not help in load balancing.
>>
>>         */
>>
>>        if (spc->runq.next->next == &spc->runq)
>>
>>                return NULL;
>>
>>
>>
>>    Thanks & Regards,
>>
>>    Nitin
>>
>>
----------------------------------------------------------------------
>-------------
>>
>>    Open Source Technology Center, Intel Corp
>>
>>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] RE: A credit scheduler issue