xen-devel
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
To: |
Stephan Diestelhorst <stephan.diestelhorst@xxxxxxx> |
Subject: |
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split |
From: |
Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> |
Date: |
Wed, 02 Feb 2011 16:14:25 +0100 |
Cc: |
George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "Przywara, Andre" <Andre.Przywara@xxxxxxx>, Keir Fraser <keir@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> |
Delivery-date: |
Wed, 02 Feb 2011 07:16:37 -0800 |
Dkim-signature: |
v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=juergen.gross@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1296659668; x=1328195668; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; z=Message-ID:=20<4D4974D1.1080503@xxxxxxxxxxxxxx>|Date:=20 Wed,=2002=20Feb=202011=2016:14:25=20+0100|From:=20Juergen =20Gross=20<juergen.gross@xxxxxxxxxxxxxx>|MIME-Version: =201.0|To:=20Stephan=20Diestelhorst=20<stephan.diestelhor st@xxxxxxx>|CC:=20"Przywara,=20Andre"=20<Andre.Przywara@a md.com>,=20=0D=0A=20George=20Dunlap=20<George.Dunlap@xxxx itrix.com>,=0D=0A=20Ian=20Jackson=20<Ian.Jackson@xxxxxxxx x.com>,=20=0D=0A=20"xen-devel@xxxxxxxxxxxxxxxxxxx"=20<xen -devel@xxxxxxxxxxxxxxxxxxx>,=0D=0A=20Keir=20Fraser=20<kei r@xxxxxxx>|Subject:=20Re:=20[Xen-devel]=20Hypervisor=20cr ash(!)=20on=20xl=20cpupool-numa-split|References:=20<4D41 FD3A.5090506@xxxxxxx>=20<AANLkTi=3DppBtb1nhdfbhGZa0Rt6kVy opdS3iJPr5fVA1x@xxxxxxxxxxxxxx>=20<4D483599.1060807@xxxxx om>=20<201102021539.06664.stephan.diestelhorst@xxxxxxx> |In-Reply-To:=20<201102021539.06664.stephan.diestelhorst@ amd.com>|Content-Transfer-Encoding:=207bit; bh=kMr7M/R4lyxTEocR1EsqDaMQrtz9DDHhGVxZbVPmXjI=; b=C+FI+wYMiyhzMzQSy6Gn7AjnYfVdes6kqH4S7ZfPMVFOAKjyIEkyhHpQ /ZnShKRY6T8ltyUEF6h9D/Wp4K/+8V2RLlemGhuNqZIHiAGzk98fuzHpP XAVb/oVUp41ya3qJizMkMiS96ur7KpdsTf82zNJRzRXfzrCN/jfEkksLv of+WHuvUWRPXHc8NV+hU3bdfeUgCKdUeXNZfzjbBstWSHqvdqBGxJxqGy eyb/AZ0TVf2e6nyvFyxPyoFGIygZ5; |
Domainkey-signature: |
s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=hyOdB4p7eEAaAyrbNf5j2p38bXP8WlHgBw2hjKer/h+aRn2CbPbD4HQH 84sTWLHM2Y+6xhStp8z1RQmFx85dpQ6pPVyJZ+7juZa5wIZRBtkLH6oj3 WtXLlNh/5oSpXOJjjKGTiMJjofo9KOPOHQYn+O5Bt/nHQIsFnr6rbNXGY DI6O3kTw4sMHdIqsQBSKGeyMLCDF09VRlWyY8nN/cjI2M7LtNWJP2qC7F 8Anh/BPlSJVw5aW1jbmcl06Zj7iWl; |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<201102021539.06664.stephan.diestelhorst@xxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
Organization: |
Fujitsu Technology Solutions |
References: |
<4D41FD3A.5090506@xxxxxxx> <AANLkTi=ppBtb1nhdfbhGZa0Rt6kVyopdS3iJPr5fVA1x@xxxxxxxxxxxxxx> <4D483599.1060807@xxxxxxx> <201102021539.06664.stephan.diestelhorst@xxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11 |
On 02/02/11 15:39, Stephan Diestelhorst wrote:
Hi folks,
long time no see. :-)
On Tuesday 01 February 2011 17:32:25 Andre Przywara wrote:
I asked Stephan Diestelhorst for help and after I convinced him that
removing credit and making SEDF the default again is not an option he
worked together with me on that ;-) Many thanks for that!
We haven't come to a final solution but could gather some debug data.
I will simply dump some data here, maybe somebody has got a clue. We
will work further on this tomorrow.
Andre and I have been looking through this further, in particular sanity
checking the invariant
prv->weight>= sdom->weight * sdom->active_vcpu_count
each time someone tweaks the active vcpu count. This happens only in
__csched_vcpu_acct_start and __csched_vcpu_acct_stop_locked. We managed
to observe the broken invariant when splitting cpupoools.
We have the following theory of what happens:
* some vcpus of a particular domain are currently in the process of
being moved to the new pool
The only _vcpus_ to be moved between pools are the idle vcpus. And those
never contribute to accounting in credit scheduler.
We are moving _pcpus_ only (well, moving a domain between pools actually
moves vcpus as well, but then the domain is paused).
On the pcpu to be moved the idle vcpu should be running. Obviously you
have found a scenario where this isn't true. I have no idea how this could
happen, as other then idle vcpus are taken into account for scheduling
only if the pcpu is valid in the cpupool. And the pcpu is set valid after the
BUG_ON you have triggered in your tests.
* some are still left on the old pool (vcpus_old) and some are already
in the new pool (vcpus_new)
* we now have vcpus_old->sdom = vcpus_new->sdom and following from this
* vcpus_old->sdom->weight = vcpus_new->sdom->weight
* vcpus_old->sdom->active_vcpu_count = vcpus_new->sdom->active_vcpu_count
* active_vcpu_count thus does not represent the separation of the
actual vpcus (may be the sum, only the old or new ones, does not
matter)
* however, sched_old != sched_new, and thus
* sched_old->prv != sched_new->prv
* sched_old->prv->weight != sched_new->prv->weight
* the prv->weight field hence sees the incremental move of VCPUs
(through modifications in *acct_start and *acct_stop_locked)
* if at any point in this half-way migration, the scheduler wants to
csched_acct, it erroneously checks the wrong active_vcpu_count
Workarounds / fixes (none tried):
* disable scheduler accounting while half-way migrating a domain
(dom->pool_migrating flag and then checking in csched_acct)
* temporarily split the sdom structures while migrating to account for
transient split of vcpus
* synchronously disable all vcpus, migrate and then re-enable
Caveats:
* prv->lock does not guarantee mutual exclusion between (same)
schedulers of different pools
<rant>
The general locking policy vs the comment situation is a nightmare.
I know that we have some advanced data-structure folks here, but
intuitively reasoning about when specific things are atomic and
mutually excluded is a pain in the scheduler / cpupool code, see the
issue with the separate prv->locks above.
E.g. cpupool_unassign_cpu and cpupool_unassign_cpu_helper interplay:
* cpupool_unassign_cpu unlocks cpupool_lock
* sets up the continuation calling cpupool_unassign_cpu_helper
* cpupool_unassign_cpu_helper locks cpupool_lock
* while intuitively, one would think that both should see a consistent
snapshot and hence freeing the lock in the middle is a bad idea
* also communicating continuation-local state through global variables
mandates that only a single global continuation can be pending
* reading cpu outside of the lock protection in
cpupool_unassign_cpu_helper also smells
</rant>
Despite the rant, it is amazing to see the ability to move running
things around through this remote continuation trick! In my (ancient)
balancer experiments I added hypervisor-threads just for side-
stepping this issue..
I think the easiest way to solve the problem would be to move the cpu to the
new pool in a tasklet. This is possible now, because tasklets are always
executed in the idle vcpus.
OTOH I'd like to understand what is wrong with my current approach...
Juergen
--
Juergen Gross Principal Developer Operating Systems
TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Andre Przywara
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Stephan Diestelhorst
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split,
Juergen Gross <=
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Stephan Diestelhorst
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Juergen Gross
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Juergen Gross
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Andre Przywara
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Andre Przywara
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Juergen Gross
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, George Dunlap
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, Juergen Gross
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, George Dunlap
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split, George Dunlap
|
|
|