WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Date: Thu, 17 Feb 2011 08:05:00 +0100
Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
Delivery-date: Wed, 16 Feb 2011 23:05:52 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=juergen.gross@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1297926305; x=1329462305; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; bh=w6pdX6RFuc/lvnb3rn/gn+K+qEEK3/Li83Bjl6vs7tM=; b=GUMiLDj5V3Q5WQ7j/lprn1XXyp3f3sqwNA/NP/p9miOzpUzsCKa1I1+Q qkeEc+IpoKgjxRtVYabRLkQcNohll6CzMlcZ0CoKeDpQE83ou6YZQqi7S I8zUDyLovIXAJquvmNcn/4Nn916X4zG2C+MPtR0HE0orhPNhNhRVUCtTL PlpAjZg8hA1UjplEw/Ch0rL9BYojJEKMDbelrDmxT3f5/jdnJe/8Tmn4T IOaiP1szSVQRIt3Tv9iTjOkSPBbq4;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=BiGOtBj+FGu1IZowH/NIcOQarcc2rkbXPu5uzb+U4V25EsqRwtpqmuVs +oYHFGde9F+tKsDrIGzWhpEjvdTD8NeNu8Fe2LUosPUee7zYQ28Pi9TnK VCeCLThKw4fJWfuRa9xpzp+cMOqvAYKfWXNcUF6CIb6+cLzEeKgDJYrhe PJ5kcYItgUtVGNs3J73lAeTLYAU1TXPz9I5PLa6dJBB6a6iWI2suPtNzc 5hQ8sYYrX3GUWYgm2sxdLJTBya+l2;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTin+rE1=+vpmTg9xeQdYn7_hucSFkrz1qCtiKfkY@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Fujitsu Technology Solutions
References: <4D41FD3A.5090506@xxxxxxx> <4D4A43B7.5040707@xxxxxxxxxxxxxx> <4D4A72D8.3020502@xxxxxxxxxxxxxx> <4D4C08B6.30600@xxxxxxx> <4D4FE7E2.9070605@xxxxxxx> <4D4FF452.6060508@xxxxxxxxxxxxxx> <AANLkTinoRUQC_suVYFM9-x3D00KvYofq3R=XkCQUj6RP@xxxxxxxxxxxxxx> <4D50D80F.9000007@xxxxxxxxxxxxxx> <AANLkTinKJUAXhiXpKui_XX8XCD6T5fmzNARwHE6Fjafv@xxxxxxxxxxxxxx> <AANLkTinP0z9GynF1RFd8RwzWuqvxYdb+UBE+7xKpX6D4@xxxxxxxxxxxxxx> <4D517051.10402@xxxxxxx> <AANLkTi=MiELBnPFvb6-jzVth+T7aKxP5JMFhVh3Crdmo@xxxxxxxxxxxxxx> <AANLkTikgGNz=imS1xRVVjntY5P=+MuT_Qsb=-h3QHajY@xxxxxxxxxxxxxx> <4D529BD9.5050200@xxxxxxx> <4D52A2CD.9090507@xxxxxxxxxxxxxx> <4D5388DF.8040900@xxxxxxxxxxxxxx> <4D53AF27.7030909@xxxxxxx> <4D53F3BC.4070807@xxxxxxx> <4D54D478.9000402@xxxxxxxxxxxxxx> <4D54E79E.3000800@xxxxxxx> <AANLkTimkRAHtM4CoTskQ7w6B-8Pis4B2+k7=frxM3oyW@xxxxxxxxxxxxxx> <4D5A29C0.4050702@xxxxxxxxxxxxxx> <4D5B9D2B.107@xxxxxxxxxxxxxx> <AANLkTin+rE1=+vpmTg9xeQdYn7_hucSFkrz1qCtiKfkY@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11
On 02/16/11 14:54, George Dunlap wrote:
Andre (and Juergen), can you try again with the attached patch?

What the patch basically does is try to make "cpu_disable_scheduler()"
do what it seems to say it does. :-)  Namely, the various
scheduler-related interrutps (both per-cpu ticks and the master tick)
is a part of the scheduler, so disable them before doing anything, and
don't enable them until the cpu is really ready to go again.

To be precise:
* cpu_disable_scheduler() disables ticks
* scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
and does it after inserting the idle vcpu
* Modify semantics, s.t., {alloc,free}_pdata() don't actually start or
stop tickers
  + Call tick_{resume,suspend} in cpu_{up,down}, respectively
* Modify credit1's tick_{suspend,resume} to handle the master ticker as well.

With this patch (if dom0 doesn't get wedged due to all 8 vcpus being
on one pcpu), I can perform thousands of operations successfully.

(NB this is not ready for application yet, I just wanted to check to
see if it fixes Andre's problem)

After some thousand iterations the machine hang and after dumping Dom0
registers to console it continued running and crashed about a second later:

(XEN) cpupool_unassign_cpu(pool=0,cpu=9)
(XEN) cpupool_unassign_cpu(pool=0,cpu=9) ffff83083fff74c0
(XEN) cpupool_unassign_cpu ret=0
(XEN) cpupool_unassign_cpu(pool=0,cpu=4)
(XEN) cpupool_unassign_cpu(pool=0,cpu=4) ffff83083fff74c0
(XEN) cpupool_unassign_cpu ret=0
(XEN) cpupool_assign_cpu(pool=1,cpu=9)
(XEN) cpupool_assign_cpu(pool=1,cpu=9) ffff83083002de40
(XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
(XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    9
(XEN) RIP:    e008:[<ffff82c480126100>] active_timer+0xc/0x37
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: ffff830839d8ff18   rsi: 0000010dbb628a80   rdi: ffff83083ffbcf98
(XEN) rbp: ffff830839d8fd50   rsp: ffff830839d8fd50   r8:  ffff83083ffbcf90
(XEN) r9:  ffff82c480213680   r10: 00000000ffffffff   r11: 0000000000000010
(XEN) r12: ffff82c4802d3f80   r13: ffff82c4802d3f80   r14: ffff83083ffbcf98
(XEN) r15: ffff83083ffbcfc0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000007809c000   cr2: 0000000000620048
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff830839d8fd50:
(XEN)    ffff830839d8fda0 ffff82c480126ef9 0000000000000000 0000010dbb628a80
(XEN)    0000000000000086 0000000000000009 ffff83083002de40 ffff83083002dd50
(XEN)    0000000000000009 0000000000000009 ffff830839d8fdc0 ffff82c480117906
(XEN)    ffff83083ffa3b40 ffff83083ffa5d70 ffff830839d8fe30 ffff82c4801214fa
(XEN)    ffff83083002dd00 0000000900000100 0000000000000286 ffff8300780da000
(XEN)    ffff83083ffbcf80 ffff83083ffbcf90 ffff82c480247e00 0000000000000009
(XEN)    00000000fffffff0 ffff83083002dd00 0000000000000000 ffff8300781cc198
(XEN)    ffff830839d8fe60 ffff82c4801019ff 0000000000000009 0000000000000009
(XEN)    ffff8300781cc198 ffff830839d990d0 ffff830839d8fe80 ffff82c480101bd9
(XEN)    ffff83107e80c5b0 ffff8300781cc000 ffff830839d8fea0 ffff82c480104f21
(XEN)    0000000000000009 ffff830839d990e0 ffff830839d8fee0 ffff82c480125b6c
(XEN)    ffff82c48024a020 ffff830839d8ff18 ffff82c48024a020 ffff830839d8ff18
(XEN)    ffff830839d99060 ffff830839d99040 ffff830839d8ff10 ffff82c48015645a
(XEN)    0000000000000000 ffff8300780da000 ffff8300780da000 ffffffffffffffff
(XEN)    ffff830839d8fe00 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff8062bda0 ffff880fbb1e5fd8 0000000000000246
(XEN)    0000000000000000 000000010003347d 0000000000000000 0000000000000000
(XEN)    ffffffff800033aa 00000000deadbeef 00000000deadbeef 00000000deadbeef
(XEN)    0000010000000000 ffffffff800033aa 000000000000e033 0000000000000246
(XEN)    ffff880fbb1e5f08 000000000000e02b 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c480126100>] active_timer+0xc/0x37
(XEN)    [<ffff82c480126ef9>] set_timer+0x102/0x218
(XEN)    [<ffff82c480117906>] csched_tick_resume+0x53/0x75
(XEN)    [<ffff82c4801214fa>] schedule_cpu_switch+0x1f1/0x25c
(XEN)    [<ffff82c4801019ff>] cpupool_assign_cpu_locked+0x61/0xd6
(XEN)    [<ffff82c480101bd9>] cpupool_assign_cpu_helper+0x9f/0xcd
(XEN)    [<ffff82c480104f21>] continue_hypercall_tasklet_handler+0x51/0xc3
(XEN)    [<ffff82c480125b6c>] do_tasklet+0xe1/0x155
(XEN)    [<ffff82c48015645a>] idle_loop+0x5f/0x67
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 9:
(XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
(XEN) ****************************************


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel