WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Date: Wed, 09 Feb 2011 14:04:08 +0100
Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
Delivery-date: Wed, 09 Feb 2011 05:04:43 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=juergen.gross@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1297256652; x=1328792652; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; bh=yHvy7/THVMsoYwLwaoVVFmfqT6VhEZhj43BSnLFjjUg=; b=X/w6zssvfJ5OeYsRqUShGBOAfU1HLtVBKkO8X4QatKcsOh43l30jEeNE VH+bm5HpY5B0uwbPDnRyff54f+ArMJngC1+E63CvgGBIPjz2PXGiKyvBK vcEX5dzTgMCbKaE8QAbwHvWTAso7ecGH9ozCgRubmvijb11yGnq2hlwS+ iTbDkWbB/KspjyVcIvupaVh45SPC0/Xz/XJT1h8TttJZfDaaKQ8tEFnCq 5CH6JDTJxZmW1ZiI0Q95PYyTDlhqm;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=iqlqe3TCUSpiC6G8z1QCG9WMln98IN5MB27Abr5C9IaLiLq4dD2ZkLfh UcyuNrwjmBEfihCf1aTGsD9q1w4PX8Sycicn2ADMnXjYaW1rec0Qahq6E jLh8IDNh57FTkMolFY3YknAvGNm2ERUAPUGR31Zj16r0EoKa9Bl9Pr7oZ eTmtSxy08L2Vw8S7w4Jk3BpknoabeTcxMPIM6GIhEUXDQ9cg13JpEbxXs ov0+LtyXy/ubcQRpJZ2ZDeepdnCgF;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTikgGNz=imS1xRVVjntY5P=+MuT_Qsb=-h3QHajY@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Fujitsu Technology Solutions
References: <4D41FD3A.5090506@xxxxxxx> <201102021539.06664.stephan.diestelhorst@xxxxxxx> <4D4974D1.1080503@xxxxxxxxxxxxxx> <201102021701.05665.stephan.diestelhorst@xxxxxxx> <4D4A43B7.5040707@xxxxxxxxxxxxxx> <4D4A72D8.3020502@xxxxxxxxxxxxxx> <4D4C08B6.30600@xxxxxxx> <4D4FE7E2.9070605@xxxxxxx> <4D4FF452.6060508@xxxxxxxxxxxxxx> <AANLkTinoRUQC_suVYFM9-x3D00KvYofq3R=XkCQUj6RP@xxxxxxxxxxxxxx> <4D50D80F.9000007@xxxxxxxxxxxxxx> <AANLkTinKJUAXhiXpKui_XX8XCD6T5fmzNARwHE6Fjafv@xxxxxxxxxxxxxx> <AANLkTinP0z9GynF1RFd8RwzWuqvxYdb+UBE+7xKpX6D4@xxxxxxxxxxxxxx> <4D517051.10402@xxxxxxx> <AANLkTi=MiELBnPFvb6-jzVth+T7aKxP5JMFhVh3Crdmo@xxxxxxxxxxxxxx> <AANLkTikgGNz=imS1xRVVjntY5P=+MuT_Qsb=-h3QHajY@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11
On 02/09/11 13:27, George Dunlap wrote:
Sorry, forgot the patch...
  -G

On Wed, Feb 9, 2011 at 12:27 PM, George Dunlap
<George.Dunlap@xxxxxxxxxxxxx>  wrote:
On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara<andre.przywara@xxxxxxx>  wrote:
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
(XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29

Interesting -- what seems to happen here is that as cpus are disabled,
vcpus are "shovelled" in an accumulative fashion from one cpu to the
next:
* v18,34,42 start on cpu 24.
* When 24 is brought down, they're all migrated to 25; then when 25 is
brougth down, to 26, then to 27
* v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix
* v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29.

While that behavior may not be ideal, it should certainly be bug-free.

Another interesting thing to note is that the bug happened on pcpu 32,
but there were no advertised migrations from that cpu.

If I understand the configuration of Andre's machine correctly, pcpu32 will
be the target of the next migrations. This pcpu is member of the next numa
node, correct?

Could it be there is a problem with the call of domain_update_node_affinity()
from cpu_disable_scheduler() ?

Hmm, I think this could really be the problem.
Andre, could you try the following patch?

diff -r f1fac30a531b xen/common/schedule.c
--- a/xen/common/schedule.c     Wed Feb 09 08:58:11 2011 +0000
+++ b/xen/common/schedule.c     Wed Feb 09 14:02:12 2011 +0100
@@ -491,6 +491,10 @@ int cpu_disable_scheduler(unsigned int c
                         v->domain->domain_id, v->vcpu_id);
                 cpus_setall(v->cpu_affinity);
                 affinity_broken = 1;
+            }
+            if ( cpus_weight(v->cpu_affinity) < NR_CPUS )
+            {
+                cpu_clear(cpu, v->cpu_affinity);
             }

             if ( v->processor == cpu )


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>