WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] A question about changeset 20621:f9392f6eda79 and Discontinu

To: "andre.przywara@xxxxxxx" <andre.przywara@xxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: [Xen-devel] A question about changeset 20621:f9392f6eda79 and Discontinuous online node
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Wed, 6 Jan 2010 16:14:32 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 06 Jan 2010 00:16:28 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcqOqEbbnXIv78APQI2JXa+pIXFSLw==
Thread-topic: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
Hi, Andre and Keir, we meet a divide_by_zero bug in xend when create guest, 
after checking the code, seems it is caused by changeset 20621(see below for 
the patch). It removes the checking for len(info['node_to_cpu'][i]) > 0 before 
nodeload[i] = int(nodeload[i] * 16 / len(info['node_to_cpu'][i])), so that if a 
node has no CPU populated, it will fail.

A deep checking of the code reveals more than this changeset. Per my 
understanding to the code, currently Xen API and control panel assumes the 
online node is always continuous. The XEN_SYSCTL_physinfo hypercall will return 
only the number of online node, and control panel like 
tools/python/xen/xend/XendDomainInfo.py will iterate from 0~nr_nodes.

However, this is not always true, considering if no memory is populated behind 
some socket. For example, in a NUMA system with 4 pxm domain, pxm 0/2 has both 
CPU and memory populated, while pxm 1/3 has only CPU.. Xen hypervisor will 
setup pxm~node mapping for all 4 domain(assume pxm is 1:1 mapping with node), 
but only node 0/2 is online (per my understanding, according to current memory 
allocation mechanism, only node with memory populated is online). 

When control panel call XEN_SYSCTL_physinfo, it will get nr_nodes as 2, and 
currently it will assume node 0/1 is online, this is sure to be wrong and may 
cause various issues.

This continuous assumption apply to CPU side also. Currently nr_cpus is 
returned as num_online_cpus(), this will cause issue if some of cpu is offlined.

I'm considering if we can pass this dis-continuous information to user space 
too, but that requires change this sysctl interface. The worse is, even if we 
can change this interface, we may run out of the 128 byte limitation for 
xen_sysctl hypercall if we change the NR_CPUS == 128 in future (currently the 
struct xen_sysctl_physinfo is 104 byte already).

I'd get some input from you guys and community before I try to fix this issue, 
any suggestion?

Thanks
Yunhong Jiang

diff -r a50c1cbf08ec -r f9392f6eda79 tools/python/xen/xend/XendDomainInfo.py
--- a/tools/python/xen/xend/XendDomainInfo.py   Fri Dec 11 08:58:06 2009 +0000
+++ b/tools/python/xen/xend/XendDomainInfo.py   Fri Dec 11 08:59:54 2009 +0000
@@ -2670,10 +2670,9 @@ class XendDomainInfo:
                                     nodeload[i] += 1
                                     break
                 for i in range(0, nr_nodes):
-                    if len(info['node_to_cpu'][i]) > 0 and i in node_list:
-                        nodeload[i] = int(nodeload[i] * 16 / 
len(info['node_to_cpu'][i]))
-                    else:
-                        nodeload[i] = sys.maxint
+                    nodeload[i] = int(nodeload[i] * 16 / 
len(info['node_to_cpu'][i]))
+                    if len(info['node_to_cpu'][i]) == 0 or i not in node_list:
+                        nodelist[i] += 8
                 return map(lambda x: x[0], sorted(enumerate(nodeload), 
key=lambda x:x[1]))


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel