WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
From: Andre Przywara <andre.przywara@xxxxxxx>
Date: Wed, 2 Feb 2011 11:59:40 +0100
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
Delivery-date: Wed, 02 Feb 2011 03:03:38 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D492C7C.5040005@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4D41FD3A.5090506@xxxxxxx> <4D426673.7020200@xxxxxxxxxxxxxx> <4D42A35D.3050507@xxxxxxx> <4D42AC00.8050109@xxxxxxxxxxxxxx> <4D42C153.5050104@xxxxxxx> <4D465F0D.4010408@xxxxxxxxxxxxxx> <4D46CE4F.3090003@xxxxxxx> <AANLkTi=ppBtb1nhdfbhGZa0Rt6kVyopdS3iJPr5fVA1x@xxxxxxxxxxxxxx> <4D483599.1060807@xxxxxxx> <4D48F954.5000103@xxxxxxxxxxxxxx> <4D491AB7.40204@xxxxxxxxxxxxxx> <4D492C7C.5040005@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.23 (X11/20090820)
Juergen Gross wrote:
Hi Andre,

could you try the attached patch?
It should verify if your problems are due to the master ticker
kicking in at a time when the cpu is already gone from the cpupool.
That's what we found also yesterday. If the timer routine triggers before the timer is stopped but is actually _running_ afterwards, this could lead to problems.

Anyway, the hypervisor still crashes, now at a different BUG_ON():

    /* Start off idling... */
    BUG_ON(!is_idle_vcpu(per_cpu(schedule_data, cpu).curr));
    cpu_set(cpu, prv->idlers);

The complete crash dump was this:

(XEN) Xen BUG at sched_credit.c:389
(XEN) ----[ Xen-4.1.0-rc2-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c480118020>] csched_alloc_pdata+0x146/0x197
(XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
(XEN) rax: ffff830434322000   rbx: ffff830434492478   rcx: 0000000000000018
(XEN) rdx: ffff82c4802d3ec0   rsi: 0000000000000006   rdi: ffff83043445e100
(XEN) rbp: ffff83043456fce8   rsp: ffff83043456fca8   r8:  00000000deadbeef
(XEN) r9:  ffff830434492478   r10: ffff82c48021a1c0   r11: 0000000000000286
(XEN) r12: 0000000000000018   r13: ffff830a3c70c780   r14: ffff830434492460
(XEN) r15: 0000000000000018   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 0000000805bac000   cr2: 00007fbbdaf71116
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83043456fca8:
(XEN)    ffff83043456fcb8 ffff830a3c70c780 0000000000000286 0000000000000018
(XEN)    ffff830434492550 ffff830a3c70c690 0000000000000018 ffff82c4802b0880
(XEN)    ffff83043456fd58 ffff82c48011fbb3 ffff82f601020900 0000000000081048
(XEN)    ffff8300c7e42000 0000000000000000 0000800000000000 ffff82c480249000
(XEN)    0000000000000002 0000000000000018 ffff830434492550 0000000000305000
(XEN)    ffff82c4802550e4 ffff82c4802b0880 ffff83043456fd78 ffff82c48010188c
(XEN)    ffff83043456fe40 0000000000000018 ffff83043456fdb8 ffff82c480101b94
(XEN)    ffff83043456fdb8 ffff82c48018380a fffffffe00000286 ffff83043456ff18
(XEN)    0000000001669004 0000000000305000 ffff83043456fef8 ffff82c4801253c1
(XEN)    ffff83043456fde8 ffff8300c7ac0000 0000000000000000 0000000000000246
(XEN)    ffff83043456fe18 ffff82c480106c7f ffff830434577100 ffff8300c7ac0000
(XEN)    ffff83043456fe28 ffff82c480125de4 0000000000000003 ffff82c4802d3f80
(XEN)    ffff83043456fe78 0000000000000282 0000000800000012 0000000400000004
(XEN)    0000000000000000 ffffffff00000018 0000000000000000 00007f7e6a549a00
(XEN)    0000000001669000 0000000000000000 0000000000000000 ffffffffffffffff
(XEN)    0000000000000000 0000000000000080 000000000000002f 0000000001669004
(XEN)    000000000166b004 000000000166d004 00007fffa59ff250 0000000000000033
(XEN)    ffff83043456fed8 ffff8300c7ac0000 00007fffa59ff100 0000000000305000
(XEN)    0000000000000003 0000000000000003 00007cfbcba900c7 ffff82c480207ee8
(XEN)    ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003
(XEN) Xen call trace:
(XEN)    [<ffff82c480118020>] csched_alloc_pdata+0x146/0x197
(XEN)    [<ffff82c48011fbb3>] schedule_cpu_switch+0x75/0x1cd
(XEN)    [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b
(XEN)    [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461
(XEN)    [<ffff82c4801253c1>] do_sysctl+0x921/0xa30
(XEN)    [<ffff82c480207ee8>] syscall_enter+0xc8/0x122
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Xen BUG at sched_credit.c:389
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


Regards,
Andre.


I'm not sure if the patch is complete - Disabling the master ticker
in csched_tick_suspend might lead to problems with cstates. The
functionality is different, at least.

George, do you think this is correct?


Juergen

On 02/02/11 09:49, Juergen Gross wrote:
On 02/02/11 07:27, Juergen Gross wrote:
On 02/01/11 17:32, Andre Przywara wrote:
Hi folks,

I asked Stephan Diestelhorst for help and after I convinced him that
removing credit and making SEDF the default again is not an option he
worked together with me on that ;-) Many thanks for that!
We haven't come to a final solution but could gather some debug data.
I will simply dump some data here, maybe somebody has got a clue. We
will work further on this tomorrow.

First I replaced the BUG_ON with some printks to get some insight:
(XEN) sdom->active_vcpu_count: 18
(XEN) sdom->weight: 256
(XEN) weight_left: 4096, weight_total: 4096
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) Xen BUG at sched_credit.c:591
(XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----

So that one shows that the number of VCPUs is not up-to-date with the
computed weight sum, we have seen a difference of one or two VCPUs (in
this case here the weight has been computed from 16 VCPUs). Also it
shows that the assertion kicks in in the first iteration of the loop,
where weight_left and weight_total are still equal.

So I additionally instrumented alloc_pdata and free_pdata, the
unprefixed lines come from a shell script mimicking the functionality of
cpupool-numa-split.
------------
Removing CPUs from Pool 0
Creating new pool
Using config file "cpupool.test"
cpupool name: Pool-node6
scheduler: credit
number of cpus: 1
(XEN) adding CPU 36, now 1 CPUs
(XEN) removing CPU 36, remaining: 17
Populating new pool
(XEN) sdom->active_vcpu_count: 9
(XEN) sdom->weight: 256
(XEN) weight_left: 2048, weight_total: 2048
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) adding CPU 37, now 2 CPUs
(XEN) removing CPU 37, remaining: 16
(XEN) adding CPU 38, now 3 CPUs
(XEN) removing CPU 38, remaining: 15
(XEN) adding CPU 39, now 4 CPUs
(XEN) removing CPU 39, remaining: 14
(XEN) adding CPU 40, now 5 CPUs
(XEN) removing CPU 40, remaining: 13
(XEN) sdom->active_vcpu_count: 17
(XEN) sdom->weight: 256
(XEN) weight_left: 4096, weight_total: 4096
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) adding CPU 41, now 6 CPUs
(XEN) removing CPU 41, remaining: 12
...
Two thing startled me:
1) There is quite some between the "Removing CPUs" message from the
script and the actual HV printk showing it's done, why is that not
synchronous?
Removing cpus from Pool-0 requires no switching of the scheduler, so you
see no calls of alloc/free_pdata here.

Looking at the code it shows that
__csched_vcpu_acct_start() is eventually triggered by a timer, shouldn't
that be triggered synchronously by add/removal events?
The vcpus are not moved explicitly, they are migrated by the normal
scheduler mechanisms, same as for vcpu-pin.

2) It clearly shows that each CPU gets added to the new pool _before_ it
gets removed from the old one (Pool-0), isn't that violating the "only
one pool per CPU" rule? Even it that is fine for a short period of time,
maybe the timer kicks in in this very moment resulting in violated
invariants?
The sequence you are seeing seems to be okay. The alloc_pdata for the
new pool
is called before the free_pdata for the old pool.

And the timer is not relevant, as only the idle vcpu should be running
on the
moving cpu and the accounting stuff is never called during idle.
Uhh, this could be wrong!
The normal ticker doesn't call accounting in idle and it is stopped during
cpu move. The master_ticker is handled wrong, perhaps. I'll check this and
prepare a patch if necessary.


Juergen




--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel