|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Re: DOM0 Hang on a large box....
>>> On 01.09.11 at 21:20, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> wrote:
> I'm looking at a system hang on a large box: 160 cpus, 2TB. Dom0 is
> booted with 160 vcpus (don't ask me why :)), and an HVM guest is started
> with over 1.5T RAM and 128 vcpus. The system hangs without much activity
> after couple hours. Xen 4.0.2 and 2.6.32 based 64bit dom0.
>
> During hang I discovered:
>
> Most of dom0 vcpus are in double_lock_balance spinning on one of the locks:
>
> @ ffffffff800083aa: 0:hypercall_page+3aa pop %r11
> @ ffffffff802405eb: 0:xen_spin_wait+19b test %eax, %eax
> @ ffffffff8035969b: 0:_spin_lock+10b test %al, %al
> @ ffffffff800342f5: 0:double_lock_balance+65 mov %rbx, %rdi
> @ ffffffff80356fc0: 0:thread_return+37e mov 0x880(%r12), %edi
>
> static int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
> __releases(this_rq->lock)
> __acquires(busiest->lock)
> __acquires(this_rq->lock)
> {
> int ret = 0;
>
> if (unlikely(!spin_trylock(&busiest->lock))) {
> if (busiest < this_rq) {
> spin_unlock(&this_rq->lock);
> spin_lock(&busiest->lock);
> spin_lock_nested(&this_rq->lock,
> SINGLE_DEPTH_NESTING);
> ret = 1;
> } else
> spin_lock_nested(&busiest->lock,
> SINGLE_DEPTH_NESTING);
> }
> return ret;
> }
>
>
> The lock is taken, but not sure who the owner is. The lock struct:
>
> @ ffff8800020e2480: 2f102e70 0000000c 00000002 00000000
>
> so slock is: 2f102e70
>
> The remaining vcpus are idling:
>
> ffffffff800083aa: 0:hypercall_page+3aa pop %r11
> ffffffff8000f0c7: 0:xen_safe_halt+f7 addq $0x18, %rsp
> ffffffff8000a5c5: 0:cpu_idle+65 jmp 0:cpu_idle+4e
> ffffffff803558fe: 0:cpu_bringup_and_idle+e leave
>
> But the baffling thing is the vcpu upcall mask is set. The block schedop
> call
> does local_event_delivery_enable() first thing, so the mask should be
> clear!!!
>
>
> Another baffling thing is the dom0 upcall mask looks fishy:
> @ ffff83007f4dba00: 4924924924924929 2492492492492492
> @ ffff83007f4dba10: 9249249249249249 4924924924924924
> @ ffff83007f4dba20: 2492492492492492 9249249249249249
> @ ffff83007f4dba30: 4924924924924924 0000000092492492
> @ ffff83007f4dba40: 0000000000000000 0000000000000000
> @ ffff83007f4dba50: 0000000000000000 ffffffffc0000000
> @ ffff83007f4dba60: ffffffffffffffff ffffffffffffffff
> @ ffff83007f4dba70: ffffffffffffffff ffffffffffffffff
>
>
> Finally, ticketing is used for spin locks. Hi Jan, what is the largest
> system this was tested on? Have you seen this before?
>From the observation of most CPUs sitting in _double_lock_balance()
I would have answered yes, but the odd upcall mask I don't recall
having seen. In any case - is your Dom0 kernel (presumably derived
from ours) up-to-date? That problem I recall was fixed months ago.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|