On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote:
> Ok, but I am still unsure where it is hanging in DomU. Can you run with
> 'console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen' to get an idea
> of what is stuck in the guest?
With "initcall_debug" parameter problem does not appear (at least for
200 domU starts)... It looks like race condition which doesn't happens
on slowed down kernel (by printing lots of debug info). This also
explains why this bug appears only on fast hardware.
> You might also have better luck using
> 'xenctx' to get a stack trace of what is hangning in the guest.
> (you will need the System.map file from the guest's kernel.. but that should
> be fairly easy to extract).
xenctx didn't provide any useful data :/ It always shows following trace
for hanged domU:
-----------------
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffffffff81801ee0
rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000
rbx: ffffffff81800010 rsi: 00000000deadbeef rdi: 00000000deadbeef
rbp: ffffffff81801ef8 r8: 0000000000000000 r9: 0000000000000000
r10: 0000000000000000 r11: 0000000000000246 r12: 0000000000000000
r13: 0000000000000000 r14: ffffffffffffffff r15: 0000000000000000
cs: e033 ss: e02b ds: 0000 es: 0000
fs: 0000 @ 0000000000000000
gs: 0000 @ ffff880018ee7000/0000000000000000
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b
59 c3 cc cc cc cc cc cc cc
Stack:
0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18
ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38
ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48
ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88
Call Trace:
[<ffffffff810013aa>] hypercall_page+0x3aa <--
[<ffffffff810072a0>] xen_safe_halt+0x10
[<ffffffff81012528>] default_idle+0x58
[<ffffffff81009faf>] cpu_idle+0x5f
[<ffffffff813fb388>] rest_init+0x68
[<ffffffff81875c79>] start_kernel+0x36f
[<ffffffff81875346>] x86_64_start_reservations+0x131
[<ffffffff81878245>] xen_start_kernel+0x5f1
------------------
I've collected few more messages from successful and failed domU starts.
The only difference is the place where "Switched to NOHz mode on CPU #0"
appears and existence of "CE: xen increased min_delta_ns to ..." and
"CE: Reprogramming failure. Giving up" messages.
I think it can be related to:
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html
(this was on HVM not PV, but looks similar)
I've tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config,
but it doesn't help. Also pinning vcpu doesn't help (this domUs have
only 1 vcpu). Is 'xenpm set-max-cstate 0' the same as booting xen with
max_cstate=0?
--
Pozdrawiam / Best Regards,
Marek Marczykowski | RLU #390519
marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl
xenctx-out
Description: Text document
xenctx-out2
Description: Text document
fwvm-fail1
Description: Text document
fwvm-fail2
Description: Text document
netvm-fail1
Description: Text document
netvm-ok
Description: Text document
netvm-ok2
Description: Text document
netvm-ok3
Description: Text document
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|