WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
From: Marek Marczykowski <marmarek@xxxxxxxxxxxx>
Date: Wed, 31 Aug 2011 18:27:54 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Joanna Rutkowska <joanna@xxxxxxxxxxxxxxxxxxxxxx>, Rafal Wojtczuk <rafal@xxxxxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 31 Aug 2011 09:28:42 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4E5D1B57.7040106@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4E5A3F0A.8060700@xxxxxxxxxxxx> <20110829200749.GA17265@xxxxxxxxxxxx> <4E5BF4C3.2050108@xxxxxxxxxxxx> <20110829205938.GB18697@xxxxxxxxxxxx> <4E5D1B57.7040106@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc14 Lightning/1.0b3pre Thunderbird/3.1.11
On 30.08.2011 19:18, Marek Marczykowski wrote:
> On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote:
>> Ok, but I am still unsure where it is hanging in DomU. Can you run with
>> 'console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen' to get an idea
>> of what is stuck in the guest? 
> 
> With "initcall_debug" parameter problem does not appear (at least for
> 200 domU starts)... It looks like race condition which doesn't happens
> on slowed down kernel (by printing lots of debug info). This also
> explains why this bug appears only on fast hardware.
> 
>> You might also have better luck using
>> 'xenctx' to get a stack trace of what is hangning in the guest.
>> (you will need the System.map file from the guest's kernel.. but that should
>> be fairly easy to extract).
> 
> xenctx didn't provide any useful data :/ It always shows following trace
> for hanged domU:
> -----------------
> rip: ffffffff810013aa hypercall_page+0x3aa
> flags: 00001246 i z p
> rsp: ffffffff81801ee0
> rax: 0000000000000000 rcx: ffffffff810013aa   rdx: 0000000000000000
> rbx: ffffffff81800010 rsi: 00000000deadbeef   rdi: 00000000deadbeef
> rbp: ffffffff81801ef8  r8: 0000000000000000    r9: 0000000000000000
> r10: 0000000000000000 r11: 0000000000000246   r12: 0000000000000000
> r13: 0000000000000000 r14: ffffffffffffffff   r15: 0000000000000000
>  cs: e033      ss: e02b        ds: 0000        es: 0000
>  fs: 0000 @ 0000000000000000
>  gs: 0000 @ ffff880018ee7000/0000000000000000
> Code (instr addr ffffffff810013aa)
> cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b
> 59 c3 cc cc cc cc cc cc cc
> 
> 
> Stack:
>  0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18
>  ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38
>  ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48
>  ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88
> 
> Call Trace:
>   [<ffffffff810013aa>] hypercall_page+0x3aa  <--
>   [<ffffffff810072a0>] xen_safe_halt+0x10
>   [<ffffffff81012528>] default_idle+0x58
>   [<ffffffff81009faf>] cpu_idle+0x5f
>   [<ffffffff813fb388>] rest_init+0x68
>   [<ffffffff81875c79>] start_kernel+0x36f
>   [<ffffffff81875346>] x86_64_start_reservations+0x131
>   [<ffffffff81878245>] xen_start_kernel+0x5f1
> ------------------
> 
> I've collected few more messages from successful and failed domU starts.
> The only difference is the place where "Switched to NOHz mode on CPU #0"
> appears and existence of "CE: xen increased min_delta_ns to ..." and
> "CE: Reprogramming failure. Giving up" messages.
> 
> I think it can be related to:
> http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html
> (this was on HVM not PV, but looks similar)
> 
> I've tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config,
> but it doesn't help. Also pinning vcpu doesn't help (this domUs have
> only 1 vcpu). Is 'xenpm set-max-cstate 0' the same as booting xen with
> max_cstate=0?

Looks like tsc_mode=2 solves the problem.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel