WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] State of current Xen debugger

To: Roger Cruz <roger.cruz@xxxxxxxxxxxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Subject: Re: [Xen-devel] State of current Xen debugger
From: Keir Fraser <keir@xxxxxxx>
Date: Tue, 28 Sep 2010 18:06:33 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 28 Sep 2010 10:07:50 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:user-agent:date :subject:from:to:cc:message-id:thread-topic:thread-index:in-reply-to :mime-version:content-type:content-transfer-encoding; bh=u8FAzYMiWOBW2Gz8kXjk0yLbuqJ43W752kWRpDUuLaM=; b=DBNjBZynAGjJ0dkB7JzggQgWupewYnm8U9HuYEwNqGnrgXXHwa+xW0R/UGvv6DzeB7 mG9UTpFOgPAomWxDKCJXSNaRkQ6r2WGvnI/qOP653jkTEJyeA+BtVcCTHjC6aLITudao yrFK604DpxrXYH/lzTBK+5uS+HNotORN4SjI4=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:user-agent:date:subject:from:to:cc:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=PKsdk5rvIVUJGF1AkMuxPYb4N56nH2ZIKjSScxlKAGzuRzxcmioYbpoNVPOOkNYBYm hIuzhmPQ+sdmHH0tci1hvbI/HYL2NnlFBnFrjCeHtDRyDIQMMJumlYzFoeUVSWPk6t0y bSEFR5KhaZKmIidfCJwUPDenkmVhG19KgQEIY=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <EACA7CA90354A849B1315959042A052C010942B3@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: ActUIG8kM5lsGIutScKk9RYSVW2ObQABLJOOAr67TxAAAH0rpgAAX2zZAAL/zu8=
Thread-topic: [Xen-devel] State of current Xen debugger
User-agent: Microsoft-Entourage/12.26.0.100708
Yeah, but the performance counters are driven by the same LAPIC timesource
that drives the main LAPIC timer.

 -- Keir

On 28/09/2010 16:40, "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx> wrote:

> 
> 
> By the APIC timer?  When I traced this code I was under the impression that is
> driven by the performance counters counting cycles and generating an interrupt
> when the counter overflows.  I found this was the routine being called to
> setup the watchdog
> 
> static void __pminit setup_p6_watchdog(unsigned counter)
> {
>     unsigned int evntsel;
> 
>     nmi_perfctr_msr = MSR_P6_PERFCTR0;  <--- register
> 
>     clear_msr_range(MSR_P6_EVNTSEL0, 2);
>     clear_msr_range(MSR_P6_PERFCTR0, 2);
> 
>     evntsel = P6_EVNTSEL_INT
>         | P6_EVNTSEL_OS
>         | P6_EVNTSEL_USR
>         | counter;
> 
>     wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
>     write_watchdog_counter("P6_PERFCTR0");
>     apic_write(APIC_LVTPC, APIC_DM_NMI);
>     evntsel |= P6_EVNTSEL0_ENABLE;
>     wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
> }
> 
> and then during the NMI tick handler this path was executed
> 
>         else if ( nmi_perfctr_msr == MSR_P6_PERFCTR0 )
>         {
>             /*
>              * Only P6 based Pentium M need to re-unmask the apic vector but
>              * it doesn't hurt other P6 variants.
>              */
>             apic_write(APIC_LVTPC, APIC_DM_NMI);
>         }
>         write_watchdog_counter(NULL);
> 
> 
> 
> static inline void write_watchdog_counter(const char *descr)
> {
>     u64 count = (u64)cpu_khz * 1000;
> 
>     do_div(count, nmi_hz);
>     if(descr)
>         Dprintk("setting %s to -0x%08Lx\n", descr, count);
>     wrmsrl(nmi_perfctr_msr, 0 - count);
> }
> 
> 
> It is also my understanding that during the CPU c3 state change in cpu_idle.c,
> the APIC timer is turned off.  See comments below.
> 
>         /*
>          * Before invoking C3, be aware that TSC/APIC timer may be
>          * stopped by H/W. Without carefully handling of TSC/APIC stop issues,
>          * deep C state can't work correctly.
>          */
>         /* preparing APIC stop */
>         lapic_timer_off();  <------------- APIC timer appears to be turned off
> here.
> 
>         /* Get start time (ticks) */
>         t1 = inl(pmtmr_ioport);
>         /* Trace cpu idle entry */
>         TRACE_2D(TRC_PM_IDLE_ENTRY, cx->idx, t1);
>         /* Invoke C3 */
>         acpi_idle_do_entry(cx);
>         /* Get end time (ticks) */
>         t2 = inl(pmtmr_ioport);
> 
>         /* recovering TSC */
>         cstate_restore_tsc();  <----- this is our backport of an unstable
> patch to keep TSCs synchronized
>         /* Trace cpu idle exit */
> 
> 
> Thanks Keir!
> 
> Roger
> 
> -----Original Message-----
> From: Keir Fraser on behalf of Keir Fraser
> Sent: Tue 9/28/2010 11:30 AM
> To: Roger Cruz; Dan Magenheimer; Tim Deegan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] State of current Xen debugger
> 
> On 28/09/2010 16:21, "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx> wrote:
> 
>> I am still chasing this hard hang in our system with a modified 3.4.2 xen.  I
>> have upgraded the BIOS and the problem still exists.  The only thing that so
>> far had appeared to work was adding max_cstate=0 but now I have a report
>> where
>> it still hung in one customer who had that flag enabled.  The rest of them
>> have been successfully running for more than a week with this ³work-around².
>> I have isolated the problem to Lenovo with the Centrino processors.  These
>> guys will stop the TSC when in C3.
>> 
>> What I need to really understand is why the NMI/watchdog in Xen is not
>> working
>> and causing a crash when the CPU hangs.  I was under the impression that NMIs
>> couldn¹t be masked at all.  Is there anyway that Xen could be disabling or
>> changing that behavior?   I know the NMI is being driven by a timer set in
>> the
>> NMI handler.  Could there be a case where this timer is disabled?   Any ideas
>> are welcome!
> 
> The NMI counter gets driven by the APIC timer. Perhaps it needs poking
> womehow on wakeup from C3? My suggestion for debugging this would be to take
> a look at what native Linux does. The NMI perfctr poking logic was all taken
> from (rather old now) upstream Linux.
> 
>  -- Keir
> 
>> Thanks
>> Roger R. Cruz
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Roger Cruz
>> Sent: Tuesday, September 14, 2010 11:55 AM
>> To: Dan Magenheimer; Tim Deegan
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: RE: [Xen-devel] State of current Xen debugger
>> 
>> Hi Dan,
>> 
>> I am using 3.4.2 where we have made very minor modifications (some backports,
>> for example).
>> 
>> I have not tried your suggestions.. so I will do that next.. thanks!
>> 
>> R.
>> 
>> -----Original Message-----
>> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>> Sent: Tue 9/14/2010 11:19 AM
>> To: Roger Cruz; Tim Deegan
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: RE: [Xen-devel] State of current Xen debugger
>> 
>> A couple of thoughts:
>> 
>> 
>> 
>> Have you tried max_cstate=0 (as a Xen boot option)?
>> 
>> 
>> 
>> Also, you didn't say what version of Xen you are using but playing around
>> with
>> hpet_broadcast (enabling it or force-disabling it as below) might be worth a
>> try.
>> 
>> 
>> 
>> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html
>> 
>> 
>> 
>> From: Roger Cruz [mailto:roger.cruz@xxxxxxxxxxxxxxxxxxx]
>> Sent: Tuesday, September 14, 2010 8:56 AM
>> To: Tim Deegan
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: RE: [Xen-devel] State of current Xen debugger
>> 
>> 
>> 
>> Hi Tim,  good to hear from you again
>> 
>> I had a pretty good inkling that one of you hardcore developers would say
>> that
>> :-)  Yes, it is pretty well wedged.  I can cause the problem more rapidly by
>> dropping to a single CPU.  When the hang happens, the Xen console is
>> completely dead.  None of the special keys work.
>> 
>> I do have hopes a BIOS upgrade could fix this as a last resort but I want to
>> see if at least I can understand the problem.  We have a few different
>> machines that are exhibiting similar symptoms so I have to see if I can find
>> a
>> work-around without requiring every user to upgrade their BIOS :-(
>> 
>> Just in case, what debugger have you been using?  Are there recent
>> instructions on how to set it up that you can point me to?
>> 
>> Thanks
>> Roger
>> 
>> 
>> -----Original Message-----
>> From: Tim Deegan [mailto:Tim.Deegan@xxxxxxxxxx]
>> Sent: Tue 9/14/2010 10:30 AM
>> To: Roger Cruz
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: Re: [Xen-devel] State of current Xen debugger
>> 
>> Hi,
>> 
>> At 15:22 +0100 on 14 Sep (1284477779), Roger Cruz wrote:
>>> I am trying to debug a problem where the hypervisor is hanging hard.
>>> Not even the NMI watchdog is triggering a reboot.  So I wanted to hook
>>> up a debugger.
>> 
>> Sorry to bring a counsel of despair but if the NMI watchdog isn't
>> working then your chances of getting a working debugger are slim.  It's
>> likely that at least one CPU is very very stuck.  Does the 'd' debug key
>> work on the serial line when the machine is wedged?
>> 
>> On a more cheerful note, I've twice seen hard hangs like this that
>> turned out to be hardware issues, fixable with BIOS upgrades.
>> 
>> Cheers,
>> 
>> Tim.
>> 
>>> What is the state of the current debuggers out there?
>>> Any input on how I should set it up (kdb, gdb, etc) and pointers to a
>>> good wiki page are much appreciated.  I did perform a Google search
>>> and found some links but I want to hear from the current developers as
>>> to what is most stable and useful for debugging this type of hard
>>> hang.  I only have a serial port PCI-express card to use as the laptop
>>> has no built in port.
>> 
>> --
>> Tim Deegan <Tim.Deegan@xxxxxxxxxx>
>> Principal Software Engineer, XenServer Engineering
>> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
>> 
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/14/10
>> 02:35:00
>> 
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/14/10
>> 02:35:00
>> 
>> 
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/14/10
>> 02:35:00
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
> 
> 
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.856 / Virus Database: 271.1.1/3149 - Release Date: 09/28/10
> 02:34:00
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel