WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Xen 4 TSC problems

Hi Xen developpers

i need some good tips to go forward with my TSC problem : 

first fast the problem : 

- clock jump 50 minutes forward : (xm dmesg)
        (XEN) TSC is reliable, synchronization unnecessary
        (XEN) Platform timer is 14.318MHz HPET
        (XEN)  Platform timer appears to have unexpectedly wrapped 10 or more 
times

        (syslog)
        Sep 28 17:45:06 dnsit11 kernel: [1970548.356130] Clocksource tsc 
unstable (delta = -2999660112689 ns)
        Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc 
unstable (delta = -2999662111513 ns)

- I can't reproduce or force the problem

- on 2 different HP DL 385 G7,  with debian squeeze : 
        xen-hypervisor-4.0-amd64                4.0.1-2
        dom0 : linux-image-2.6.32-5-xen-amd64          2.6.32-35
        domus : 5 -> 15 debian machines
        2 * 12-cores AMD Opteron(tm) Processor 6174

- i have this problem since begin of september, before, the machine were 
running since 3 month without problem
        begin of September,  I have done an upgrade (dom0 and domus:)
        linux-image-2.6.32-5-xen-amd64:amd64 (2.6.32-31, automatic)  -> 
linux-image-2.6.32-5-xen-amd64:amd64 (2.6.32-31, 2.6.32-35)

- what is strange : (don't know if there is a link with the problem)
        /proc/cpuinfo in dom0 gives me : 

        cpu MHz         : 3249880.888
  --or --
        cpu MHz         : 2300454.255
....            (different after each reboot)
        
        in domu thi value is ok(cpu MHz         : 2200.112), the bogomips is 
also ok (bogomips        : 4400.21)
        if I start the machine with a non-xen environment, the values are also 
ok
        
I have now exact the same machine where I can make some tests.

Could you give me some tips that I could test or implement ?
        - hardware problem ? hypervisor problem ? dom0 problem ?
        - try other hypervisor version ? 
        - try linux-image-3.0.0-1-amd64 3.0.0-3
        - try reproducing problem ? (how ?, log it ? ....)

all your help is welcomed !

many thanks

Philippe





> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of George Dunlap
> Sent: Monday, September 19, 2011 12:40 PM
> To: Dan Magenheimer
> Cc: Keir Fraser; jeremy@xxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx; Philippe
> Simonet; Konrad Wilk
> Subject: Re: [Xen-devel] Xen 4 TSC problems
> 
> On Thu, Sep 15, 2011 at 7:38 PM, Dan Magenheimer
> <dan.magenheimer@xxxxxxxxxx> wrote:
> >> I haven't been following this conversation, so I don't know if this
> >> is relevant, but I've just discovered this morning that the TSC warp
> >> check in Xen is done at the wrong time (before any secondary cpus are
> >> brought up), and thus always returns warp=0.  I've submitted a patch
> >> to do the check after secondary CPUs are brought up; that should
> >> cause Xen to do periodic synchronization of TSCs when there is drift.
> >
> > Wow, nice catch, George!  I wonder if this is the underlying bug for
> > many of the mysterious time problems that have been reported for a
> > year or two now... at least on certain AMD boxes.
> > Any idea when this was introduced?  Or has it always been wrong?
> 
> Well the comment in 20823:89907dab1aef seems to indicate that's where the
> "assume it's reliable on AMD until proven otherwise" started; that would be
> January 2010.
> 
> I looked as far back as 20705:a74aca4b9386, and there the TSC reliability
> checks were again in init_xen_time().  Figuring out where things were before
> then is getting into archeology. :-)
> 
> The comment at the top of init_xen_time() is correct now, but from the time
> it was first written through 4.1 is was just plain wrong -- it said
> init_xen_time() happened after all cpus were up, which has never been true.
> 
>  -George
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>