RE: [Xen-devel] Huge Time went backwards

Hi,

I have found another event from "Time want backwards" with my 3.3.1, so I 
analyzed
a bit deeper and systematically. Maybe there is something, you recognize. Sorry 
this post is so long.

Short summary of system: Gigabyte GA-M56S-S3, AMD 4050e Dualcore, Debian Etch as
base for a self-compiled Xen 3.3.1 with 2.6.18.8 besides the Firewall. Dom0 is
64bit, there is one 64bit DomU, all others are 32bit. The 64bit DomUs and one
32bit DomU have 2 VCPUs, the others have one only.

I use munin to monitor only a few values, two of them are cpu usage of domains 
and
residency on the four available P-States. I noticed a peak at 06:25. I found 
out that 
it's because all Domains are running cron.daily, in addition munin-cron fires 
up (like 
it does all five minutes). 

I generally reduced "Time went backwards" messages by setting sample_rate of 
the ondemand 
govenor to 4sec, using newer Kernels and spreading crons, but this is all cure 
on symtoms,
I guess. 

Today, at 06:25, there was another Time went Backwards in Dom0:

Apr  7 06:25:01 data /USR/SBIN/CRON[24074]: (munin) CMD (if [ -x 
/usr/bin/munin-cron...
Apr  7 06:25:01 data /USR/SBIN/CRON[24075]: (root) CMD (if [ -x 
/etc/munin/plugins/apt_all...
Apr  7 06:25:03 data kernel: Timer ISR/0: Time went backwards: delta=-32282595 
delta_cpu=43717405 
                             shadow=375953403877760 off=31846903 
processed=375953468006000 
                             cpu_processed=375953392006000
Apr  7 06:25:03 data kernel:  0: 375953392006000
Apr  7 06:25:03 data kernel:  1: 375953464006000

It's 32ms, nothing unusual. It's just after munin-cron, I guess the 2 seconds 
are needed
before the graphic calculation starts. So we have a system sleeping well at 
1,0GHz, that
jumps to 2,1GHz. Right then, we have the message. 

I checked with all other Domains. In terms of "Time went backwards", there was 
only one
other Domain, spock, the 32bit 2 VCPU Domain:

Apr  7 06:22:37 spock -- MARK --
Apr  7 06:25:04 spock kernel: Timer ISR/1: Time went backwards: delta=-25302545 
delta_cpu=-21302545 
                              shadow=375953435562320 off=7458 
processed=375953460871440 
                              cpu_processed=375953456871440
Apr  7 06:25:04 spock kernel:  0: 375953452871440
Apr  7 06:25:04 spock kernel:  1: 375953456871440

Again, it was sleeping. It comes up with a different delta. But what made me 
curious is: the
last 6 digits of the per_cpu(processed_system_time) are invariant in bot cases. 
I am not expert
enough to judge, whether this is right.

When browsing through the syslogs, I also saw another event that mentioned TSC: 
all DomUs
besides the 64 Bit DomUs mention TSC unstability:

Apr  2 22:02:22 shields kernel: Clocksource tsc unstable (delta = -263023689 ns)
Apr  2 22:02:32 tuvok kernel: TSC appears to be running slowly. Marking it as 
unstable
Apr  2 22:02:34 kes kernel: TSC appears to be running slowly. Marking it as 
unstable
Apr  2 22:02:33 uhura kernel: TSC appears to be running slowly. Marking it as 
unstable
Apr  2 22:02:34 worf kernel: TSC appears to be running slowly. Marking it as 
unstable

You have to know that this was shortly after a reboot. When rebooting, I will 
start up
all DomUs by xendomains with a 15sec delay. So I thought that it is quite 
unusual that
they find out about unstable TSC at the same time. So I looked up Dom0:

Apr  2 22:02:19 data ntpd[6325]: synchronized to 212.112.228.242, stratum 2
Apr  2 22:02:19 data ntpd[6325]: time reset -2.027414 s
Apr  2 22:02:19 data ntpd[6325]: kernel time sync enabled 0001

I guess *this* could be conected with Tims patch.

As promised, I will shortly try to set up other versions of Xen and later 
kernels. But honestly,
I don't expect the situation to change, as I normally follow the patches build 
into mercurial.

Best Regards,
Carsten.

----- Originalnachricht -----
Von: Carsten Schiers <carsten@xxxxxxxxxx>
Gesendet: Mon, 6.4.2009 23:19
An: dan.magenheimer <dan.magenheimer@xxxxxxxxxx> ; xen-devel 
<xen-devel@xxxxxxxxxxxxxxxxxxx>
Cc: Tim.Deegan <Tim.Deegan@xxxxxxxxxx>
Betreff: AW: RE: AW: RE: [Xen-devel] Huge Time went backwards

Thanks Dan.

As I lost my setup (although I could have restored that - it was an lvm), I set
up everything new. Today I compiled Xen 3.3.1, 3.3-testing and 3.4-unstable and
two kernel pairs for 64 and 32 bit (with and without MSI support; I have some
issues with that in one of my DomUs that gets four PCI devices passed thru).

As I am unsure whether I can simply install xen-unstable tools over 3.3.1, I 
will
set up a copy of my Dom0 and install it there. I modified the extra version so 
that I can use the same boot dir and still can distinguish between the kernels.

I still have not found anything like tagging for the kernel; the tar-ball for 
Xen-3.3.1 will not compile with my Xen. So for kernels, it's now the one from 
today in any case.

Next step is to carefully start testing, beginning with MSI and Xen 3.4, as I
expect the best results there. Then I will shorten the ondemand sampling rate,
which is now at roughly 4 sec. This is to prevent the core to jump their clock
rate too often.

In parallel, I will try to find out where exactly this message is produced and
what it means.

I will report ;-)

Andy yes, it's all pv, no hvm. And all DomUs get their time from Dom0/ntpd. 
And there seems to be no drift.

Best Regards,
Carsten. 

-----Ursprüngliche Nachricht-----
Von: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] 
Gesendet: Montag, 6. April 2009 23:00
An: Carsten Schiers; xen-devel
Cc: Tim.Deegan
Betreff: RE: AW: RE: [Xen-devel] Huge Time went backwards

Hi Carsten --

I think domain0 (and all PV domains) only use the
one paravirtualized clock based on xen system time.
Changing clock options such as hpet and notsc will
not affect a PV domain, only an HVM domain.

Dan

> -----Original Message-----
> From: Carsten Schiers [mailto:carsten@xxxxxxxxxx]
> Sent: Friday, April 03, 2009 11:11 AM
> To: Dan Magenheimer; xen-devel
> Cc: Tim.Deegan
> Subject: AW: RE: [Xen-devel] Huge Time went backwards
> 
> 
> > Interesting.  This is reported booting dom0, correct?  Are
> > you running NTP in dom0?
> 
> It's in Dom0 log when powernow-k8 is loaded, which is after 
> loading loop 
> and prior on 
> mounting disks. Roughly 4 seconds prior to starting ntpd, so 
> I guess it 
> has no interaction.
> 
> It feels a bit like that 3.3.1 and 3.4 set values differently 
> in the CPU 
> cores, because
> when rebooting the same Xen version two or three times again, 
> it's away. 
> It will come back
> with either Xen version when you switch version. At least it 
> feels that 
> way.
> 
> >> If I get on your nerves with my time keeping issues
> >
> >No, it is good to raise awareness of these issues until they
> >are all fixed.  ESPECIALLY if you see time problems in xen-unstable,
> >it would be good to get them fixed before 3.4 is final.
> 
> Unfortunately, I broke my unstable. Also, it's a 
> semi-productive family 
> server, so slots to
> Test are a bit rare. I try my best to set it up again and do more 
> testing.
> 
> I have this Time went backwards issues from the beginning. 
> Now I tried 
> to set HPET to 32bit,
> although I don't know whether a) BIOS is read to use 32bit instead of 
> 64bit, or b) this makes
> any difference, or c) HEPT and TSC have something in common when it 
> comes to TSC drifts related
> to power management.
> 
> I also wondered, whether I should try booting with notsc, but it's a 
> dual core and I think then
> I need TSC, or don't I?
> 
> BR,
> Carsten.
> 
> 
> -----Original Message-----
> From: Carsten Schiers [mailto:carsten@xxxxxxxxxx]
> Sent: Friday, April 03, 2009 6:52 AM
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-devel] Huge Time went backwards
> 
> 
> Just an observation, to whom it may concern: when booting 
> between 3.3.1 
> and
> current xen-3.4-unstable, right after loading powernow-k8, 
> there will be 
> one huge
> Time went backwards messages (500ms upt to 1,5s), which disappears 
> unless
> you change Xen version again.
> 
> BTW: If I get on your nerves with my time keeping issues, just drop a 
> note and 
> I keep calm ;-).
> 
> BR,
> Carsten.
> 
> 
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Huge Time went backwards