Hi,
I have found another event from "Time want backwards" with my 3.3.1, so I
analyzed
a bit deeper and systematically. Maybe there is something, you recognize. Sorry
this post is so long.
Short summary of system: Gigabyte GA-M56S-S3, AMD 4050e Dualcore, Debian Etch as
base for a self-compiled Xen 3.3.1 with 2.6.18.8 besides the Firewall. Dom0 is
64bit, there is one 64bit DomU, all others are 32bit. The 64bit DomUs and one
32bit DomU have 2 VCPUs, the others have one only.
I use munin to monitor only a few values, two of them are cpu usage of domains
and
residency on the four available P-States. I noticed a peak at 06:25. I found
out that
it's because all Domains are running cron.daily, in addition munin-cron fires
up (like
it does all five minutes).
I generally reduced "Time went backwards" messages by setting sample_rate of
the ondemand
govenor to 4sec, using newer Kernels and spreading crons, but this is all cure
on symtoms,
I guess.
Today, at 06:25, there was another Time went Backwards in Dom0:
Apr 7 06:25:01 data /USR/SBIN/CRON[24074]: (munin) CMD (if [ -x
/usr/bin/munin-cron...
Apr 7 06:25:01 data /USR/SBIN/CRON[24075]: (root) CMD (if [ -x
/etc/munin/plugins/apt_all...
Apr 7 06:25:03 data kernel: Timer ISR/0: Time went backwards: delta=-32282595
delta_cpu=43717405
shadow=375953403877760 off=31846903
processed=375953468006000
cpu_processed=375953392006000
Apr 7 06:25:03 data kernel: 0: 375953392006000
Apr 7 06:25:03 data kernel: 1: 375953464006000
It's 32ms, nothing unusual. It's just after munin-cron, I guess the 2 seconds
are needed
before the graphic calculation starts. So we have a system sleeping well at
1,0GHz, that
jumps to 2,1GHz. Right then, we have the message.
I checked with all other Domains. In terms of "Time went backwards", there was
only one
other Domain, spock, the 32bit 2 VCPU Domain:
Apr 7 06:22:37 spock -- MARK --
Apr 7 06:25:04 spock kernel: Timer ISR/1: Time went backwards: delta=-25302545
delta_cpu=-21302545
shadow=375953435562320 off=7458
processed=375953460871440
cpu_processed=375953456871440
Apr 7 06:25:04 spock kernel: 0: 375953452871440
Apr 7 06:25:04 spock kernel: 1: 375953456871440
Again, it was sleeping. It comes up with a different delta. But what made me
curious is: the
last 6 digits of the per_cpu(processed_system_time) are invariant in bot cases.
I am not expert
enough to judge, whether this is right.
When browsing through the syslogs, I also saw another event that mentioned TSC:
all DomUs
besides the 64 Bit DomUs mention TSC unstability:
Apr 2 22:02:22 shields kernel: Clocksource tsc unstable (delta = -263023689 ns)
Apr 2 22:02:32 tuvok kernel: TSC appears to be running slowly. Marking it as
unstable
Apr 2 22:02:34 kes kernel: TSC appears to be running slowly. Marking it as
unstable
Apr 2 22:02:33 uhura kernel: TSC appears to be running slowly. Marking it as
unstable
Apr 2 22:02:34 worf kernel: TSC appears to be running slowly. Marking it as
unstable
You have to know that this was shortly after a reboot. When rebooting, I will
start up
all DomUs by xendomains with a 15sec delay. So I thought that it is quite
unusual that
they find out about unstable TSC at the same time. So I looked up Dom0:
Apr 2 22:02:19 data ntpd[6325]: synchronized to 212.112.228.242, stratum 2
Apr 2 22:02:19 data ntpd[6325]: time reset -2.027414 s
Apr 2 22:02:19 data ntpd[6325]: kernel time sync enabled 0001
I guess *this* could be conected with Tims patch.
As promised, I will shortly try to set up other versions of Xen and later
kernels. But honestly,
I don't expect the situation to change, as I normally follow the patches build
into mercurial.
Best Regards,
Carsten.
----- Originalnachricht -----
Von: Carsten Schiers <carsten@xxxxxxxxxx>
Gesendet: Mon, 6.4.2009 23:19
An: dan.magenheimer <dan.magenheimer@xxxxxxxxxx> ; xen-devel
<xen-devel@xxxxxxxxxxxxxxxxxxx>
Cc: Tim.Deegan <Tim.Deegan@xxxxxxxxxx>
Betreff: AW: RE: AW: RE: [Xen-devel] Huge Time went backwards
Thanks Dan.
As I lost my setup (although I could have restored that - it was an lvm), I set
up everything new. Today I compiled Xen 3.3.1, 3.3-testing and 3.4-unstable and
two kernel pairs for 64 and 32 bit (with and without MSI support; I have some
issues with that in one of my DomUs that gets four PCI devices passed thru).
As I am unsure whether I can simply install xen-unstable tools over 3.3.1, I
will
set up a copy of my Dom0 and install it there. I modified the extra version so
that I can use the same boot dir and still can distinguish between the kernels.
I still have not found anything like tagging for the kernel; the tar-ball for
Xen-3.3.1 will not compile with my Xen. So for kernels, it's now the one from
today in any case.
Next step is to carefully start testing, beginning with MSI and Xen 3.4, as I
expect the best results there. Then I will shorten the ondemand sampling rate,
which is now at roughly 4 sec. This is to prevent the core to jump their clock
rate too often.
In parallel, I will try to find out where exactly this message is produced and
what it means.
I will report ;-)
Andy yes, it's all pv, no hvm. And all DomUs get their time from Dom0/ntpd.
And there seems to be no drift.
Best Regards,
Carsten.
-----Ursprüngliche Nachricht-----
Von: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
Gesendet: Montag, 6. April 2009 23:00
An: Carsten Schiers; xen-devel
Cc: Tim.Deegan
Betreff: RE: AW: RE: [Xen-devel] Huge Time went backwards
Hi Carsten --
I think domain0 (and all PV domains) only use the
one paravirtualized clock based on xen system time.
Changing clock options such as hpet and notsc will
not affect a PV domain, only an HVM domain.
Dan
> -----Original Message-----
> From: Carsten Schiers [mailto:carsten@xxxxxxxxxx]
> Sent: Friday, April 03, 2009 11:11 AM
> To: Dan Magenheimer; xen-devel
> Cc: Tim.Deegan
> Subject: AW: RE: [Xen-devel] Huge Time went backwards
>
>
> > Interesting. This is reported booting dom0, correct? Are
> > you running NTP in dom0?
>
> It's in Dom0 log when powernow-k8 is loaded, which is after
> loading loop
> and prior on
> mounting disks. Roughly 4 seconds prior to starting ntpd, so
> I guess it
> has no interaction.
>
> It feels a bit like that 3.3.1 and 3.4 set values differently
> in the CPU
> cores, because
> when rebooting the same Xen version two or three times again,
> it's away.
> It will come back
> with either Xen version when you switch version. At least it
> feels that
> way.
>
> >> If I get on your nerves with my time keeping issues
> >
> >No, it is good to raise awareness of these issues until they
> >are all fixed. ESPECIALLY if you see time problems in xen-unstable,
> >it would be good to get them fixed before 3.4 is final.
>
> Unfortunately, I broke my unstable. Also, it's a
> semi-productive family
> server, so slots to
> Test are a bit rare. I try my best to set it up again and do more
> testing.
>
> I have this Time went backwards issues from the beginning.
> Now I tried
> to set HPET to 32bit,
> although I don't know whether a) BIOS is read to use 32bit instead of
> 64bit, or b) this makes
> any difference, or c) HEPT and TSC have something in common when it
> comes to TSC drifts related
> to power management.
>
> I also wondered, whether I should try booting with notsc, but it's a
> dual core and I think then
> I need TSC, or don't I?
>
> BR,
> Carsten.
>
>
> -----Original Message-----
> From: Carsten Schiers [mailto:carsten@xxxxxxxxxx]
> Sent: Friday, April 03, 2009 6:52 AM
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-devel] Huge Time went backwards
>
>
> Just an observation, to whom it may concern: when booting
> between 3.3.1
> and
> current xen-3.4-unstable, right after loading powernow-k8,
> there will be
> one huge
> Time went backwards messages (500ms upt to 1,5s), which disappears
> unless
> you change Xen version again.
>
> BTW: If I get on your nerves with my time keeping issues, just drop a
> note and
> I keep calm ;-).
>
> BR,
> Carsten.
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|