WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)
From: Steve Allison <stalks@xxxxxxxxxxx>
Date: Sun, 03 Jul 2011 12:03:14 +0100
Delivery-date: Sun, 03 Jul 2011 04:05:16 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110702161623.62014j3r45n6awuf@xxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <20110514110535.GA13248@xxxxxxxxxxxxxxxxxx> <1309647206428-4545936.post@xxxxxxxxxxxxx> <20110702161623.62014j3r45n6awuf@xxxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0
On 03/07/2011 00:16, Dave Stevens wrote:
Quoting Andy Lee <yikes2000@xxxxxxxxx>:


Dmitry Nedospasov wrote:

I was watching some logs on a domU today and i suddenly noticed that the
timestamps were off by something on the order of 47 seconds. I was
surprised because *I don't* run independent wall clocks. I checked
some other domUs and the "drift" was also very close to that of the
first domU.

I also checked another dom0, Here the domUs were "only" out of sync by
~11 seconds.

The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also
debian squeeze and utilizing PV with the ParaVirtOPs in the normal
debian linux-image-2.6.32 kernel.


I've been fighting this problem (clock running +47 seconds) for several
months. My OS setup is like yours, dom0 is Debian Squeeze x64 running Xen
4.0.1-2.  DomU's are Debian Squeeze x64 or Lenny x86:

A friend of mine has been suffering from this same issue, and have yet to find a solution.

Running Squeeze dom0, with a mixture of pvgrub domU's running more Squeeze and CentOS, and 3 Windows HVM domU's. I also believe it was a Supermicro machine, although I'll get confirmation later.

The time movement is also in the region of 48 seconds, but it causes catastrophic failure of Windows HVM domU's. The HVM domU's will BSOD, or just restart after the time shift. He has also suffered from spontaneously restarting dom0 which he never found the cause of but was suspecting the time shift was related. The same machine running CentOS & KVM.

The machine was running Debian Lenny with Xen 3.x for some time. It had the same time issues but seemed to only be warnings and didn't have any other symptoms. Since the move to Debian Squeeze and Xen 4.0.1, he has been plagued with this issue, and was forced to reluctantly explore and learn CentOS with KVM after hours of troubleshooting Xen.

The dom0 restarts would happen randomly and in the range of 1 hour or 1.5 days.

Don't think he gave up easily though, he tried every combination of

Debian Xen Kernel / Kernel from Jeremy, 2.6.32-*
Debian Squeeze Xen 4.0.1 / Compiling Xen from source, 4.1.1

I am asking him to write a mail which either he will reply here himself or I'll pass on. I have pieced this mail together using what correspondence we have had over the last couple of weeks.
=======================

On boot dmesg would show

[    0.064660] PM-Timer failed consistency check  (0x0xffffff) - aborting.
=======================

This log was captured prior to HVM deaths and dom0 reboots with the following..

[31853.028654] hrtimer: interrupt took 48149483 ns
=======================

Some process crashes on domU would show this during heavy I/O, don't know if its related...

[266640.072386] INFO: task flush-202:3:8547 blocked for more than 120 seconds. [266640.072393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [266640.072400] flush-202:3 D 0000000000000002 0 8547 2 0x00000000 [266640.072410] ffff88001fd81c40 0000000000000246 0000000000000000 0000000000000000 [266640.072424] 0000000000000001 0000000000000001 000000000000f9e0 ffff8800136dbfd8 [266640.072437] 0000000000015780 0000000000015780 ffff88001df58e20 ffff88001df59118
[266640.072451] Call Trace:
[266640.072458]  [<ffffffff8102cdcc>] ? pvclock_clocksource_read+0x3a/0x8b
[266640.072467]  [<ffffffff8110e16e>] ? sync_buffer+0x0/0x40
[266640.072474]  [<ffffffff8110e16e>] ? sync_buffer+0x0/0x40
[266640.072481]  [<ffffffff812fb0d2>] ? io_schedule+0x73/0xb7
[266640.072489]  [<ffffffff8110e1a9>] ? sync_buffer+0x3b/0x40


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users