Re: [Xen-devel] Clock jumped 50 minutes in dom0 caused incorrect

To:	Mark Adams <mark@xxxxxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
From:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date:	Wed, 06 Oct 2010 09:23:06 -0700
Cc:	Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Wed, 06 Oct 2010 09:23:59 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20101006161529.GA3635@xxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20101006111618.GA31233@xxxxxxxxxxxxxxxxxx> <4CAC98BF.9010902@xxxxxxxx> <20101006161529.GA3635@xxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc13 Lightning/1.0b3pre Thunderbird/3.1.4

 On 10/06/2010 09:15 AM, Mark Adams wrote:
> On Wed, Oct 06, 2010 at 08:41:51AM -0700, Jeremy Fitzhardinge wrote:
>>  On 10/06/2010 04:16 AM, Mark Adams wrote:
>>> Hi Xen-Devel's
>>>
>>> Please see my note below regarding a serious issue where my clock jumped
>>> in dom0. I'm sending this through to the devel list as I haven't managed
>>> to glean any clear help from xen-users and the debian bug team are
>>> unsure what could have caused this.
>>>
>>> Can you confirm if the kernel or xen controls the clock in dom0? I also
>>> understand that this could be an underlying hardware issue but I have
>>> another system on exactly the same hardware which hasn't had this occur.
>> The kernel manages its own time, but it uses the Xen system clock as its
>> timebase.  If the Xen system clock is unstable for some reason, then it
>> will affect the kernel's timekeeping.
>>
>> Nothing should be using the tsc clocksource, so I'm not sure why its
>> reporting any kinds of messages.  No PV Xen domain can expect the raw
>> tsc to be stable.
> The message was reported in dom0, not domU.

Dom0 is a normal PV domain.  It just has a few more privileges than a
regular domU.

>> But the tsc is the basis for the Xen clocksource, and if the tsc is
>> unstable in unexpected ways then it can affect Xen timekeeping.  This
>> can be caused by certain power management modes.
>>
>>> Any advice on how to investigate further or ensure better clock
>>> stability across dom0 and domU would be appreciated. 
>> What type of system is it?  How many CPUs?  What CPU vendor?
> It is a Tyan S7010AGM2NRF with 2 intel quad core Xeon E5620 CPU's.

I forget all the magic options that can affect timekeeping (cc:d Dan,
since this stuff is close to his heart).

    J

> Thanks,
> Mark
>
>>> Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
>>> the time moves this much? My guess is that the domU crashed when the
>>> time changed, and was thus rebooted automatically. Strangely the Windows
>>> 2003 server didn't get rebooted.
>> I don't think there would be any direct connection between the dom0 time
>> jump and Windows dying, but if the CPU's tsc and/or Xen's timekeeping is
>> unstable, then Windows might also see a similar time jump and react badly.
>>
>>     J
>>
>>> If you need any more info to help please let me know.
>>>
>>> Thanks,
>>> Mark
>>>
>>> On Mon, Oct 04, 2010 at 01:00:51PM +0100, Mark Adams wrote:
>>>> On Mon, Oct 04, 2010 at 11:01:10AM +0100, Mark Adams wrote:
>>>>> Hi All,
>>>>>
>>>>> Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.
>>>>> Today I noticed (when kerberos to the domain controllers stopped
>>>>> working..) that the clock was 50 minutes out in dom0 -- This caused the
>>>>> HVM windows domain controllers to have the wrong time.
>>>>>
>>>>> I'm not sure if this is a kernel issue or a xen issue, but the only
>>>>> thing related is I can see the following in the kernel log:
>>>>>
>>>>> Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable 
>>>>> (delta = -2999660303788 ns)
>>>>>
>>>>> But I also see in the dmesg log that xen is using it's own clock.
>>>>>
>>>>> [    7.676563] Switching to clocksource xen
>>>>>
>>>>> I can't identify anything else in the logs to indicate when the time
>>>>> might have changed. I have a few other dom0 at the same level that
>>>>> haven't decided to change the time.
>>>>>
>>>>> Can anyone confirm whether xen controls the time or the kernel? Also
>>>>> when I corrected the time in dom0 it was still wrong in HVM domU -- How
>>>>> long does it take for this to propogate? (I rebooted the VM's to correct
>>>>> it immediately).
>>>>>
>>>>> Any other pointers on how to ensure stability of clocks from dom0 to
>>>>> domU HVM hosts (and pv for that matter..) would be appreciated.
>>>> Some further info on this, It appears the HVM domU (windows server 2008)
>>>> unexpectedly shut down at 18:51, after the unstable clocksource error.
>>>> qemu-dm logs show a reset "reset requested in cpu_handle_ioreq." and
>>>> xend.log shows a reboot 
>>>>
>>>> [2010-10-02 18:51:03 1759] INFO (XendDomainInfo:2088) Domain has shutdown: 
>>>> name=ha-dc1 id=2 reason=reboot.
>>>>
>>>> This is like someone issuing "xm reboot domain" is it not? Is it
>>>> possible that xen could have issued this reboot itself due to a crash? I
>>>> can't see any crash logs.
>>>>
>>>> Cheers,
>>>> Mark
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>>>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Clock jumped 50 minutes in dom0 caused incorrect 2008 R2