Ah, I changed my mind. I now think it's due to overflow of a 32-bit usecs
variable in Linux's do_gettimeofday(). Cause is still the same though -- the
guest assuming that the timestamp record from Xen will never be 'very' stale
(in this case, older than 1h12m).
-- Keir
On 18/7/08 12:01, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote:
> Okay, I'm pretty sure the reason your patch works is that system time as
> calculated in the guest is still being continually calibrated by
> local_time_calibration() in Xen. You do not turn off that calibration
> function and hence it will continually update the cpu_time structure with
> new timestamps and scale factors. Is this really intended?
>
> The reason mine doesn't work is that the time record exposed to the guest
> *never* changes. The calculation in the guest is thus effectively:
> sys_time = TSC * scale_factor
>
> The problem is that this is sensitive to small errors in the scale factor!
> And unfortunately we compute a new scale factor, to convert to us, as
> scale_factor_ns/1000. The resulting scale factor is obviously less accurate,
> and leads to an accumulating absolute error as the TSC value gets large.
>
> So, when all is said and done, what are we actually trying to achieve here
> and how should it really work? As far as I can see we cannot avoid sending
> an updated time record to guests periodically. But running
> local_time_calibration() only for guests and not for Xen system time seems
> pretty weird to me.
>
> Perhaps we should push this all off until after 3.3?
>
> -- Keir
>
> On 18/7/08 08:24, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote:
>
>> You shouldn't have to modify get_s_time(). Guests will still be running the
>> old algorithm (sys_stamp + (now_tsc - stamp_tsc) * scale_tsc), so existing
>> get_s_time() ought to work. In fact it must work.
>>
>> -- Keir
>>
>> On 18/7/08 00:05, "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx> wrote:
>>
>>> After trying lots of things, I fell back to my known
>>> working version of time.c and adapted it forward to
>>> match tip. The attached patch (on top of 18063) boots
>>> clocksource=tsc and doesn't display the weirdness for
>>> 2-1/2 hours of testing (always showed up around 1 hour before).
>>>
>>> It may not be elegant but it works and tip doesn't (with
>>> clocksource=tsc).
>>>
>>> Dan
>>>
>>>> -----Original Message-----
>>>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>>>> Sent: Wednesday, July 16, 2008 6:50 AM
>>>> To: dan.magenheimer@xxxxxxxxxx; Xen-Devel (E-mail)
>>>> Cc: Dave Winchell
>>>> Subject: Re: [PATCH] clocksource=tsc
>>>>
>>>>
>>>> That's a weird set of symptoms. Perhaps dom0's sense of system time is
>>>> diverging from Xen's? I don't see that CPUs can diverge, if
>>>> their TSCs are
>>>> in sync, since we shouldn't be dynamically modifying the
>>>> per-CPU timestamps
>>>> or scale factors.
>>>>
>>>> -- Keir
>>>>
>>>> On 16/7/08 13:43, "Dan Magenheimer"
>>>> <dan.magenheimer@xxxxxxxxxx> wrote:
>>>>
>>>>> Well now I have to take that back. It DOESN'T work yet.
>>>>> I think I am experiencing "Weirdness can happen..."
>>>>> when booting with clocksource=tsc... I was away from
>>>>> the machine overnight but the symptoms I've seen before
>>>>> are that the system becomes less snappy and eventually
>>>>> grinds to a near-halt.
>>>>>
>>>>> Oddly, I can login most of the way on the console
>>>>> and launch new xterm's in my VNC display, but I never
>>>>> get a prompt, and I can't interrupt a process I left
>>>>> running overnight in another xterm. The time display
>>>>> in gnome seems to have frozen about an hour after
>>>>> I booted. Pinging the machine works but ssh'ing to
>>>>> it doesn't ("Connection closed")
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>>>>>> Sent: Tuesday, July 15, 2008 10:12 PM
>>>>>> To: 'dan.magenheimer@xxxxxxxxxx'; 'Keir Fraser';
>>>> 'Xen-Devel (E-mail)'
>>>>>> Cc: 'Dave Winchell'
>>>>>> Subject: RE: [PATCH] clocksource=tsc
>>>>>>
>>>>>>
>>>>>> OK, ignore that. It looks like you have it all fixed
>>>>>> in 18060. I tried it and it now boots. Thanks!
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>>>>>>> Sent: Tuesday, July 15, 2008 7:16 PM
>>>>>>> To: 'dan.magenheimer@xxxxxxxxxx'; 'Keir Fraser'; 'Xen-Devel
>>>>>> (E-mail)'
>>>>>>> Cc: 'Dave Winchell'
>>>>>>> Subject: RE: [PATCH] clocksource=tsc
>>>>>>>
>>>>>>>
>>>>>>>>>> Returning to 32-bit read_counter(), and having NULL
>>>>>>>>> read_counter when
>>>>>>>>>> clocksource=tsc would be another possibility...
>>>>>>>
>>>>>>> Well I hacked on 18055 for awhile and just couldn't get it
>>>>>>> to boot. I think local_time_calibration() (and thus
>>>>>>> init_percpu_time()) is necessary for boot, though I'm not really
>>>>>>> sure why. Possibly the "Weirdness can happen..." comment in
>>>>>>> that routine?
>>>>>>>
>>>>>>> Anyway, this patch (on top of 18055) DOES work, returns to the
>>>>>>> 32-bit read_counter, and re-enables local_time_calibration().
>>>>>>> I'd suggest putting off more major surgery for another day.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dan
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>>>>>>>> Sent: Tuesday, July 15, 2008 10:04 AM
>>>>>>>> To: dan.magenheimer@xxxxxxxxxx; Keir Fraser; Xen-Devel (E-mail)
>>>>>>>> Cc: Dave Winchell
>>>>>>>> Subject: RE: [PATCH] clocksource=tsc
>>>>>>>>
>>>>>>>>
>>>>>>>> Hmmm... 18055 also fails to boot on my machine.
>>>>>>>>
>>>>>>>> Could we perhaps fall back to my original patch and do
>>>>>>>> cleanup later/separately? I also want to try implementing
>>>>>>>> an hpet64-based get_s_time() so will be working more
>>>>>>>> in this code later... but want to get clocksource=tsc
>>>>>>>> working now with minimal code impact given the freeze.
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>>>>>>>>> Sent: Tuesday, July 15, 2008 9:46 AM
>>>>>>>>> To: 'Keir Fraser'; 'Xen-Devel (E-mail)'
>>>>>>>>> Cc: 'Dave Winchell'
>>>>>>>>> Subject: RE: [PATCH] clocksource=tsc
>>>>>>>>>
>>>>>>>>>> Actually in this mode of operation we hardly need a platform
>>>>>>>>>> timer *at all*.
>>>>>>>>>> The idea is that we let the TSCs free-run, because we know
>>>>>>>>>> they will behave.
>>>>>>>>>> Returning to 32-bit read_counter(), and having NULL
>>>>>>>>> read_counter when
>>>>>>>>>> clocksource=tsc would be another possibility...
>>>>>>>>>
>>>>>>>>> That's essentially what the original tscstable.patch did, though
>>>>>>>>> I was perhaps much uglier in the miscellaneous parts.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|