OK, gathered a bit more data.
On 15.3.2006 23:44, Keir Fraser wrote:
>
> On 15 Mar 2006, at 19:52, Tomas Kopal wrote:
>
>> I was printing out real diff values (detecting min and max over periods
>> of time) and it varied about 40% around the latch value. I didn't want
>> to get too many false positives, so I set it to double the expected
>> value. As the problematic values tend to be quite high, I think this is
>> a safe threshold.
>
> 40% range is huge, given that Xen disables interrupts only for very
> short periods of time.
Sorry, I overshot. It's up to 30%. Here is part of my debug log, using
TSC. It was gathering minimum and maximum values over time approximately
15 minutes each line. It's still a lot though, but as these are absolute
extremes over long period of time, it may be not as bad most of the time.
(XEN) Stats: min_tsc = 7272785, max_tsc = 9728578
(XEN) Stats: min_tsc = 6315016, max_tsc = 10686693
(XEN) Stats: min_tsc = 7287942, max_tsc = 9717338
(XEN) Stats: min_tsc = 7349398, max_tsc = 9653272
(XEN) Stats: min_tsc = 7101256, max_tsc = 9898106
(XEN) Stats: min_tsc = 6246158, max_tsc = 10753919
(XEN) Stats: min_tsc = 6263384, max_tsc = 10999952
(XEN) Stats: min_tsc = 6207822, max_tsc = 10799607
(XEN) Stats: min_tsc = 6919892, max_tsc = 10073639
(XEN) Stats: min_tsc = 6137085, max_tsc = 10864224
(XEN) Stats: min_tsc = 6276877, max_tsc = 10724951
(XEN) Stats: min_tsc = 7151101, max_tsc = 9848466
(XEN) Stats: min_tsc = 7020142, max_tsc = 9978974
(XEN) Stats: min_tsc = 7002859, max_tsc = 9992022
>
> If you think that the timer ends up corrupting its count value, but
> continues counting in the mode we originally programmed it to, there
> would be no need for your patch to reprogram the timer. We could just
> clamp diff and let the timer continue to free-run from whatever value it
> corrupted itself to. Would that simpler patch, with no reprogramming,
> work for you?
I tried not to reset the timer and once the error appeared, it didn't go
away for quite a long time (it did disappear at the end though). It
definitely was not one time problem only. That leads me to believe that
the mode IS changed after all, and it switches to mode with
pre-programmed reset value, so the counter does not overflow as expected
with free-run and the result of the subtraction is flawed.
>
> I agree with your suspicion that this may be a channel-0 problem. The
> 40% value range points at some serious weirdness.
As you can see from the following snippet, if we can trust TSC, the
channel 0 is not affected when the error occur, the interrupts still
occur regularly.
(XEN) Stats: min_tsc = 6125635, max_tsc = 11387514
(XEN) Stats: min_tsc = 5518772, max_tsc = 11457692
(XEN) Stats: min_tsc = 6331729, max_tsc = 10671097
(XEN) PIT Timer HW error: 40750
(XEN) Stats: min_tsc = 6137681, max_tsc = 10868227
(XEN) Stats: min_tsc = 6372589, max_tsc = 10626930
(XEN) Stats: min_tsc = 6096761, max_tsc = 10902500
(XEN) Stats: min_tsc = 6218072, max_tsc = 10784801
(XEN) Stats: min_tsc = 6232108, max_tsc = 10775067
(XEN) Stats: min_tsc = 6088005, max_tsc = 10914563
(XEN) PIT Timer HW error: 39470
(XEN) Stats: min_tsc = 5037753, max_tsc = 11961922
(XEN) Stats: min_tsc = 6189690, max_tsc = 10810470
(XEN) Stats: min_tsc = 6247146, max_tsc = 10754276
Thanks
Tomas
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|