|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Large system boot problems
Here is some debug of the large memory / pmtimer issue.
(for background see [Xen-devel] Test results on Unisys ES7000 64x 256gb using
unstablec/s 16693 on 3.2.0 Release Candidate from Jan 9, 2008)
Symptom: A system with lots of CPUs and memory can fail
to boot up properly. Dom0 gets time going backwards
errors and effectively hangs during initialization.
The cause of dom0's init failure is due to it
using bogus values for CPU0's speed, while the other
CPUs have proper speed info.
Workarounds: Increasing the memory retained by the
Hypervisor by either a dom0_mem or a xenheap arg will delay
the start of dom0 enough (while that memory is scrubbed)
that the HV cpu speed calculation will self-correct.
Changing the timer used can also work (pit works for me)
but basically it's a race and I expect that
with the right hardware situation it could fail too.
Details: With either pmtimer or pit the initial calculation
done for CPU0 speed is bad (at least on a large system). If the
dom0 starts quickly enough that it reads the bad CPU speed
data from the Hypervisor shared area before the Hypervisor
corrects it, dom0 is in trouble.
Debug details:
When the Xen boot has sized memory, detected and
booted all the CPUs, and gets to the point of
(XEN) ENABLING IO-APIC IRQs
init_percpu_time gets called for the CPU0 and the
cpu_time values recorded are:
(XEN) dump_cpu_time cpu0 addr ffff828c801ca520
(XEN) local_tsc_stamp 1691332805
(XEN) stime_master_stamp 0
(XEN) stime_local_stamp 0
(XEN) Platform timer overflows in 234 jiffies.
(XEN) Platform timer is 3.579MHz ACPI PM Timer
Then domain 0 is loaded, and local_time_calibration for CPU0
gets called and actually does something. The "out count" below
indicates that it was called 315 times and due to
if ( ((s64)stime_elapsed64 < (EPOCH / 2)) )
effectively did nothing on those calls.
The result of the calculations in local_time_calibration
with the huge difference in the tsc value screws up pretty
badly:
(XEN) local_time_calibration error factor cpu0 is 0x80000000. out count
315
(XEN) PRE0: tsc=1691332805 stime=0 master=0
(XEN) CUR0: tsc=33466953185 stime=9345455787 master=4641208868 ->
-4704246919
(XEN) calibration_mul_frac 4ac8a18d tsc_shift -2
The bogus values here are then used by dom0 to incorrectly determine
the frequency of CPU0, while all other CPUs have correct values.
Xen reported: 13692.820 MHz processor.
For the HV, this self corrects, as the next time local_time_calibration
gets called the data in cpu_time is properly set. But the damage
has been done and dom0 struggles to make progress and reports
time going backwards, etc.
The reason that limiting the memory given to dom0 fixes
the problem is that the loop that scrubs the memory
that the HV is keeping (scrub_heap_pages) periodically
calls process_pending_timers and if there is enough memory
there, then the correction happens before dom0 starts.
This recalls a comment from a vendor a few months ago
where they said you needed to add a xenheap arg to make
large memory work.
When doing clocksource=pit a similar thing happens where
the initial calc is bad, but it gets fixed before dom0
gets going (debug from PIT):
(XEN) dump_cpu_time cpu0 addr ffff828c801ca520
(XEN) local_tsc_stamp 226384274
(XEN) stime_master_stamp 0
(XEN) stime_local_stamp 0
(XEN) Platform timer overflows in 2 jiffies.
(XEN) Platform timer is 1.193MHz PIT
there are no "goto out's" taken, the next call to local_time_calibration
does the bad calculation
(XEN) Scrubbing Free RAM: .local_time_calibration error factor cpu0 is
0x80000000. out count 0
(XEN) PRE0: tsc=226384274 stime=0 master=0
(XEN) CUR0: tsc=35424564759 stime=10351900878 master=1052517641 ->
-9299383237
(XEN) calibration_mul_frac 7a7b2a1a tsc_shift -5
next call to local_time_calibration fixes it..
(XEN) calibration_mul_frac 969714d2 tsc_shift -1
and dom0 get the right stuff
Xen reported: 3399.956 MHz processor.
Looking for ideas or suggestions on how to solve this issue.
Ideally we'd be able to prevent the bogus calculation in the
first place.
Bill
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|