WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Test results on Unisys ES7000 64x 256gb using unstablec/

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] Test results on Unisys ES7000 64x 256gb using unstablec/s 16693 on 3.2.0 Release Candidate
From: Bill Burns <bburns@xxxxxxxxxx>
Date: Wed, 30 Jan 2008 11:20:40 -0500
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Carb, Brian A" <Brian.Carb@xxxxxxxxxx>
Delivery-date: Wed, 30 Jan 2008 08:21:11 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C3C39267.1B7C8%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C3C39267.1B7C8%Keir.Fraser@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.12 (X11/20071129)
Keir Fraser wrote:
> On 28/1/08 14:02, "Bill Burns" <bburns@xxxxxxxxxx> wrote:
> 
>> Ok, some progress. Background is that 3.1.2 (and 3.1.3 at least
>> as it was a wek or two ago) fails to boot on a 64 CPU es7000 with
>> over 112GB of memory. This is with both HV & dom0 being x86_64.
>> The symptom is that the dom0 kernel gets time went backwards
>> error during init.
>>
>> The patch at which this first fails is 15137, which is the patch
>> that introduces using the ACPI PM timer as the clock
>> source. If I include the next patch (that allows for clock
>> selection) and choose pit as clock source the system boots
>> fine. Without the arg the ACPI timer is used and I get the hang.
> 
> The obvious question then is what happens to the ACPI PM timer when dom0
> gets more than 112GB of memory. Perhaps it's worth adding some tracing to
> Xen and see whether e.g., the platform timer stops running?
> 
>  -- Keir
> 
> 

I enabled the printk in local_time_calibration in Xen's time.c
and added a similar one to init_cpu_khz in time-xen.c in the
dom0 kernel.

The hypervisor outputs many line like:
        (XEN) ---10: 00000000 9697086f -1
where the key value is always 969xxxxx...

Until we get to scrubing free ram:

        (XEN) Initrd len 0x894600, start at 0xffffffff80702000
        (XEN) Scrubbing Free RAM: ---0: 80000000 498c0b61 -2
        (XEN) .done.

The bogus 498c0b61 value is seen by the dom0 kernel and is
used to improperly calculate the CPU speed. A further printk
in the dom0's get_time_values_from_xen shows that all the CPUs
except the first have good values, leading right into the
first time went backwards message...

ACPI: Core revision 20060707
get_time_values_from_xen tsc_to_nsec_mul 498c0b61 ver 2
Initializing CPU#1
get_time_values_from_xen tsc_to_nsec_mul 969703ce ver 2
Initializing CPU#2
get_time_values_from_xen tsc_to_nsec_mul 9697099e ver 2
Initializing CPU#3
get_time_values_from_xen tsc_to_nsec_mul 9697068c ver 2
Initializing CPU#4
get_time_values_from_xen tsc_to_nsec_mul 96970374 ver 2
Initializing CPU#5
get_time_values_from_xen tsc_to_nsec_mul 96970a55 ver 2
Initializing CPU#6
get_time_values_from_xen tsc_to_nsec_mul 96970a7c ver 2
Initializing CPU#7
get_time_values_from_xen tsc_to_nsec_mul 96970952 ver 2
Initializing CPU#8
get_time_values_from_xen tsc_to_nsec_mul 969708f3 ver 2
Initializing CPU#9
get_time_values_from_xen tsc_to_nsec_mul 969708f8 ver 2
Initializing CPU#10
get_time_values_from_xen tsc_to_nsec_mul 96970a55 ver 2
Initializing CPU#11
get_time_values_from_xen tsc_to_nsec_mul 969706f6 ver 2
Initializing CPU#12
get_time_values_from_xen tsc_to_nsec_mul 96970bdc ver 2
Initializing CPU#13
get_time_values_from_xen tsc_to_nsec_mul 9697069b ver 2
Initializing CPU#14
get_time_values_from_xen tsc_to_nsec_mul 96970997 ver 2
Initializing CPU#15
get_time_values_from_xen tsc_to_nsec_mul 969707c5 ver 2
Initializing CPU#16
get_time_values_from_xen tsc_to_nsec_mul 969707ff ver 2
Initializing CPU#17
get_time_values_from_xen tsc_to_nsec_mul 969707aa ver 2
Initializing CPU#18
get_time_values_from_xen tsc_to_nsec_mul 9697062a ver 2
Initializing CPU#19
get_time_values_from_xen tsc_to_nsec_mul 969707d7 ver 2
Initializing CPU#20
get_time_values_from_xen tsc_to_nsec_mul 969709be ver 2
Initializing CPU#21
get_time_values_from_xen tsc_to_nsec_mul 9697096f ver 2
Initializing CPU#22
get_time_values_from_xen tsc_to_nsec_mul 96970902 ver 2
Initializing CPU#23
get_time_values_from_xen tsc_to_nsec_mul 969709a8 ver 2
Initializing CPU#24
get_time_values_from_xen tsc_to_nsec_mul 96970778 ver 2
Initializing CPU#25
get_time_values_from_xen tsc_to_nsec_mul 969705ad ver 2
Initializing CPU#26
get_time_values_from_xen tsc_to_nsec_mul 96970b44 ver 2
Initializing CPU#27
get_time_values_from_xen tsc_to_nsec_mul 96970974 ver 2
Initializing CPU#28
get_time_values_from_xen tsc_to_nsec_mul 96970bb4 ver 2
Initializing CPU#29
get_time_values_from_xen tsc_to_nsec_mul 969708c1 ver 2
Initializing CPU#30
get_time_values_from_xen tsc_to_nsec_mul 96970c23 ver 2
Brought up 32 CPUs
Initializing CPU#31
get_time_values_from_xen tsc_to_nsec_mul 9697085b ver 2
get_time_values_from_xen tsc_to_nsec_mul 969703ce ver 2
get_time_values_from_xen tsc_to_nsec_mul 498c0b61 ver 2
Timer ISR/0: Time went backwards: delta=-35017583219 delta_cpu=10416781
shadow=9160708347 off=11318417225 processed=55496708347 
cpu_processed=20468708347

Note that this Hypervisor was only built for 32 CPUs,
not all 64.

So the problem seems to occur in the HV itself when it
tries to scrub the free memory. Funny that when it has
lots to scrub, when dom0 is restricted to less memory,
there is no issue. But then there is little to scrub,
but a lot of memory for dom0, things go wrong.

 Bill


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>