WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB sys

To: "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
From: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
Date: Thu, 17 Dec 2009 20:36:36 -0800
Cc: Kurt Hackel <kurt.hackel@xxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Delivery-date: Thu, 17 Dec 2009 20:37:58 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi,

I finally solved a hang on a 1TB box during our dom0 boot on xen 3.4.0,
that I'd been working on. The hang comes from:

calibrate_delay_direct():
....
        for (i = 0; i < MAX_DIRECT_CALIBRATION_RETRIES; i++) {
                pre_start = 0;
                start_jiffies = jiffies;
                while (jiffies <= (start_jiffies + tick_divider)) {
                        pre_start = start;
                        read_current_timer(&start);
                }
                read_current_timer(&post_start);
...


start_jiffies is set to : INITIAL_JIFFIES == 0xfffedb08

now, timer interrupt comes in and finding delta to be rather
huge (thanks to the page scrubbing of 1TB in xen), makes jiffies
wrap around. This causes hang in the loop, that would resolve after
say several days.

delta: 940b7d68a4, jiffies:00009f8b


I came up with fix (is there a reason it doesn't use 64bit values?) :

             while (jiffies <= (start_jiffies + tick_divider)) {
                   pre_start = start;
                   read_current_timer(&start);
+                  if (jiffies < start_jiffies)  /* jiffies wrapped */
+                          start_jiffies = jiffies;
             }


The other fix I thought of was to change INITIAL_JIFFIES to something 
sooner.


Would appreciate any help, I don't understand xen time management well.

thanks,
Mukesh


PS: I'm attaching output of 'xm debug-key t'.

Attachment: skew.out
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel