WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Timer going backwards and Unable to handle kernel NULL point

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Timer going backwards and Unable to handle kernel NULL pointer
From: "Robert Hulme" <rob@xxxxxxxxxxxx>
Date: Wed, 16 May 2007 09:21:41 +0100
Delivery-date: Wed, 16 May 2007 01:20:01 -0700
Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=OuMsTEjsoywj0pk4TQkipzC4MgDeEPnlsjPY+XRMFWY168RLTempItIZH51hA3LDS34kYEKOm8dcqZ8Gu43zDqlv9divP/a6touPhVVnO8EKNCwLRFAQeVdX85Kebf3TAxOtqWb2dd1oGXApr7YtYpYDX91sI6O7YGdFolQTu4w=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=XF0a7cmS2Hf6HpuieDM1bnYT7jl8Kf5+ujHWv3SbhCbki3N6jjmjqkfSQ8nvAMXXP+0WJUVRna3FXFFZj92vtpMaIw/eA/NKJotK7+y7rAzlSm9WSdFm6yMwKEGQDNtOP9LwAIAYLTu5vr+SXKv2kMtdATqzwENkj7DQmkxGlTg=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi,

I wonder if anyone has any idea what the cause of these two problems
is (and if they're related).

I've got a Thunder K8SRE with two dual core Opteron processors and
eight 1 GB SDRAM sticks in it.

Over the last few months I've been experiencing weird crashes with it
which at first I thought was because of a dodgy motherboard, so I
replaced the motherboard, then it looked like the PSU was dodgy (well,
it stopped working), so that got replaced. Finally when that didn't
fix the problem I replaced the CPUs. Unfortunately the problem
continued.

At the time I was running some ancient version of Debian unstable
AMD64 along with a not up to date version of Xen, so I decided the
cause could be software related. So I've now installed Debian stable
and the latest Xen stable source tarball from Xensource.


Booting in to SMP mode resulted in this:

May 16 00:43:37 weebl kernel: Timer ISR/3: Time went backwards:
delta=-12967620 delta_cpu=217015619 shadow=46263756493 off=423276508
processed=46700000000 cpu_processed=46470016761
May 16 00:43:37 weebl kernel:  0: 46680000000
May 16 00:43:37 weebl kernel:  1: 46700016761
May 16 00:43:37 weebl kernel:  2: 45670016761
May 16 00:43:37 weebl kernel:  3: 46470016761
May 16 00:43:37 weebl kernel: Timer ISR/1: Time went backwards:
delta=-14213208 delta_cpu=-4229969 shadow=46311295002 off=434492777
processed=46760000000 cpu_processed=46750016761
May 16 00:43:37 weebl kernel:  0: 46760000000
May 16 00:43:37 weebl kernel:  1: 46750016761
May 16 00:43:37 weebl kernel:  2: 46700016761
May 16 00:43:37 weebl kernel:  3: 46680016761
May 16 00:43:37 weebl kernel: Timer ISR/1: Time went backwards:
delta=-10857604 delta_cpu=9125635 shadow=46311295002 off=457847726
processed=46780000000 cpu_processed=46760016761
May 16 00:43:37 weebl kernel:  0: 46780000000
May 16 00:43:37 weebl kernel:  1: 46760016761
May 16 00:43:37 weebl kernel:  2: 46700016761
May 16 00:43:37 weebl kernel:  3: 46680016761
May 16 00:43:37 weebl kernel: Timer ISR/1: Time went backwards:
delta=-13339199 delta_cpu=6644040 shadow=46311295002 off=475366135
processed=46800000000 cpu_processed=46780016761
May 16 00:43:37 weebl kernel:  0: 46800000000
May 16 00:43:37 weebl kernel:  1: 46780016761
May 16 00:43:37 weebl kernel:  2: 46700016761
May 16 00:43:37 weebl kernel:  3: 46680016761
etc...

Which appears to be this bug:
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=195

So I rebooted with nosmp and things seemed much better. This was at
1AM, but when I woke up this morning although all the domains were
reachable by ping (the box is colocated) I couldn't ssh to them or
access them on port 80 etc.

I rebooted the system remotely back in to the earlier version of Xen /
Debian and found the following errors in the log:

May 16 06:28:34 weebl kernel: PGD 11d4ab067 PUD 11db77067 PMD 0
May 16 06:28:34 weebl kernel: CPU 0
May 16 06:28:34 weebl kernel: Modules linked in: xt_physdev
iptable_filter ip_tables x_tables bridge ipv6 button ac battery raid5
xor ide_cd cdrom serio_raw pcspkr i2c_nforce2 i2c_core floppy
dm_mirror dm_snapshot dm_mod raid1 ide_generic sd_mod tg3 sata_nv
amd74xx libata ide_disk scsi_mod ehci_hcd ohci_hcd usbcore
thermal processor fan
May 16 06:28:34 weebl kernel: Pid: 4261, comm: find Not tainted 2.6.16.33-xen #5
May 16 06:28:34 weebl kernel: RIP: e030:[<ffffffff80195afc>]
<ffffffff80195afc>{inotify_inode_queue_event+286}
May 16 06:28:34 weebl kernel: RSP: e02b:ffff88011d49fec8  EFLAGS: 00010246
May 16 06:28:34 weebl kernel: RAX: ffff880112ec6c70 RBX:
0000000040000010 RCX: 0000000000000000
May 16 06:28:34 weebl kernel: RDX: 0000000000000000 RSI:
0000000040000010 RDI: ffff880112a96c80
May 16 06:28:34 weebl kernel: RBP: fffffffffffffff1 R08:
ffff880112a96a78 R09: 0000000000000001
May 16 06:28:34 weebl kernel: R10: 0000000000000003 R11:
ffffffffff578000 R12: ffff880112a96a78
May 16 06:28:34 weebl kernel: R13: ffff880112a96c80 R14:
0000000000000000 R15: fffffffffffffff1
May 16 06:28:34 weebl kernel: FS:  00002aad53da46d0(0000)
GS:ffffffff80469000(0000) knlGS:0000000000000000
May 16 06:28:34 weebl kernel: CS:  e033 DS: 0000 ES: 0000
May 16 06:28:34 weebl kernel: Process find (pid: 4261, threadinfo
ffff88011d49e000, task ffff8801c0518100)
May 16 06:28:34 weebl kernel: Stack: 0000000000000000 4000001000000000
ffff880112a96c70 0000000040000010
May 16 06:28:34 weebl kernel:        ffff8801bfc8ae80 ffff880112a96a78
ffff880112a8f250 ffff8801c10ee0c0
May 16 06:28:34 weebl kernel:        0000000000576400 ffffffff801734e2
May 16 06:28:34 weebl kernel: Call Trace:
<ffffffff801734e2>{__fput+97} <ffffffff80170d43>{filp_close+89}
May 16 06:28:34 weebl kernel:        <ffffffff801721a6>{sys_close+136}
<ffffffff8010adf6>{system_call+134}
May 16 06:28:34 weebl kernel:        <ffffffff8010ad70>{system_call+0}
May 16 06:28:34 weebl kernel:
May 16 06:28:34 weebl kernel: Code: 4d 8b 7f 10 49 83 ef 10 48 8d 45
10 48 3b 44 24 10 0f 85 2d
May 16 06:28:34 weebl kernel:  <1>Unable to handle kernel NULL pointer
dereference at 0000000000000000 RIP:
May 16 06:28:34 weebl kernel: PGD 11d519067 PUD 11da61067 PMD 0
May 16 06:28:34 weebl kernel: CPU 0
May 16 06:28:34 weebl kernel: Modules linked in: xt_physdev
iptable_filter ip_tables x_tables bridge ipv6 button ac battery raid5
xor ide_cd cdrom serio_raw pcspkr i2c_nforce2 i2c_core floppy
dm_mirror dm_snapshot dm_mod raid1 ide_generic sd_mod tg3 sata_nv
amd74xx libata ide_disk scsi_mod ehci_hcd ohci_hcd usbcore
thermal processor fan
May 16 06:28:34 weebl kernel: Pid: 4692, comm: run-parts Not tainted
2.6.16.33-xen #5
May 16 06:28:34 weebl kernel: RIP: e030:[<ffffffff80195afc>]
<ffffffff80195afc>{inotify_inode_queue_event+286}
May 16 06:28:34 weebl kernel: RSP: e02b:ffff88011db13e48  EFLAGS: 00010246
May 16 06:28:34 weebl kernel: RAX: ffff880112f86950 RBX:
ffff8801bee14d00 RCX: 0000000000000000
May 16 06:28:34 weebl kernel: RDX: 0000000000000000 RSI:
0000000000000001 RDI: ffff880112a96960
May 16 06:28:34 weebl kernel: RBP: fffffffffffffff0 R08:
ffff880112a96758 R09: ffff88011ceba7f0
May 16 06:28:34 weebl kernel: R10: 0000000000000059 R11:
ffff88011d4e23d8 R12: 0000000000000001
May 16 06:28:34 weebl kernel: R13: ffff880112a96960 R14:
0000000000000000 R15: fffffffffffffff0
May 16 06:28:34 weebl kernel: FS:  00002b82087656d0(0000)
GS:ffffffff80469000(0000) knlGS:0000000000000000
May 16 06:28:34 weebl kernel: CS:  e033 DS: 0000 ES: 0000
May 16 06:28:34 weebl kernel: Process run-parts (pid: 4692, threadinfo
ffff88011db12000, task ffff8801c1080860)
May 16 06:28:34 weebl kernel: Stack: 0000000000000000 0000000100000000
ffff880112a96950 ffff8801bee14d00
May 16 06:28:34 weebl kernel:        0000000000000059 0000000000000001
ffff880112a96758 00007fffff89f468
May 16 06:28:34 weebl kernel:        0000000000503010 ffffffff80172e73
May 16 06:28:34 weebl kernel: Call Trace:
<ffffffff80172e73>{vfs_read+313} <ffffffff8017b76b>{kernel_read+65}
May 16 06:28:34 weebl kernel:        <ffffffff8017cc17>{do_execve+299}
<ffffffff8010976f>{sys_execve+54}
May 16 06:28:34 weebl kernel:        <ffffffff8010b285>{stub_execve+61}
May 16 06:28:34 weebl kernel:
May 16 06:28:34 weebl kernel: Code: 4d 8b 7f 10 49 83 ef 10 48 8d 45
10 48 3b 44 24 10 0f 85 2d
May 16 06:28:34 weebl kernel:  <1>Unable to handle kernel paging
request at 0000000000110050 RIP:
May 16 06:28:34 weebl kernel: PGD 11d46f067 PUD 11da4a067 PMD 0
May 16 06:28:34 weebl kernel: CPU 0
May 16 06:28:34 weebl kernel: Modules linked in: xt_physdev
iptable_filter ip_tables x_tables bridge ipv6 button ac battery raid5
xor ide_cd cdrom serio_raw pcspkr i2c_nforce2 i2c_core floppy
dm_mirror dm_snapshot dm_mod raid1 ide_generic sd_mod tg3 sata_nv
amd74xx libata ide_disk scsi_mod ehci_hcd ohci_hcd usbcore
thermal processor fan
May 16 06:28:34 weebl kernel: Pid: 4695, comm: run-parts Not tainted
2.6.16.33-xen #5
May 16 06:28:34 weebl kernel: RIP: e030:[<ffffffff8017a4a3>]
<ffffffff8017a4a3>{cdev_put+6}
May 16 06:28:34 weebl kernel: RSP: e02b:ffff88011db51dc8  EFLAGS: 00010206
May 16 06:28:34 weebl kernel: RAX: 0000000000000000 RBX:
0000000000000010 RCX: 0000000000000000
May 16 06:28:34 weebl kernel: RDX: ffffffffff578000 RSI:
ffff8801c1107ec0 RDI: 0000000000110000
May 16 06:28:34 weebl kernel: RBP: ffff8801c1107ec0 R08:
ffff880112a96438 R09: 0000000000105874
May 16 06:28:34 weebl kernel: R10: 0000000000000000 R11:
ffffffff8026141c R12: ffff880112a96438
May 16 06:28:34 weebl kernel: R13: ffff88011d441e90 R14:
ffff8801c10eebc0 R15: 0000000000503010
May 16 06:28:34 weebl kernel: FS:  00002b82087656d0(0000)
GS:ffffffff80469000(0000) knlGS:0000000000000000
May 16 06:28:34 weebl kernel: CS:  e033 DS: 0000 ES: 0000
May 16 06:28:34 weebl kernel: Process run-parts (pid: 4695, threadinfo
ffff88011db50000, task ffff8801c0518100)
May 16 06:28:34 weebl kernel: Stack: 0000000000000010 ffffffff8017353e
ffff880143724a00 ffff880143724a00
May 16 06:28:34 weebl kernel:        ffff88011db51f58 0000000000000000
ffff88011db51f58 ffffffff8019d278
May 16 06:28:34 weebl kernel:        ffff880143724a00 ffffffff801536de
May 16 06:28:34 weebl kernel: Call Trace:
<ffffffff8017353e>{__fput+189} <ffffffff8019d278>{load_script+92}
May 16 06:28:34 weebl kernel:
<ffffffff801536de>{__alloc_pages+92}
<ffffffff8017b407>{copy_strings+373}
May 16 06:28:34 weebl kernel:
<ffffffff8017b53f>{search_binary_handler+167}
<ffffffff8017cc87>{do_execve+411}
May 16 06:28:34 weebl kernel:        <ffffffff8010976f>{sys_execve+54}
<ffffffff8010b285>{stub_execve+61}
May 16 06:28:34 weebl kernel:
May 16 06:28:34 weebl kernel: Code: 48 8b 5f 50 e8 b4 78 10 00 48 85
db 74 2a 65 8b 04 25 24 00
May 16 06:28:35 weebl kernel:  <6>savelog[4718]: segfault at
0000000000185ca8 rip 0000000000444236 rsp 00007fffffdf1fa0 error 4
May 16 06:28:35 weebl kernel: standard[4732]: segfault at
000000000008cca8 rip 0000000000489330 rsp 00007fffffb1f980 error 4
May 16 06:28:35 weebl kernel: standard[4733]: segfault at
000000000008cca8 rip 0000000000489330 rsp 00007fffffb1f980 error 4
May 16 06:28:35 weebl kernel: PGD 1c5b067 PUD 0
May 16 06:28:35 weebl kernel: CPU 0
May 16 06:28:35 weebl kernel: Modules linked in: xt_physdev
iptable_filter ip_tables x_tables bridge ipv6 button ac battery raid5
xor ide_cd cdrom serio_raw pcspkr i2c_nforce2 i2c_core floppy
dm_mirror dm_snapshot dm_mod raid1 ide_generic sd_mod tg3 sata_nv
amd74xx libata ide_disk scsi_mod ehci_hcd ohci_hcd usbcore
thermal processor fan
May 16 06:28:35 weebl kernel: Pid: 4768, comm: sysklogd Not tainted
2.6.16.33-xen #5
May 16 06:28:35 weebl kernel: RIP: e030:[<ffffffff80119088>]
<ffffffff80119088>{mm_walk+988}
May 16 06:28:35 weebl kernel: RSP: e02b:ffff88011e713bc0  EFLAGS: 00010246
May 16 06:28:35 weebl kernel: RAX: 7fffffffffffffff RBX:
ffff8801128d2008 RCX: 0000000055555555
May 16 06:28:35 weebl kernel: RDX: ffff880008d14000 RSI:
0000000155555555 RDI: 0000000155555555
May 16 06:28:35 weebl kernel: RBP: 8000000000000063 R08:
0000000000000000 R09: ffff880112a94518
May 16 06:28:35 weebl kernel: R10: 0000000112a94000 R11:
00000001128d2000 R12: ffff88011db12000
May 16 06:28:35 weebl kernel: R13: 0000000000000000 R14:
ffff8801c10911c0 R15: ffff880143724200
May 16 06:28:35 weebl kernel: FS:  00002b138b6e86d0(0000)
GS:ffffffff80469000(0000) knlGS:0000000000000000
May 16 06:28:35 weebl kernel: CS:  e033 DS: 0000 ES: 0000
May 16 06:28:35 weebl kernel: Process sysklogd (pid: 4768, threadinfo
ffff88011e712000, task ffff880000c6c820)
May 16 06:28:35 weebl kernel: Stack: ffff88011d598c00 ffff880000c6c820
ffff88011d598c00 ffff880000c6c820
May 16 06:28:35 weebl kernel:        ffffffff8011a275 ffff88011d598c00
ffffffff8011a4a4 0000000000000080
May 16 06:28:35 weebl kernel:        ffff880143724200 ffff880143724200
May 16 06:28:35 weebl kernel: Call Trace:
<ffffffff8011a275>{mm_unpin+427}
<ffffffff8011a4a4>{_arch_exit_mmap+476}
May 16 06:28:35 weebl kernel:
<ffffffff8014fedd>{generic_file_aio_read+52}
<ffffffff801724a6>{do_sync_read+199}
May 16 06:28:35 weebl kernel:        <ffffffff801612d2>{exit_mmap+34}
<ffffffff80128e05>{mmput+40}
May 16 06:28:35 weebl kernel:
<ffffffff8017c509>{flush_old_exec+2443}
<ffffffff80172e73>{vfs_read+313}
May 16 06:28:35 weebl kernel:
<ffffffff8019dd36>{load_elf_binary+1095}
<ffffffff801536de>{__alloc_pages+92}
May 16 06:28:35 weebl kernel:
<ffffffff8017b407>{copy_strings+373}
<ffffffff8017b53f>{search_binary_handler+167}
May 16 06:28:35 weebl kernel:        <ffffffff8017cc87>{do_execve+411}
<ffffffff8010976f>{sys_execve+54}
May 16 06:28:35 weebl kernel:        <ffffffff8010b285>{stub_execve+61}
May 16 06:28:35 weebl kernel:
May 16 06:28:35 weebl kernel: Code: 48 8b 34 ca 48 21 c6 48 c1 e7 0c
48 b8 00 00 00 00 00 88 ff

Has anyone got any idea why this is occuring? I'm completely out of ideas.

Many months ago I thought the random crashing could be because of RAM,
but it's all ECC RAM, and with any of the RAM sticks in it would still
randomly crash.

I'm not sure where to go next - what to try. Does anyone know what the
cause is, or have any idea how to diagnose it?

Thanks
-Rob

--
http://www.robhulme.com/
http://robhu.livejournal.com/

On two occasions, I have been asked [by members of Parliament], "Pray,
Mr. Babbage, if you put into the machine wrong figures, will the right
answers come out?" I am not able to rightly apprehend the kind of
confusion of ideas that could provoke such a question.
-- Charles Babbage

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>