WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 g

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram
From: Tom Brown <tbrown@xxxxxxxxxxxxx>
Date: Thu, 8 Dec 2005 00:45:10 -0800 (PST)
Delivery-date: Thu, 08 Dec 2005 08:47:24 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
This seems to be a repeatable crash. just do some disk intensive stuff in
domU and then type "sync" :(

The box is a dual opteron 720, with 8 gig of ram, one domU and (duh) one
dom0, both with aprox 500 meg of RAM allocated.

The box has remote power control, serial console, and I can provide
developer access if it helps. Kernel was compiled locally (on centos 4.2
amd64 domU and dom0)

Box seems stable under raw linux 2.6.14.2, but does generate occasionaly
MCE messages pointing at the northbridge/GART... I spent a day researching
that, and didn't come to any conclusion other than it could be a bogus
report specific to amd64 systems with > 4gig ram. there is an IBM page to
that effect for an older RHE system... box has a 3ware controller and SATA
drives.

Anyhow, any help would be appreciated. I'm probably going to try to see if
the PAE stuff is more stable... but obviously not tonight.

In theory this is a 3.0.0 box, but might be 3.0-testing...

This is pretty greek to me, but given that it seems reproducable, I should
be able to produce any other info required...?

Or should I be dumping this into bugzilla?

-Tom

>From root@xxxxxxxxxxxxxxxxxxxxx Thu Dec  8 00:33:19 2005
Date: Thu, 8 Dec 2005 00:21:56 -0800
From: root <root@xxxxxxxxxxxxxxxxxxxxx>
To: tbrown@xxxxxxxxxxxxx
Subject: oops.2.ksymoops
ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.6.12.6-xen0/ (default)
     -m /boot/System.map-2.6.12.6-xen0 (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel paging request at ffff88001e61b000 RIP:
<ffffffff80220bfb>{memcpy+11}
Oops: 0003 [1]
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11}
Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
RAX: ffff88001e61b000 RBX: 000000000000500c RCX: 0000000000000200
RDX: 0000000000000000 RSI: ffff8800040a2000 RDI: ffff88001e61b000
RBP: 0000000000000002 R08: 0000000000000002 R09: ffff8800040a2000
R10: ffff8800040a2000 R11: 0000000000000246 R12: 0000000000000000
R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
FS:  00002aaaaaac9360(0000) GS:ffffffff80511a00(0000) knlGS:0000000055572460
CS:  e033 DS: 0000 ES: 0000
Stack: ffffffff8011a094 ffff8800016a55e8 0000000000000000 ffff880005ac42d8
       ffffffff8011a2cd ffff8800016a55e8 0000000000000000 0000000100000000
       ffff8800147221c0 0000000000000001
Call Trace:<ffffffff8011a094>{__sync_single+100} 
<ffffffff8011a2cd>{unmap_single+109}
       <ffffffff8011aa40>{swiotlb_unmap_sg+192} 
<ffffffff802eb517>{tw_interrupt+1799}
       <ffffffff8014cd9d>{handle_IRQ_event+61} <ffffffff8014ce87>{__do_IRQ+167}
       <ffffffff80114dc4>{do_IRQ+52} <ffffffff8010d958>{evtchn_do_upcall+136}
       <ffffffff80111e7d>{do_hypervisor_callback+17} 
<ffffffff8010f793>{xen_idle+83}
       <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31}
       <ffffffff8052671f>{start_kernel+495} <ffffffff80526193>{_sinittext+403}
Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90


>>RIP; ffffffff80220bfb <memcpy+b/b0>   <=====

>>RAX; ffff88001e61b000 <__start___xen_guest+ffff88001e612144/ffffffff800f7144>
>>RSI; ffff8800040a2000 <__start___xen_guest+ffff880004099144/ffffffff800f7144>
>>RDI; ffff88001e61b000 <__start___xen_guest+ffff88001e612144/ffffffff800f7144>
>>R09; ffff8800040a2000 <__start___xen_guest+ffff880004099144/ffffffff800f7144>
>>R10; ffff8800040a2000 <__start___xen_guest+ffff880004099144/ffffffff800f7144>
>>R13; ffff800000000000 <__start___xen_guest+ffff7fffffff7144/ffffffff800f7144>
>>R14; 7fffffffffffffff <__start___xen_guest+7fffffffffff7143/ffffffff800f7144>
>>R15; 6db6db6db6db6db7 <__start___xen_guest+6db6db6db6dadefb/ffffffff800f7144>

Trace; ffffffff8011a094 <__sync_single+64/70>
Trace; ffffffff8011aa40 <swiotlb_unmap_sg+c0/e0>
Trace; ffffffff8014cd9d <handle_IRQ_event+3d/80>
Trace; ffffffff80114dc4 <do_IRQ+34/50>
Trace; ffffffff80111e7d <do_hypervisor_callback+11/18>
Trace; ffffffff8010f793 <xen_idle+53/70>
Trace; ffffffff8052671f <start_kernel+1ef/200>

Code;  ffffffff80220bfb <memcpy+b/b0>
0000000000000000 <_RIP>:
Code;  ffffffff80220bfb <memcpy+b/b0>   <=====
   0:   f3 48 a5                  repz movsq %ds:(%rsi),%es:(%rdi)   <=====
Code;  ffffffff80220bfe <memcpy+e/b0>
   3:   89 d1                     mov    %edx,%ecx
Code;  ffffffff80220c00 <memcpy+10/b0>
   5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
Code;  ffffffff80220c02 <memcpy+12/b0>
   7:   c3                        retq
Code;  ffffffff80220c03 <memcpy+13/b0>
   8:   66                        data16
Code;  ffffffff80220c04 <memcpy+14/b0>
   9:   66                        data16
Code;  ffffffff80220c05 <memcpy+15/b0>
   a:   66                        data16
Code;  ffffffff80220c06 <memcpy+16/b0>
   b:   90                        nop
Code;  ffffffff80220c07 <memcpy+17/b0>
   c:   66                        data16
Code;  ffffffff80220c08 <memcpy+18/b0>
   d:   66                        data16
Code;  ffffffff80220c09 <memcpy+19/b0>
   e:   66                        data16
Code;  ffffffff80220c0a <memcpy+1a/b0>
   f:   90                        nop
Code;  ffffffff80220c0b <memcpy+1b/b0>
  10:   66                        data16
Code;  ffffffff80220c0c <memcpy+1c/b0>
  11:   66                        data16
Code;  ffffffff80220c0d <memcpy+1d/b0>
  12:   66                        data16
Code;  ffffffff80220c0e <memcpy+1e/b0>
  13:   90                        nop

CR2: ffff88001e61b000
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!



>From root@xxxxxxxxxxxxxxxxxxxxx Thu Dec  8 00:43:16 2005
Date: Thu, 8 Dec 2005 00:40:51 -0800
From: root <root@xxxxxxxxxxxxxxxxxxxxx>
To: tbrown@xxxxxxxxxxxxx
Subject: tmpx3.ksymoops

ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.6.12.6-xen0/ (default)
     -m /usr/src/linux/System.map (default)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel paging request at ffff88001e527000 RIP:
<ffffffff80220bfb>{memcpy+11}
Oops: 0003 [1]
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11}
Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
RAX: ffff88001e527000 RBX: 0000000000003968 RCX: 0000000000000200
RDX: 0000000000000000 RSI: ffff880003550000 RDI: ffff88001e527000
RBP: 0000000000000002 R08: 0000000000000002 R09: ffff880003550000
R10: ffff880003550000 R11: 0000000000000246 R12: 0000000000000000
R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
FS:  00002aaaabe8f280(0000) GS:ffffffff80511a00(0000) knlGS:0000000055572460
CS:  e033 DS: 0000 ES: 0000
Stack: ffffffff8011a094 ffff8800016a2088 ffffffff00000000 ffff880005ac42d8
       ffffffff8011a2cd ffff8800016a2088 ffffffff00000000 0000000100000000
       ffff8800078caf20 0000000000000001
Call Trace:<ffffffff8011a094>{__sync_single+100}
<ffffffff8011a2cd>{unmap_single+109}
       <ffffffff8011aa40>{swiotlb_unmap_sg+192}
<ffffffff802eb517>{tw_interrupt+1799}
       <ffffffff8014cd9d>{handle_IRQ_event+61} <ffffffff8014ce87>{__do_IRQ+167}
       <ffffffff80114dc4>{do_IRQ+52} <ffffffff8010d958>{evtchn_do_upcall+136}
       <ffffffff80111e7d>{do_hypervisor_callback+17}
<ffffffff8010f793>{xen_idle+83}
       <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31}
       <ffffffff8052671f>{start_kernel+495} <ffffffff80526193>{_sinittext+403}
Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90


>>RIP; ffffffff80220bfb <bitmap_parse+bb/210>   <=====

>>RAX; ffff88001e527000 <phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
>>RSI; ffff880003550000 <phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
>>RDI; ffff88001e527000 <phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
>>R09; ffff880003550000 <phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
>>R10; ffff880003550000 <phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
>>R13; ffff800000000000 <phys_startup_64+ffff7fffffefff00/ffffffff7fffff00>
>>R14; 7fffffffffffffff <phys_startup_64+7fffffffffeffeff/ffffffff7fffff00>
>>R15; 6db6db6db6db6db7 <phys_startup_64+6db6db6db6cb6cb7/ffffffff7fffff00>

Trace; ffffffff8011a094 <touch_nmi_watchdog+4/30>
Trace; ffffffff8011aa40 <pin_2_irq+60/130>
Trace; ffffffff8014cd9d <kfifo_init+8d/90>
Trace; ffffffff80114dc4 <pda_init+94/110>
Trace; ffffffff80111e7d <handle_lost_ticks+13d/170>
Trace; ffffffff8010f793 <oops_begin+23/70>
Trace; ffffffff8052671f <__log_buf+e15f/20000>

Code;  ffffffff80220bfb <bitmap_parse+bb/210>
0000000000000000 <_RIP>:
Code;  ffffffff80220bfb <bitmap_parse+bb/210>   <=====
   0:   f3 48 a5                  repz movsq %ds:(%rsi),%es:(%rdi)   <=====
Code;  ffffffff80220bfe <bitmap_parse+be/210>
   3:   89 d1                     mov    %edx,%ecx
Code;  ffffffff80220c00 <bitmap_parse+c0/210>
   5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
Code;  ffffffff80220c02 <bitmap_parse+c2/210>
   7:   c3                        retq
Code;  ffffffff80220c03 <bitmap_parse+c3/210>
   8:   66                        data16
Code;  ffffffff80220c04 <bitmap_parse+c4/210>
   9:   66                        data16
Code;  ffffffff80220c05 <bitmap_parse+c5/210>
   a:   66                        data16
Code;  ffffffff80220c06 <bitmap_parse+c6/210>
   b:   90                        nop
Code;  ffffffff80220c07 <bitmap_parse+c7/210>
   c:   66                        data16
Code;  ffffffff80220c08 <bitmap_parse+c8/210>
   d:   66                        data16
Code;  ffffffff80220c09 <bitmap_parse+c9/210>
   e:   66                        data16
Code;  ffffffff80220c0a <bitmap_parse+ca/210>
   f:   90                        nop
Code;  ffffffff80220c0b <bitmap_parse+cb/210>
  10:   66                        data16
Code;  ffffffff80220c0c <bitmap_parse+cc/210>
  11:   66                        data16
Code;  ffffffff80220c0d <bitmap_parse+cd/210>
  12:   66                        data16
Code;  ffffffff80220c0e <bitmap_parse+ce/210>
  13:   90                        nop

CR2: ffff88001e527000
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users