WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, >

To: "Tom Brown" <tbrown@xxxxxxxxxxxxx>, <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram
From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date: Thu, 8 Dec 2005 12:03:25 -0000
Cc: ian.pratt@xxxxxxxxxxxx
Delivery-date: Thu, 08 Dec 2005 12:04:15 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcX71ADgzoJNNq3fRXq4VEymkb76awAGzDqg
Thread-topic: [Xen-users] xen 3.0 amd64 crash... seems to be tied into disk i/o, > 4 gig ram
 
Looking at thhe oops message, this is with a 3ware card, right?
We've had at least one other report of them causing problems on systems
with >4GB enabled (or maybe it was you?)

Ian

> This seems to be a repeatable crash. just do some disk 
> intensive stuff in domU and then type "sync" :(
> 
> The box is a dual opteron 720, with 8 gig of ram, one domU 
> and (duh) one dom0, both with aprox 500 meg of RAM allocated.
> 
> The box has remote power control, serial console, and I can 
> provide developer access if it helps. Kernel was compiled 
> locally (on centos 4.2
> amd64 domU and dom0)
> 
> Box seems stable under raw linux 2.6.14.2, but does generate 
> occasionaly MCE messages pointing at the northbridge/GART... 
> I spent a day researching that, and didn't come to any 
> conclusion other than it could be a bogus report specific to 
> amd64 systems with > 4gig ram. there is an IBM page to that 
> effect for an older RHE system... box has a 3ware controller 
> and SATA drives.
> 
> Anyhow, any help would be appreciated. I'm probably going to 
> try to see if the PAE stuff is more stable... but obviously 
> not tonight.
> 
> In theory this is a 3.0.0 box, but might be 3.0-testing...
> 
> This is pretty greek to me, but given that it seems 
> reproducable, I should be able to produce any other info required...?
> 
> Or should I be dumping this into bugzilla?
> 
> -Tom
> 
> >From root@xxxxxxxxxxxxxxxxxxxxx Thu Dec  8 00:33:19 2005
> Date: Thu, 8 Dec 2005 00:21:56 -0800
> From: root <root@xxxxxxxxxxxxxxxxxxxxx>
> To: tbrown@xxxxxxxxxxxxx
> Subject: oops.2.ksymoops
> ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
>      -V (default)
>      -K (specified)
>      -l /proc/modules (default)
>      -o /lib/modules/2.6.12.6-xen0/ (default)
>      -m /boot/System.map-2.6.12.6-xen0 (specified)
> 
> No modules in ksyms, skipping objects
> No ksyms, skipping lsmod
> Unable to handle kernel paging request at ffff88001e61b000 RIP:
> <ffffffff80220bfb>{memcpy+11}
> Oops: 0003 [1]
> CPU 0
> Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
> RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} 
> Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
> RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
> RAX: ffff88001e61b000 RBX: 000000000000500c RCX: 0000000000000200
> RDX: 0000000000000000 RSI: ffff8800040a2000 RDI: ffff88001e61b000
> RBP: 0000000000000002 R08: 0000000000000002 R09: ffff8800040a2000
> R10: ffff8800040a2000 R11: 0000000000000246 R12: 0000000000000000
> R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
> FS:  00002aaaaaac9360(0000) GS:ffffffff80511a00(0000) 
> knlGS:0000000055572460
> CS:  e033 DS: 0000 ES: 0000
> Stack: ffffffff8011a094 ffff8800016a55e8 0000000000000000 
> ffff880005ac42d8
>        ffffffff8011a2cd ffff8800016a55e8 0000000000000000 
> 0000000100000000
>        ffff8800147221c0 0000000000000001 Call 
> Trace:<ffffffff8011a094>{__sync_single+100} 
> <ffffffff8011a2cd>{unmap_single+109}
>        <ffffffff8011aa40>{swiotlb_unmap_sg+192} 
> <ffffffff802eb517>{tw_interrupt+1799}
>        <ffffffff8014cd9d>{handle_IRQ_event+61} 
> <ffffffff8014ce87>{__do_IRQ+167}
>        <ffffffff80114dc4>{do_IRQ+52} 
> <ffffffff8010d958>{evtchn_do_upcall+136}
>        <ffffffff80111e7d>{do_hypervisor_callback+17} 
> <ffffffff8010f793>{xen_idle+83}
>        <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31}
>        <ffffffff8052671f>{start_kernel+495} 
> <ffffffff80526193>{_sinittext+403}
> Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90
> 
> 
> >>RIP; ffffffff80220bfb <memcpy+b/b0>   <=====
> 
> >>RAX; ffff88001e61b000 
> >><__start___xen_guest+ffff88001e612144/ffffffff800f7144>
> >>RSI; ffff8800040a2000 
> >><__start___xen_guest+ffff880004099144/ffffffff800f7144>
> >>RDI; ffff88001e61b000 
> >><__start___xen_guest+ffff88001e612144/ffffffff800f7144>
> >>R09; ffff8800040a2000 
> >><__start___xen_guest+ffff880004099144/ffffffff800f7144>
> >>R10; ffff8800040a2000 
> >><__start___xen_guest+ffff880004099144/ffffffff800f7144>
> >>R13; ffff800000000000 
> >><__start___xen_guest+ffff7fffffff7144/ffffffff800f7144>
> >>R14; 7fffffffffffffff 
> >><__start___xen_guest+7fffffffffff7143/ffffffff800f7144>
> >>R15; 6db6db6db6db6db7 
> >><__start___xen_guest+6db6db6db6dadefb/ffffffff800f7144>
> 
> Trace; ffffffff8011a094 <__sync_single+64/70> Trace; 
> ffffffff8011aa40 <swiotlb_unmap_sg+c0/e0> Trace; 
> ffffffff8014cd9d <handle_IRQ_event+3d/80> Trace; 
> ffffffff80114dc4 <do_IRQ+34/50> Trace; ffffffff80111e7d 
> <do_hypervisor_callback+11/18> Trace; ffffffff8010f793 
> <xen_idle+53/70> Trace; ffffffff8052671f <start_kernel+1ef/200>
> 
> Code;  ffffffff80220bfb <memcpy+b/b0>
> 0000000000000000 <_RIP>:
> Code;  ffffffff80220bfb <memcpy+b/b0>   <=====
>    0:   f3 48 a5                  repz movsq 
> %ds:(%rsi),%es:(%rdi)   <=====
> Code;  ffffffff80220bfe <memcpy+e/b0>
>    3:   89 d1                     mov    %edx,%ecx
> Code;  ffffffff80220c00 <memcpy+10/b0>
>    5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
> Code;  ffffffff80220c02 <memcpy+12/b0>
>    7:   c3                        retq
> Code;  ffffffff80220c03 <memcpy+13/b0>
>    8:   66                        data16
> Code;  ffffffff80220c04 <memcpy+14/b0>
>    9:   66                        data16
> Code;  ffffffff80220c05 <memcpy+15/b0>
>    a:   66                        data16
> Code;  ffffffff80220c06 <memcpy+16/b0>
>    b:   90                        nop
> Code;  ffffffff80220c07 <memcpy+17/b0>
>    c:   66                        data16
> Code;  ffffffff80220c08 <memcpy+18/b0>
>    d:   66                        data16
> Code;  ffffffff80220c09 <memcpy+19/b0>
>    e:   66                        data16
> Code;  ffffffff80220c0a <memcpy+1a/b0>
>    f:   90                        nop
> Code;  ffffffff80220c0b <memcpy+1b/b0>
>   10:   66                        data16
> Code;  ffffffff80220c0c <memcpy+1c/b0>
>   11:   66                        data16
> Code;  ffffffff80220c0d <memcpy+1d/b0>
>   12:   66                        data16
> Code;  ffffffff80220c0e <memcpy+1e/b0>
>   13:   90                        nop
> 
> CR2: ffff88001e61b000
>  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> 
> 
> >From root@xxxxxxxxxxxxxxxxxxxxx Thu Dec  8 00:43:16 2005
> Date: Thu, 8 Dec 2005 00:40:51 -0800
> From: root <root@xxxxxxxxxxxxxxxxxxxxx>
> To: tbrown@xxxxxxxxxxxxx
> Subject: tmpx3.ksymoops
> 
> ksymoops 2.4.11 on x86_64 2.6.12.6-xen0.  Options used
>      -V (default)
>      -K (specified)
>      -l /proc/modules (default)
>      -o /lib/modules/2.6.12.6-xen0/ (default)
>      -m /usr/src/linux/System.map (default)
> 
> No modules in ksyms, skipping objects
> No ksyms, skipping lsmod
> Unable to handle kernel paging request at ffff88001e527000 RIP:
> <ffffffff80220bfb>{memcpy+11}
> Oops: 0003 [1]
> CPU 0
> Pid: 0, comm: swapper Not tainted 2.6.12.6-xen0
> RIP: e030:[<ffffffff80220bfb>] <ffffffff80220bfb>{memcpy+11} 
> Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
> RSP: e02b:ffffffff80525d50  EFLAGS: 00010246
> RAX: ffff88001e527000 RBX: 0000000000003968 RCX: 0000000000000200
> RDX: 0000000000000000 RSI: ffff880003550000 RDI: ffff88001e527000
> RBP: 0000000000000002 R08: 0000000000000002 R09: ffff880003550000
> R10: ffff880003550000 R11: 0000000000000246 R12: 0000000000000000
> R13: ffff800000000000 R14: 7fffffffffffffff R15: 6db6db6db6db6db7
> FS:  00002aaaabe8f280(0000) GS:ffffffff80511a00(0000) 
> knlGS:0000000055572460
> CS:  e033 DS: 0000 ES: 0000
> Stack: ffffffff8011a094 ffff8800016a2088 ffffffff00000000 
> ffff880005ac42d8
>        ffffffff8011a2cd ffff8800016a2088 ffffffff00000000 
> 0000000100000000
>        ffff8800078caf20 0000000000000001 Call 
> Trace:<ffffffff8011a094>{__sync_single+100}
> <ffffffff8011a2cd>{unmap_single+109}
>        <ffffffff8011aa40>{swiotlb_unmap_sg+192}
> <ffffffff802eb517>{tw_interrupt+1799}
>        <ffffffff8014cd9d>{handle_IRQ_event+61} 
> <ffffffff8014ce87>{__do_IRQ+167}
>        <ffffffff80114dc4>{do_IRQ+52} 
> <ffffffff8010d958>{evtchn_do_upcall+136}
>        <ffffffff80111e7d>{do_hypervisor_callback+17}
> <ffffffff8010f793>{xen_idle+83}
>        <ffffffff8010f793>{xen_idle+83} <ffffffff8010f7cf>{cpu_idle+31}
>        <ffffffff8052671f>{start_kernel+495} 
> <ffffffff80526193>{_sinittext+403}
> Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90
> 
> 
> >>RIP; ffffffff80220bfb <bitmap_parse+bb/210>   <=====
> 
> >>RAX; ffff88001e527000 
> >><phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
> >>RSI; ffff880003550000 
> >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
> >>RDI; ffff88001e527000 
> >><phys_startup_64+ffff88001e426f00/ffffffff7fffff00>
> >>R09; ffff880003550000 
> >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
> >>R10; ffff880003550000 
> >><phys_startup_64+ffff88000344ff00/ffffffff7fffff00>
> >>R13; ffff800000000000 
> >><phys_startup_64+ffff7fffffefff00/ffffffff7fffff00>
> >>R14; 7fffffffffffffff 
> >><phys_startup_64+7fffffffffeffeff/ffffffff7fffff00>
> >>R15; 6db6db6db6db6db7 
> >><phys_startup_64+6db6db6db6cb6cb7/ffffffff7fffff00>
> 
> Trace; ffffffff8011a094 <touch_nmi_watchdog+4/30> Trace; 
> ffffffff8011aa40 <pin_2_irq+60/130> Trace; ffffffff8014cd9d 
> <kfifo_init+8d/90> Trace; ffffffff80114dc4 <pda_init+94/110> 
> Trace; ffffffff80111e7d <handle_lost_ticks+13d/170> Trace; 
> ffffffff8010f793 <oops_begin+23/70> Trace; ffffffff8052671f 
> <__log_buf+e15f/20000>
> 
> Code;  ffffffff80220bfb <bitmap_parse+bb/210> 0000000000000000 <_RIP>:
> Code;  ffffffff80220bfb <bitmap_parse+bb/210>   <=====
>    0:   f3 48 a5                  repz movsq 
> %ds:(%rsi),%es:(%rdi)   <=====
> Code;  ffffffff80220bfe <bitmap_parse+be/210>
>    3:   89 d1                     mov    %edx,%ecx
> Code;  ffffffff80220c00 <bitmap_parse+c0/210>
>    5:   f3 a4                     repz movsb %ds:(%rsi),%es:(%rdi)
> Code;  ffffffff80220c02 <bitmap_parse+c2/210>
>    7:   c3                        retq
> Code;  ffffffff80220c03 <bitmap_parse+c3/210>
>    8:   66                        data16
> Code;  ffffffff80220c04 <bitmap_parse+c4/210>
>    9:   66                        data16
> Code;  ffffffff80220c05 <bitmap_parse+c5/210>
>    a:   66                        data16
> Code;  ffffffff80220c06 <bitmap_parse+c6/210>
>    b:   90                        nop
> Code;  ffffffff80220c07 <bitmap_parse+c7/210>
>    c:   66                        data16
> Code;  ffffffff80220c08 <bitmap_parse+c8/210>
>    d:   66                        data16
> Code;  ffffffff80220c09 <bitmap_parse+c9/210>
>    e:   66                        data16
> Code;  ffffffff80220c0a <bitmap_parse+ca/210>
>    f:   90                        nop
> Code;  ffffffff80220c0b <bitmap_parse+cb/210>
>   10:   66                        data16
> Code;  ffffffff80220c0c <bitmap_parse+cc/210>
>   11:   66                        data16
> Code;  ffffffff80220c0d <bitmap_parse+cd/210>
>   12:   66                        data16
> Code;  ffffffff80220c0e <bitmap_parse+ce/210>
>   13:   90                        nop
> 
> CR2: ffff88001e527000
>  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
> 

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users