WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
From: Peter Sandin <psandin@xxxxxxxxxx>
Date: Tue, 12 Apr 2011 10:39:52 -0400
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 12 Apr 2011 07:40:36 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110407135009.GA7258@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <FEE62514-E221-4DDA-8E28-D0579AB582B8@xxxxxxxxxx> <20110407135009.GA7258@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
The traces included here are from customer instances so we don't know exactly 
what they were doing at the time they hit this. Looking at their IO usage, and 
the other information included with their reports it sounds like they were 
swapping heavily in most cases. Unfortunately we haven't been able to reproduce 
this in a controlled environment. If you have any suggested tests that I could 
run to help reproduce, or narrow down the source of this bug, I can certainly 
give those a try. Another data point that may be helpful is that we have only 
seen this issue with 32bit kernels.

--Peter

On Apr 7, 2011, at 9:50 AM, Konrad Rzeszutek Wilk wrote:

> On Wed, Apr 06, 2011 at 05:59:03PM -0400, Peter Sandin wrote:
>> Hello,
>> 
>> We've got some 2.6.38 domUs that are hitting a bug in mm/swapfile.c. The 
>> issue has only cropped up since we have moved to 2.6.38. This issue has 
>> happened on multiple separate physical machines. I've attached the trace 
>> from one instance here, additional instances can be found along with a copy 
>> of the domU kernel image and configuration at:
>> 
>> http://thesandins.net/xen/2.6.38/
> 
> What exactly happend to cause this?
>> 
>> ------------[ cut here ]------------
>> kernel BUG at mm/swapfile.c:2524!
>> invalid opcode: 0000 [#1] SMP
>> last sysfs file: /sys/devices/vbd-51728/block/xvdb/stat
>> Modules linked in:
>> 
>> Pid: 539, comm: apache2 Not tainted 2.6.38-linode31 #1
>> EIP: 0061:[<c01a36b6>] EFLAGS: 00010246 CPU: 0
>> EIP is at swap_count_continued+0x176/0x180
>> EAX: f57ba5f8 EBX: ed3c8f00 ECX: f57ba000 EDX: 00000000
>> ESI: ed3c5320 EDI: 00000080 EBP: 000005f8 ESP: eb80be80
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> Process apache2 (pid: 539, ti=eb80a000 task=cc472be0 task.ti=eb80a000)
>> Stack:
>> ec8e68c0 000195f8 00000040 00000000 c01a37b1 000195f8 ec96ae20 ed14b000
>> 00000000 c01a3a18 00000000 c01955cc 00000000 80000007 00000000 c0106314
>> 00000023 ecf7ac4c 000195f8 ca6e0290 00000000 ffffffe8 b883d664 00000000
>> Call Trace:
>> [<c01a37b1>] ? swap_entry_free+0xf1/0x120
>> [<c01a3a18>] ? swap_free+0x18/0x30
>> [<c01955cc>] ? handle_pte_fault+0x49c/0xac0
>> [<c0106314>] ? check_events+0x8/0xc
>> [<c0196e91>] ? handle_mm_fault+0x101/0x1a0
>> [<c011e81b>] ? do_page_fault+0xfb/0x3e0
>> [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
>> [<c063fcd6>] ? error_code+0x5a/0x60
>> [<c013f397>] ? sys_rt_sigaction+0x77/0xa0
>> [<c011e720>] ? do_page_fault+0x0/0x3e0
>> [<c063fcd6>] ? error_code+0x5a/0x60
>> [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
>> [<c011e720>] ? do_page_fault+0x0/0x3e0
>> Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 
>> 89 f8 3c 80 0f 94 c0 e9 b
>> 9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 
>> 89 1c 24 89 c3 89 74 24
>> EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:eb80be80
>> ---[ end trace 41e4a2572fe1ada6 ]---
>> 
>> I've looked at the section of code that is generating the the fault, but I'm 
>> a bit over my head. Does this look like it is a Xen specific issue, or 
>> something that would be better addressed on the LKML? Any insight you can 
>> provide on the source or a fix for this issue would be appreciated.
>> 
>> Thanks,
>> Peter
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel