|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Hunting down an oops in Xen 3.1.0's 2.6.18 kernel
Hey,
I've been beating my head against this bug for the last few days.
After Dom0's memory is reduced it appears that something is trying to
refer to a page that was removed from the machine_to_phys_mapping
table. After much tracing around I haven't spotted how that could
happen yet though.
System required to reproduce:
x86_32, with or without pae
2 GB of ram or more
3.1.0's 2.6.18 or things based on it such as redhat's 2.6.20 xen patch
start dom0 with no memory limit so it uses most of the 2gb
The easiest way to reproduce the problem is to reduce dom0's memory
significantly (to something like 150M) with either mem-set or by
starting a vary large domU. Then do something, sometimes ls will do,
other times I start compiling glibc. It is also possible to hit the
issue by reducing memory only a little but that will take longer to
hit if at all.
I have been unable to reproduce this with 3.0.4's 2.6.16 kernel but
2.6.18 will oops on both 3.0.4 and 3.1.0. Also, x86_64 appears to be
ok.
I'm guessing this issue is the same as the oops reported here:
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=975
Below is an example of the oops on my 2.6.18 pae kernel with a couple
extra debuging lines added:
(XEN) mm.c:503:d0 Could not get page ref for pfn 7fffffff
(XEN) mm.c:2324:d0 mfn: 7fffffff, gmfn: 7fffffff, ptr: 7fffffff0c0
(XEN) mm.c:2325:d0 Could not get page for normal update
virtptr: f57a70c0 machineptr: 7fffffff0c0
------------[ cut here ]------------
kernel BUG at arch/i386/mm/hypervisor.c:62!
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 1
EIP: 0061:[<c0117875>] Not tainted VLI
EFLAGS: 00010296 (2.6.18-xen-r5-try2 #6)
EIP is at xen_l1_entry_update+0xb9/0xde
eax: 0000002d ebx: deadbeef ecx: 00000000 edx: 00000001
esi: deadbeef edi: 00000000 ebp: ecea0c4c esp: ecea0c14
ds: 007b es: 007b ss: 0069
Process bash (pid: 5065, ti=ecea0000 task=ecfe3030 task.ti=ecea0000)
Stack: c037b964 f57a70c0 fffff0c0 000007ff 00000000 00000000 f57a70c0 fffff0c0
000007ff 00000000 00000000 00000000 00000000 00000000 ecea0cc0 c0158693
3536f025 00000000 ed383780 ed3837c8 c04bce70 00000000 00000004 00000000
Call Trace:
[<c0158693>] zap_pte_range+0x265/0x658
[<c0158bf2>] unmap_page_range+0x16c/0x2b4
[<c0158e08>] unmap_vmas+0xce/0x1cb
[<c015f094>] exit_mmap+0x7d/0xf4
[<c011e0cf>] mmput+0x36/0x8c
[<c01782af>] exec_mmap+0x156/0x229
[<c0178a54>] flush_old_exec+0x59/0x25a
[<c01989f4>] load_elf_binary+0x33c/0xc52
[<c0178f06>] search_binary_handler+0x89/0x23c
[<c0197c95>] load_script+0x221/0x23c
[<c0178f06>] search_binary_handler+0x89/0x23c
[<c017920b>] do_execve+0x152/0x1be
[<c010391c>] sys_execve+0x32/0x84
[<c0104dfb>] syscall_call+0x7/0xb
[<b7e13899>] 0xb7e13899
Code: 78 08 83 c4 2c 5b 5e 5f 5d c3 8b 45 e4 8b 55 e8 89 54 24 0c 89
44 24 08 8b 45 e
EIP: [<c0117875>] xen_l1_entry_update+0xb9/0xde SS:ESP 0069:ecea0c14
And just for kicks a non-pae oops:
(XEN) mm.c:503:d0 Could not get page ref for pfn fffff
(XEN) mm.c:2324:d0 mfn: fffff, gmfn: fffff, ptr: fffff060
(XEN) mm.c:2325:d0 Could not get page for normal update
virtptr: fbfa7060 machineptr: fffff060
------------[ cut here ]------------
kernel BUG at arch/i386/mm/hypervisor.c:62!
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 1
EIP: 0061:[<c01158e1>] Not tainted VLI
EFLAGS: 00010282 (2.6.18-xen-r5-try2 #4)
EIP is at xen_l1_entry_update+0xa1/0xb1
eax: 0000002a ebx: deadbeef ecx: 00000000 edx: 00000001
esi: deadbeef edi: fbfa7060 ebp: c0bcbca0 esp: c0bcbc74
ds: 007b es: 007b ss: 0069
Process bash (pid: 4943, ti=c0bcb000 task=c1fd7030 task.ti=c0bcb000)
Stack: c036508c fbfa7060 fffff060 00000000 fffff060 00000000 00000000 00000000
fbfa7060 3b875025 f3bce3c0 c0bcbd20 c0152f4b c0bcbd10 f35ff840 80018000
00000000 f35bb860 c0bcbd38 003fefe8 00000000 00000001 800c9000 f3be7800
Call Trace:
[<c0152f4b>] unmap_vmas+0x4d4/0x743
[<c0156b36>] exit_mmap+0x7f/0xf4
[<c011b779>] mmput+0x24/0x85
[<c016fd62>] flush_old_exec+0x2de/0xa6d
[<c018fad0>] load_elf_binary+0x51d/0x1a4d
[<c016f23e>] search_binary_handler+0x8d/0x22c
[<c0170eca>] do_execve+0x14d/0x1c9
[<c01034be>] sys_execve+0x2e/0x76
[<c0104e83>] syscall_call+0x7/0xb
[<b7ecb899>] 0xb7ecb899
Code: c1 72 af 0f 0b 22 00 54 29 36 c0 eb a5 8b 45 e4 8b 55 e8 89 44
24 08 89 54 24 0
EIP: [<c01158e1>] xen_l1_entry_update+0xa1/0xb1 SS:ESP 0069:c0bcbc74
The call trace's tend to differ, but the above two are pretty common.
The oops is in xen_l1_entry_update almost all of the time, I have seen
it in xen_l2_entry_update
Thanks,
--
Michael Marineau
Oregon State University
mike@xxxxxxxxxxxx
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-devel] Hunting down an oops in Xen 3.1.0's 2.6.18 kernel,
Michael Marineau <=
|
|
|
|
|