Hi Guys
I'm seeing oops messages on xen 1.2 on a system we have here. I
currently can't keep the box up for more than a couple of hours.
The box runs: apache, vsftpd, postfix, and ecartis (a mailing
list manager).
Hardware: Pentium III 500mhz, 192mb ram. At the moment it's only
running domain 0 (with 32mb ram) and then one other domain with
the remaining ram. I've run memtest86 on the box for multiple
passes, and the ram returns perfect results. The system runs
on IDE. It has a eepro1000 network card.
General setup: domain0 runs nfs, and domain1 runs with an
nfsroot environment. Both domains run iptables for protection
from the internet at large. Iptables is compiled in as series
of modules. I've used mac-address-limiting entries in the
iptables to ensure that spoofed packets can't hit the nfs
server. I don't know if this code has been tested.
Both domains have access to local disk for swap.
Reproducability: In general, I can make the system
oops with vsftpd on a fairly consistent basis, though not
always. Occasionally, though, it will oops with spamassasin. In
general, the process that oopsed will then sit in disk wait
until I restart the domain.
Often there is stuff in the oops that indicates nfs problems,
but I've tried compiling the kernel with nfs2 and nfs3 support,
and both oops. The trace I have at the moment is not nfs
related. I'll send through other entries as they happen.
The oops below indicates vsftp problems. Every time I
perform oops analysis I get one or more oops warnings, such as
"Warning (Oops_read): Code line not seen, dumping what data is
available". This may be because some module is not running when
I run ksymoops (after a reboot) or something. I'd appreciate
guidance on how to resolve this, if it's actually a problem.
As indicated in the version, I'm using gcc 3.2.2 for
compilation. Other oopses occur with gcc 2.95.
Should I try and run a "release" kernel that you have on an
ftp site somewhere for debugging this, that you guys know
works? I'm very keen to get the box stable - since right now
it crashes every few hours. It is a live server, though not
for a large user base.
Version:
Linux version 2.4.26-xeno (root@vm) (gcc version 3.2.2) #10 Sun May 16 20:02:54
SAST 2004
ksymoops 2.4.5 on i686 2.4.26-xeno. Options used
-V (default)
-k /proc/ksyms (specified)
-l /proc/modules (specified)
-o /lib/modules/2.4.26-xeno/ (specified)
-m /boot/System.map-2.4.26-xeno (specified)
May 16 22:04:23 tux kernel: invalid operand: 0000
May 16 22:04:23 tux kernel: CPU: 0
May 16 22:04:23 tux kernel: EIP: 0819:[__free_pages_ok+69/752] Not tainted
May 16 22:04:23 tux kernel: EFLAGS: 00010282
May 16 22:04:23 tux kernel: eax: c0127e10 ebx: c1172e70 ecx: c797f734
edx: c0127c60
May 16 22:04:23 tux kernel: esi: 00000000 edi: 00000000 ebp: 00000000
esp: c49b1ec0
May 16 22:04:23 tux kernel: ds: 0821 es: 0821 ss: 0821
May 16 22:04:23 tux kernel: Process vsftpd (pid: 846, stackpage=c49b1000)<1>
May 16 22:04:23 tux kernel: Stack: c00b06a3 c0127de8 c86a5d60 c797f734 00000000
c6340050 c0019860 c797f680
May 16 22:04:23 tux kernel: 00000004 07a820e0 00000000 c6340050 00001000
c0017f15 c1172e70 00000000
May 16 22:04:23 tux kernel: c48588e0 c49b1f6c 00000001 c4d83404 40414000
40015000 00000000 c0017d6f
May 16 22:04:23 tux kernel: Call Trace: [kfree_skbmem+19/48]
[set_page_dirty+144/160] [zap_pte_range+389/459] [zap_pmd_range+79/112]
[zap_page_range+79/176]
May 16 22:04:27 tux kernel: invalid operand: 0000
May 16 22:04:27 tux kernel: CPU: 0
May 16 22:04:27 tux kernel: EIP: 0819:[__free_pages_ok+69/752] Not tainted
May 16 22:04:27 tux kernel: EFLAGS: 00010286
May 16 22:04:27 tux kernel: eax: c0127e10 ebx: c115769c ecx: c70866d4
edx: c0127c60
May 16 22:04:27 tux kernel: esi: 00000000 edi: 00000000 ebp: 00000000
esp: c49b1ec0
May 16 22:04:27 tux kernel: ds: 0821 es: 0821 ss: 0821
May 16 22:04:27 tux kernel: Process vsftpd (pid: 847, stackpage=c49b1000)<1>
May 16 22:04:27 tux kernel: Stack: c86d7c80 c0127de8 00000001 c70866d4 00001000
c0dc365c c0019860 c7086620
May 16 22:04:27 tux kernel: 00000004 070830e0 00001000 c0dc365c 00002000
c0017f15 c115769c 00000000
May 16 22:04:27 tux kernel: c49b1f34 fbff9000 00000001 c4ed6404 40596000
40198000 00000000 c0017d6f
May 16 22:04:27 tux kernel: Call Trace: [set_page_dirty+144/160]
[zap_pte_range+389/459] [zap_pmd_range+79/112] [zap_page_range+79/176]
[exit_mmap+175/320]
May 16 22:04:27 tux kernel: invalid operand: 0000
May 16 22:04:27 tux kernel: CPU: 0
May 16 22:04:27 tux kernel: EIP: 0819:[__free_pages_ok+69/752] Not tainted
May 16 22:04:27 tux kernel: EFLAGS: 00010286
May 16 22:04:27 tux kernel: eax: c0127e10 ebx: c1118b34 ecx: c59b6914
edx: c0127c60
May 16 22:04:27 tux kernel: esi: 00000000 edi: 00000000 ebp: 00000000
esp: c375fec0
May 16 22:04:27 tux kernel: ds: 0821 es: 0821 ss: 0821
May 16 22:04:27 tux kernel: Process vsftpd (pid: 845, stackpage=c375f000)<1>
May 16 22:04:27 tux kernel: Stack: 0805386d c0127de8 00000002 00000580 ffffffff
c0127d38 c104e018 ffffffff
May 16 22:04:27 tux kernel: 00001aa3 059b50e0 00000000 c6316050 00001000
c0017f15 c1118b34 00000000
May 16 22:04:27 tux kernel: c30a7160 c375ff6c 00000001 c1453404 40414000
40015000 00000000 c0017d6f
May 16 22:04:27 tux kernel: Call Trace: [zap_pte_range+389/459]
[zap_pmd_range+79/112] [zap_page_range+79/176] [exit_mmap+175/320]
[mmput+83/336]
May 16 22:04:28 tux kernel: <1>Unable to handle kernel NULL pointer
dereference at virtual address 00000000
May 16 22:04:28 tux kernel: 00000000
May 16 22:04:28 tux kernel: Oops: 0000
May 16 22:04:28 tux kernel: CPU: 0
May 16 22:04:28 tux kernel: EIP: 0819:[xeno_con_fini+0/-1073741824] Not
tainted
May 16 22:04:28 tux kernel: EFLAGS: 00010203
May 16 22:04:28 tux kernel: eax: 00000010 ebx: c1118b34 ecx: c8773864
edx: c1118b34
May 16 22:04:28 tux kernel: esi: c59b6914 edi: c59b691c ebp: c59b6924
esp: c8779ef4
May 16 22:04:28 tux kernel: ds: 0821 es: 0821 ss: 0821
May 16 22:04:28 tux kernel: Process kupdated (pid: 6, stackpage=c8779000)<1>
May 16 22:04:28 tux kernel: Stack: c0019fce c1118b34 00000000 00000000 00000004
c59b6860 c877385c c8773800
May 16 22:04:28 tux kernel: c003f3c3 c59b6914 00000000 c8778000 c87784f0
0000001f c8779fd0 c002e858
May 16 22:04:28 tux kernel: c8778000 c87784f0 c002ec06 00000000 00000000
00000000 c87784f0 00000000
May 16 22:04:28 tux kernel: Call Trace: [filemap_fdatasync+142/192]
[sync_unlocked_inodes+163/320] [sync_old_buffers+8/96] [kupdate+342/464]
[ret_from_fork+6/32]
Warning (Oops_read): Code line not seen, dumping what data is available
>>eax; c0127e10 <contig_page_data+1b0/3c0>
>>ebx; c1172e70 <_end+1002294/8ad1424>
>>ecx; c797f734 <_end+780eb58/8ad1424>
>>edx; c0127c60 <contig_page_data+0/3c0>
>>esp; c49b1ec0 <_end+48412e4/8ad1424>
>>eax; c0127e10 <contig_page_data+1b0/3c0>
>>ebx; c115769c <_end+fe6ac0/8ad1424>
>>ecx; c70866d4 <_end+6f15af8/8ad1424>
>>edx; c0127c60 <contig_page_data+0/3c0>
>>esp; c49b1ec0 <_end+48412e4/8ad1424>
>>eax; c0127e10 <contig_page_data+1b0/3c0>
>>ebx; c1118b34 <_end+fa7f58/8ad1424>
>>ecx; c59b6914 <_end+5845d38/8ad1424>
>>edx; c0127c60 <contig_page_data+0/3c0>
>>esp; c375fec0 <_end+35ef2e4/8ad1424>
>>ebx; c1118b34 <_end+fa7f58/8ad1424>
>>ecx; c8773864 <_end+8602c88/8ad1424>
>>edx; c1118b34 <_end+fa7f58/8ad1424>
>>esi; c59b6914 <_end+5845d38/8ad1424>
>>edi; c59b691c <_end+5845d40/8ad1424>
>>ebp; c59b6924 <_end+5845d48/8ad1424>
>>esp; c8779ef4 <_end+8609318/8ad1424>
1 warning issued. Results may not be reliable.
Thanks very much for the help. If you want, I can put the actual
kernel, system.map, modules, and so forth on a web site for you.
Cheers,
Oskar
--
Oskar Pearson <oskar@xxxxxxxxxxx>
Qualica Technologies (Pty) Ltd
web: http://www.qualica.com/
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel
|