I just tried another bk pull + make world, and it failed because it couldn't gunzip linux-2.4.26.tar.gz. I tried it manually and sure enough it failed. 'xm list' etc just seg faulted too.
After a reboot though, the file was fine again, so the corruption in this case was a read error not a write error. I'm assuming that if I had done enough io to flush any buffers and then tried to gunzip the file again it probably would have worked.
Just prior to this I had run a little C program which would just try and allocate memory in 1mb chunks until it was killed.. After reboot I tried the same thing again and it appears to be staying up okay now, unfortunately. It almost seems like I only start to get errors after a day or so uptime and a fair bit of I/O.
Curiously though, the first time I ran my memory exhausting program, all my xenU domains restarted...
Since starting this email I have managed to induce corruption again, i'll reboot and try it again without starting any other domains.
The server is a Compaq ProLiant 1600 2x550mhz P3 with 768mb memory. All the memory is ECC and up until I acquired it for Linux purposes, it was running as another company's main Windows server, so I wouldn't have suspected a hardware issue.
I'll follow up shortly hopefully with some instructions on inducing the corruption on this server for anyone else to try to see if we have a general problem.
There haven't been any fixes in the last 2 days that would correct this problem have there? I'm a few days out of date i think.
James
I'm not in a position to test this, but is it possible that the corruption problem could manifest itself after an out of memory condition? When I first noticed the corruption I rebooted as quickly as possible so it didn't continue and so didn't check, but it's possible that it ran out of memory first. I guess I could test this but don't really want to do anything to risk corruption any further :)
speaking of memory, I have 3 domains running currently, 0 + 2U, all declared with 128mb memory, but xm list shows this:
Dom Name Mem(MB) CPU State Time(s)
0 Domain-0 119 0 r---- 1293.0
6 gaia 127 1 -b--- 81.9
7 mail2 126 0 -b--- 1597.9
'free' under mail2 and gaia shows 128124 as the total amount of memory.
I appreciate that maybe something about dom0 means that it shows something different, but why would the other two report different amounts of memory when they both have the same amount??? Both are running identical kernels.
James
From: Chris Andrews
Sent: Mon 19/07/2004 8:43 AM
To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] file corruption!!!
On 18 Jul 2004, at 18:48, Ian Pratt wrote:
>>
>> On 17 Jul 2004, at 21:21, Ian Pratt wrote:
>>
>>> It would be very interesting to hear whether you get the problem
>>> with the 2.6.7 xen linux. It might give us a clue as to whether
>>> the problem is with the backend blk driver or within the domain
>>> itself (the 2.6.7 implementation is completely different).
>>
>> I can certainly give the 2.6.7 guest another try. I did have it
>> booting, but I didn't persist with it long enough to tell if there was
>> fs corruption -- there seemed to be issues loading modules, and when I
>> compiled everything in, I got a gpf when racoon tried to use a PF_KEY
>> socket. I'll try and get some useful dumps for both these problems.
>
> I haven't tried loading modules, but I can't think why it
> wouldn't work (assuming the mechanism is basically the same as
> 2.4).
It's different enough to need new userspace tools. The symptoms of
failure are a GPF, and the userspace process stuck in D (be it insmod
or lsmod). The results of feeding the GPF to ksymoops are below (I
hesitate to say it's actually decoded).
> BTW: what's racoon, and what's a PF_KEY socket?
racoon is the ISAKMP daemon used with the 2.6 kernel's KAME IPSec code.
It uses a PF_KEY socket to communicate with the kernel. I've
successfully used it in a 2.4 guest.
Chris.
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
CPU: 0
EIP: 0061:[<c01471a7>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246 (2.6.7-xenU)
eax: 00000600 ebx: c5400000 ecx: 00000001 edx: 00000600
esi: c0102c54 edi: c5089000 ebp: c5087000 esp: c04b1ec4
ds: 0069 es: 0069 ss: 0069
Stack: c0102c50 c5087000 00002000 c122c6a8 c122c6e0 00000001 c01473f8
c122c6a8
c5087000 fffffffe c0147491 c5087000 00000000 c5055c19 c5084380
c5015000
fffffffe c5084380 c014753e c5087000 00000001 c012d9c3 c5087000
c5087000
Call Trace:
c04b1ed0: [<c01473f8>] c04b1ee0: [<c0147491>] c04b1f00: [<c014753e>]
c04b1f0c: [<c012d9c3>] c04b1f38: [<c02da440>] c04b1f94: [<c012dc5d>]
c04b1fb4: [<c010a663>]
Code: 0f 22 e2 0f 20 d9 0f 22 d9 0f 22 e0 83 c4 0c 5b 5e 5f c3 e8
>>EIP; c01471a7 <unmap_vm_area+5d/80> <=====
>>ebx; c5400000 <pg0+50c8000/3bcc5000>
>>esi; c0102c54 <swapper_pg_dir+c54/1000>
>>edi; c5089000 <pg0+4d51000/3bcc5000>
>>ebp; c5087000 <pg0+4d4f000/3bcc5000>
>>esp; c04b1ec4 <pg0+179ec4/3bcc5000>
Code; c01471a7 <unmap_vm_area+5d/80>
00000000 <_EIP>:
Code; c01471a7 <unmap_vm_area+5d/80> <=====
0: 0f 22 e2 mov %edx,%cr4 <=====
Code; c01471aa <unmap_vm_area+60/80>
3: 0f 20 d9 mov %cr3,%ecx
Code; c01471ad <unmap_vm_area+63/80>
6: 0f 22 d9 mov %ecx,%cr3
Code; c01471b0 <unmap_vm_area+66/80>
9: 0f 22 e0 mov %eax,%cr4
Code; c01471b3 <unmap_vm_area+69/80>
c: 83 c4 0c add $0xc,%esp
Code; c01471b6 <unmap_vm_area+6c/80>
f: 5b pop %ebx
Code; c01471b7 <unmap_vm_area+6d/80>
10: 5e pop %esi
Code; c01471b8 <unmap_vm_area+6e/80>
11: 5f pop %edi
Code; c01471b9 <unmap_vm_area+6f/80>
12: c3 ret
Code; c01471ba <unmap_vm_area+70/80>
13: e8 00 00 00 00 call 18 <_EIP+0x18>
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel