WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] issues getting more than 16M ram to be used without oop

To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] issues getting more than 16M ram to be used without oopsing. 1.2 and 1.3-unstable
From: "Brian Wolfe" <brianw@xxxxxxxxxxxx>
Date: Sun, 4 Apr 2004 14:23:37 -0500 (CDT)
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 04 Apr 2004 20:25:31 +0100
Envelope-to: steven.hand@xxxxxxxxxxxx
Importance: Normal
In-reply-to: <E1BA2cs-0006OY-00@xxxxxxxxxxxxxxxxx>
List-archive: <http://sourceforge.net/mailarchive/forum.php?forum=xen-devel>
List-help: <mailto:xen-devel-request@lists.sourceforge.net?subject=help>
List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
List-post: <mailto:xen-devel@lists.sourceforge.net>
List-subscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=subscribe>
List-unsubscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=unsubscribe>
References: Your message of "Sat, 03 Apr 2004 15:53:51 MDT." <36009.216.166.50.35.1081029231.squirrel@xxxxxxxxxxxxx> <E1BA2cs-0006OY-00@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-admin@xxxxxxxxxxxxxxxxxxxxx
User-agent: SquirrelMail/1.5.0
Bad memory is what I also thought the first time it failed to run. It
certainly LOOKS like a case of bad ram. However the unknown ACPI boot
message has given me memory faults on other mainboards in the past year or
so when using Athlon XP 1900+ and 2000+ cpus. That and the hardware's
history of use makes me think that it is ok.

I'll go ahead and run memtest on it for a few hours to see if it finds
anything and report back the results once I have completed the hardware
testing. I will also try swapping out the memory in my new workstation and
see if that affects it any.

Workstation ram:  2 pcs. Micron pc2100 512MB, using it right now under
2.6.4 (Linux enigma 2.6.4-1-k7 #1 Sat Mar 13 22:44:25 EST 2004 i686
GNU/Linux) Debian unstable.
 Xen server: 1 pc. 512MB pc2100 Micron, 2 pcs. 256M Micron pc2100.

I hope it is as simple as the system doesn't liek the memory that is in it
I have attempted boting from IDE and nfsroot with the IDE turned off.
There is one item during the xen-1.2 bootup with the debian xen package
that I did notice.

Note the "unknown apic" line where it attempts to initialize the ACPI
hardware. I don't know if this can be a source fo the issue. This mesage
has given me grief in other systems with similar effects as this until I
custom compiled a newer 2.6 kernel that supported the APIC natively.

The stock debian 2.4 kernel also seems to have issues with the io APIC.
This is why I compiled custom 2.6 kernels which found the APIC and used it
ok. This is also what was leading me to believe that the Via chipset may
be behaving in a way that Xen isn't expecting and that the Linux 2.4 and
2.6 kernels are avoiding triggering.

Maybe Xen pokes around in different ways/areas than the linux kernel did
and has found some bad ram and/or APIC  flaw that I just never ran into
with Linux by sheer chance.

The following is a full dump of the system booting up under xen-1.2 with
18m for domain0 as compild by Adam (doogie) for Debian unstable. It loads
grub, xen, and xenolinux from hda1. Root is loaded from the nfsroot
fileserver. For some reason using 18000 works as well as using < 16384,
but memory sizes between 16M and 18m fail, as does anything over 18m.

BTW, just smack me if I provide too much and/or the wrong info that yall
need. :)

------- bootup dump of Debian  xen.deb, 1.2 version. -----

root  (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /xen.gz dom0_mem=18000 ser_baud=115200 noht watchdog
   [Multiboot-elf, <0x100000:0x11a138:0x0>, <0x21c000:0x1f4f8:0x29528>,
shtab=0
x265280, entry=0x100000]
module /xenolinux.gz root=/dev/nfs nfsroot=10.10.10.161:/xen/dom0 rw
ip=10.10.1
0.160::10.10.10.1:255.255.255.0:vhost1:eth0:off console=xencons0
   [Multiboot-module @ 0x286000, 0x12f8cc bytes]

 __  __            _   ____
 \ \/ /___ _ __   / | |___ \
  \  // _ \ '_ \  | |   __) |
  /  \  __/ | | | | |_ / __/
 /_/\_\___|_| |_| |_(_)_____|

 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 1.2 (root@xxxxxxxxxxxxxxxxxxxxx) (gcc version 3.3.3 (Debian))
Thu M
ar 4 12:56:20 CST 2004

Initialised all memory on a 1022MB machine
Reading BIOS drive-info tables at 0xf95f0 and 0xfe819
CPU0: Before vendor init, caps: 0383fbff c1c3fbff 00000000, vendor = 2
CPU caps: 0383fbff c1c3fbff 00000000 00000000
found SMP MP-table at 000f60c0
Memory Reservation 0xf60c0, 4096 bytes
Memory Reservation 0xf0c00, 4096 bytes
ACPI: Searched entire block, no RSDP was found.
ACPI: RSDP located at physical address fc4f7ac0
RSD PTR  v0 [KT600 ]
__va_range(0x3fef3000, 0x68): idx=8 mapped at ffff6000
ACPI table found: RSDT v1 [KT600  AWRDACPI 16944.11825]
__va_range(0x3fef3040, 0x24): idx=8 mapped at ffff6000
__va_range(0x3fef3040, 0x74): idx=8 mapped at ffff6000
ACPI table found: FACP v1 [KT600  AWRDACPI 16944.11825]
__va_range(0x3fef7a00, 0x24): idx=8 mapped at ffff6000
__va_range(0x3fef7a00, 0x5a): idx=8 mapped at ffff6000
ACPI table found: APIC v1 [KT600  AWRDACPI 16944.11825]
__va_range(0x3fef7a00, 0x5a): idx=8 mapped at ffff6000
LAPIC (acpi_id[0x0000] id[0x0] enabled[1])
CPU 0 (0x0000) enabledProcessor #0 Pentium(tm) Pro APIC version 16
IOAPIC (id[0x2] address[0xfec00000] global_irq_base[0x0])
INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0])
INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x0] trigger[0x0])
LAPIC_NMI (acpi_id[0x0000] polarity[0x1] trigger[0x1] lint[0x1])
1 CPUs total
Local APIC address fee00000
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #0 Pentium(tm) Pro APIC version 17
I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode: Flat.Using 1 I/O APICs
Processors: 2
Initialising domains
Initialising schedulers
Initializing CPU#0
Detected 1852.069 MHz processor.
CPU0: Before vendor init, caps: 0383fbff c1c3fbff 00000000, vendor = 2
CPU caps: 0383fbff c1c3fbff 00000000 00000000
CPU0 booted
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Error: only one processor found.
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-10, 2-11, 2-16, 2-18, 2-19, 2-20, 2-21, 2-22
not co
nnected.
..TIMER: vector=0x41 pin1=2 pin2=0
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 00178003
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0003
An unexpected IO-APIC was found. If this kernel release is less than
three months old please report this to linux-smp@xxxxxxxxxxxxxxx
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
 00 000 00  1    0    0   0   0    0    0    00
 01 0FF 0F  0    0    0   0   0    1    1    49
 02 001 01  0    0    0   0   0    1    1    41
 03 0FF 0F  0    0    0   0   0    1    1    51
 04 0FF 0F  0    0    0   0   0    1    1    59
 05 0FF 0F  0    0    0   0   0    1    1    61
 06 0FF 0F  0    0    0   0   0    1    1    69
 07 0FF 0F  0    0    0   0   0    1    1    71
 08 0FF 0F  0    0    0   0   0    1    1    79
 09 0FF 0F  0    0    0   0   0    1    1    81
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 0FF 0F  0    0    0   0   0    1    1    89
 0d 0FF 0F  0    0    0   0   0    1    1    91
 0e 0FF 0F  0    0    0   0   0    1    1    99
 0f 0FF 0F  0    0    0   0   0    1    1    A1
 10 000 00  1    0    0   0   0    0    0    00
 11 0FF 0F  1    1    0   1   0    1    1    A9
 12 000 00  1    0    0   0   0    0    0    00
 13 000 00  1    0    0   0   0    0    0    00
 14 000 00  1    0    0   0   0    0    0    00
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 0FF 0F  1    1    0   1   0    1    1    B1
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ17 -> 0:17
IRQ23 -> 0:23
.................................... done.
Using local APIC timer interrupts.
Calibrating APIC timer for CPU0...
..... CPU speed is 1852.1073 MHz.
..... Bus speed is 336.7467 MHz.
..... bus_scale = 0x000158E5
ACT: Initialising Accurate timers
Time init:
.... System Time: 11615885ns
.... cpu_freq:    00000000:6E645690
.... scale:       00000001:14728FC3
.... Wall Clock:  1081028483s 0us
Start schedulers
Testing NMI watchdog --- CPU#0 okay.
PCI: PCI BIOS revision 2.10 entry at 0xfb8d0, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router VIA [1106/3177] at 00:11.0
PCI->APIC IRQ transform: (B0,I9,P0) -> 17
PCI->APIC IRQ transform: (B0,I18,P0) -> 23
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
00:09.0: 3Com PCI 3c595 Vortex 100baseTx at 0xd000. Vers LK1.1.16
00:09.0: Overriding PCI latency timer (CFLT) setting of 32, new value is 248.
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 89
VP_IDE: detected chipset, but driver not compiled in!
PCI: No IRQ known for interrupt pin A of device 00:11.1. Probably buggy MP
table
.
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:pio, hdd:pio
hda: ST360021A, ATA DISK drive
hdb: TOSHIBA CD-ROM XM-5602B, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdb: ATAPI 8X CD-ROM drive, 256kB Cache
Uniform CD-ROM driver Revision: 3.12
hda: 117231408 sectors (60022 MB) w/2048KiB Cache, CHS=7297/255/63 PIO
(slow!)
SCSI subsystem driver Revision: 1.00
Red Hat/Adaptec aacraid driver (1.1.2 Mar  4 2004 12:55:28)
Device eth0 opened and ready for use.
DOM0: Guest OS virtual load address is c0000000
DOM0: xen_console_init
DOM0: Linux version 2.4.25-xeno-p2 (adam@gradall) (gcc version 3.3.3
(Debian)) #
1 Thu Mar 4 12:49:07 CST 2004
DOM0: On node 0 totalpages: 4500
DOM0: zone(0): 4096 pages.
DOM0: zone(1): 404 pages.
DOM0: zone(2): 0 pages.
DOM0: Kernel command line: /xenolinux.gz root=/dev/nfs
nfsroot=10.10.10.161:/xen
/dom0 rw ip=10.10.10.160::10.10.10.1:255.255.255.0:vhost1:eth0:off
console=xenco
ns0
DOM0: Initializing CPU#0
DOM0: Xen reported: 1852.069 MHz processor.
DOM0: Console: colour VGA+ 80x25
DOM0: Calibrating delay loop... 14837.35 BogoMIPS
DOM0: Memory: 16348k/18000k available (978k kernel code, 1652k reserved,
166k da
ta, 52k init, 0k highmem)
DOM0: Dentry cache hash table entries: 4096 (order: 3, 32768 bytes)
DOM0: Inode cache hash table entries: 2048 (order: 2, 16384 bytes)
DOM0: Mount cache hash table entries: 512 (order: 0, 4096 bytes)
DOM0: Buffer cache hash table entries: 1024 (order: 0, 4096 bytes)
DOM0: Page-cache hash table entries: 8192 (order: 3, 32768 bytes)
DOM0: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
DOM0: CPU: L2 Cache: 512K (64 bytes/line)
DOM0: CPU: AMD Athlon(tm) XP 2500+ stepping 00
DOM0: POSIX conformance testing by UNIFIX
DOM0: Linux NET4.0 for Linux 2.4
DOM0: Based upon Swansea University Computer Society NET3.039
DOM0: Initializing RT netlink socket
DOM0: Starting kswapd
DOM0: VFS: Disk quotas vdquot_6.5.1
DOM0: register_swap_method: method blkdev
DOM0: register_swap_method: method blkdev file
DOM0: register_swap_method: method nfs file
DOM0: Xeno console successfully installed
DOM0: Successfully installed virtual firewall/router interface
DOM0: Starting Xeno Balloon driver
DOM0: pty: 256 Unix98 ptys configured
DOM0: Serial driver version 5.05c (2001-07-08) with no serial options enabled
DOM0: ttyS00 at 0x03f8 (irq = 4) is a 16550A
DOM0: Real Time Clock Driver v1.10f
DOM0: Could not allocate block update interrupt
DOM0: Initializing Cryptographic API
DOM0: NET4: Linux TCP/IP 1.0 for NET4.0
DOM0: IP Protocols: ICMP, UDP, TCP, IGMP
DOM0: IP: routing cache hash table of 512 buckets, 4Kbytes
DOM0: TCP: Hash tables configured (established 1024 bind 2048)
DOM0: IP-Config: Complete:
DOM0:       device=eth0, addr=10.10.10.160, mask=255.255.255.0,
gw=10.10.10.1,
DOM0:      host=vhost1, domain=, nis-domain=(none),
DOM0:      bootserver=255.255.255.255, rootserver=10.10.10.161, rootpath=
DOM0: NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
DOM0: Looking up port of RPC 100003/2 on 10.10.10.161
DOM0: Looking up port of RPC 100005/1 on 10.10.10.161
DOM0: VFS: Mounted root (nfs filesystem).
DOM0: Freeing unused kernel memory: 52k freed
DOM0: INIT: version 2.85 booting
DOM0: Hello World!
DOM0: mount: can't find / in /etc/fstab or /etc/mtab
DOM0: Hello World.
DOM0: Loading /etc/console/boottime.kmap.gz
DOM0: Activating swap.
DOM0: Calculating module dependencies... done.
DOM0: Loading modules: via-rhine modprobe: Can't locate module via-rhine
DOM0: 3c59x modprobe: Can't locate module 3c59x
DOM0:
DOM0: Checking all file systems...
DOM0: fsck 1.35 (28-Feb-2004)
DOM0: Setting kernel variables..
DOM0: Mounting local filesystems...
DOM0:  /var/run /var/lock.
DOM0: Running 0dns-down to make sure resolv.conf is ok...done.
DOM0: Cleaning: /etc/network/ifstate.
DOM0: Setting up IP spoofing protection: rp_filter.
DOM0: Configuring network interfaces...SIOCADDRT: File exists
DOM0: done.
DOM0: /etc/rcS.d/S41hostname.dhcp: line 1: host: command not found
DOM0: Starting portmap daemon: portmap.
DOM0: Loading the saved-state of the serial devices...
DOM0: /dev/ttyS0 at 0x03f8 (irq = 4) is a 16550A
DOM0: Initializing random number generator...done.
DOM0: Recovering nvi editor sessions... done.
DOM0: /etc/init.d/rcS: line 54: /etc/rcS.d/S70xfree86-common: Permission
denied
DOM0: INIT: Entering runlevel: 2
DOM0: Starting system log daemon: syslogd.
DOM0: Starting kernel log daemon: klogd.
DOM0: Starting portmap daemon: portmap.
DOM0: Starting MTA: 2004-04-03 15:41:35 Failed to open configuration file
/etc/e
xim/exim.conf
DOM0: Starting internet superserver: inetd.
DOM0: Starting OpenBSD Secure Shell server: sshd.
DOM0: Starting NFS common utilities: statd.
DOM0: Starting deferred execution scheduler: atd.
DOM0: Starting periodic command scheduler: cron.
DOM0: INIT: no more processes left in this runlevel
DOM0: INIT: Switching to runlevel: 6
DOM0: Stopping periodic command scheduler: cron.
DOM0: Stopping MTA: No /usr/lib/exim/exim3 found running; none killed.
DOM0: exim.
DOM0: Stopping internet superserver: inetd.
DOM0: Stopping OpenBSD Secure Shell server: sshd.
DOM0: Stopping NFS common utilities: statd.
DOM0: Stopping deferred execution scheduler: atd.
DOM0: Stopping kernel log daemon: klogd.
DOM0: Stopping system log daemon: syslogd.
DOM0: Sending all processes the TERM signal...done.
DOM0: Sending all processes the KILL signal...done.
DOM0: Saving random seed...done.
DOM0: Unmounting remote and non-toplevel virtual filesystems...done.
DOM0: NOT deconfiguring network interfaces: / is an NFS mount
DOM0: Deactivating swap...done.
DOM0: Unmounting local filesystems...done.
DOM0: Rebooting... Restarting system.
Domain 0 killed: rebooting machine!

----------------------

Here is some of the machine's work history that will explain why i'm so
confident that the hardware is ok.

Prior to retasking this machine as a Xen server it was my workstation. I
have run a variety of kernels on it in the pat and even pushed the system
memory consumption quite hard (it used to have a triple head xinerama
setup on it.)

I've run 2.4.21, 22, 23 and 24, 2.6.0-test9, 0, 1, 2, 3 and 4 on it (all
custom compiles, 2.6 compiles used highmem 2G) when it was my workstation.
I have pushed the memory consumption to the point of causing heavy
swapping (700M+ of swap used, < 100M cache used, <16M free memory).

  Prior to emailing the Xen list I swapped the ram sticks out with another
server that has run flawlessly for the past 12 months as a production
mail server (50,000 email accounts, 400+ simultaneous receiving sendmail
processes).

Wehn the mainboard, cpu, disk, and power supply were my workstation I did
MANY intensive compiling sessions (best method to find bad ram besides
badram and memtest), sometimes several compiles in parallel.


-- 
Brian Wolfe           | Phone 1-(214)-764-1204
President,            | Email  brianw@xxxxxxxxxxxx
TerraBox.com Inc.     |


pub  1024D/73C5A2DF 2003-03-18 Brian Wolfe <brianw@xxxxxxxxxxxx>
     Key fingerprint = 050E 5E3C CF65 4C1E A183  F48F E3E3 5B22 73C5 A2DF
sub  1024g/BB87A3DD 2003-03-18


Keir Fraser said:
>
>> Now, this machine has been used for aprox 5 months now without any
>> glitches or oopses. So i'm 99.9999% certain that the hardware is good.
>>
>> I'm using an NFS root since the ide is only in pio mode (and to
>> eliminate
>> it's use toher than to boot the kernels).
>>
>> Any insights?
>>
>> If necessary for debuging, I can provide access to the hardware via
>> serial
>> console. :)
>>
>> Thanks for any help yall can give!
>
> The crashes look quite random -- I don't think this is a bug in the
> core of Xen. The two most likely possibilities are that you have duff
> memory or that a misconfigured device is trashing memory. I definitely
> wouldn't discount the former, even though native x86 Linux has been
> running okay -- crashes can be very sensitive to memory layout.
>
> It might be worth running a few rounds of memtest on the machine, or
> swapping the memory, or trying to boot Xen on another identical box.
>
> If that doesn't cure it then try swapping out or disabling
> hardware. For example, boot off local disc and disable networking
> ('ifname=dummy'). Since the cause is most likely hardware-related, the
> best approach is to isolate the problem hardware.
>
>  -- Keir
>
> PS. If you build your own Xen/Xenolinux then keep the build trees
> around (or at least, for Xenolinux, the 'vmlinux' file). I can't find
> suitable image files for the tarballs on the Xen website, and without
> them it is very difficult to determine anything from crash dumps.
>



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel