WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] time-related problems in recent Xen

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] time-related problems in recent Xen
From: Osma Suominen <osma.suominen@xxxxxxxxxxxx>
Date: Fri, 27 May 2005 14:59:23 +0300
Delivery-date: Fri, 27 May 2005 11:56:07 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Internet Messaging Program (IMP) 3.2.2
Hello,

I have seen the same problem described by Niels Toedmann a few days ago 
(see http://thread.gmane.org/gmane.comp.emulators.xen.devel/7054 )
It appears that during high load some time-related system calls break.

I was able to make a reproducible test case with the 2.0.5 live CD.
Unfortunately I haven't yet been able to test 2.0.6 since I don't have a spare
machine to install that on. I'll wait for the live CD which will hopefully be
released soon.

The easiest way to trigger this is to run a CPU-intensive program in the
background. I've used SETI@Home but I've seen the same problem during other
high loads as well (however, a simple Python "while 1: pass" isn't enough).
This
often breaks the following programs:

* Mailman (IOError in time.sleep())
* Zope (misc time-related errors, or just hangs)
* ncftp (works, but download speed is always reported as 0KB/sec)
* wget (crashes with "acalc_rate: Assertion `msecs >= 0' failed")
* apt-get (crashes with time-related Perl error messages)
* ssh, ssh-keygen (refuses to start, "PRNG not seeded" error)
* top (displays "nan" in the %CPU column)

Try this:
- boot the 2.0.5 live CD in text mode
- ifup eth0 (assuming you have a DHCP server) in domain 0
  (no need to boot other domains, but you can reproduce this in them as well)
- wget ftp://alien.ssl.berkeley.edu/pub/setiathome-3.08.i686-pc-linux-gnu.tar
- untar and run setiathome (in the background)
- now try some of these:
   - wget the same file again -> crashes
   - in python, run the following:
      import time
      time.sleep(1) -> IOError (not every time though)
   - in the shell, run "sleep 1" -> sleeps forever 
   - try to ssh to another machine -> "PRNG not seeded"
- kill setiathome, the programs start working again (not always though)

There are no new messages in dmesg or /var/log/* after what's caused by ifup.
However, after starting up setiathome, the process "python /usr/sbin/xend
start" starts eating lots of CPU, and goes on doing thateven if I kill
setiathome. I don't know whether this is normal behavior or not. According to
top it takes 99.7% CPU, but ps reports a more modest 15% to 25%.

I'm still new to Xen and I'll answer any questions you may have. Also I'm
planning to retest with the 2.0.6 live CD when it is released.

As I said SETI is an easy program to test with but I've seen the problem
occasionally without SETI as well on a server with only official Debian 
binaries (mysql, exim, apache2, spamd, stunnel, php4).

Here's my /proc/cpuinfo on the liveCD test machine:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 6
cpu MHz         : 733.372
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : yes
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 mmx fxsr
sse
bogomips        : 1464.72

However, I've seen this on some server machines as well.
Here's the cpuinfo for one of them:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 1
cpu MHz         : 2995.006
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : yes
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts
acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid xtpr
bogomips        : 5976.88

-Osma

-- 
*** Osma Suominen / MB Concert Ky ***

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] time-related problems in recent Xen, Osma Suominen <=