We have experienced recently few issues on Xen 3.3.1 for
which we would appreciate if one of you can shed some light.
First of all, our system configuration is:
a dual Xeon 2.5GHz with 16Gb (8 cores)
Xen 3.3.1 from latest xensources distributed with Linux
Dom0 is a Centos 5.2 upgraded few days ago to Centos
There are 6 HVM DomUs running, 5 with sporadic issues
(see below) are Fedora-10 x86_64 and 1 domU (no issue so far) is a Windows
The 5 Fedora-10 domUs have the latest package upgrades,
including a Linux kernel 184.108.40.206-170.2.35.fc10.x86_64. They have 2 vCPU each,
between 512MB to 1Gb of memory, and 30Gb of disk space stored on an internal
Dom0’s VPCU is pinned to core 0 (dom0_vcpus_pin)
DomUs are visibly sharing core 1 to 7, (xm
vcpu_List) although no config was done to map them to specific Cpu/cores
Now here are our observations:
Fedora-10 domUs described above are randomly and partially (see below) freezing
after running for some hours.
If there is a pre-existing ssh session on a hung domU,
some commands such as ‘ls’, ‘ps’, ‘tail –f <file>’,’free’ can be executed
while commands such ‘top’, ‘vmstat’ will hang OR sometimes no command at all
Xentop display of 0% activity on a hung domU although I
have observed a 100% once on another hung one
There is nothing significant on
domU:/var/log/messages and nothing as well on dom0:/var/log/xen/qemu-dm-…
Nagios running on dom0 doesn’t really picked this
condition up as the hung domUs are still able to answer ping or able to answer
Nagios ssh checkin; note that ssh to a hung domU doesn’t work although Nagios
basic tcp port answers on 22
Their time is completely off (see next observation
below) with or without ntpd running
I had the occasion to run ‘free’ on few of them and it
appears that they had enough free memory, i.e. not swapping at all
don’t want to speculate on the potential root cause nevertheless what can be
the next most effective troubleshooting steps?
Force a domU system dump? And then?
Deep dive into dom0 logs although a quick
browsing wasn’t successful?
Disable most of the processes on one of these
domU to identify if a user proc can cause this issue (may be very time
Set the run-level to 3 instead of 5?
5 Fedora-10s domUs are not keeping their time in sync
We have read different pages
concerning time management for a Linux domU but we haven’t found yet something
concluding and/or haven’t been able to set this up properly. The facts are:
Our dom0 runs ntpd and is perfectly synchronized on
external public ntp sources
We tried initially to run ntpd on the Fedora-10 domUs,
configured on external public sources, which has proven to be unsuccessful; the
time is usually off by few minutes
We tried without ntpd, this should be the proper
configuration according to our readings as the domUs’ hardware clock should
sync up on their dom0’s hw clock alas still unsuccessful. In this case, the
domUs end up significantly lagging behind their dom0’s time
We have read on few occasion that there is a parameter
to set with echo 1 > /proc/sys/xen/independent_wallclock in order to
run ntpd on a domU, but /proc/sys/xen doesn’t exist on these Fedora-10 domUs.
Is it an expected behavior? Should we assume the setting independent_wallclock
is only for PV domUs?!
Note that one of the domUs is a Windows 2003 server
32-bits and is perfectly on time, i.e. in sync with its dom0. It does run the
default Windows time service, no ntpd installed
5 Fedora-10 domUs have been installed as HVM domU but their kernels see them as
PV. This may be a misunderstanding from our side, however, a dmesg on the 5
Fedora-10 domUs, shows the message:
“Booting paravirtualized kernel on
We just installed an HVM centos 5.3 domU, and this time the
kernel boot message “Booting …” doesn’t appear.
Therefore, can we conclude that the presumed HVM Fedora-10
domUs are in fact PV domUs?
Should a /proc/sys/xen be present on a PV domU or on any
type of domUs?