Re: [Xen-users] RE: Fedora 10-VM hung, Time issue and /dev/proc/sys/xen

Subject: Re: [Xen-users] RE: Fedora 10-VM hung, Time issue and /dev/proc/sys/xen missing, HVM VMs recognized as PV
Date: Sun, 12 Apr 2009 11:48:50 -0400
Two approaches:

1. Try to get to root cause of problem
2. Work around the problem because you're not paid to do (1)

Build a vm with same kernel version as Dom0


1. Install sar(systat) on one of your DomU's collecting data at 1 second interval
2. If you dont see much activity on this VM then run some Linux compiles or the Volano benchmark to keep the VM busy
3. Look at the sar dat with ksar. What happens when you get this "hang"

 Can you see if you have a /proc/xen or /sys/xen?


On Apr 11, 2009, at 7:38 PM, Lionel Raynaud wrote:

Xen Users,
We have experienced recently few issues on Xen 3.3.1 for which we would appreciate if one of you can shed some light.
First of all, our system configuration is:
-          a dual Xeon 2.5GHz with 16Gb (8 cores)
-          Xen 3.3.1 from latest xensources distributed with Linux Kernel
-          Dom0 is a Centos 5.2 upgraded few days ago to Centos 5.3
-          There are 6 HVM DomUs running, 5 with sporadic issues (see below) are Fedora-10 x86_64 and 1 domU (no issue so far) is a Windows 2003.
-          The 5 Fedora-10 domUs have the latest package upgrades, including a Linux kernel They have 2 vCPU each, between 512MB to 1Gb of memory, and 30Gb of disk space stored on an internal SATA
-          Dom0’s VPCU is pinned to core 0 (dom0_vcpus_pin)
-          DomUs are  visibly sharing core 1 to 7, (xm vcpu_List) although no config was done to map them to specific Cpu/cores
Now here are our observations:
(1)    The Fedora-10 domUs described above are randomly and partially (see below) freezing after running for some hours.
-          If there is a pre-existing ssh session on a hung domU, some commands such as ‘ls’, ‘ps’, ‘tail –f <file>’,’free’ can be executed while commands such ‘top’, ‘vmstat’ will hang OR sometimes no command at all
-          Xentop display of 0% activity on a hung domU although I have observed a 100% once on another hung one
-          There is nothing significant on  domU:/var/log/messages and nothing as well on dom0:/var/log/xen/qemu-dm-…
-          Nagios running on dom0 doesn’t really picked this condition up as the hung domUs are still able to answer ping or able to answer Nagios ssh checkin; note that ssh to a hung domU doesn’t work although Nagios basic tcp port answers on 22
-          Their time is completely off (see next observation below) with or without ntpd running
-          I had the occasion to run ‘free’ on few of them and it appears that they had enough free memory, i.e. not swapping at all
ð  I don’t want to speculate on the potential root cause nevertheless what can be the next most effective troubleshooting steps?
o   Force a domU system dump? And then?
o   Deep dive into dom0 logs although a quick browsing wasn’t successful?
o   Disable most of the processes on one of these domU to identify if a user proc can cause this issue (may be very time consuming)?
o   Set the run-level to 3 instead of 5?
o   etc
(2)    The 5 Fedora-10s domUs are not keeping their time in sync
We have read different pages concerning time management for a Linux domU but we haven’t found yet something concluding and/or haven’t been able to set this up properly. The facts are:
-          Our dom0 runs ntpd and is perfectly synchronized on external public ntp sources
-          We tried initially to run ntpd on the Fedora-10 domUs, configured on external public sources, which has proven to be unsuccessful; the time is usually off by few minutes
-          We tried without ntpd, this should be the proper configuration according to our readings as the domUs’ hardware clock should sync up on their dom0’s hw clock alas still unsuccessful. In this case, the domUs end up significantly lagging behind their dom0’s time
-          We have read on few occasion that there is a parameter to set with echo 1 > /proc/sys/xen/independent_wallclock in order to run ntpd on a domU, but /proc/sys/xen doesn’t exist on these Fedora-10 domUs. Is it an expected behavior? Should we assume the setting independent_wallclock is only for PV domUs?!
-          Note that one of the domUs is a Windows 2003 server 32-bits and is perfectly on time, i.e. in sync with its dom0. It does run the default Windows time service, no ntpd installed
(3)    The 5 Fedora-10 domUs have been installed as HVM domU but their kernels see them as PV. This may be a misunderstanding from our side, however, a dmesg on the 5 Fedora-10 domUs, shows the message:
“Booting paravirtualized kernel on bare hardware”
We just installed an HVM centos 5.3 domU, and this time the kernel boot message “Booting …” doesn’t appear.
Therefore, can we conclude that the presumed HVM Fedora-10 domUs are in fact PV domUs?
Should a /proc/sys/xen be present on a PV domU or on any type of domUs?
Thank you,
Lionel Raynaud.
