Hello,
I work in a hosting company, we have tens of Xen dom0 running just fine,
but unfortunately we do have a few that get out of control.
Reported behaviour :
- dom0 uses more and more memory
- no process can be found using that memory
- at some point, oom killer kicks in, and kills everything, until even
ssh the box becomes hard
- when there is really no more process to kill, it crashes even more,
and we are forced to reboot
Configuration summary :
- dom0 with debian/stable, xen 4.0.1
- 512MB, or up to 2GB after some crash
I have tried to find something that differs between a working dom0 and a
buggy one, but didn't manage to find anything. Install from the same
template, same packages, same hardware (but serials and mac addresses).
I didn't manage to find anything about leak in dom0 ending up with oom
killer without doubt.
I tried to gather as much log as i thought could be helpful in
attachments[1].
Host bk - about to get a reboot, as xend already got killed
Host sw - 800MB/2GB used for nothing,
Attachments[1] contains :
- memory graph (by munin) - it might help to see the pattern of memory
usage
cat from :
- grub.cfg
- /proc/meminfo
- /proc/slabinfo
- /proc/vmstat
- /var/log/kern.log
- /var/log/xen/xend.log
Result from :
- dmesg
- dpkg -l
- free
- lsmod
- top
- vmstat
- xm info
- xm info -c
I'd appreciate any feedback about such behaviour, and would be happy to
provide additional information.
Those are productions servers, the only thing i'd really like to avoid
as much as possible is rebooting them for tests.
Regards,
--
Adrien URBAN
[1] Sent an email with files as attachments a few days ago, but it never
made the list.
Files can be found here : http://www.hagtheil.net/xen/oom/
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|