Hi,
I am using Xen-2.0.7 on a Dual Intel Xeon 2.8GHz system with 4GB of ram. I am
using 2.6.11 as kernel for my domain 0. Domain 0 uses Debian Sarge with a
backported Xen 2.0.7 package (only litte changes to the debian 2.0.6 package,
nothing important enough to get metioned). All kernels were compiled against
vanilla kernels with xen-patch. The domain U's are using 2.6.11 or 2.4.30
(debian, suse).
I have no problems within domains and everything is running very smoothly,
exepct one thing (which was also not working correctly in xen-2.0.6 for me):
I can save a domain with "xm save <domainname> <suspendfile>" once and I can
restore this domain again, but if I try a second "xm save ..." it simply
seems to hang. Nothing happens and the last thing in the logs are these
lines:
==> /var/log/xend.log <==
[2005-08-15 20:12:27 xend] INFO (XendMigrate:380) Save BEGIN: ['save', ['id',
'1'], ['state', 'begin'], ['domain', '5'], ['file', '/suspend/vm-ralph']]
[2005-08-15 20:12:27 xend] INFO (XendRoot:113) EVENT> xend.domain.save
['vm-ralph', '5', 'begin', ['save', ['id', '1'], ['state', 'begin'],
['domain', '5'], ['file', '/suspend/vm-ralph']]]
==> /var/log/xfrd.log <==
3808 [INF] XFRD> Accepted connection from 127.0.0.1:3905 on 2
4165 [INF] XFRD> Xfr service for 127.0.0.1:3905
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.save 5 "(domain (id 5) (name vm-ralph) (memory 127) (maxmem 128) (state
-b---) (cpu 3) (cpu_time 1.583158713) (up_time 1401.25794005) (start_time
1124128146.12) (console (status listening) (id 12) (domain 5) (local_port 12)
(remote_port 1) (console_port 9605)) (devices (vif (idx 0) (vif 0) (mac
aa:00:00:00:00:22) (vifname vif5.0) (ip 212.79.XXX.XXX/32) (evtchn 17 4)
(index 0)) (vbd (idx 0) (vdev 2049) (device 65030) (mode w) (dev sda1) (uname
phy:xen-volumes/vm-ralph) (node xen-volumes/vm-ralph) (index 0)) (vbd (idx 1)
(vdev 2050) (device 65031) (mode w) (dev sda2) (uname
phy:xen-volumes/swap-ralph) (node xen-volumes/swap-ralph) (index 1))) (config
(vm (name vm-ralph) (memory 128) (cpu 3) (image (linux
(kernel /boot/xen-linux-2.6.11-domu-tops1)
(ramdisk /boot/xen-linux-2.6.11-domu-tops1-modules) (root '/dev/sda1 ro')))
(device (vbd (uname phy:xen-volumes/vm-ralph) (dev sda1) (mode w))) (device
(vbd (uname phy:xen-volumes/swap-ralph) (dev sda2) (mode w))) (device (vif
(mac aa:00:00:00:00:22) (ip 212.79.XXX.XXX/32))))))" /suspend/vm-ralph)
[DEBUG] Conn_sxpr< err=0
[1124129547.387983] xc_linux_save start 5
xc_linux_save start 5
I can strace the "xm save" process, but there is not much acction:
xen:/var/log# ps fax |grep xm
4164 pts/0 S+ 0:00 | \_ python /usr/sbin/xm save
vm-ralph /suspend/vm-ralph
xen:/var/log# strace -p 4164
Process 4164 attached - interrupt to quit
recv(3,
Even an xfrd thrad seems to be spawned, but there is more or less the same as
in the xm save process:
xen:/var/log# ps fax |grep xfrd
3808 ? S 0:00 xfrd
4165 ? SL 0:00 \_ xfrd
xen:/var/log# strace -p 4165
Process 4165 attached - interrupt to quit
read(3,
I can press ctrl-c and the "xm save" aborts with the following error (I waited
over 3min):
Traceback (most recent call last):
File "/usr/sbin/xm", line 9, in ?
main.main(sys.argv)
File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 808, in main
xm.main(args)
File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 106, in main
self.main_call(args)
File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 124, in
main_call
p.main(args[1:])
File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 276, in main
server.xend_domain_save(dom, savefile)
File "/usr/lib/python2.3/site-packages/xen/xend/XendClient.py", line 244, in
xend_domain_save
{'op' : 'save',
File "/usr/lib/python2.3/site-packages/xen/xend/XendClient.py", line 148, in
xendPost
return self.client.xendPost(url, data)
File "/usr/lib/python2.3/site-packages/xen/xend/XendProtocol.py", line 79,
in xendPost
return self.xendRequest(url, "POST", args)
File "/usr/lib/python2.3/site-packages/xen/xend/XendProtocol.py", line 143,
in xendRequest
resp = conn.getresponse()
File "/usr/lib/python2.3/httplib.py", line 781, in getresponse
response.begin()
File "/usr/lib/python2.3/httplib.py", line 273, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.3/httplib.py", line 231, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.3/socket.py", line 323, in readline
data = recv(1)
KeyboardInterrupt
After that it doesn't matter if I shutdown and recreate the domain before I
try to save the domain for the second time. It happens every time after the
first successfull save&restore. Sometimes even on the first "xm save"
attempt.
It even seems that xen let's the "half-saved" domain in a broken state,
because I cannot shutdown the domain correctly after the second "xm save"
attempt. I can ssh into it and type "halt" and it shutdowns, but xen (xm
list) still things that the domain is running. even a xm destroy <domainname>
doesn't help. I have to reboot the phy. machine to get the domain working
correctly.
Because this should get a production system very soon I would appreciate help
very much. More information (like xm dmesg) available on request... ;-PP
--Ralph
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|