Hi,
in preparation for our soon to arrive central storage array i wanted to
test live magration and remus replication and stumbled upon a problem.
When migrating a test-vm (512megs ram, idle) between my 3 servers two of
them are extremely slow in "receiving" the vm. There is little to no cpu
utilization from xc_restore until shortly before migration is complete.
The same goes for xm restore.
The xend.log contains:
[2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:286)
restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0,
[2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:305) [xc_restore]:
/usr/lib/xen/bin/xc_restore 48 43 1 2 0 0 0 0
[2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) xc_domain_restore
start: p2m_size = 20000
[2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) Reloading memory
pages: 0%
[2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal
error: Error when reading batch size
[2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal
error: error when buffering batch, finishing
When receiving a vm via live migration finally finishes. You can see the
large gap in the timestamps.
The vm is perfectly fine after that, it just takes way too long.
First off let me explain my server setup, detailed information on trying
to narrow down the error follows.
I have 3 servers running xen4 with 2.6.31.13-pvops as kernel, its the
current kernel from jeremy's xen/master git branch.
The guests are running vanilla 2.6.32.11 kernels.
The 3 servers differ slightly in hardware, two are Dell PE 2950 and one
is a Dell R710, the 2950's have 2 Quad-Xeon CPUs (L5335 and L5410), the
R710 has 2 Quad Xeon E5520.
All machines have 24gigs of RAM.
They are called "tarballerina" (E5520), "xentruio1" (L5335) ad
"xenturio2" (L5410).
Currently i use tarballerina for testing purposes but i dont consider
anything in my setup "stable".
xenturio1 has 27 guests running, xenturio2 25.
No guest does anything that would even put a dent into the systems
performance (ldap servers, radius, department webservers, etc.).
I created a test-vm on my current central iscsi storage, called "hatest"
that idles around, has 2 VCPUs and 512megs of ram.
First i testen xm save/restore:
tarballerina:~# time xm restore /var/saverestore-t.mem
real 0m13.227s
user 0m0.090s
sys 0m0.023s
xenturio1:~# time xm restore /var/saverestore-x1.mem
real 4m15.173s
user 0m0.138s
sys 0m0.029s
When migrating to xenturio1 or 2 it the migration takes 181 to 278
seconds, when migrating it to tarballerina it takes rougly 30seconds:
tarballerina:~# time xm migrate --live hatest 10.0.1.98
real 3m57.971s
user 0m0.086s
sys 0m0.029s
xenturio1:~# time xm migrate --live hatest 10.0.1.100
real 0m43.588s
user 0m0.123s
sys 0m0.034s
--- attempt of narrowing it down ----
My first guess was that since tarballerina had almost no guest running
that did anything, it could be a issue of memory usage by the tapdisk2
processes (each dom0 has been mem-set to 4096M).
I then started almost all vms that i have on tarballerina:
tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem
real 0m2.884s
tarballerina:~# time xm restore /var/saverestore-t.mem
real 0m15.594s
i tried this several times, sometimes it too 30+ seconds.
Then i started 2 VMs that run load and io generating processes (stress,
dd, openssl encryption, md5sum).
But this didnt affect xm restore perfomance, it still was quite fast:
tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem
real 0m7.476s
user 0m0.101s
sys 0m0.022s
tarballerina:~# time xm restore /var/saverestore-t.mem
real 0m45.544s
user 0m0.094s
sys 0m0.022s
i tried several times again, restore took 17 to 45 seconds
Then i tried migrating the test-vm to tarballerina again, still fast,
inspite of several vms including load and io generating vms:
This ate almost all available ram.
cputimes for xc_restore according to target machine's "top":
tarballerina -> xenturio1: 0:05:xx , cpu 2-4%, near the end 40%.
xenturio1 > tarballerina: 0:04:xx, cpu 4-8%, near the end 54%.
tarballerina:~# time xm migrate --live hatest 10.0.1.98
real 3m29.779s
user 0m0.102s
sys 0m0.017s
xenturio1:~# time xm migrate --live hatest 10.0.1.100
real 0m28.386s
user 0m0.154s
sys 0m0.032s
so my attempt of narrowing the problem down failed, its neither the free
memory of the dom0 nor the load, io or the memory the other domUs utilize.
---end attempt---
More info(xm list, meminfo, table with migration times, etc.) on my
setup can be found here:
http://andiolsi.rz.uni-lueneburg.de/node/37
There was another guy who has the same error in his logfile, this might
be unrelated or not:
http://lists.xensource.com/archives/html/xen-users/2010-05/msg00318.html
Further information can be given, should demand for i arise.
With best regards
---
Andreas Olsowski <andreas.olsowski@xxxxxxxxxxxxxxx>
Leuphana Universität Lüneburg
System- und Netzwerktechnik
Rechenzentrum, Geb 7, Raum 15
Scharnhorststr. 1
21335 Lüneburg
Tel: ++49 4131 / 6771309
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|