[Xen-users] Working with myoung's beta F12 pvops dom0

Pls cc: me, as I do not receive mail from the list, to prevent top posting my 
reply.

My post last month benched the difference between gplpv 0.9.x and 0.10.x. This 
time, at Pasi's suggestion, I upgraded my F8 2.6.21 xenified kernel based 
system to F11 with myoung's beta F12 2.6.31rc pvops kernel. First a discussion 
of using the pvops kernel:

First problem was drm modesetting. Leaving it enabled caused both plymouth and 
Xorg to fault over and over again at startup. (Forget what kind of fault - 
this was three weeks ago.) Then I remembered that the F10 Release notes 
mentioned that drm modesetting for intel video chips (mine is 945gm) was 
experimental. Modesetting works fine for the standard F11 2.6.29 non xen 
kernel. For the xen kernel, I needed to add 'nomodeset' to the xen kernel's 
'module' line in grub.

Working with a debug kernel is a bit like shooting at a moving target, as you 
get an update every few days, altho' this latest kernel - 
2.6.31-0.1.2.43.rc3.xendom0.fc12 - has been around for two weeks. Debug 
kernels can be noisy in syslog, and slow. They finally shut off the kmemleak 
messages that were flooding my syslog in this last kernel. I also had to 
disable selinux to prevent a similar flood. (This actually appears to be more 
of an F11 problem, upgrading from F10, than a xen problem, as my F11 pv domu 
was doing the same thing. Numerous selinux relabels only reduce the frequency 
of the messages, not stop them.)

Then the slowness of a debug kernel bit me as I usually have a number of 
monitoring programs running, and I found that leaving 'iptraf' running really 
slowed down my winxp domu.

Then I decided to recompile the kernel to try to eliminate a debug option that 
'make xconfig' warned incurred a significant performance penalty 
(XEN_DEBUG_FS). After loading this kernel, I started having serious problems 
with the standard 'yum-cron' service. Every time yum actually found updates to 
apply, within 1-3 hours later, I started getting kernel errors of the form 
'page_fault / xen_force_evtchn_callback / _out_of_memory / oom_kill_process', 
even though syslog itself showed I had lots of swap left, and 'gkrellm' showed 
I have lots of memory  left in dom0. The oom killer didn't stop until all my X 
window programs had been killed, altho' for the most part system services were 
left alone, or I hit the reset button (and of course the machines desktop and 
ssh connections were unresponsive). The problem was also characterized by very 
high cpu times for the kernel threads events/0 & 1, and kswapd0. I reloaded 
the original kernel last weekend, but the problems persisted with oom killer 
after a yum-cron update. Finally, after cleaning out some left over yum files 
in /var/lock and /var/run, the system has been stable for a few days, through 
a few yum updates. I'm slowly re-enabling desktop  programs to see if it fails 
again. I've got gnome back up on the system console (no iptraf or gnome-
system-monitor). Next step will be to re-enable my kde4 vncserver. My winxp 
domu has been running all during this time (qemu-dm only got killed once). 
Since 'loop0' runs  at nice -20, and I always renice qemu-dm to -11 for an hvm 
domu, my winxp domu console was actually still relatively responsive, so long 
as I didn't need dom0 services besides disk. (Rebooting took for ever.)

Beyond that, I'm still having problems with my wireless card. It associates 
and gets an ip fine under the non xen F11 kernel. Under the xen kernel, tho', 
continuous execution of 'iwconfig' shows I'm making and losing association 
with my wireless router. It never keeps association long enough to get an ip. 
At one point, after an oom killer session, I got a bunch of iwl3945 errors 
also, so I have removed the module from a xen boot up in /etc/rc.local.

And now for the benchmarks:

Equipment: core 2 duo T5600, 1.83ghz each, 2M, sata drive configured for
UDMA/100
Old System: fc8 64bit, xen 3.1.2, xen.gz 3.1.4, dom0 2.6.21
New System: f11 64bit, xen 3.3.1, dom0 2.6.31rc pvops
Tested hvm: XP Pro SP3, 2002 32bit  w/512M, file backed vbd on local disk,
tested w/ iometer 2006-07-27 (1Gb \iobw.tst, 5min run) & iperf 1.7.0 (1 min
run)

Since I don't have particularly fast equipment, the significance of these 
numbers will be in the relative difference between the xenified kernel system 
and the pvops kernel system. Gplpv 0.10.0.69 only will be tested on winxp, 
with /patchtpr. There will be no qemu numbers.

Since this is a file backed vbd, domu numbers are not expected to be faster 
than dom0 numbers.

First the old F8 iometer numbers:

test pattern  |  domu MB/s   |  domu %CPU   |  dom0 MB/s   |  dom0 %CPU
4k 50% read   |     3.01     |     6.46     |     1.98     |     0
0% random
32k 50% read  |     3.92     |     0.57     |     1.83     |     0

and now the pvops numbers:

test pattern  |  domu MB/s   |  domu %CPU   |  dom0 MB/s   |  dom0 %CPU
4k 50% read   |     1.76     |    10.59     |     2.37     |     0
0% random
32k 50% read  |     3.71     |     9.03     |    3.42      |     0
0% random

As might be expected for a debug kernel, the 4k numbers are slower for pvops, 
with more %CPU. However, the 32k numbers are roughly just as fast, but again 
with more %CPU. (Btw, James - I should have made this more explicit in last 
month's post (the old numbers): using /patchtpr really makes the domu numbers 
very close to dom0. Nice work!)

For network, a tcp test on F8, 'iperf-1.7.0 -c dom0-name -t 60 -r', 
gave:

domu->dom0:  2.4 Mb/s (huh?)
dom0->domu:  92  Mb/s (wow!)

For a udp test, requesting a 10Mb/s bandwidth, 'iperf-1.7.0 -c dom0-name -t 
60 -r -b 10000000' gave:

domu->dom0: 14.7 kb/s (huh?)
dom0->domu:  8.7 Mb/s w/12% loss

and for pvops:

For a tcp test, 'iperf-1.7.0 -c dom0-name -t 60 -r':

domu->dom0:  2.4 Mb/s (huh?)
dom0->domu:  132 Mb/s (wow!)

For a udp test, requesting a 10Mb/s bandwidth, 'iperf-1.7.0 -c dom0-name -t 
60 -r -b 10000000' gave:

domu->dom0:  4.7 kb/s (huh?)
dom0->domu:  9.9 Mb/s w/0% loss (better)

This was with the xennet settings 'Check checksum on RX packets' disabled, 
'Checksum Offload' enabled, and MTU = 9000 (altho' nothing else on the bridge 
has that high an MTU). The weirdness with one direction being slower than the 
other continues, altho' the faster direction for both tcp & udp is better than 
the old numbers. For the udp test, the faster direction always gave me the 
following kernel trace:

Aug  2 18:10:50 Insp6400 kernel: Call Trace:
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff81063017>] 
warn_slowpath_common+0x95/0xc3
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff8106306c>] 
warn_slowpath_null+0x27/0x3d
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff81496f28>] udp_lib_unhash+0x91/0xe6
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff8142d8dc>] 
sk_common_release+0x45/0xe5
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff81495a27>] udp_lib_close+0x21/0x37
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff814a0340>] inet_release+0x68/0x87
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff814295a5>] sock_release+0x32/0x98
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff81429643>] sock_close+0x38/0x50
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff8113e265>] __fput+0x137/0x1f8
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff8113e353>] fput+0x2d/0x43
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff8113a540>] filp_close+0x77/0x97
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff8113a620>] sys_close+0xc0/0x110
Aug  2 18:10:50 Insp6400 kernel: [<ffffffff810467cf>] 
sysenter_dispatch+0x7/0x33
Aug  2 18:10:50 Insp6400 kernel: ---[ end trace 4f591a696edea67c ]---

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
WARNING - OLD ARCHIVES

xen-users

[Xen-users] Working with myoung's beta F12 pvops dom0