|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] domU using linux-2.6.37-xen-next pvops kernel with CONFI
On Tue, Dec 21, 2010 at 8:22 AM, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
On Mon, Dec 20, 2010 at 05:03:13PM -0800, Dante Cinco wrote:
> (Sorry, I accidentally sent the previous post before finishing the summary
> table)
>
> For a couple of months now, we've been trying to track down the slow I/O
> performance in pvops domU. Our system has 16 Fibre Channel devices, all
> PCI-passthrough to domU. We were previously using a 2.6.32 (Ubuntu version)
> HVM kernel and were getting 511k IOPS. We switched to pvops with Konrad's
> xen-pcifront-0.8.2 kernel and were disappointed to see the performance
> degrade to 11k IOPS. After disabling some kernel debug options including
> KMEMLEAK, the performance jumped to 186k IOPS but still well below what we
> were getting with the HVM kernel. We tried disabling spinlock debugging in
> the kernel but it actually resulted in a drop in performance to 70k IOPS.
>
> Last week we switched to linux-2.6.37-xen-next and with the same kernel
> debug options disabled, the I/O performance was slightly better at 211k
> IOPS. We tried disabling spinlock debugging again and saw a similar drop in
> performance to 58k IOPS. We searched around for any performance-related
> posts regarding pvops and found two references to CONFIG_PARAVIRT_SPINLOCKS
> (one from Jeremy and one from Konrad):
> http://lists.xensource.com/archives/html/xen-devel/2009-05/msg00660.html
> http://lists.xensource.com/archives/html/xen-devel/2010-11/msg01111.html
>
> Both posts recommended (Konrad strongly) enabling PARAVIRT_SPINLOCKS when
> running under Xen. Since it's enabled by default, we decided to see what
> would happen if we disabled CONFIG_PARAVIRT_SPINLOCKS. With the spinlock
> debugging enabled, we were getting 205k IOPS but with spinlock debugging
> disabled, the performance leaped to 522k IOPS !!!
>
> I'm assuming that this behavior is unexpected.
<scratches his head> You got me. I am really happy to find out that you guys
were able to solve this conundrum.
Are the guests contending for the CPUs (so say you have 4 logical CPUs and
you launch two guests, each wanting 4 vCPUs)? How many CPUs do the guests have?
Are the guests pinned to the CPUS? What is the scheduler in the Hypervisor? credit1?
>
We only have one guest which we assign 16 VCPUs, each pinned to its respective PCPU. The system has 24 PCPUs (dual Westmere). Each of the 16 Fibre Channel devices is affinitized to its own CPU.
(XEN) Using scheduler: SMP Credit Scheduler (credit) xm sched-credit: Weight=256, Cap=0
> Here's a summary of the kernels, config changes and performance (in IOPS):
>
> pcifront linux
> 0.8.2 2.6.37-xen-next
> pvops pvops
> Spinlock
> debugging enabled, 186k 205k
> PARAVIRT_SPINLOCKS=y
>
> Spinlock
> debugging disabled, 70k 58k
> PARAVIRT_SPINLOCKS=y
>
> Spinlock
> debugging disabled, 247k 522k
> PARAVIRT_SPINLOCKS=n
Whoa.... Thank you for the table. My first thought is that: "whoa, PV byte-locking
spinlocks sure sucks", but then I realized that there are some improvements in
2.6.37-xen-next. Like in the vmap flushing code ..
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|