[Xen-devel] Root cause of the issue that HVM guest boots slowly

To:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] Root cause of the issue that HVM guest boots slowly with pvops dom0
From:	"Yang, Xiaowei" <xiaowei.yang@xxxxxxxxx>
Date:	Thu, 21 Jan 2010 16:16:42 +0800
Delivery-date:	Thu, 21 Jan 2010 00:18:53 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization:	pdsmsx601.ccr.corp.intel.com
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Thunderbird 2.0.0.23 (X11/20090817)

Symptoms
---------
Boot time of UP HVM on one DP Nehalem server (SMT-on):
         seconds
2.6.18  20
pvops   42

pvops dom0 util% is evidently higher (peak: 230% vs 99%) and resched IPI reaches
200+k per seconds in max.


Analysis
---------
Xentrace shows that the access time of IDE IOport is 10X slower than other
IOports that go through QEMU (~300K v.s. ~30K).

Ftrace of 'sched events' in dom0 tells there are frequent context switches
between 'idle' and 'events/?' (work queue execution kthread) on each idle vCPU,
and events/? runs lru_add_drain_per_cpu(). This explains where resched IPI comes
from: Kernel uses it to notify idle vCPU to do the real work.
lru_add_drain_per_cpu() is triggered by lru_add_drain_all() which is a *costly*
sync operation and won't return until each vCPU executes the work queue.
Throughout the kernel, there are 9 places calls lru_add_drain_all(): shm, memory
migration, mlock and etc. If IDE IOport access invokes one of them, that could
be reason why it's so slow.

Then ftrace of 'syscall' in dom0 reveals the assumption is true - QEMU really
calls mlock(). And it turns out that mlock() is used a lot in Xen (73 places),
to ensure that dom0 user space's buffer passed to Xen HV by hypercall is pinned
in memory. IDE IOport access may call one of them - HVMOP_set_isa_irq_level.

Kernel change log is searched backwards. In 2.6.28
(http://kernelnewbies.org/Linux_2_6_28), one major change to mlock
implementation (b291f000: mlock: mlocked pages are unevictable) puts mlocked
pages under the management of (page frame reclaiming) LRU, and
lru_add_drain_all() is a prepare operation to purge the pages in a temporary
data structures (pagevec) to an LRU list. That's why 2.6.18 dom0 doesn't have so
many resched IPI.

One hack is tried to omit mlock() before HVMOP_set_isa_irq_level in pvops
dom0, and guest boot time returns to normal - ~20s.


Solutions?
-----------
- Limiting vCPU# of dom0 is always an easiest one - you may call it workaround
rather than a solution:) It not only reduces the total # of resched IPI ( =

mlock# * (vCPU#-1)), but reduces the cost of each handler - because of spinlock.But the impact is still there, more or less, when vCPU# > 1.


- To remove mlock, another sharing method is needed between dom0 user space app
and Xen HV.

- More ideas?

Thanks,
xiaowei

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Root cause of the issue that HVM guest boots slowly with pvo