WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops

To: "Zhai, Edwin" <edwin.zhai@xxxxxxxxx>, "andreas.olsowski@xxxxxxxxxxxxxxx" <andreas.olsowski@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Wed, 9 Jun 2010 14:32:35 +0100
Cc: Dave McCracken <dcmccracken@xxxxxxxxx>, Dave McCracken <dcm@xxxxxxxx>
Delivery-date: Wed, 09 Jun 2010 06:33:38 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C076EB2.9030108@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcsC+2XHk9UYEs6ETqCH7I4OOMHtsAE3NKT7
Thread-topic: [Xen-devel] slow live magration / xc_restore on xen4 pvops
User-agent: Microsoft-Entourage/12.24.0.100205
Edwin, Dave,

The issue is clearly that xc_domain_restore now only ever issues
populate_physmap requests for a single extent at a time. This might be okay
when allocating superpages, but that is rarely the case for PV guests
(depends on a rare domain config parameter) and is not guaranteed for HVM
guests either. The resulting performance is unacceptable, especially when
the kernel's underlying mlock() is slow.

It looks to me like the root cause is Dave McCracken's patch
xen-unstable:19639, which Edwin Zhai's patch xen-unstable:20126 merely
builds upon. Ultimately I don't care who fixes it, but I would like a fix
for 4.0.1 which releases in the next few weeks, and if I have to do it
myself I will simply hack out the above two changesets. I'd rather have
domain restore working in reasonable time than the relatively small
performance boost of guest superpage mappings.

 Thanks,
 Keir

On 03/06/2010 09:58, "Zhai, Edwin" <edwin.zhai@xxxxxxxxx> wrote:

> I assume this is PV domU rather than HVM, right?
> 
> 1. we need check if super page is the culprit by SP_check1.patch.
> 
> 2. if this can fix this problem, we need further check where the extra
> costs comes: the speculative algorithm, or the super page population
> hypercall by SP_check2.patch
> 
> If SP_check2.patch works, the culprit is the new allocation hypercall(so
> guest creation also suffer); Else, the speculative algorithm.
> 
> Does it make sense?
> 
> Thanks,
> edwin
> 
> 
> Brendan Cully wrote:
>> On Thursday, 03 June 2010 at 06:47, Keir Fraser wrote:
>>   
>>> On 03/06/2010 02:04, "Brendan Cully" <Brendan@xxxxxxxxx> wrote:
>>> 
>>>     
>>>> I've done a bit of profiling of the restore code and observed the
>>>> slowness here too. It looks to me like it's probably related to
>>>> superpage changes. The big hit appears to be at the front of the
>>>> restore process during calls to allocate_mfn_list, under the
>>>> normal_page case. It looks like we're calling
>>>> xc_domain_memory_populate_physmap once per page here, instead of
>>>> batching the allocation? I haven't had time to investigate further
>>>> today, but I think this is the culprit.
>>>>       
>>> Ccing Edwin Zhai. He wrote the superpage logic for domain restore.
>>>     
>> 
>> Here's some data on the slowdown going from 2.6.18 to pvops dom0:
>> 
>> I wrapped the call to allocate_mfn_list in uncanonicalize_pagetable
>> to measure the time to do the allocation.
>> 
>> kernel, min call time, max call time
>> 2.6.18, 4 us, 72 us
>> pvops, 202 us, 10696 us (!)
>> 
>> It looks like pvops is dramatically slower to perform the
>> xc_domain_memory_populate_physmap call!
>> 
>> I'll attach the patch and raw data below.
>>   



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>