Ok, thanks. I'll try to address those comments asap in a new version.
Off-the-bat:
- sharing deadlock: I can arrange it so that shr_lock is always taken
after get_gfn. That would still require either disabling deadlock
detection or moving shr_lock down the deadlock order. Which one is
preferable?
The real solution that I have queued is removing the global hash table,
and locking each shared page individually using PGT_locked.
- get_mfn: the goal is additional belts-and-braces. It doesn't require an
extra argument. Put_gfn will put the mfn that sits in the p2m entry at the
time of put. A few tricks are needed to handle remove_page, sharing, etc.
Works for me with windows and linux hvm guests (so far)
- I can rename gfn_to_mfn_type_p2m to get_gfn_type_access (since that's
what the remaining uses are for)
Thanks for your feedback on the nested p2m functions, which I quite don't
get yet.
Andres
> Hi,
>
> This mostly looks good; I have a few comments on the detail below.
> If those are addressed I hope I can apply the next version.
>
> Any other maintainers want to object to this renaming?
>
> At 22:28 -0500 on 07 Nov (1320704913), Andres Lagar-Cavilla wrote:
>> @@ -1182,6 +1195,7 @@ int hvm_hap_nested_page_fault(unsigned l
>> mfn_t mfn;
>> struct vcpu *v = current;
>> struct p2m_domain *p2m;
>> + int rc;
>>
>> /* On Nested Virtualization, walk the guest page table.
>> * If this succeeds, all is fine.
>> @@ -1255,8 +1269,8 @@ int hvm_hap_nested_page_fault(unsigned l
>> if ( violation )
>> {
>> p2m_mem_access_check(gpa, gla_valid, gla, access_r,
>> access_w, access_x);
>> -
>> - return 1;
>> + rc = 1;
>> + goto out_put_gfn;
>
> Shouldn't this patch be changing the call to gfn_to_mfn_type_p2m() just
> above here to a get_gfn_*() call too?
>
>> -void hvm_unmap_guest_frame(void *p)
>> +void hvm_unmap_guest_frame(void *p, unsigned long addr, int is_va)
>> {
>> + /* We enter this function with a map obtained in
>> __hvm_map_guest_frame.
>> + * This map performed a p2m query that locked the gfn entry and got
>> + * a ref on the mfn. Must undo */
>> if ( p )
>> + {
>> + unsigned long gfn = INVALID_GFN;
>> +
>> + if ( is_va )
>> + {
>> + if ( addr )
>> + {
>> + uint32_t pfec = PFEC_page_present;
>> + gfn = paging_gva_to_gfn(current, addr, &pfec);
>> + } else {
>> + gfn = INVALID_GFN;
>> + }
>> + } else {
>> + gfn = addr;
>> + }
>> +
>> + if ( gfn != INVALID_GFN )
>> + put_gfn(current->domain, gfn);
>> +
>> unmap_domain_page(p);
>> + }
>> }
>
> This is a pretty wierd-looking function now - I think it would be better
> just to make all callers have to remember the GFN. In fact, since a
> guest VCPU can change its pagetables at any time, it's not safe to use
> paging_gva_to_gfn() to regenerate the GFN from a VA.
>
> Also, IIRC the nested-SVM code keeps the hvm_map mapping around for
> quite a long time so in fact this unmap-and-put-gfn may not be the right
> interface. Maybe the caller should put_gfn() explicitly.
>
>> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/svm/svm.c
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -244,9 +244,10 @@ static int svm_vmcb_restore(struct vcpu
>> {
>> if ( c->cr0 & X86_CR0_PG )
>> {
>> - mfn = mfn_x(gfn_to_mfn(v->domain, c->cr3 >> PAGE_SHIFT,
>> &p2mt));
>> + mfn = mfn_x(get_gfn(v->domain, c->cr3 >> PAGE_SHIFT,
>> &p2mt));
>> if ( !p2m_is_ram(p2mt) || !get_page(mfn_to_page(mfn),
>> v->domain) )
>> {
>> + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT);
>> gdprintk(XENLOG_ERR, "Invalid CR3 value=0x%"PRIx64"\n",
>> c->cr3);
>> return -EINVAL;
>> @@ -257,6 +258,8 @@ static int svm_vmcb_restore(struct vcpu
>> put_page(pagetable_get_page(v->arch.guest_table));
>>
>> v->arch.guest_table = pagetable_from_pfn(mfn);
>> + if ( c->cr0 & X86_CR0_PG )
>> + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT);
>> }
>>
>> v->arch.hvm_vcpu.guest_cr[0] = c->cr0 | X86_CR0_ET;
>> @@ -1144,7 +1147,6 @@ static void svm_do_nested_pgfault(struct
>> unsigned long gfn = gpa >> PAGE_SHIFT;
>> mfn_t mfn;
>> p2m_type_t p2mt;
>> - p2m_access_t p2ma;
>> struct p2m_domain *p2m = NULL;
>>
>> ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0);
>> @@ -1161,7 +1163,8 @@ static void svm_do_nested_pgfault(struct
>> p2m = p2m_get_p2m(v);
>> _d.gpa = gpa;
>> _d.qualification = 0;
>> - _d.mfn = mfn_x(gfn_to_mfn_type_p2m(p2m, gfn, &_d.p2mt, &p2ma,
>> p2m_query, NULL));
>> + mfn = get_gfn_query_unlocked(p2m->domain, gfn, &_d.p2mt);
>> + _d.mfn = mfn_x(mfn);
>
> Ah - this is not quite the same thing. This lookup uses a specific p2m
> table (possibly a nested-p2m table reflecting the fact the guest is
> running in nested SVM mode) so it can't be replaced by a call that just
> takes the domain pointer (and will internally use the domain's master
> p2m table).
>
> The naming of 'p2m_get_p2m()' is particularly unhelpful here, I realise.
> It fetches the running p2m, i.e. the one that hardware sees as an NPT
> table; almost everywhere else uses p2m_get_hostp2m(), which
> fetches the master p2m. But in general when you see the
> gfn_to_mfn_*_p2m() functions, that take an explicit p2m pointer, you
> need to be careful.
>
>>
>> __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
>> }
>> @@ -1181,7 +1184,7 @@ static void svm_do_nested_pgfault(struct
>> if ( p2m == NULL )
>> p2m = p2m_get_p2m(v);
>> /* Everything else is an error. */
>> - mfn = gfn_to_mfn_type_p2m(p2m, gfn, &p2mt, &p2ma, p2m_guest, NULL);
>> + mfn = get_gfn_guest_unlocked(p2m->domain, gfn, &p2mt);
>
> Likewise here.
>
>> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/guest_walk.c
>> --- a/xen/arch/x86/mm/guest_walk.c
>> +++ b/xen/arch/x86/mm/guest_walk.c
>> @@ -86,30 +86,33 @@ static uint32_t set_ad_bits(void *guest_
>> return 0;
>> }
>>
>> +/* If the map is non-NULL, we leave this function having
>> + * called get_gfn, you need to call put_gfn. */
>> static inline void *map_domain_gfn(struct p2m_domain *p2m,
>> gfn_t gfn,
>> mfn_t *mfn,
>> p2m_type_t *p2mt,
>> uint32_t *rc)
>> {
>> - p2m_access_t a;
>> -
>> /* Translate the gfn, unsharing if shared */
>> - *mfn = gfn_to_mfn_type_p2m(p2m, gfn_x(gfn), p2mt, &a, p2m_unshare,
>> NULL);
>> + *mfn = get_gfn_unshare(p2m->domain, gfn_x(gfn), p2mt);
>
> Here's another case where we need to handle the nested-hap path; the p2m
> pointer here must be propagated into the lookup function.
>
>> if ( p2m_is_paging(*p2mt) )
>> {
>> ASSERT(!p2m_is_nestedp2m(p2m));
>> p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
>> + put_gfn(p2m->domain, gfn_x(gfn));
>> *rc = _PAGE_PAGED;
>> return NULL;
>> }
>> if ( p2m_is_shared(*p2mt) )
>> {
>> + put_gfn(p2m->domain, gfn_x(gfn));
>> *rc = _PAGE_SHARED;
>> return NULL;
>> }
>> if ( !p2m_is_ram(*p2mt) )
>> {
>> + put_gfn(p2m->domain, gfn_x(gfn));
>> *rc |= _PAGE_PRESENT;
>> return NULL;
>> }
>> @@ -120,6 +123,9 @@ static inline void *map_domain_gfn(struc
>>
>>
>> /* Walk the guest pagetables, after the manner of a hardware walker. */
>> +/* Because the walk is essentially random, it can cause a deadlock
>> + * warning in the p2m locking code. Highly unlikely this is an actual
>> + * deadlock, because who would walk page table in the opposite order?
>> */
>
> Linear pagetables make this sort of thing more likely, especially if
> they're used by one process to update another process's mappings.
>
> If this is a problem, maybe we'll need to avoid the deadlock either by
> allowing multiple lock-holders in the p2m or by rearranging the a/d
> writeback soit does another p2m lookup -- I'd rather not do that,
> though, since even on 64-bit it will add a lot of memory accesses.
>
> I'm happy to take a version of this big switchover patch that doesn't
> solve this problem, though. It can be sorted out in a separate patch.
>
>> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/hap/nested_hap.c
>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>> @@ -146,22 +146,29 @@ nestedhap_walk_L0_p2m(struct p2m_domain
>> mfn_t mfn;
>> p2m_type_t p2mt;
>> p2m_access_t p2ma;
>> + int rc;
>>
>> /* walk L0 P2M table */
>> mfn = gfn_to_mfn_type_p2m(p2m, L1_gpa >> PAGE_SHIFT, &p2mt, &p2ma,
>> p2m_query, page_order);
>
> Again, are we not changing this function's name?
>
>
> Cheers,
>
> Tim.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|