|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH for-4.13 v4 2/3] x86/passthrough: fix migration of MSI when using posted interrupts
On 13.11.2019 16:59, Roger Pau Monne wrote:
> @@ -5266,6 +5267,36 @@ void hvm_set_segment_register(struct vcpu *v, enum
> x86_segment seg,
> alternative_vcall(hvm_funcs.set_segment_register, v, seg, reg);
> }
>
> +int hvm_intr_get_dests(struct domain *d, uint8_t dest, uint8_t dest_mode,
While for IO-APIC and MSI "dest" can't be wider than 8 bits without
virtual interrupt remapping, "intr" imo is generic enough to also
(potentially) include IPIs. I'd therefore recommend in new code to
make this uint32_t right away to cover for all current and future
cases.
Also - const for d?
> + uint8_t delivery_mode, unsigned long *vcpus)
> +{
> + struct vcpu *v;
> +
> + switch ( delivery_mode )
> + {
> + case dest_LowestPrio:
> + /*
> + * Get all the possible destinations, but note that lowest priority
> + * mode is only going to inject the interrupt to the vCPU running at
> + * the least privilege level.
"privilege level"? I think you mean "lowest priority" here?
> + *
> + * Fallthrough
> + */
> + case dest_Fixed:
There's not really any fall through here, and hence I think this part
of the comment would better be dropped.
> + for_each_vcpu ( d, v )
> + if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode)
> )
> + __set_bit(v->vcpu_id, vcpus);
> + break;
> +
> + default:
> + gprintk(XENLOG_WARNING, "unsupported interrupt delivery mode %u\n",
> + delivery_mode);
gdprintk() perhaps, so keep at least release builds' logs tidy
(and useful)?
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -112,6 +112,25 @@ static void sync_pir_to_irr(struct vcpu *v)
> alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
> }
>
> +void domain_sync_vlapic_pir(struct domain *d, unsigned long *vcpus)
const (twice)?
> +{
> + unsigned int id;
> +
> + if ( !bitmap_weight(vcpus, d->max_vcpus) )
> + return;
Why go over the entire bitmap en extra time here? The find-first
will do exactly the same, and hence the loop body below won't be
entered in this case.
> + for ( id = find_first_bit(vcpus, d->max_vcpus);
> + id < d->max_vcpus;
> + id = find_next_bit(vcpus, d->max_vcpus, id + 1) )
> + {
> + if ( d->vcpu[id] != current )
> + vcpu_pause(d->vcpu[id]);
Isn't this setting us up for a deadlock if two parties come here
for the same domain, and both on a vCPU belonging to that domain
(and with the opposite one's bit set in the bitmap)? But it looks
like d would never be the current domain here - you will want to
assert and comment on this, though. At that point the comparisons
against current can then go away as well afaict.
> @@ -345,6 +289,8 @@ int pt_irq_create_bind(
> const struct vcpu *vcpu;
> uint32_t gflags = pt_irq_bind->u.msi.gflags &
> ~XEN_DOMCTL_VMSI_X86_UNMASKED;
> + DECLARE_BITMAP(dest_vcpus, MAX_VIRT_CPUS) = { };
> + DECLARE_BITMAP(prev_vcpus, MAX_VIRT_CPUS) = { };
This is reachable for HVM domains only, isn't it? In which case
why the much larger MAX_VIRT_CPUS (creating two unreasonably big
local variables) instead of HVM_MAX_VCPUS? However, even once
switched I'd be opposed to this - There'd be a fair chance that
the need to deal with these variables might go unnoticed once
the maximum vCPU count for HVM gets increased (which has been a
pending todo item for many years now).
> @@ -420,20 +384,16 @@ int pt_irq_create_bind(
> delivery_mode = MASK_EXTR(pirq_dpci->gmsi.gflags,
> XEN_DOMCTL_VMSI_X86_DELIV_MASK);
>
> - dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> + hvm_intr_get_dests(d, dest, dest_mode, delivery_mode, dest_vcpus);
> + dest_vcpu_id = bitmap_weight(dest_vcpus, d->max_vcpus) != 1 ?
> + -1 : find_first_bit(dest_vcpus, d->max_vcpus);
> pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> spin_unlock(&d->event_lock);
>
> pirq_dpci->gmsi.posted = false;
> vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
> - if ( iommu_intpost )
> - {
> - if ( delivery_mode == dest_LowestPrio )
> - vcpu = vector_hashing_dest(d, dest, dest_mode,
> - pirq_dpci->gmsi.gvec);
> - if ( vcpu )
> - pirq_dpci->gmsi.posted = true;
> - }
> + if ( vcpu && iommu_intpost )
> + pirq_dpci->gmsi.posted = true;
One aspect that I'm curious about: How much posting opportunity do
we lose in practice by no longer posting when the guest uses lowest
priority mode with multiple destinations?
> @@ -442,6 +402,9 @@ int pt_irq_create_bind(
> pi_update_irte(vcpu ? &vcpu->arch.hvm.vmx.pi_desc : NULL,
> info, pirq_dpci->gmsi.gvec);
>
> + if ( hvm_funcs.deliver_posted_intr )
> + domain_sync_vlapic_pir(d, prev_vcpus);
Accessing hvm_funcs here looks like a layering violation. This
wants either moving into the function or (seeing the other use)
abstracting away. Seeing the conditional here (and below) I also
notice that you calculate prev_vcpus in vein when there's no
interrupt posting in use.
I guess together with the variable size issue mentioned above a
possible solution would be:
- have one bitmap hanging off of pirq_dpci->gmsi,
- have one bitmap per pCPU,
- populate the new destination bits into the per-pCPU one,
- issue the PIR->IRR sync,
- exchange the per-pCPU and per-DPCI pointers.
You could then leave the pointers at NULL when no posting is to
be used, addressing the apparent layering violation here at the
same time.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |