RE: [Xen-devel] MSI and VT-d interrupt remapping

Espen Skoglund <mailto:espen.skoglund@xxxxxxxxxxxxx> wrote:
> [Yunhong Jiang]
>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
>>> You're right in that Linux does not currently support this.  You
>>> can, however, allocate multiple interrupts using MSI-X.  Anyhow, I
>>> was not envisioning this feature being used directly for
>>> passthrough device access.  Rather, I was considering the case
>>> where a device could be configured to communicate data directly
>>> into a VM (e.g., using multi-queue NICs) and deliver the interrupt
>>> to the appropriate VM.  In this case the frontend in the guest
>>> would not need to see a multi-message MSI device, only the backend
>>> in dom0/the driver domain would need to be made aware of it.
> 
>> Although I don't know if any device has such usage model (Intel's
>> VMDq is using MSI-X ), but yes, your usage model will be helpful.
>> To achive this, maybe we need change the protocol between pci
>> backend and pci frontend, in fact, maybe the
>> pci_enable_msi/pci_enable_msix can be commbind, with a flag to
>> determin if the vector should be continous or not.
> 
> This is similar to my initial idea as well.  In addition to being
> contigous the multi-message MSI request would also need to allocate
> vectors that are properly aligned.

Yes, but don't think we need add the implementation now. We can change
the xen_pci_op to accomondate this requirement, otherwise, this will
cause more difference with upstream Linux. (Maybe the hypercall need
changed for this requirement also).

As for set_irq_affinity, I think it is a general issue, not MSI related,
we can follow up on it continously.


> 
>> One thing left is, how can the driver domain bind the vector to the
>> frontend VM.  Some sanity check mechanism should be added.
> 
> Well, there exists a domctl for modifying the permissions of a pirq.
> This could be used to grant pirq access to a frontend domain.  Not
sure if
> this is sufficient. 
> 
> Also, as discussed in my previous reply dom0 may need the ability to
> reset the affinity of an irq when migrating the destination vcpu.
> Further, a pirq is now always bound to vcpu[0] of a domain (in
> evtchn_bind_pirq).  There is clearly some room for improvement and
more
> flexibility here. 
> 
> Not sure what the best solution is.  One option is to allow a guest to
> re-bind a pirq to set its affinity, and have such expliticly set
> affinities be automatically updated when the associated vcpu is
> migrated.  Another option is to create unbound ports in a guest domain
> and let a privileged domain bind pirqs to those port.  The privileged
> domain should then also be allowed to later modify the destination
> vcpu and set the affinity of the bound pirq.
> 
> 
>> BTW, can you tell which device may use this feature? I'm a bit
>> interesting on this.
> 
> I must confess that I do not know of any device that currently use
> this feature (perhaps Solarflare or NetXen devices have support for
> it), and the whole connection with VT-d interreupt remapping is as of
> now purely academic anyway due to the lack of chipsets with the
apropriate
> feature. 
> 
> However, the whole issue of binding multiple pirqs of a device to
> different guest domains remains the same even if using MSI-X.
> Multi-message MSI devices only/mostly add some additional restrictions
> upon allocating interrupt vectors.
> 
> 
>>>>> I do not think explicitly specifying destination APIC upon
>>>>> allocation is the best idea.  Setting the affinity upon binding
>>>>> the interrupt like it's done today seems like a better approach.
>>>>> This leaves us with dealing with the vectors.
>>> 
>>>> But what should happen when the vcpu is migrated to another
>>>> physical cpu? I'm not sure the cost to program the interrupt
>>>> remapping table, otherwise, that is a good choice to achieveh the
>>>> affinity.
>>> 
>>> As you've already said, the interrupt affinity is only set when a
>>> pirq is bound.  The interrupt routing is not redirected if the vcpu
>>> it's bound to migrates to another physical cpu.  This can (should?)
>>> be changed in the future so that the affinity is either set
>>> implicitly when migrating the vcpu, or explictily with a rebind
>>> call by dom0.  In any case the affinity would be reset by the
>>> set_affinity method.
> 
>> Yes, I remember Keir suggested to use interrupt remapping table in
>> vtd to achieve this, not sure that is still ok.
> 
> Relying on the VT-d interrupt remapping table would rule out any Intel
> chipset on the market today, and also the equivalent solution (if any)
used
> by AMD and others. 
> 
> It seems better to update the IOAPIC entry or MSI capability structure
> directly when redirecting the interrupt, and let io_apic_write() or
> the equivalent function for MSI rewrite the interrupt remapping table
> if VT-d is enabled.  Not sure how much it would cost to rewrite the
> remapping table and perform the respecive VT-d interrupt entry cache
> flush; it's difficult to measure without actually having any available
> hardware.  However, I suspect the cost would in many cases be dwarfed
> by migrating the cache working set and by other associated costs of
> migrating a vcpu. 
> 
>       eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] MSI and VT-d interrupt remapping