WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] MSI and VT-d interrupt remapping

Espen Skoglund <mailto:espen.skoglund@xxxxxxxxxxxxx> wrote:
> [Yunhong Jiang]
>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
>>> You're right in that Linux does not currently support this.  You
>>> can, however, allocate multiple interrupts using MSI-X.  Anyhow, I
>>> was not envisioning this feature being used directly for
>>> passthrough device access.  Rather, I was considering the case
>>> where a device could be configured to communicate data directly
>>> into a VM (e.g., using multi-queue NICs) and deliver the interrupt
>>> to the appropriate VM.  In this case the frontend in the guest
>>> would not need to see a multi-message MSI device, only the backend
>>> in dom0/the driver domain would need to be made aware of it.
> 
>> Although I don't know if any device has such usage model (Intel's
>> VMDq is using MSI-X ), but yes, your usage model will be helpful.
>> To achive this, maybe we need change the protocol between pci
>> backend and pci frontend, in fact, maybe the
>> pci_enable_msi/pci_enable_msix can be commbind, with a flag to
>> determin if the vector should be continous or not.
> 
> This is similar to my initial idea as well.  In addition to being
> contigous the multi-message MSI request would also need to allocate
> vectors that are properly aligned.

Yes, but don't think we need add the implementation now. We can change
the xen_pci_op to accomondate this requirement, otherwise, this will
cause more difference with upstream Linux. (Maybe the hypercall need
changed for this requirement also).

As for set_irq_affinity, I think it is a general issue, not MSI related,
we can follow up on it continously.


> 
>> One thing left is, how can the driver domain bind the vector to the
>> frontend VM.  Some sanity check mechanism should be added.
> 
> Well, there exists a domctl for modifying the permissions of a pirq.
> This could be used to grant pirq access to a frontend domain.  Not
sure if
> this is sufficient. 
> 
> Also, as discussed in my previous reply dom0 may need the ability to
> reset the affinity of an irq when migrating the destination vcpu.
> Further, a pirq is now always bound to vcpu[0] of a domain (in
> evtchn_bind_pirq).  There is clearly some room for improvement and
more
> flexibility here. 
> 
> Not sure what the best solution is.  One option is to allow a guest to
> re-bind a pirq to set its affinity, and have such expliticly set
> affinities be automatically updated when the associated vcpu is
> migrated.  Another option is to create unbound ports in a guest domain
> and let a privileged domain bind pirqs to those port.  The privileged
> domain should then also be allowed to later modify the destination
> vcpu and set the affinity of the bound pirq.
> 
> 
>> BTW, can you tell which device may use this feature? I'm a bit
>> interesting on this.
> 
> I must confess that I do not know of any device that currently use
> this feature (perhaps Solarflare or NetXen devices have support for
> it), and the whole connection with VT-d interreupt remapping is as of
> now purely academic anyway due to the lack of chipsets with the
apropriate
> feature. 
> 
> However, the whole issue of binding multiple pirqs of a device to
> different guest domains remains the same even if using MSI-X.
> Multi-message MSI devices only/mostly add some additional restrictions
> upon allocating interrupt vectors.
> 
> 
>>>>> I do not think explicitly specifying destination APIC upon
>>>>> allocation is the best idea.  Setting the affinity upon binding
>>>>> the interrupt like it's done today seems like a better approach.
>>>>> This leaves us with dealing with the vectors.
>>> 
>>>> But what should happen when the vcpu is migrated to another
>>>> physical cpu? I'm not sure the cost to program the interrupt
>>>> remapping table, otherwise, that is a good choice to achieveh the
>>>> affinity.
>>> 
>>> As you've already said, the interrupt affinity is only set when a
>>> pirq is bound.  The interrupt routing is not redirected if the vcpu
>>> it's bound to migrates to another physical cpu.  This can (should?)
>>> be changed in the future so that the affinity is either set
>>> implicitly when migrating the vcpu, or explictily with a rebind
>>> call by dom0.  In any case the affinity would be reset by the
>>> set_affinity method.
> 
>> Yes, I remember Keir suggested to use interrupt remapping table in
>> vtd to achieve this, not sure that is still ok.
> 
> Relying on the VT-d interrupt remapping table would rule out any Intel
> chipset on the market today, and also the equivalent solution (if any)
used
> by AMD and others. 
> 
> It seems better to update the IOAPIC entry or MSI capability structure
> directly when redirecting the interrupt, and let io_apic_write() or
> the equivalent function for MSI rewrite the interrupt remapping table
> if VT-d is enabled.  Not sure how much it would cost to rewrite the
> remapping table and perform the respecive VT-d interrupt entry cache
> flush; it's difficult to measure without actually having any available
> hardware.  However, I suspect the cost would in many cases be dwarfed
> by migrating the cache working set and by other associated costs of
> migrating a vcpu. 
> 
>       eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel