WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] MSI and VT-d interrupt remapping

To: "Espen Skoglund" <espen.skoglund@xxxxxxxxxxxxx>
Subject: RE: [Xen-devel] MSI and VT-d interrupt remapping
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Tue, 25 Mar 2008 22:29:12 +0800
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, "Shan, Haitao" <haitao.shan@xxxxxxxxx>
Delivery-date: Tue, 25 Mar 2008 07:35:05 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <18408.65003.87218.946381@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <18384.23842.229533.91264@xxxxxxxxxxxxxxxxxx><C3F6F313.1DA70%keir.fraser@xxxxxxxxxxxxx><18385.24520.251144.711784@xxxxxxxxxxxxxxxxxx><391BF3CDD2DC0848B40ACB72FA97AD59031A7567@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <18408.65003.87218.946381@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AciOfFlK3GtOYcDtTqiFPpr/l79evgABGSow
Thread-topic: [Xen-devel] MSI and VT-d interrupt remapping
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
> [Yunhong Jiang]
>>> Right.  The reason for bringing up this suggestion now rather than
>>> later is because MSI support has not yet found its way into
>>> mainline.  Whoever decides on the interface used for registering
>>> MSI and MSI-X interrupts might want to take multi-message MSIs into
>>> account as well.
> 
>> Espen, thanks for your comments. I remember Linux has not such
>> support, so Linux driver will not benifit from such
>> implementation. After all, driver need provide ISR for the
>> interrupts. Of course, we need this feature if any OS has support. I
>> didn't support this because it may require changes to various common
>> components and need more discussion, while Linux has no support to
>> it. (also I rushed to 3.2 cut-off at that time :$).
> 
> You're right in that Linux does not currently support this.  You can,
> however, allocate multiple interrupts using MSI-X.  Anyhow, I was not
> envisioning this feature being used directly for passthrough device
> access.  Rather, I was considering the case where a device could be
> configured to communicate data directly into a VM (e.g., using
> multi-queue NICs) and deliver the interrupt to the appropriate VM.  In
> this case the frontend in the guest would not need to see a
> multi-message MSI device, only the backend in dom0/the driver domain
> would need to be made aware of it.

Although I don't know if any device has such usage model (Intel's VMDq
is using MSI-X ), but yes, your usage model will be helpful.
To achive this, maybe we need change the protocol between pci backend
and pci frontend, in fact, maybe the pci_enable_msi/pci_enable_msix can
be commbind, with a flag to determin if the vector should be continous
or not. 

One thing left is, how can the driver domain bind the vector to the
frontend VM.  Some sanity check mechanism should be added.

BTW, can you tell which device may use this feature? I'm a bit
interesting on this.

> 
>>> I do not think explicitly specifying destination APIC upon
>>> allocation is the best idea.  Setting the affinity upon binding the
>>> interrupt like it's done today seems like a better approach.  This
>>> leaves us with dealing with the vectors.
> 
>> But what should happen when the vcpu is migrated to another physical
>> cpu? I'm not sure the cost to program the interrupt remapping table,
>> otherwise, that is a good choice to achieveh the affinity.
> 
> As you've already said, the interrupt affinity is only set when a pirq
> is bound.  The interrupt routing is not redirected if the vcpu it's
> bound to migrates to another physical cpu.  This can (should?) be
> changed in the future so that the affinity is either set implicitly
> when migrating the vcpu, or explictily with a rebind call by dom0.  In
> any case the affinity would be reset by the set_affinity method.

Yes, I remember Keir suggested to use interrupt remapping table in vtd
to achieve this, not sure that is still ok.

> 
>>> My initial thought was to make use of the new msix_entries[] field
>>> in the xen_pci_op structure.  This field is already used as an
>>> in/out parameter for allocating MSI-X interrupts.  The
>>> pciback_enable_msi() function can then attempt to allocate multiple
>>> interrupts instead of a single one, and return the allocated
vectors.
>>> 
>>> The current MSI patchset also lacks a set_affinity() function for
>>> changing the APIC destination similar to what is done for, e.g.,
>>> IOAPICs.  Also similar to IOAPICs, the MSI support should have
>>> something like the io_apic_write_remap_rte() for rewriting the
>>> interrupt remapping table when enabled.
> 
>> For the set_affinity(), what do you mean of changing the APIC
>> destination? Currently if set guest's pirq's affinity, it will only
>> impact event channel. The physical one will only be called once,
>> when the pirq is bound.
> 
> With "changing the APIC destination" I meant changing the destination
> CPU of an interrupt while keeping the vector, delivery type,
> etc. intact.
> 
>> As for rewriting interrupt remapping table like
>> io_apic_write_remap_rte(), I think it will be added later also.
> 
>> I'm also a bit confused for your statement in previous mail "The
>> necessary changes would enable a device driver for an MSI capable
>> device to allocate a range of pirqs and bind these to different
>> frontends.".  What do you mean of different frontends?
> 
> Different frontends here means multiple instances of frontends
> residing in different VMs, all served by a single backend.  As eluded
> to above, the idea is to have a single backend that has direct access
> to the device, and multiple frontends that somehow share some limited
> direct access to the device.  For example, a multi-queue capable NIC
> could deliver the packets to the queue in the apropriate VM and raise
> an interrupt in that VM without involving the domain of the backend
driver.

Got it.

> 
>       eSk
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel