[Xen-devel] RE: [VTD] Intel iommu IOTLB flush really slow

To:	Jean Guyader <jean.guyader@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] RE: [VTD] Intel iommu IOTLB flush really slow
From:	"Kay, Allen M" <allen.m.kay@xxxxxxxxx>
Date:	Mon, 31 Oct 2011 23:00:08 -0700
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:
Delivery-date:	Mon, 31 Oct 2011 23:01:25 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20111031163801.GG19392@xxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20111031163801.GG19392@xxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcyX65BzepR+qNlgQF+YjSahZrA8NAAbbXQA
Thread-topic:	[VTD] Intel iommu IOTLB flush really slow

Hi Jean,

I agree plan B is the better solution.  Having batch capability in shadow/HAP 
might be useful for other use cases.

Allen

-----Original Message-----
From: Jean Guyader [mailto:jean.guyader@xxxxxxxxxxxxx] 
Sent: Monday, October 31, 2011 9:38 AM
To: xen-devel@xxxxxxxxxxxxxxxxxxx
Cc: Kay, Allen M
Subject: [VTD] Intel iommu IOTLB flush really slow

Hi,

Some IOMMU DMA remapping engine sometimes take longer to flush the IOTLBs.
For instance on Ibex Peak a iommu_map_page can in the order of milisecondes.

In the Intel IOMMU spec you can see that you don't need to flush if the PTE was 
present so it's all good when we are creating a domain because we don't need to 
flush anything. Some problem happen when we try to move memory arround.

Here is some code from hvmloader, pci.c:190 on xen-unstable:

while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) {
    struct xen_add_to_physmap xatp;
    if ( hvm_info->high_mem_pgend == 0 )
        hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
    xatp.domid = DOMID_SELF;
    xatp.space = XENMAPSPACE_gmfn;
    xatp.idx   = --hvm_info->low_mem_pgend;
    xatp.gpfn  = hvm_info->high_mem_pgend++;
    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
        BUG();
}

This code gets triggered when the PCI hole increased so much that it overlaps 
with the allocated RAM. So we have to relocate the section that overlap in the 
top memory.

If we folow the code down to Xen we can find that add_to_physmap calls 
set_p2m_entry which uses either p2m_set_entry or ept_set_entry with an order or 
0, yes we only try to move one page.

Both implementations update the iommu page table with iommu_map_page.
So at the end we end up doing a loop of iommu_map_page driven by this loop in 
hvmloader.

The IOMMU DMA remapping enigne of the Intel GPU is really really slow to flush. 
So when we try to create a domain that does Intel GPU pass through with enough 
memory to force a relocation of the top RAM below 4G the domain can take 
minutes to start!

There are multiple approches that we can use to fix this problem, but before I 
start working on a patch I would like to get the list's point of view.

Plan A:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a 
new gfn.
  - Add a flag in the IOMMU API to delay the IOTLB flush
  - Add a new API call to flush the the IOTLB manully once we relocate all the 
range.

Plan B:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a 
new gfn.
  - Add a new set_p2m_entry function that will understand batches of gfns and 
mfns.
  - Implement batch operation for shadow and HAP.
  - Add new IOMMU API to support batch operation

(A) isn't very nice but has the benefit of not modifying to much code, (B) 
would be the right thing to do but would be quite disruptive in term of code 
and API change.

Let me know what you think,
Jean

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] RE: [VTD] Intel iommu IOTLB flush really slow