xen-devel
[Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for
To: |
Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> |
Subject: |
[Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs. |
From: |
Chris Wright <chrisw@xxxxxxxxxxxx> |
Date: |
Tue, 19 Jan 2010 10:43:16 -0800 |
Cc: |
fujita.tomonori@xxxxxxxxxxxxx, jeremy@xxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, Ian.Campbell@xxxxxxxxxxxxx, joerg.roedel@xxxxxxx, Chris Wright <chrisw@xxxxxxxxxxxx>, iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx, dwmw2@xxxxxxxxxxxxx, alex.williamson@xxxxxx |
Delivery-date: |
Tue, 19 Jan 2010 10:44:31 -0800 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<20100119174654.GQ11986@xxxxxxxxxxxxxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<1263510064-16788-1-git-send-email-konrad.wilk@xxxxxxxxxx> <1263510064-16788-2-git-send-email-konrad.wilk@xxxxxxxxxx> <1263510064-16788-3-git-send-email-konrad.wilk@xxxxxxxxxx> <20100115013308.GB6021@xxxxxxxxxxxxxxxxxxxx> <20100119174654.GQ11986@xxxxxxxxxxxxxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
User-agent: |
Mutt/1.5.19 (2009-01-05) |
* Konrad Rzeszutek Wilk (konrad.wilk@xxxxxxxxxx) wrote:
> > > + * Is the DMA (Bus) address within our bounce buffer (start and end).
> > > + */
> > > + int (*is_swiotlb_buffer)(struct swiotlb_engine *, dma_addr_t dev_addr,
> > > + phys_addr_t phys);
> > > +
> >
> > Why is this implementation specific?
>
> In the current implementation, they both use the physical address and
> do a simple check:
>
> return paddr >= virt_to_phys(io_tlb_start) &&
> paddr < virt_to_phys(io_tlb_end);
>
> That for virtualized environments where a PCI device is passed in would
> work too.
>
> Unfortunately the problem is when we provide a channel of communication
> with another domain and we end up doing DMA on behalf of another guest.
> The short description of the problem is that a page of memory is shared
> with another domain and the mapping in our domain is correct (bus->physical)
> the other way (virt->physical->bus) is incorrect for the duration of this page
> being shared. Hence we need to verify that the page is local to our
> domain, and for that we need the bus address to verify that the
> addr == physical->bus(bus->physical(addr)) where addr is the bus
> address (dma_addr_t). If it is not local (shared with another domain)
> we MUST not consider it as a SWIOTLB buffer as that can lead to
> panics and possible corruptions. The trick here is that the phys->virt
> address can fall within the SWIOTLB buffer for pages that are
> shared with another domain and we need the DMA address to do an extra check.
>
> The long description of the problem is:
>
> You are the domain doing some DMA on behalf of another domain. The
> simple example is you are servicing a block device to the other guests.
> One way to implement this is to present a one page ring buffer where
> both domains move the producer and consumer indexes around. Once you get
> a request (READ/WRITE), you use the virtualized channels to "share" that page
> into your domain. For this you have a buffer (2MB or bigger) wherein for
> pages that shared in to you, you over-write the phys->bus mapping.
> That means that the phys->phys translation is altered for the duration
> of this request being out-standing. Once it is completed, the phys->bus
> translation is restored.
>
> Here is a little diagram of what happens when a page is shared (and lets
> assume that we have a situation where virt #1 == virt #2, which means
> that phys #1 == phys #2).
>
> (domain 2) virt#1->phys#1---\
> +- bus #1
> (domain 3) virt#2->phys#2 ---/
>
> (phys#1 points to bus #1, and phys#2 points to bus #1 too).
>
> The converse of the above picture is not true:
>
> /---> phys #1-> virt #1. (domain 2).
> bus#1 +
> \---> phys #2-> virt #2. (domain 3).
>
> phys #1 != phys #2 and hence virt #1 != virt #2.
>
> When a page is not shared:
>
> (domain 2) virt #1->phys #1--> bus #1
> (domain 3) virt #2->phys #2--> bus #2
>
> bus #1 -> phys #1 -> virt #1 (domain 2)
> bus #2 -> phys #2 -> virt #2 (domain 3)
>
> The bus #1 != bus #2, but phys #1 could be same as phys #2 (since
> there are just PFNs). And virt #1 == virt #2.
>
> The reason for these is that each domain has its own memory layout where
> the memory starts at pfn 0, not at some higher number. So each domain
> sees the physical address identically, but the bus address MUST point
> to different areas (except when sharing) otherwise one domain would
> over-write another domain, ouch.
>
> Furthermore when a domain is allocated, the pages for the domain are not
> guaranteed to be linearly contiguous so we can't guarantee that phys == bus.
>
> So to guard against the situation in which phys #1 ->virt comes out with
> an address that looks to be within our SWIOTLB buffer we need to do the
> extra check:
>
> addr == physical->bus(bus->physical(addr)) where addr is the bus
> address
>
> And for scenarios where this is not true (page belongs to another
> domain), that page is not in the SWIOTLB (even thought the virtual and
> physical address point to it).
>
> > > + /*
> > > + * Is the DMA (Bus) address reachable by the PCI device?.
> > > + */
> > > + bool (*dma_capable)(struct device *, dma_addr_t, phys_addr_t, size_t);
>
> I mentioned in the previous explanation that when a domain is allocated,
> the pages are not guaranteed to be linearly contiguous.
>
> For bare-metal that is not the case and 'dma_capable' just checks
> the device DMA mask against the bus address.
>
> For virtualized environment we do need to check if the pages are linearly
> contiguous for the size request.
>
> For that we need the physical address to iterate over them doing the
> phys->bus#1 translation and checking whether the (phys+1)->bus#2
> bus#1 == bus#2 + 1.
Right, for both of those cases I was thinking you could make that the
base logic and the existing helpers to do addr translation would be
enough. But that makes more sense when compiling for a specific arch
(i.e. the checks would be noops and compile away when !xen) as opposed to a
dynamic setup like this.
thanks,
-chris
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-devel] Re: [PATCH 07/15] [swiotlb] In 'swiotlb_free' check iommu_sw pointer., (continued)
- [Xen-devel] Re: [PATCH 05/15] [swiotlb] Respect the io_tlb_nslabs argument value., Chris Wright
- [Xen-devel] Re: [PATCH 04/15] [swiotlb] Search and replace s/io_tlb/iommu_sw->/, Chris Wright
- [Xen-devel] Re: [PATCH 04/15] [swiotlb] Search and replace s/io_tlb/iommu_sw->/, Konrad Rzeszutek Wilk
- [Xen-devel] Re: [PATCH 03/15] [swiotlb] Add swiotlb_register_engine function., Chris Wright
- [Xen-devel] Re: [PATCH 03/15] [swiotlb] Add swiotlb_register_engine function., Konrad Rzeszutek Wilk
- [Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs., Chris Wright
- [Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs., Konrad Rzeszutek Wilk
- [Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs.,
Chris Wright <=
- [Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs., FUJITA Tomonori
- [Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs., Konrad Rzeszutek Wilk
[Xen-devel] Re: [PATCH 01/15] [swiotlb] fix: Update 'setup_io_tlb_npages' to accept both arguments in either order., Chris Wright
[Xen-devel] Re: [RFC SWIOTLB-0.2], Chris Wright
|
|
|