This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_I

To: "H. Peter Anvin" <hpa@xxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Fri, 22 Oct 2010 14:02:31 -0400
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, the arch/x86 maintainers <x86@xxxxxxxxxx>, "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>
Delivery-date: Fri, 22 Oct 2010 11:04:57 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4CC1BF58.9020001@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4CC0C14E.5080205@xxxxxxxx> <4CC0C318.90401@xxxxxxxxx> <4CC0CA07.3000306@xxxxxxxx> <4CC0DEB8.1060309@xxxxxxxxx> <20101022150826.GA23325@xxxxxxxxxxxx> <4CC1BF58.9020001@xxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)
On Fri, Oct 22, 2010 at 09:44:08AM -0700, H. Peter Anvin wrote:
> On 10/22/2010 08:08 AM, Konrad Rzeszutek Wilk wrote:
> >>
> >> Okay, could you clarify this part a bit?  Why does the kernel need to
> >> know the difference between "pseudo-physical" and "machine addresses" at
> >> all?  If they overlap, there is a problem, and if they don't overlap, it
> >> will be a 1:1 mapping anyway...
> > 
> > The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
> > used instead of the PFN. We need that b/c when a driver does page_to_pfn()
> > it ends up using the PFN as bus address to write out registers data.
> > 
> > Without this patch, the page->virt->PFN value is used and the PFN != to 
> > real MFN
> > so we end up writing in a memory address that the PCI device has no idea 
> > about.
> > By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
> > 
> > The drivers I am talking about are mostly, if not all, located in 
> > drivers/gpu
> > and it looks that we are missing two more patches to utilize the patch
> > that Jeremy posted.
> > 
> > Please note that I am _not_ suggesting that the two patches
> > below should go out - I still need to post them on drm mailing list.
> > 
> I'm still seriously confused.  If I understand this correctly, we're
> talking about DMA addresses here (as opposed to PIO addresses, i.e.
> BARs), right?

Correct. The BARs are ok since they go through the ioremap.
> It's the bimodality that really bothers me.  I understand of course that
> Xen imposes yet another address remapping layer, but I'm having a hard
> time understanding any conditions under with we would need that layer to
> go away, as long as DMA addresses are translated via the DMA APIs -- and
> if they aren't, then iommus will break, too.

That is it. They aren't using the DMA or PCI API completly(*).  Try doing
'iommu=soft swiotlb=force' with your radeon card under baremetal
(I used an ATI ES1000).  I think it will grind to halt during the writeback 

(*): This was with 2.6.34, I haven't touched 2.6.36 and there was an drm/iomem 
so it might be that this now working. The incomplete part of the graphics
drivers was that it would not do pci_dma_sync_*, so when the MFN was programmed 
in the
GTT/GART (check out radeon_gart_bind, the call to pci_map_page gets the bus 
address, also
known as MFN). So the GPU would now have a virt->MFN mapping. However, on the 
CPU side
when the driver writes a texture to virtual address, the mapping is virt->PFN.

So when we kick the GPU to do its magic, the VM on the graphics card would 
the virtual address to the MFN, which did not have the data that was written by 
kernel to the PFN. In other words *PFN != *MFN, while we need *PFN == *MFN.
There are two ways of making this work:
 1). PFN == MFN (this is what Jeremy's patch ends up doing) and under
     baremetal it won't affect as baremetal doesn't care what the VM_IO flag
     stands for.
 2). Add a whole bunch of pci_dma_sync in the appropiate sections in the 
     graphic drivers.

I am not qualified to do 2) - that code scares me. Also 1) is the easier :-)

I am actually not sure how it works with AMD-Vi or Intel VT-d. I do remember
something about letting certain devices bypass the VT-d, and I think I saw
the nouveau making the DMAR throw a fit.

> As such, I don't grok this page flag and what it does, and why it's
> needed.  I'm not saying it's *wrong*, I'm saying the design is opaque to

I hope my explanation cleared the confusion.

> me and I'm not sure it is the right solution.

Xen-devel mailing list