I've been slowly working on the dma problem i ran into; thought i was
making progress, but i think i'm up against a wall, so more discussion
and ideas might be helpful.
The problem was that on x86_32 PAE and x86_64, our physical address size
is greater then 32 bits, yet many (most?) io devices can only address
the first 32 bits of memory. So if/when we try to do dma to an address
that's has bits greater then 32 set (call these high addresses), due to
truncation the dma ends up happening to the wrong address.
I saw this problem on x86_64 with 6gigs ram, if i made dom0 too big, the
allocator put it in high memory, the linux kernel booted fine, but the
partition scan failed, and it couldn't mount root.
My original solution was to add another type to the xen zoneinfo array
to divide memory between high and low. Finally, only allocate low memory
when a domain needs to do dma or when high memory is exhausted. This was
an easy patch that worked fine. I can provide it if anyone wants it.
On the linux side of things, my first approach was to try to use linux
zones to divide up memory. Currently under xen, all memory is placed in
the dma zone. I was hoping i could somewhere loop over memory, do check
the machine address of each page, and place it in the proper zones. The
first problem with this approach is that linux zones are designed more
for dealing with the even smaller isa address space. That aside, it
seems to make large assumptions about memory being (mostly) contiguous
and most frequently deals with "start" and "size" rather then arrays
of pages. I start looking at code, thinking that i might change
that, but at some point finally realized that on an abstract level,
what i was fundamentally doing was the exact reason that the pfn/mfn
mapping exists---teaching linux about non-contiguous memory looks fairly
The next approach i started on was to have xen reback memory with
low pages when it went to do dma. dma_alloc_coherent() makes a call
to xen_contig_memory(), which forces a range of memory to be backed
by machine contiguous pages by freeing the buffer to xen, and then
asking for it back. I tried adding another hypercall to request that
dma'able pages be returned. This worked great for the network cards, but
disk was another story. First off, there were several code paths that
do dma that don't end up calling xen_contig_memory (which right now is
fine because its only ever on single pages). I started down the path of
finding those, but in the mean time realized that for disk, we could be
dma'ing to any memory. Additionally, Michael Hohnbaum reminded me of
page flipping. Between these two, it seems reasonable to think that the
pool for free dma memory could eventually become exhausted.
That is the wall.
Footnote: this will not be a problem on all machines. AMD x86_64 has
iommu which should make this a non-problem (if the kernel chooses to use
it). Unfortunately, from what i understand, EMT64 is not so blessed.
1| incidentally, it seems to me that optimally xen_contig_memory()
should just return if order==0.
Xen-devel mailing list