> >> page-table directory so that when the GPU accesses the addresses, it
> >> gets the real bus address. I wonder if it fails at that thought -
> >> meaning that the addresses that are written to the page table are
> >> actually the guest page numbers (gpfn) instead of the machine page numbers
> >> (mfn).
> > No, I don't think thats how it works. The user-space write triggers an
> > aio-write -
> which triggers do_page_fault, handle_mm_fault, do_linear_fault, __do_fault
> and finally ttm_bo_vm_fault.
> ttm_bo_fault returns VM_FAULT_NOPAGE
VM_FAULT_NOPAGE = means retry the fault, In other words, I've fixed the
PTE to point to the right PFN.
> - but xen-boot keeps on re-triggering the same fault.
Which probably means that something is not OK with the PTE. What is the
vma->vm_page_prot value before the vm_insert_mixed? (and maybe even
Try also reading the true value of the PTE and seeing what it shows
before and after the vm_insert_mixed.
I've attached a simple patch I wrote some time ago to get the real MFNs
and its page protection. I think you can adapt it (print_data function to be
to peet at the PTE and its protection values.
There is an extra flag that the PTE can have when running under Xen:
This signifies that the PFN is actually the MFN. In this case thought
it sholdn't be enabled b/c the memory is actually gathered from
alloc_page. But if it is, it might be the culprit.
> when vm_fault calls ttm_tt_get_page, the page is already there, and
> the handler does another vm_insert_page (i changed vm_insert_mixed
> vm_insert_page/pfn based on io_mem, now the only patch, and it works on
> bare machine) on and on and on.
> What can possibly cause the fault-handler to repeat endlessly?
The VM_FAULT_NOPAGE shortcircuits most of the fault-handler and makes it
return back. The application is resumed and retries the operation that
caused the fault - in this case an attempt to write to an address that
was not present. Obviously the second attempt at writing to the address
should have worked without problems.
> If a wrong page is backed at the user-address, it should create bad_access or
> some other subsequent events - but the system is running fine minus all local
> consoles! If the insertion is to a wrong place, this can happen; but
> the top-level
> trap is the only provider of the address - and the fault addres and
> vma address match,
> and the same code works fine on bare-boot.
So you see this fault handler being called endlessly while the machine
is still running and other pieces of code work just fine, right?
> ttm_tt_get_page calls alloc in a loop - so it may allocate multiple pages from
> start/end depending on Highmem memory or not - implying asynchronous
> and mapping.
I thought it had some logic to figure out that it already handled this
page and would return an already allocate page?
> All I want now is *ptr = (uint32_t)data to work as of now!
You are doing a great job at this head-spinning detective work. Much
Description: Text document
Xen-devel mailing list