I was able to get some debugging in on this problem and here is what I have found. I re-ran my test with the XEN debug options enabled as Keir suggested (I also put some debug output in netif_map and map_frontend_pages to find out exactly what was failing). The 16th VM's vif timed out again and here is what I saw in the dmesg log: (XEN) grant_table.c:557:d1 Expanding dom (1) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d2 Expanding dom (2) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d3 Expanding dom (3) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d4 Expanding dom (4) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d5 Expanding dom (5) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d6 Expanding dom (6) grant table from (4) to (5)
frames. (XEN) grant_table.c:557:d7 Expanding dom (7) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d8 Expanding dom (8) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d9 Expanding dom (9) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d10 Expanding dom (10) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d11 Expanding dom (11) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d12 Expanding dom (12) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d13 Expanding dom (13) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d14 Expanding dom (14) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d15 Expanding dom (15) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d16 Expanding dom (16) grant table from (4) to (5)
frames. (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000
You can see from above that the first 15 VMs are OK, and the 16th VM fails with the last error message in "mm.c" as shown above. I attempted to trace upwards what exactly was failing so I enabled debug output in "linux-2.6-xen-sparse/drivers/xen/netback/interface.c" (this is where netif_map() is located). I then observed the following output in /var/log/messages when the 16th VMs vif timed out: (map_frontend_pages:227) Gnttab failure mapping rx_ring_ref! (netif_map:274) map frontend pages failed [I added this debug output] vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11 The error message from mm.c displayed in the dmesg log is coming from the function "create_grant_va_mapping" (a call to guest_map_l1e() is failing with NULL).
In summary, it looks like the mapping of the RX shared memory ring is failing (the TX mapping is passing, it always fails on the mapping of the RX ring). Another interesting note is that the address dumped in the dmesg log is always the same: d1400000 (I saw the failure about 10 times today and the address never changes). Also, by suggestion of Keir, I tried the XEN 3.0.4 kernel in my 16th VM (2.6.16.33), it failed the same way. The only difference is that instead of extending the grant table from 4 to 5 frames, it was extended from 4 to 16 frames: (XEN) grant_table.c:557:d18 Expanding dom (18) grant table from (4) to (16) frames. (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000
I believe the following stack trace represents the trace of the failure (starting from within XenBus, traced by hand):
connect_rings linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c netif_map linux-2.6-xen-sparse/drivers/xen/netback/interface.c map_frontend_pages linux-2.6-xen-sparse/drivers/xen/netback/interface.c __gnttab_map_grant_ref (hypercall) xen/common/grant_table.c create_grant_host_mapping xen/arch/x86/mm.c create_grant_va_mapping xen/arch/x86/mm.c
guest_map_l1e xen/arch/x86/mm.c (this is the function that is ultimately failing) Any clue as to what is causing this failure or how to fix it? Is there any other debug info I can provide here that would be of any help in resolving this issue? I have some free time tomorrow to debug this issue, but need some direction; this is in an area of XEN I don't understand very well. I am also thinking about downloading the xen 3.1 unstable release and trying that one to see if the problem also exists there. Thanks, Eric Keir Fraser <keir@xxxxxxxxxxxxx> wrote: Can you try 3.0.4 domU kernel agaianst 3.1 dom0 kernel, and vice versa? Also, turn on debug tracing in Xen (boot options ?loglvl=all guest_loglvl=all?) and see what appears at the end of ?xm dmesg?.
-- Keir
On 13/7/07 02:19, "Eric Tessler" <maiden1134@xxxxxxxxx> wrote:
At the same time in dom0, we see the following error message in /var/log/messages: "vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11" (the error message above means that netif_map failed for some reason in XenBus) If we repeat this same exact test using XEN 3.0.4, we never have any problems. All vifs in all VMs work correctly. This problem must be specific to XEN 3.1.
_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|