This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Grant Table Network Issues

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Grant Table Network Issues
From: Michael Vrable <mvrable@xxxxxxxxxxx>
Date: Sat, 13 Aug 2005 11:59:45 -0700
Delivery-date: Sat, 13 Aug 2005 18:58:01 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Mail-followup-to: xen-devel@xxxxxxxxxxxxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.9i
I've been working on getting networking functioning in translated shadow
mode, and have it to the point where it's almost working--some packets
get through some of the time before the machine crashes.

In an effort to narrow down the problem, I've found that the grant table
network interface code in xen-unstable.hg seems to have some stability
problems as well.  Here's one of them: this is with a mostly unmodified
checkout of xen-unstable.hg (from yesterday evening), patched to produce
more debugging output and also with IP checksumming optimizations
disabled (since I was seeing some trouble with those).  After a very
short while, I get a Dom-0 crash.  This transcript is taken from
Domain-0, pinging an unprivileged domain (no shadow modes enabled):

    potemkin58:~# ping
    PING ( 56(84) bytes of data.
    (XEN) gnttab_donate: i=0 mfn=000112aa domid=1 gref=000003c1
    (XEN) (file=grant_table.c, line=1146) gnttab_prepare_for_transfer rd(1) 
ld(0) ref(961).
    (XEN) (file=grant_table.c, line=1225) gnttab_notify_transfer rd(1) ld(0) 
    (XEN) (file=grant_table.c, line=423) Mapping grant ref (706) for domain (1) 
with flags (6)
    (XEN) (file=grant_table.c, line=119) activate_grant_ref: mapping=0 
    (XEN) (file=grant_table.c, line=315) activate_grant_ref: frame=78807
    (XEN) (file=grant_table.c, line=455) map_grant_ref: frame=78807 
vaddr=df807000 handle=2
    (XEN) gnttab_donate: i=0 mfn=0007689f domid=1 gref=000003c2
    (XEN) (file=grant_table.c, line=1146) gnttab_prepare_for_transfer rd(1) 
ld(0) ref(962).
    (XEN) (file=grant_table.c, line=1225) gnttab_notify_transfer rd(1) ld(0) 
    (XEN) (file=grant_table.c, line=535) Unmapping grant ref (706) for domain 
(1) with handle (2)
    (XEN) (file=grant_table.c, line=652) unmap_grant_ref: frame=78807
    (XEN) (file=grant_table.c, line=423) Mapping grant ref (706) for domain (1) 
with flags (6)
    (XEN) (file=grant_table.c, line=119) activate_grant_ref: mapping=0 
    (XEN) (file=grant_table.c, line=315) activate_grant_ref: frame=7b6b0
    (XEN) (file=grant_table.c, line=455) map_grant_ref: frame=7b6b0 
vaddr=df808000 handle=2
    kernel BUG at include/linux/skbuff.h:1148 (kmap_skb_frag)!
     [<c03c2a0b>] skb_checksum+0x27b/0x310
     [<c040228b>] icmp_rcv+0x16b/0x1a0
     [<c03dbebb>] ip_local_deliver+0xdb/0x220
     [<c03dc33e>] ip_rcv+0x33e/0x4b0
     [<c03dc630>] ip_rcv_finish+0x0/0x250
     [<c03c7d64>] netif_receive_skb+0x204/0x270
     [<c03c7e89>] process_backlog+0xb9/0x190
     [<c03c800d>] net_rx_action+0xad/0x1a0
     [<c0123f35>] __do_softirq+0xc5/0xf0
     [<c0123feb>] do_softirq+0x8b/0x90
     [<c01240b5>] irq_exit+0x35/0x40
     [<c010f082>] do_IRQ+0x22/0x30
     [<c0106530>] evtchn_do_upcall+0x70/0xa0
     [<c010a758>] hypervisor_callback+0x2c/0x34
     [<c01064ba>] force_evtchn_callback+0xa/0x10
     [<c014b72f>] __pagevec_lru_add+0x15f/0x1c0
     [<c013fb46>] add_to_page_cache+0x76/0xf0
     [<c018957d>] mpage_readpages+0x18d/0x190
     [<c01cd850>] ext3_get_block+0x0/0xc0
     [<c0147e74>] read_pages+0x124/0x170
     [<c01cd850>] ext3_get_block+0x0/0xc0
     [<c0145573>] __alloc_pages+0x2e3/0x430
     [<c0147fe0>] __do_page_cache_readahead+0x120/0x230
     [<c01415cf>] filemap_nopage+0x2ef/0x410
     [<c01528f8>] do_no_page+0xb8/0x3b0
     [<c01502a3>] pte_alloc_map+0x93/0x210
     [<c0152e36>] handle_mm_fault+0xf6/0x240
     [<c01178ec>] do_page_fault+0x19c/0x5f2
     [<c0114246>] old_mmap+0xd6/0x110
     [<c010a93a>] page_fault+0x2e/0x34
    Kernel panic - not syncing: BUG!
     (XEN) Domain 0 shutdown: rebooting machine.

The line causing trouble is "BUG_ON(in_irq())".  In this example, I had
tcpdump running in both domains; this seems to trigger the problem more
reliably.  I've also seen a similar crash with a TCP connection, but it
takes a few packets before this shows up (the handshake completes, and
the crash happens about the time data packets come back from domain-0;
if checksumming optimizations are enabled, it seems the packets are
dropped so I don't see a crash but I don't get any data either).

I've been having a difficult time tracking this down, so any help is

--Michael Vrable

Xen-devel mailing list