This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Grant Table Network Issues

To: Michael Vrable <mvrable@xxxxxxxxxxx>
Subject: Re: [Xen-devel] Grant Table Network Issues
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Sun, 14 Aug 2005 09:29:02 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 14 Aug 2005 08:33:34 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20050813185945.GA23341@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20050813185945.GA23341@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On 13 Aug 2005, at 19:59, Michael Vrable wrote:

The line causing trouble is "BUG_ON(in_irq())".  In this example, I had
tcpdump running in both domains; this seems to trigger the problem more
reliably.  I've also seen a similar crash with a TCP connection, but it
takes a few packets before this shows up (the handshake completes, and
the crash happens about the time data packets come back from domain-0;
if checksumming optimizations are enabled, it seems the packets are
dropped so I don't see a crash but I don't get any data either).

On the stack trace, at irq_exit() you definitely have no hardirqs or softirqs in progress. But somehow, at kmap_skb_frag(), the hardirq section of the preempt mask has become non-zero. You can't have been preempted to another cpu during any of this because the preempt mask is continuously non-zero throughout original irq handling and subsequent softirq handling.

The only code between irq_exit and kmap_skb_frag on the stack trace is unmodified Linux code. Assuming that is all correct (and presumably the same whether we enable grant tables or not) I might guess another interrupt arrives and the handler corrupts things?

 -- Keir

Xen-devel mailing list