WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] dom0 crashing on extreme I/O

To: Diwaker Gupta <diwaker.lists@xxxxxxxxx>
Subject: Re: [Xen-devel] dom0 crashing on extreme I/O
From: Christopher Clark <christopher.clark@xxxxxxxxxxxx>
Date: Thu, 12 Jan 2006 14:41:16 +0000
Cc: Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 12 Jan 2006 14:48:26 +0000
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references; b=tRsCeHqieb/K5EP6t+Hm7xeNvyHhdXMirJ+wJ522XOyMt0Y71u/QPj4QVbhyGllCNfzS4oChL3xbQy+I/QxyKpOekGEzBv+FtuAK83vwtLi6PZNtz7d6X2monS0kQk4ioWfV7DasEsM/auLcpZ+Bzg8taZnOHYvl8H23kZX9W4I=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <A95E2296287EAD4EB592B5DEEFCE0E9D40A11B@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <A95E2296287EAD4EB592B5DEEFCE0E9D40A11B@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Diwaker

Could you add this patch to your build of the domain 0 kernel and try to exercise the fault again please?

thanks

Christopher

diff -r 821368442403 linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c        Thu Jan 12 11:45:49 2006
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c        Thu Jan 12 14:36:56 2006
@@ -212,6 +212,14 @@
                vdata   = (unsigned long)skb->data;
                old_mfn = virt_to_mfn(vdata);

+        if ( ((old_mfn & 0xfff) == 0x001) && (old_mfn > 0x10000000UL) )
+        {
+            printk("XXX: nasty mfn from p2m: v:%p p:%p m:%p\n",
+                    vdata, __pa(vdata), old_mfn );
+            /* HACK: let's try shifting it until it looks sane... */
+            old_mfn >>= 12;
+        }
+
                /* Memory squeeze? Back off for an arbitrary while. */
                if ((new_mfn = alloc_mfn()) == 0) {
                        if ( net_ratelimit() )


On 1/12/06, Ian Pratt < m+Ian.Pratt@xxxxxxxxxxxx> wrote:


> I have 3 VMs, two running webservers and the 3rd running
> netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0
> and all the remaining VMs on a separate CPU.
>
> Currently my dom0 has 528M of memory, while each VM has around 160M.
> Under high loads, the system crashes. I'm pasting a
> representative crash here:
>
> file=grant_table.c, line=729) gnttab_transfer: out-of-range
> or xen frame 2f016001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2f017001

Interesting. We've seen this very occasionally before, but this is the
first time on a 32b kernel.

The clue is that the errant frame numbers always end 001, and are
actually valid if you shift them >>12.

It would be very helpful if you could work on a minimal repro case,
ideally with only one domU.

Chris: any extra debugging that might be helpful?

Thanks,
Ian


> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fca001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fcb001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270c001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270d001 ------------[ cut here
> ]------------ kernel BUG at drivers/xen/netback/netback.c:335!
> invalid operand: 0000 [#1]
> Modules linked in: ipt_physdev iptable_filter ip_tables video
> thermal processor fan button battery ac md sworks_agp agpgart
> dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih
> mptbase sd_mod scsi_mod
> CPU:    0
> EIP:    0061:[<c02c6782>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.12.6-xen0)
> EIP is at net_rx_action+0x4c2/0x4f0
> eax: 0000fff7   ebx: df26b620   ecx: 00000042   edx: c04b8920
> esi: dc073480   edi: 00000000   ebp: c04b3900   esp: c0a23d28
> ds: 007b   es: 007b   ss: 0069
> Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510)
> Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80
> df26b400 5cb34f36
>        00000088 00000000 0003d700 db6ff012 c04b8920 00000106
> 00a23e2c c05e5000
>        00000000 c0363a90 00000001 00000000 00000000 00000001
> c0a16510 00000000 Call Trace:
>  [<c0362d90>] br_forward_finish+0x0/0x80  [<c0363b36>]
> br_handle_frame_finish+0xa6/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c01423a5>]
> kmem_getpages+0x65/0x90  [<c013ece2>] __rmqueue+0xb2/0xf0
> [<c032302d>] nf_iterate+0x5d/0x90  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c032336e>] nf_hook_slow+0x6e/0x120
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0368549>]
> br_nf_pre_routing+0x319/0x4a0  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c032302d>] nf_iterate+0x5d/0x90
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c032336e>]
> nf_hook_slow+0x6e/0x120  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c0363db3>]
> br_handle_frame+0x1c3/0x260  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c03188d3>]
> netif_receive_skb+0x113/0x230  [<c02820bf>]
> tg3_rx+0x2cf/0x490  [<c027e246>] tg3_restart_ints+0x26/0xa0
> [<c02823a6>] tg3_poll+0x126/0x1a0  [<c0121660>]
> ksoftirqd+0x0/0xa0  [<c0121660>] ksoftirqd+0x0/0xa0
> [<c01214ff>] tasklet_action+0x5f/0xa0  [<c0121152>]
> __do_softirq+0x52/0xc0  [<c0121207>] do_softirq+0x47/0x60
> [<c01216b9>] ksoftirqd+0x59/0xa0  [<c013079d>]
> kthread+0xad/0xf0  [<c01306f0>] kthread+0x0/0xf0
> [<c0106855>] kernel_thread_helper+0x5/0x10
> Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40
> c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff
> 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b
> 2a 01 38 19 3a c0  <0>Kernel panic - not syncing: Fatal
> exception in interrupt
>  (XEN) Domain 0 shutdown: rebooting machine.
>
> NOTE: the line number in netback.c (335) might not be very
> useful for reference. I have some additional instrumentation
> in netback, so the line number might not match the files in
> xen-unstable.hg
>
> Will increasing dom0 memory further help? Or increasing the
> size of the rings?
> --
> Web/Blog/Gallery: http://floatingsun.net
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>