WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xennet: skb rides the rocket messages in domU dmesg

To: Mark Hurenkamp <mark.hurenkamp@xxxxxxxxx>
Subject: Re: [Xen-devel] xennet: skb rides the rocket messages in domU dmesg
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Tue, 01 Jun 2010 09:42:16 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 01 Jun 2010 10:52:54 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C018A9A.7060806@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4BFD90D6.2020107@xxxxxxxxx> <4BFDA304.3060803@xxxxxxxx> <4C018A9A.7060806@xxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-2.fc12 Lightning/1.0b2pre Thunderbird/3.0.4
On 05/29/2010 02:43 PM, Mark Hurenkamp wrote:
>> That appears to mean that you're getting single packets which are larger
>> than 18 pages long (72k).  I'm not quite sure how that's possible, since
>> I thought the datagram limit is 64k..
>>
>> Are you using nfs over udp or tcp?  (I think tcp, from your stack
>> trace.)
>>
>> Does turning of tso/gso with ethtool make a difference?
>>    
> Ok, i tried this on the running system, and it did seem to improve
> things, but still i'd see some (other) messages.
> After a reboot, with the new xen/stable-2.6.32.13.x based kernel
> and switching tso and gso off with ethtool, these messages are
> now completely gone (have the system up for about a day now).

Hm.  I don't think disabling them should be necessary, but the only
downside in doing so is slightly higher per-packet processing cost.

>
> I do notice something else though (might have been there before,
> but now it is the only message in domU dmesg), just after starting
> nfs during boot of the domU:
>
> BUG: unable to handle kernel paging request at 00000002dcf32198
> IP: [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
> PGD a777067 PUD 0
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci-0/pci0000:08/0000:08:02.0/local_cpus

What device is 0000:08:02.0?

> CPU 0
> Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss
> autofs4 ipv6 wm8775 tea5767 cx25840 tuner_simple sunrpc tuner_types
> tda9887 tda8290 tuner msp3400 saa7127 saa7115 ivtv i2c_algo_bit
> cx2341x v4l2_common videodev v4l1_compat xen_fbfront
> v4l2_compat_ioctl32 fb_sys_fops tveeprom sysimgblt joydev i2c_core
> sysfillrect xen_kbdfront syscopyarea xen_netfront raid10 raid456
> async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
> async_tx raid1 raid0 multipath linear
> Pid: 3468, comm: irqbalance Not tainted 2.6.32.13m7.1 #1
> RIP: e030:[<ffffffff811cf09a>]  [<ffffffff811cf09a>]
> bitmap_scnprintf+0x5c/0xb6
> RSP: e02b:ffff88001cbd9e18  EFLAGS: 00010246
> RAX: ffffffff81527f2b RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000ffe RDI: 0000000000000000
> RBP: ffff88001cbd9e48 R08: 0000000000000010 R09: 0000000000000001
> R10: 0000000000000357 R11: dead000000200200 R12: 0000000000000000
> R13: 0000000000000ffe R14: 00000002dcf32198 R15: ffff880002bbd000
> FS:  00007fc142b6d720(0000) GS:ffff8800046e0000(0000)
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000002dcf32198 CR3: 000000001ca58000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process irqbalance (pid: 3468, threadinfo ffff88001cbd8000, task
> ffff88001ded2920)
> Stack:
>  0000000000000200 ffff880002bbd000 ffff88001cbd9f58 ffff880002eeb858
> <0> ffff88001ce8ed10 ffffffff81616230 ffff88001cbd9e68 ffffffff811dd333
> <0> ffff880002eeb878 ffffffff81606368 ffff88001cbd9e98 ffffffff81273574
> Call Trace:
>  [<ffffffff811dd333>] local_cpus_show+0x44/0x57
>  [<ffffffff81273574>] dev_attr_show+0x22/0x49
>  [<ffffffff810a4e8e>] ? __get_free_pages+0x9/0x46
>  [<ffffffff8112fbc2>] sysfs_read_file+0xb4/0x139
>  [<ffffffff810da927>] vfs_read+0xa6/0x103
>  [<ffffffff810daa3a>] sys_read+0x45/0x69
>  [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b
> Code: e0 48 c7 c0 2b 7f 52 81 41 83 ec 20 31 db eb 60 44 89 e2 44 89
> e1 48 63 fb 83 e1 3f c1 fa 06 41 b9 01 00 00 00 48 63 d2 44 89 ee <49>
> 8b 14 d6 29 de 48 d3 ea 49 8d 3c 3f 44 88 c1 41 83 ec 20 49
> RIP  [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
>  RSP <ffff88001cbd9e18>
> CR2: 00000002dcf32198
> ---[ end trace 5f520ed1e48e5394 ]---
>
>
> During boot of dom0 i see the following when it is starting my domU
> (seems to be more of a warning):
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.

Interesting.  That looks like a bug in the core kernel's mmu notifier
machinery that we're using, but the only side-effect is that it will
disable lockdep checking.

> Pid: 5861, comm: qemu-dm Not tainted 2.6.32.13m7.1 #1
> Call Trace:
>  [<ffffffff8106a625>] __lock_acquire+0x431/0x459
>  [<ffffffff810b029d>] ? vma_prio_tree_remove+0x27/0xda
>  [<ffffffff8106a6b1>] lock_acquire+0x64/0x81
>  [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
>  [<ffffffff813cdb70>] _spin_lock_nest_lock+0x31/0x66
>  [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
>  [<ffffffff813ccc0e>] ? mutex_lock_nested+0x34/0x39
>  [<ffffffff810b939d>] mm_take_all_locks+0xe5/0x11c
>  [<ffffffff810cbcbc>] ? do_mmu_notifier_register+0x56/0x113
>  [<ffffffff810cbcc4>] do_mmu_notifier_register+0x5e/0x113
>  [<ffffffff810cbd94>] mmu_notifier_register+0xe/0x10
>  [<ffffffff8123acdb>] gntdev_open+0x8f/0xcc
>  [<ffffffff81257dc2>] misc_open+0x188/0x21e
>  [<ffffffff810dd1f6>] chrdev_open+0x164/0x185
>  [<ffffffff810dd092>] ? chrdev_open+0x0/0x185
>  [<ffffffff810d8bd5>] __dentry_open+0x149/0x27f
>  [<ffffffff810d8dd1>] nameidata_to_filp+0x3d/0x4e
>  [<ffffffff810e59ed>] do_filp_open+0x4ee/0x9e9
>  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff8100eff2>] ? check_events+0x12/0x20
>  [<ffffffff811d0637>] ? _raw_spin_unlock+0x8f/0x98
>  [<ffffffff813cdb3a>] ? _spin_unlock+0x26/0x2b
>  [<ffffffff810eedf2>] ? alloc_fd+0x111/0x123
>  [<ffffffff810d89a3>] do_sys_open+0x5e/0x10a
>  [<ffffffff810d8a78>] sys_open+0x1b/0x1d
>  [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b
>
>
> Probably not related, i see the following message in my dom0 from time
> to time, and if it appears at the 'wrong' moment, it causes my system
> to become completely unusable as soon as a process needs disk access.
>
> ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata4.00: BMDMA stat 0x64
> ata4.00: failed command: READ DMA
> ata4.00: cmd c8/00:08:99:13:5c/00:00:00:00:00/ef tag 0 dma 4096 in
>          res 51/40:00:a0:13:5c/00:00:00:00:00/ef Emask 0x9 (media error)
> ata4.00: status: { DRDY ERR }
> ata4.00: error: { UNC }
> ata4.00: configured for UDMA/133
> ata4.01: configured for UDMA/133
> ata4: EH complete
>
> Not sure if this is related though, it could be just a bad disk (it
> seems to be always related to the same disk), i'm going to replace the
> disk, and see if that makes a difference.

That looks like a real disk error - it's getting uncorrectable read errors.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Xen-devel] xennet: skb rides the rocket messages in domU dmesg, Jeremy Fitzhardinge <=