Re: [Xen-devel] xennet: skb rides the rocket messages in domU dm

Hi

Are you actually using the "xen/next" branch?  I recommend you use
xen/stable-2.6.32.x, since that's tracking all the other bugfixes going
into Linux 2.6.32.

I was using xen/next since some of the features i use were not
in xen/stable at the time. I built a new xen/stable-2.6.32.x yesterday,
which does seem to work fine, so i guess i can follow that branch
now.

To keep consistency with old recording data, and since i would like to
have all
recordings in a single volume, i tried to use an nfs mount of the
recordings volume
from the dom0 to mount on all backends. This resulted in a very
unstable system,
to the point where my most important slave backend became unusable.

Unstable how?

The mythtv backends would not be able to reliably record shows on an
nfs mounted filesystem. Ivtv driver would complain about application not
reading fast enough. This made the backends unusable.

That appears to mean that you're getting single packets which are larger
than 18 pages long (72k).  I'm not quite sure how that's possible, since
I thought the datagram limit is 64k..

Are you using nfs over udp or tcp?  (I think tcp, from your stack trace.)

Does turning of tso/gso with ethtool make a difference?

Ok, i tried this on the running system, and it did seem to improve
things, but still i'd see some (other) messages.
After a reboot, with the new xen/stable-2.6.32.13.x based kernel
and switching tso and gso off with ethtool, these messages are
now completely gone (have the system up for about a day now).

I do notice something else though (might have been there before,
but now it is the only message in domU dmesg), just after starting
nfs during boot of the domU:

BUG: unable to handle kernel paging request at 00000002dcf32198
IP: [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
PGD a777067 PUD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci-0/pci0000:08/0000:08:02.0/local_cpus
CPU 0

Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgssautofs4 ipv6 wm8775 tea5767 cx25840 tuner_simple sunrpc tuner_typestda9887 tda8290 tuner msp3400 saa7127 saa7115 ivtv i2c_algo_bit cx2341xv4l2_common videodev v4l1_compat xen_fbfront v4l2_compat_ioctl32fb_sys_fops tveeprom sysimgblt joydev i2c_core sysfillrect xen_kbdfrontsyscopyarea xen_netfront raid10 raid456 async_raid6_recov async_pqraid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear

Pid: 3468, comm: irqbalance Not tainted 2.6.32.13m7.1 #1

RIP: e030:[<ffffffff811cf09a>] [<ffffffff811cf09a>]bitmap_scnprintf+0x5c/0xb6

RSP: e02b:ffff88001cbd9e18  EFLAGS: 00010246
RAX: ffffffff81527f2b RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000ffe RDI: 0000000000000000
RBP: ffff88001cbd9e48 R08: 0000000000000010 R09: 0000000000000001
R10: 0000000000000357 R11: dead000000200200 R12: 0000000000000000
R13: 0000000000000ffe R14: 00000002dcf32198 R15: ffff880002bbd000
FS:  00007fc142b6d720(0000) GS:ffff8800046e0000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000002dcf32198 CR3: 000000001ca58000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Process irqbalance (pid: 3468, threadinfo ffff88001cbd8000, taskffff88001ded2920)

Stack:
 0000000000000200 ffff880002bbd000 ffff88001cbd9f58 ffff880002eeb858
<0> ffff88001ce8ed10 ffffffff81616230 ffff88001cbd9e68 ffffffff811dd333
<0> ffff880002eeb878 ffffffff81606368 ffff88001cbd9e98 ffffffff81273574
Call Trace:
 [<ffffffff811dd333>] local_cpus_show+0x44/0x57
 [<ffffffff81273574>] dev_attr_show+0x22/0x49
 [<ffffffff810a4e8e>] ? __get_free_pages+0x9/0x46
 [<ffffffff8112fbc2>] sysfs_read_file+0xb4/0x139
 [<ffffffff810da927>] vfs_read+0xa6/0x103
 [<ffffffff810daa3a>] sys_read+0x45/0x69
 [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b

Code: e0 48 c7 c0 2b 7f 52 81 41 83 ec 20 31 db eb 60 44 89 e2 44 89 e148 63 fb 83 e1 3f c1 fa 06 41 b9 01 00 00 00 48 63 d2 44 89 ee <49> 8b14 d6 29 de 48 d3 ea 49 8d 3c 3f 44 88 c1 41 83 ec 20 49

RIP  [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
 RSP <ffff88001cbd9e18>
CR2: 00000002dcf32198
---[ end trace 5f520ed1e48e5394 ]---

During boot of dom0 i see the following when it is starting my domU(seems to be more of a warning):

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
Pid: 5861, comm: qemu-dm Not tainted 2.6.32.13m7.1 #1
Call Trace:
 [<ffffffff8106a625>] __lock_acquire+0x431/0x459
 [<ffffffff810b029d>] ? vma_prio_tree_remove+0x27/0xda
 [<ffffffff8106a6b1>] lock_acquire+0x64/0x81
 [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
 [<ffffffff813cdb70>] _spin_lock_nest_lock+0x31/0x66
 [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
 [<ffffffff813ccc0e>] ? mutex_lock_nested+0x34/0x39
 [<ffffffff810b939d>] mm_take_all_locks+0xe5/0x11c
 [<ffffffff810cbcbc>] ? do_mmu_notifier_register+0x56/0x113
 [<ffffffff810cbcc4>] do_mmu_notifier_register+0x5e/0x113
 [<ffffffff810cbd94>] mmu_notifier_register+0xe/0x10
 [<ffffffff8123acdb>] gntdev_open+0x8f/0xcc
 [<ffffffff81257dc2>] misc_open+0x188/0x21e
 [<ffffffff810dd1f6>] chrdev_open+0x164/0x185
 [<ffffffff810dd092>] ? chrdev_open+0x0/0x185
 [<ffffffff810d8bd5>] __dentry_open+0x149/0x27f
 [<ffffffff810d8dd1>] nameidata_to_filp+0x3d/0x4e
 [<ffffffff810e59ed>] do_filp_open+0x4ee/0x9e9
 [<ffffffff8100e871>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100eff2>] ? check_events+0x12/0x20
 [<ffffffff811d0637>] ? _raw_spin_unlock+0x8f/0x98
 [<ffffffff813cdb3a>] ? _spin_unlock+0x26/0x2b
 [<ffffffff810eedf2>] ? alloc_fd+0x111/0x123
 [<ffffffff810d89a3>] do_sys_open+0x5e/0x10a
 [<ffffffff810d8a78>] sys_open+0x1b/0x1d
 [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b

Probably not related, i see the following message in my dom0 from timeto time, and if it appears at the 'wrong' moment, it causes my system tobecome completely unusable as soon as a process needs disk access.


ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata4.00: BMDMA stat 0x64
ata4.00: failed command: READ DMA
ata4.00: cmd c8/00:08:99:13:5c/00:00:00:00:00/ef tag 0 dma 4096 in
         res 51/40:00:a0:13:5c/00:00:00:00:00/ef Emask 0x9 (media error)
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4.00: configured for UDMA/133
ata4.01: configured for UDMA/133
ata4: EH complete

Not sure if this is related though, it could be just a bad disk (itseems to be always related to the same disk), i'm going to replace thedisk, and see if that makes a difference.



Regards,
Mark


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] xennet: skb rides the rocket messages in domU dmesg