[Xen-users] vanilla linux and jumbo frames ? (for AOE)


I seem to have a problem getting packets >= 4096 bytes (jumbo
frames) through to a current vanilla linux kernel. This seems to
be important for AoE performance.

I've been running AOE for quite a while with XEN, but write performancesucks, so I'm trying to get jumbo frames running... AFAIK, the old centoslinux kernels don't support jumbo frames in the AOE initiator so I rebuiltone of my 64 bit machines as a 32 bit machine as 32 on 64 doesn't seem towork at least with the centos 5.1 version of xen.

Anyhow, I ran into an issue. I can configure nice big MTUs ... and Ilearned to do it on all the interfaces on the bridge (will move it toanother bridge later) so as to get the bridge MTU to show the desiredtarget MTU (it seems to be the set to the lowest MTU of any of thedevices)... and with some poking around it is working, I can see largeicmp packets running back and forth...

HOWEVER, I can not get ping packets larger than 4052 bytes to work. I getan error in the domU:


   net eth0: rx->offset: 0, size: 4294967295

ping -s 4052 works
ping -s 4053 does not


Looking for the source of that message it seems to be this line
in drivers/net/xen-netfront.c ... (from the vanilla 2.6.24.3
kernel)

    if (unlikely(rx->status < 0 ||
                 rx->offset + rx->status > PAGE_SIZE)) {
            if (net_ratelimit())
                    dev_warn(dev, "rx->offset: %x, size: %u\n",
                             rx->offset, rx->status);
            xennet_move_rx_slot(np, skb, ref);
            err = -EINVAL;
            goto next;
    }

This seems to suggest that this version of netfront can't handle
a packet bigger than 4096 bytes :(

ethernet overhead is 14?
   gotta be at least 12 bytes + the type field (2?)
IP overhead is: 20 or 24 bytes
ICMP overhead is: 8 bytes

so that's 42 at a minimum.
42+ 4052 = 4094   ... that's pretty darn close to 4096

this packet is received:

   19:16:22.935917 00:30:48:78:b2:3a > 00:16:3e:46:a3:d5, ethertype
   IPv4 (0x0800), length 4094: (tos 0x0, ttl  64, id 7342, offset 0,
   flags [none], proto: ICMP (1), length: 4080) X >
   Y: ICMP echo reply, id 54824, seq 1, length 4060

this is apparently discarded:

   19:16:43.677814 00:30:48:78:b2:3a > 00:16:3e:46:a3:d5, ethertype
   IPv4 (0x0800), length 4095: (tos 0x0, ttl  64, id 7343, offset 0,
   flags [none], proto: ICMP (1), length: 4081) X >
   Y: ICMP echo reply, id 61224, seq 1, length 4061

it gets all the way back to the domU and then that check in
xen-netfront.c seems to throw it out :(

3072 byte AOE packets are good enough to speed up read I/O, but I
think AoE desparately needs to write a page at a time, as all my
stats programs (including the standard vmstat) imply that there
is a huge amount of READ I/O happening on the target (vblade or
qaoed) when I try to WRITE out a big file. I'm thinking this is a
consequence of writing 3k to a 4k block... but it is just a
guess. I do know that AoE is supposed to have reasonable
performance when jumbo packets are working... with 3072 byte
packets my read performance is more than 4 times faster than the
write performance... but if every write() requires a read, then
that isn't suprising.

Any thoughts folks? Sorry, This post is definately too long.

I suppose I can try booting the centos kernel and using an
"aftermarket" aoe module.... but that is _not_ the solution I'd
like to use. I'd rather use the vanilla kernel and compile it
myself. Amongst other reasons that means I don't need modules and
can use a generic initrd.

-Tom

p.s.

(after some compilation, probing, messing around with the MTU's
on the various MTUs...) yup, using the centos kernel with an

"aftermarket" aoe module_does_ work... the V59 aoemodule gives me real feedback on loading:


   aoe: e2.2: setting 7680 byte data frames

and voila, write I/O is just writes!

   time ( dd if=/dev/zero of=/mnt/BIG2 bs=4096 count=262144; sync)
   real    0m14.079s

hhmm, that seems fishy, that's a tich faster than the native
drive speed (should be 60 MByte/s)... still 14 seconds is a ___
of a lot better than the 82 seconds I was seeing with a 1024
block size... and I suspect that the time difference is simply
that the target hasn't flushed everything to disk yet (vblade

doesn't have logic to run in O_SYNC, qaoed does, but using itseems to cut performance back down to about 18 MByte/s ).


Reads weren't bad (45 MByte/s vs 70, but writes at 12 vs 60 were
pretty sad).

OK, so that's a the centos/RH 5.1 kernel, with a custom aoe
module, wonder if I can get that into the initrd so I can boot
over AOE.

While I'm here, does anyone have the udev rules that create the
extra /dev/etherd files like rediscover, err, revalidate etc ?

centos 5.1 doesn't like the one from the 2.6.24 kernel :(


-Tom



----------------------------------------------------------------------
tbrown@xxxxxxxxxxxxx   | How often I found where I should be going
http://BareMetal.com/  | only by setting out for somewhere else.
web hosting since '95  | -- R. Buckminster Fuller


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

[Xen-users] vanilla linux and jumbo frames ? (for AOE)