Let's see...the SAN has two controllers with a 4GB cache in each controller.  Each controller has a single 4 x 2Gb FC controller.  Two of those ports go to the switch; the other two create redundant loops with the disk array (going from the controller to one disk array, then to the next disk array, then to the second controler).  The disks are FCATA disks, there are 30 active disks (with 2 hot-spares).  The SAN does RAIDs across the disks on a per-volume basis, and my e-mail volume is using a RAID10 configuration.
I've done most of the filesystem tuning I can without completely rebuilding the filesystem - atime is turned off.  I've also adjusted the elevator per previous suggestions and played with some of the tuning parameters for the elevators.  I haven't got around to trying something other than XFS, yet - it's going to take a while to sync over stuff from the existing FS to an EXT3 or something similar.  I'm also contacting the SAN vendor to get their help in the situation.

On 2009/08/27 at 08:15, John Madden <jmadden@xxxxxxxxxxx> wrote:

> I'm not really sure that bandwidth is an issue - perhaps latency more
> than that.  I don't think the amount of data is what's causing the
> problem; rather the number of transactions that the e-mail system is
> trying to do on the volume.  The file sizes are actually pretty small
> - 1 to 4 Kb on average, so I think it's the large number of these
> files that it has to try to read rather than streaming a large amount
> of data.  Both the SAN and the iostat output on both dom0 and domU
> indicate somewhere between 5000 and 20000 kB/s read rates - that's
> somewhere around 40Mb/s to 160Mb/s, which is well within the
> capability of the FC connection.  The SAN is indicating I/O operations
> between 500 and 1500 I/O requests per second, which I assume is what's
> causing the problem.

What's the backend inside the SAN look like?  Look into amount of cache,
number of spindles, RAID used, what else is using those spindles, etc.

500-1500 iops isn't a lot for a "SAN" in general, but given that your FC
disks are going to get around 200 worst-case iops, you'd still need
quite a few of them to push 1500 continuously (with your cache picking
up some of the spikes).  And that depends on workload (read/write,
random or not, block size) and RAID type.

In case you haven't already, I'd look into the usual filesystem
performance guides and do things like turning off atime and that lot.
My feeling on this is that you're going to need to drive down those iops

What were your results on trying something other than xfs?


