In case anyone has the same setup and is interested, the conclusions we came to were:
Disabling Delayed Ack:
We're leaving it enabled. Disabling it didn't seem to make a significant change to performance. I'd read that a reboot was needed after disabling it so we did that. KB 1002598 and http://www.dell.com/support/troubleshooting/us/en/19/KCS/KcsArticles/ArticleView?
c=us&l=en&s=dhs&docid=599631 say it may need disabling if there's network congestion but we didn't notice any problem when running perf tests on both our hosts simultaneously which is probably about as busy as they'll get.
IOs per path:
We tested 1, 3, 7, 8, 9, 20, 100 IOs, and 8800 bytes. They were all similar for random reads and writes with small IO sizes and numbers of outstanding IOs. We thought this was because the bottleneck was the disk performance. With sequential, and random with large IO sizes and outstanding IOs, the max speed varied from about 130 MB/s to 229 MB/s for reads, and about 3/4 of that for writes (I don't have the numbers in front of me). The fastest were 8 IOs and 8800 bytes, with nothing to choose between them. We went for 8. The disk usage of our VMs will be varied but mostly random, so I'm not sure what difference it's going to make really. And there's also depping's point that there might not be any difference in real usage anyway, above.
Large Receive Offload:
We didn't bother testing it, wasn't widely mentioned on forums, and we couldn't see how 2 x 1gig links were going to get much faster than we'd got.
So the changes/decisions we made were:
- used the max number of disks possible for the storage group
- enabled jumbo frames, set the mtu on the port group and vswitch, don't do it on just the port group like me and wonder why performance is identical!
- didn't disable Delayed Ack or Large Receive Offload
- set the default PSP for VMW_SATP_ALUA to VMW_PSP_RR and made sure the LUNs were using round robin
- set the IOs per path to 8
- didn't use iscsi port binding which the Dell doc above incorrectly recommends