I've also posted this on my blog, VirtualLifestyle.nl.
I've been having a lot of latency issues lately with two Dell PowerEdge R310s. These 1U boxes have a low end controller, an PERC H200. I've been having latency spikes in the range of 500-600ms, which is high enough to have the Linux VMs remount their filesystems in read-only mode continuously. This basically happens any time any of the VMs does moderate (say, 25+) iops, and causes the controller to lock up and take down multiple other VMs along the way. It also happens during any operation on the controller itself, like formatting a disk with VMFS, creating a snapshot, consolidating a disk or removing a snapshot.
As you can imagine, performance was abysmal, and something needed to be done. I've been monitoring guests to see if something inside the Guest OS caused the controller to skid out of control (I even down- and upgraded Linux kernels and changed guest filesystems), I have tried different mpt2sas driver versions, different advanced storage settings and many other things, but in the end, nothing really helped.
Until I spotted this post on the Dell Community by a user called 'damirc':
Well, found my solution.
It seems that the LSI SMI-S provider (the health provider for the vSphere console) is not too comfortable with Dell PERC H200 (or LSI 9211-8i) and seriously slows down disk i/o.
Worth a try, right? I removed the vib ('lsiprovider') and rebooted the host. And hey presto, I could easily push the SSD and H200 controller north of 4.000 iops with sub 10ms latency without any issue, which is pretty good in my view, and it certainly is a substantial improvement from the latency spikes and horribly low iops before. After a couple of hours of testing and monitoring, the previously mentioned issues seem to have completely disappeared by removing the SMI-S provider.
Now, I'm very curious if others have similar experiences with using the LSI SMI-S provider in conjunction with a Dell PERC H200 or H310? I can't find any confirmed cases (only some unconfirmed cases: HP EVA SMI-S Provider Collection Latency Issue and Dell R210 II alternative SATA/SAS RAID controllers?.
I've filed a support request with LSI (P00099195) to find out if it's a confirmed bug? Does this apply to a specific vib version, a specific controller (OEM version, firmware version), or anything else?
I will keep monitoring the issue and doing some more testing on a spare host that still has the issue to see if I can narrow it down. I'll post an update here if appropriate.