Hi all,
Would like to get input from members of the forum on optimizing NFS performance for vSphere 5.1. Performance has been good but I haven't found a lot of shared knowledge around Oracle Sun ZFS Appliances and vSphere 5.1 and would like to share what I've tried as well as hear what others are doing. Here's what I've got so far:
Networking
- Brocade FastIron SX
- Separate VLAN for NFS traffic
- Jumbo Frames
- Flow Control
- STP disabled
- LLDP enabled (wish the vNetwork Standard Switch would support this as the vNetwork Distributed Switch does)
- CDP enabled (though the info doesn't seem to make it into vSphere despite "Both" setting - I just use this on the physical switch to confirm vSS configuration for the ESXi hosts since the info doesn't show up on the vSphere side)
- 10 GbE ports for NFS, 1 GbE for storage appliance management
- vCenter Server 5.1b (947673), ESXi 5.1 (914609), vCloud Director 5.1.1.868405
- vSphere Enterprise Plus licensing
- vNetwork Distributed Switch 5.1 with 2 x 1 GbE uplinks per ESXi host configured for LACP Passive on vDS and Active on physical switch
- Storage vDS is dedicated only for vmkernel traffic (VM networks are on separate vDS)
- Each ESXi 5.1 host is configured to use the Storage vDS for vmkernel
- All virtual machines and templates use 1 MB partition alignment offset
- Windows virtual machines have 4k NTFS clusters
- Mixed workloads - there are a few virtual machines with Microsoft SQL Server (extents are 8 x 8k) but the databases are not heavily utilized (more of a dev environment). Most virtual machines are J2EE app servers and client workstations used for testing.
- Storage I/O Control enabled on each NFS-based Datastore
- Direct I/O and Network I/O Control not possible with my ESXi hosts's physical adapters
- Advanced Settings for each ESXi host: I don't really have more than 8 volumes so I didn't make any tweaks to NFS.MaxVolumes or to Net.TcpipHeapSize or Net.TcpipHeapMax.
Storage
- Oracle Sun ZFS Storage Appliance 7320
- Two head units, clustered
- Each head unit has 2 x 10 GbE ports though not LACPd because each head unit needs one active and the other for failover
- Management of each head unit is through 1 x GbE ports on unit
- Two disk shelves - each have 2 x SSD log devices, one with 20 x 600 GB 15K SAS drives and the other with 20 x 3 TB SAS drives
- One pool for each disk shelf
- Each pool is using Mirrored data profile
- Each head unit is active for one pool (active/active)
- Update access time on read: unchecked
- Non-blocking mandatory locking: unchecked
- Data deduplication: unchecked
- Data compression: LZJB (fastest)
- Cache device usage: All data and meta-data
- Synchronous write bias: Latency
- Database record size: 128k
- NFSv3 is the only protocol in use (no other traffic to appliances other than management)
- Each NFS share is accessed by a different IP address to promote load sharing on the vSphere side
- If I were to use iSCSI in the future, I would probably go with 8k as the Volume block size
I'd love to see Oracle add VAAI in the future as Nexenta has done. Not sure if that's on their road map.
Anyone doing anything differently or have thoughts on how to optimize Oracle Sun ZFS Appliance performance in vSphere environments? I've read all the best practice docs from Oracle but they're pretty dated (still focused on vSphere 4.x) and not as thorough as what you see from NetApp.
Thanks,
Nate