Hi,
We are running a two node cluster with shared storage.
Essential plus kit is the license at our disposal.
ESXI 5.5 update 1 is running on both hosts.The specific one is VMvisor-Installer-5.5.0.update01-1623387.
Shared storage is HP NAS x1600.Using it as a iSCSI.
We are using @ NIC out 6 NIC of each ESXI host to connect to storage using iSCSI.Each NIC is on different subnet and configuration is based on Port Binding.All throughout at vmware switches and uplink switches as well as Adapter connecting on storage has jumbo frame enabled.it is on 1 Gigabit network.Physical Switch is HP A5120.This is the only switch which separates vmotion,management,iscsi traffic itself.
Running around 7-8 VM on each Host with separate datastore for each VM. (I do not know if it is recommended to have multiple LUN or one/two big LUN to host all vm.I find if one DS goes bad all will be gone).
Noticed that around two weeks back suddenly few vm machines failed to respond.Logging on at the vcenter found that all vm machines which were present on one particular host were down and that particular host was alive.So thought of storage on particular host losing the connectivity (else machine would have used HA feature which very much works fine).After rescan of iscsi storage on host worked for us.
Few days later,Same happened for other host Even rescan of datastore didn't worked for us this time.So rebooted the host without any success and it kind of hung up and even failed to boot the hypervisor beyond certain point.
The other host which was live and all vm on that was working we rebooted and to our surprise it got stuck at same error.So this was where all our whole datacenter was down.
It stuck at Swapobj loaded successfully.Once we shut down the storage all ESXI host booted and finally after switching on storage andf rescanning the iscsi DS on each Host did the trick.
In nutshell,I found that there is issue with the ISCSI network.(correct me if I am wrong).
Please advise me to rectify the smae once for all.