What version of ESXi are you running?
It sounds like your storage system IS the problem.
Eventually, ESXi will consider a path suspect if it does not get the response time that it expects. Different versions of ESXi handle this differently. I have found that ESXi 4.1 does not handle these issues as gracefully as 5.x, I have hosts of both versions connected to the same datastores, and while my 4.x hosts tend to log "lost access to volume due to connectivity issues" during very high latency, the 5.x hosts generally lose the connection much less frequently, but log a bunch of latency warnings.
Take a closer look at your storage system during the time your snapshots are being removed and look for CPU and/or cache saturation, or disks that are maxxed out and can't handle the load being thrown at them.