Quantcast
Channel: VMware Communities: Message List - vSphere™ Storage
Viewing all articles
Browse latest Browse all 5826

VMware, Jbod and Big Data

$
0
0

I see a lot of recommendations concerning using JBOD for big data appliations like Hadoop.  The reasoning I hear around it is "you don't need raid 5 because

1. "High availability is provided by the application so we don't need the protection of a RAID 5,6 or 10 array"

and

2.  "Performance is better with JBOD on SATA"

3.  "cost is cheaper"

 

So far these points have not made sense to me, because:

1.  for point one, just because HA is provided by the application, for example, hadoop can tolerate the loss of a singe vm with no problem.  I don't see why I want to make Hadoop do that if I don't need to?  Just because High availabilty is provided at the application layer, why do I need to go and take it away at the storage layer?  If I can lose a single RAID 5 hard drive without losing a VM, why do I need to take that away and use JBOD just because the application has its own HA? 

2. for point two,  It seems to me you can get the performance you want with RAID 5 if the rest of the storage design allows for it.  For example, I can add more cache to the Array, I can use SSD instead of SAS / SATA.  And the solution I'm seeing proposed is JBOD on SATA. Why does performance force me to use JBOD on SATA?

3. For point 3, If cost is the constraint, then you have to obviously design around it. But it seems like I need to calculate and compare the cost before I say cost is a constraint.  Has everyone in the industry concluded that big data like hadoop on EMC is just too expensive?  Has anyone found that the cost is NOT prohibitive to using a normal Fibre Channel EMC array for example?

 

Please forgive my ignorance and correct my erroneous thinking....


Viewing all articles
Browse latest Browse all 5826

Trending Articles