vanderetto,
I did extensive testing on this all weekend using around 700G of valid test data and definitely gained a bit of understanding about this, much, much more. I recently worked with a client that moved to deduplication on 2012R2. It was enabled and around 12-13 hours later, we got calls that the VM was off! As I dug into it, it was not off - it was out of space...but wait...there's more!
Not knowing what exactly happened, I added 500G of space to the LUN, and then extended the datastore, and finally got the machine back on-line. Once the server was up, I logged in and found 500G of free space on the disk!!!!! Deduplication did it's job, and found around 250G of space it could dedup, but at the cost of the thin provisioning both on the SAN and the .vmdk - basically it was useless and here is the reason why below.
Keep in mind that this is a post process deduplicaiton. Meaning, after files are at rest the dedup engine will go to work some interval later. In order for Microsoft to do this, it reads the data (files) and then moves then around into this area called the "ChunkStore". The chunkstore is located on the volume being dedup'd under a hidden system type path. And then this is where all of the writing is going on.
What I found is MS seems to be using the free space on the drive as a sort of cache spot. It writes out into the chunkstore, does the algorithm stuff, and finally dedupes the files (sorry - I know not the most technical). Well, on a thin provisioned volume, this is going to kill it and I mean eat it up alive. In my customer's case a thin provisioned .vmdk sitting on top a thin provisioned LUN (yes - there are going to be people saying thin / thin - dumb - just remember use case). What happened was it at the .vmdk space and didnt return it (looking more into a garbage collection routine) and, of course, becasue of this the thin space on the LUN was chopped up like a salad, add a little dedup style dressing and it was gone!
Anyway, I setup a lab environment on this and painstakingly worked all weekend on this and got one component done: watching how it did what it did to the thin on thin. I am going to see if I can correct this, but unless your SAN doesnt support thin reclamation (T10 - SCSI UNMAP), the only way to fix this is to move the vmdk to a new datastore. Good luck if you dont have the space, but again, I am trying to make sure of this and find work arounds.
Like you, I didnt find a ton of information out here, but I can tell you this. Make sure you read the deduplication setup stuff from MS. There are some things stated in the read about being careful about certain situations, but nothing really mentioned about thin / thin.
Unless I find more in my testing (I will post), I would highly recommend staying away from thin provsioned, MS based deduplication because the space you think you just saved is now gone on the SAN as the blocks are written on. More over, if you have a lot of thin provisioning going on and your SAN is oversubscribed, you are going to be in trouble. Yes, on some of this most of everyone out there would consider "duh - ya think", but there are good use cases on thin stuff, but you have to know the environment and how data is being used, and should thin be used in a hybrid situation? In other words, thin on some stuff - not so much on others.
Anyway Below is some data during my testing if you want to see what happened along the way, with timing and such. During this testing it was on a non-production server, in other words, no user files changing in size or any outside access. Look at the net of this however:
Space saved in windows from start to finish: 180G "freed up"
Space "lost on SAN": 432.73
Space "lost on datastore": 287G
As you can see with thin enabled, you technically have a huge loss in space vs. a gain on the windows file server. Again, this is non-scientific and I winged it to find out more - not claiming any hard facts here, but basically stating deduplication on a thin volume and without some kind of reclamation - where is the gain on this??
Some starting stats: 1TB VMDK, 1TB LUN, and 601G of user data copied to windows 2012R2 server around 420G free on datastore before I started.
- around 10 minutes into the start of depduplication, from the VMware side saw that the free space has begun to decrease, now showing 425.49G (started with 427.67G)
- Windows server reported 431G free (463130730496 bytes)
- Already windows server returning space ------ but VMware losing more space..
- 20 minute marker
- Vmware reporting 422.46G (free space)
- not a single file changed on the OS from a "user" perspective, data copy, etc, this is all the deduplication running
- 30 minute marker
Vmware reports: 411.82 (free space)
Windows reports: 433 free
- 45 minute marker
vmware reports: 407.03G (free space)
Windows reports: 434G free (still increasing)
SAN: 35.14G (space written)
- 60 minute marker
vmware reports: 398.52G (free space)
windows: 441G free
SAN: 42G Change Written
- 90 Minute Marker
vmware reports: 389.59G (free space)
windows reports: 449G free (job status of dedup: 22%)
san reports: 58.51G (space written)
- 120 minutes later
vmware: 365.94G (free space)
windows: 461G free (job status of dedup: 28%)
san: 88.6 (space written)
- 180 minutes later
vmware: 321.27 (free space)
windows: 497 free (job status of dedup: 38%)
SAN: 168G changed / written
- 240 minutes later
vmware: 282G (free space)
windows: 511G free (job status of dedup: 50%)
san: 222G (space written)
- 300 minutes
vmware: 227G (free space)
windows: 539G free (job status of dedup: 58%)
san: 296G (space written)
- 360 minutes
vmware: 188.64 (free space)
windows: 556G free (job status of dedup: 88%)
san: 358.51 (space written)
- 420 minutes
vmware: 159G (free space)
windows: 584G freeespace (job status of dedup: 93%)
san: 406.09G (space written)
- 495 minutes (completed job)
vmware: 140G (free space)
windows: 601G free space (job complete)
san: 432.73 (space written)