VMware Dead Space Reclamation (UNMAP) and Pure Storage

One of the main things I have been doing in my first few weeks at Pure Storage (which has been nothing but awesome so far by the way) is going through all of our VMware best practices and integration points. Testing them, seeing how they work or can they be improved etc. The latest thing I looked into was Dead Space Reclamation (which from here on out I will just refer to as UNMAP) with the Pure Storage FlashArray and specifically ESXi 5.5. This is a pretty straight forward process but I did find something interesting that is worth noting.

405 front

For those unfamiliar with the UNMAP process in ESXi 5.5 it was somewhat enhanced in how it is performed on the ESXi side of things as compared to earlier releases of ESXi 5.x. Prior to 5.5 it was a vmkfstools CLI option and due to the way it was executed certain storage arrays could get overwhelmed by heavy UNMAP operations causing performance degradation to concurrent workloads. This led to a VMware recommendation to run it sparingly or only during non-peak workload time periods. The other issue was that depending on how much space to indicated it to reclaim it was possibly to run into temporary out-of-space conditions when the UNMAP balloon file was created. Furthermore due to file size limits and other reasons very large datastores could not be fully reclaimed.

VMware recognized these issues and resolved them in 5.5. First off, instead of vmkfstools UNMAP is now built into the esxcli command. Most importantly though, instead of requiring gigantic balloon files to be created and reclaimed in one sweep it is now an iterative process. Going through segments one at a time and reclaiming them and then moving onto the next segment until all of the “dead” space is reclaimed. This alleviated the performance impact as well as removed the out-of-space condition threat present in the previous UNMAP mechanism.

UNMAP with the Pure Storage FlashArray is a very quick and low-impact operation so it has never suffered from the performance problems that plagued some arrays. The FlashArray simply just removes the pointers to the data on the array that is being reclaimed and moves on. Therefore the ESXi can throw a lot of simultaneous UNMAP calls to the array and it isn’t an issue–the more it can send at once is better.

On to my point… The only real option to change how UNMAP with esxcli behaves is to indicate the iteration size. By default (if not indicated by the user) esxcli will UNMAP in 200 blocks at a time or specifically 200 MB in the majority of situations. After reclaiming 200 MB it will move on to the next 200 MB and so on. This number can be increased and decreased as needed–I do not know if there is an upper limit to this–I haven’t been able to find information on that, so if you know please share! Anyways, this was a configuration variable I wanted to test on the FlashArray–in my previous job I tested different numbers on the VMAX and didn’t see any noticeable difference in time to UNMAP when altered. So the recommendation was to just use the default.

I wanted to be able to make a recommendation for the FlashArray as well whether that be the default or something else. Due to the architecture of the array I suspected it might be noticeable since it handles UNMAP so efficiently. Turns out I was correct and the difference was somewhat profound–the FlashArray handled larger values like a champ! Let’s look at the results.

esxcli storage vmfs unmap -l Pure_Datastore -n 8000

***See how to do it in PowerCLI here***

I ran UNMAP tests to a 1 TB FlashArray volume using block counts of 200, 400, 800 and 8000. I timed the operation and watched the esxtop data to see the number of UNMAP commands going out and to see the effective UNMAP throughput. Note that the throughput is a bit of a misnomer in this case because it is not actually reading or writing that much at a time, it just reflects on how much space ESXi is reclaiming per second. Below are three charts showing these values. Block counts on the x axis for all three.

totalunmapcmds unmapduration unmapthroughput


As you can note the duration and number of total commands are both asymptotic (they continue to decrease as you up the block count per iteration but it will never hit zero) and the block count has a dramatic effect on both numbers. As a silly aside–reminds me of that math joke–an infinite number of mathematicians enter a bar and the first one orders a beer, the second one half of a beer, the third a fourth, the fourth an eighth and so on. The bartender just says “screw this” and pours them two full beers. But I digress…

Another point to notice is that the number of UNMAP commands that are being issued fall–and this is a big reason as to why the duration falls so much–more capacity is being reclaimed per UNMAP command so there are less commands to be processed between the ESXi host and the PureArray–so both have to do less work.

The effective throughput just keeps going up as well–but as I mentioned before this is a logical throughput–not a factor of the SAN pipes. Also note that since the duration was so low there were not a lot of samples to make the average for the 8000 run, but over the many runs I did on each that was consistent and if you do the math it is just about right what you would expect.

By upping the block count from 200 to 8000 the process took only about 8 seconds instead of almost two minutes. There were no failures in any case and everything was always unmapped and was reflected on the FlashArray as de-allocated immediately. Basically when it comes to UNMAP the FlashArray can take whatever ESXi can throw at it–so feel free to use a larger block count to save you some time!


17 thoughts on “VMware Dead Space Reclamation (UNMAP) and Pure Storage”

  1. Well it depended. During UNMAP any I/O to a different volume on the same host, or the same OR different volume on another host was essentially unaffected (I didnt see anything statistically significant). If workloads were running simultaneously on the same volume on the same ESXi host as where UNMAP was initiated from I did see some interference to the workload. I exchanged some emails with VMFS engineers and it seems due to semaphore locking on the VMFS UNMAP can take precedence essentially for that particular volume on that host and can lead to reduced performance for the workload. To reduce the impact use a larger block count to greatly reduce the time of the UNMAP process or to eliminate any impact you can run UNMAP from a host that isn’t currently directly using that volume and use larger block counts (this is what i recommend).

  2. Cody,

    Is there a way to automate this unmap process? We have 6 clusters of VDI, all with 3 datastores each from Pure (18 total), and it gets tedious doing 1 after the other. I’m sure more people have a lot more clusters/stores than us and I’m being a baby, but, it is what it is. 🙂

    Thanks and great blog.

    1. The video posted in your link got my wheels turning some, Cody. If I run UNMAP in my environme nt, and, say a week later, I see my usage % in my Pure console go back up to what it was before the unmap (without any adds/changes to the VDI environment), would you say that’s odd? Because that’s what we see. I’ll run these unmaps, get say 10% back on both my controllers, then in a week they’ll go back up. Again, this is without growing the environment really.

  3. I’d say it would have to be writes inside of guests–writes to new sections of a virtual disk will cause further allocation on the array without any noticed growth on the VMFS. This is true regardless of virtual disk type. If it is outside of a guest allocations it could possibly be from swap files being deleted/recreated as VMs reboot etc. Just initial guesses though

  4. I don’t understand the methodology, or maybe you skipped a step… If you run an UNMAP once, doesn’t it actually free up the space in the LUN? So when you did your iterative testing, doesn’t each subsequent run have less to clear, and thus would run faster, no matter what parameter you used? Or did you restore the LUN content each time from a clone (or otherwise recreate the LUN contents) so that your starting point was always the same?

    1. I refilled the datastore between each iteration with the same VMs (PowerCLI script to clone them over to it). I probably should have mentioned this. Sorry for the confusion.

  5. The value you use and the time required will definitely depend on the array and the size of the LUN/volume.

    I’m in the process of breaking out my two large 32 TB volumes into individual volumes; one per VM.

    I had ~140 VMs spread across two large 32 TB volumes (64 TB total) and it was working just fine. I had no errors in esxtop and the pending I/O was never abnormal (below 10 or less at all times).

    I’m breaking them out just to improve visibility. I have a Dell Compellent SC4020 and I utilize Dell’s Co-Pilot Optimize service. So they send me reports and in those reports I can see details information for each LUN/volume. So by breaking them out I can see what each VM is up to, as well as control replication at a granular level. I do run vRealize but I prefer raw data from the storage and vCenter/ESXi to be honest. We bought vRealize for the cute pie charts and graphs that others may need or want.

    With the Compellent, you can run the esxcli storage vmfs unmap command and that won’t immediately free anything. It will simply tell the array what’s not in use and during the data progression operation those blocks will be unmapped at a low priority.

    The SC4020 I’m running has 35 TB of all flash and if I use a value of -n 8000, the array pushes 14.5 GB/s.

    If I take the default value of 200 by not specifying anything, it can take 8 hours to run on a large 32 TB LUN/volume.

    So right now, I’m having to run those commands on the two large volumes while I slowly move my VMs onto their own dedicated LUN/volumes.

    I’ve also noticed applications such as CommVault that rely on taking snapshots can cause LUN/volume usage to grow until this command is run. So right now, I’m in the process of asking CommVault about Intellisnap vs the VM backup method of taking snaps at the VM level. I would imagine that if the snap is taken and deleted on the array, you wouldn’t end up with mapped non-existent data on the array. That’s my theory based on logic. I’ll have to inquire and test to confirm.

    Dell recommends thick lazy on their thin array, which means VMFS3.EnableBlockDelete can’t be used. However, I’ve confirmed with RAXCO, PerfectStorage 3.0, the array supports zero-fill space recovery. Setting up this software and pushing out agents was very clean/easy. The only thing I’ve found wrong with their software is that it has issues if the service account is longer than a pre-2000 legacy format.

    It’s unfortunate this UNMAP issue is still an issue. Just because the array supports it, doesn’t mean those commands are being issued. This really is an issue with the guest OS IMO. That’s half the issue anyway. The other half being that even if you could force a thick provisioned Windows guest to send an UNMAP command to ESXi to relay to the storage, ESXi would discard/ignore the command. This is because ESXi will not pass that UNMAP command for a thick provisioned VM.

    That means even though PerfectStorage can have the guest OS send an UNMAP command, the host will reject it. I’m in the process of testing out zero-fill, which is technically supported by Compellent.

    Sorry for the long post! 🙂

  6. Hi Cody,

    I am planning to regain dead spaces in the Pure storage array datastores using your script and also used your script to report the deadspaces and it worked well.

    In our client environment, the datastores which has deadspaces are presented to the esxi 5.1 and esxi 6.0 hosts managed by different vcenters.

    Though the datastores are presented to esxi 6.0 hosts (managed by vcenter1), the vm’s residing on those datastores are running on the esxi 5.1 hosts (managed by vcenter2). Is it safe to run the unmap operation on those datastores from 6.1 hosts as unmap is supported only from 5.5 and above.

    Please advice.

    1. Yeah that should be completely fine. Though I would look at getting off 5.1 soon as VMware has made it end of normal support life, so if something goes wrong they will only give guidance, not provide fixes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.