This is certainly not my first post about UNMAP and I am pretty sure it will not be my last, but I think this is one of the more interesting updates of late. vSphere 6.0 has a new feature that supports the ability for direct UNMAP operations from inside a virtual machine issued from a Guest OS. Importantly this is now supported using a virtual disk instead of the traditional requirement of a raw device mapping.
First, let’s quickly recap the history of UNMAP and VMware:
- Automatic VMFS UNMAP support was introduced in ESXi 5.0. This means that whenever an entire virtual disk was deleted the target space on the VMFS was reclaimed automatically by the delete operation. Also after a successful Storage vMotion the source VMFS space was reclaimed as part of the SvMotion cleanup.
- Shortly after 5.0 came out, problems with automatic UNMAP were abound. Storage vMotions timed out because the UNMAP process took a very long time on certain arrays. The heavy I/O workload caused by the UNMAP process overwhelmed certain arrays on their front end and back end causing poor performance for VMs on the corresponding datastore or often any workload on the shared array.
- In 5.0 Patch 2, VMware disabled the UNMAP primitive by default. The option /VMFS3/EnableBlockDelete was changed from 1 to 0. You could re-enable this if you wanted to.
- In 5.0 Update 1, VMware re-introduced UNMAP but as a CLI-only operation, by providing firm support of running UNMAP via the CLI command “vmkfstools -y” Also, the option /VMFS3/EnableBlockDelete was completely disabled–even if you enabled it, it did nothing. Completely defunct.
- The performance problems with UNMAP were not mitigated though, so VMware and many storage vendors, required the use of UNMAP during maintenance periods only. Furthermore, the UNMAP process was not particularly flexible. You could specify a percentage of free space to be reclaimed (1-99%). If you specified a high percentage, you would reclaim more space, but you risked temporarily filling up the datastore with the balloon file that was created during the UNMAP process, which introduced its own risks. If you used a low percentage you didn’t risk space exhaustion, but you didn’t reclaim a lot of space. Also it didn’t work well with large datastores.
- In vSphere 5.5 the vmkftools method was retired and an enhanced version of UNMAP was introduced into esxcli (could still be done with vmkfstools, but it used the new functionality only). The new process allowed for an iterative UNMAP that by default reclaimed 200 MB at a time–so space exhaustion due to a balloon file was not an issue. Furthermore–it always reclaimed all of the space, so it was much more efficient. The underlying UNMAP SCSI command was also improved as to what ESXi could leverage. ESXi now supported 100 block descriptors per UNMAP command instead of 1–possibly making the process faster, or at least more efficient (assuming the underlying storage supported this, as identified by inquiring the Block Limits VPD page B0). Also since it was in esxcli, PowerCLI could be used to script this process very easily.
So this is where we were in vSphere 5.5 for UNMAP outside of a guest (on VMFS dead space).
Let’s get on with it
But what about dead space INSIDE of a VM? As in deleting files from within a guest on a filesystem on a virtual disk.
Prior to vSphere 6, if you wanted to reclaim space inside of a virtual disk it was a semi-arduous process. With the exception of the SE Sparse disk (which is only supported in VMware View today) the only decent option to reclaim space was through the use of some zeroing tool inside the guest, like sDelete in Windows. This was unfortunate because so many Guest OSes actually support issuing UNMAP themselves. Since ESXi virtualized the SCSI layer, even if OSes attempted to send UNMAP down to the virtual disk it would not do anything and would not make it to the array. This zeroing behavior caused unnecessary I/O and inflation of virtual disks if they were thin. This is no longer so in vSphere 6.0.
Remember, just a bit ago I mentioned the option /VMFS3/EnableBlockDelete? Prior to vSphere 6.0, if you looked at this option and read the description it said, “Enable VMFS block delete.” If you now look at it in vSphere 6, it now looks like:
Path: /VMFS3/EnableBlockDelete Type: integer Int Value: 0 Default Int Value: 0 Min Value: 0 Max Value: 1 String Value: Default String Value: Valid Characters: Description: Enable VMFS block delete when UNMAP is issued from guest OS
Interesting! The description has changed! You may also note it is still disabled by default (default value is 0).
If you enable this option, ESXi will now permit guest OSes that issue UNMAPs to a virtual disk to be translated down to the array so that the space can be reclaimed. Furthermore, the virtual disk will be shrunk down by the amount of space reclaimed if the virtual disk is thin. From my testing it seems that only a thin virtual disk supports this guest UNMAP functionality–but I am not sure if that is a VMware restriction or a guest OS restriction just not wanting to UNMAP virtual disks that are “thick,” it cannot tell the underlying actual storage is thin. Definitely need to do some research here. So for the purposes of this post I am going to assume that thin virtual disks are required. Which brings us to the requirements…
Please note!! The following list is NOT FROM VMWARE! This is my observation–I am in the middle of vetting out the official behavior, requirements, architecture with VMware now. So refer to this post as a “hey, look what I found!” anecdotal post. Look for an upcoming post with a lot more details and hopefully official information.
Anecdotal, seemingly required, requirements:
- Thin virtual disks
- VM hardware version 11
- ESXi 6.0
- EnableBlockDelete set to 1
- Guest OS support of UNMAP
- In 6.0 CBT must be turned off. In 6.5, it can be enabled as this is fixed in ESXi 6.5!
I tested with two different operating systems. RHEL 6.5 and Windows 2012 R2.
A quick note…
Before I continue on, let’s be clear about what the EnableBlockDelete option actually does. It does NOT enable Guest OSes to execute UNMAP–this is always enabled by vSphere 6. If this is not enabled and UNMAP is run in a guest OS it will work and the virtual disk will be shrunk. So read the description again:
“Enable VMFS block delete when UNMAP is issued from guest OS”
What this option does is not allow UNMAP in the guest, but it allows ESXi to recognize the virtual disk was shrunk by UNMAP and it then issues UNMAP to the underlying storage. So essentially if you want the guest to UNMAP the space the whole way down to the array–enable this option. When it is enabled, you will see in esxtop the VAAI Delete counter increment–this will show you that something happened, besides of course the space being reclaimed on the array.
So I didn’t really have any luck getting this to work on Linux. I tried the normal things that work great on RDMs.
UPDATE: So after I wrote this but before I published, after some email exchanges I was pointed to this KB article by Cormac Hogan that explains why Linux doesn’t work
UPDATE 2: Linux now works with ESXi 6.5!!
Linux 6.x+ with the ext4 filesystem (I haven’t looked into XFS yet) offer two options (that I am aware of) for reclaiming space.
- Mount the filesystem with the discard option: mount -t ext4 -o discard /dev/sdc /mnt/UNMAP This will make Linux automatically issue UNMAP when files are deleted from the filesystem.
- Use the command fstrim. Fstrim is a command that will reclaim deadspace across a directory or entire filesystem on demand–this does not require the discard option to be set, but is compatible with filesystems that do have it enabled.
Both of these options work with RDMs great. I could not get either to work with virtual disks though. After some reading, I suspect RHEL is using TRIM not UNMAP (the name fstrim seems to be a blatant hint 🙂 ), and I am guessing that TRIM may not be supported by this new feature. RHEL literature references UNMAP quite a bit, but maybe it is just being used in a colloquial sense to refer to space reclamation?
In short, the SPC version in ESXi VMDKs is too old. So hopefully this will be fixed in the future.
Windows Server 2012 R2
Okay more in my wheelhouse! Yes…I am a Windows guy, deal with it.
Windows I could get to work, mostly.
UPDATE 1: This also works with automatic UNMAP but you need the allocation unit to be 32 K or 64 K for NTFS. See this post Allocation Unit Size and Automatic Windows In-Guest UNMAP on VMware
UPDATE 2: As of ESXi 6.5 P1 you no longer even need to write allocation unit. See this post: In-Guest UNMAP Fix in ESXi 6.5 Part I: Windows
Windows 2012 R2 and Windows 8, introduced UNMAP support to reclaim space from NTFS volumes. Hyper-V has been able to UNMAP through guests for awhile because of this. In addition to UNMAP support, Windows redesigned their “defrag” utility to be smarter about SSDs and thinly-provisioned volumes. As you might be aware, defrag operations on an SSD (and often just in general on a VM) is useless and possibly deleterious (unnecessary write amplification, space bloat etc.).
Defrag is now a utility called “Optimize Drives.”
The utility allows you to schedule operations on a given volume, and it will intelligently decide what type of operation to do depending on the device type. Something like defrag for HDDs, TRIM for SSDs (I believe), and UNMAP for thinly-provisioned volumes.
As you can see, some of the drives are recognized as SSDs and others as thin provisioned volumes. Accordingly, thin virtual disks show up as “thin provisioned” and eagerzeroedthick/zeroedthick virtual disks show up as “Solid state drive.” So, let’s run through an example.
First, ensure EnableBlockDelete is enabled:
Then, let’s add a thin virtual disk to my Windows 2012 R2 virtual machine:
We can take a look at the VMDK size on the datastore and it is currently 0 KB:
The array reports the volume as having 443 MB written after reduction–the base VM is on the same datastore:
Now format the volume as NTFS and then copy some files. I am copying over a bunch of vSphere 5.5 ISOs.
Let’s review the size of the virtual disk now (10 GB) and the datastore reported raw used:
I realize that I don’t need these ISOs anymore (because I am running vSphere 6.0, why do I need 5.5?!) and delete them. Make sure to delete them permanently (shift+delete) OR delete then empty the recycling bin (very important).
Normally, the delete operation will reclaim the space automatically–but this behavior doesn’t seem to work with a virtual disk. So launch the Optimize Drives utility. Click the volume and then choose “Optimize”
The process is pretty quick and will report as OK (100% space efficiency) when done.
Now if we look at the VMDK size it is back to a small number–80 MB. Not the whole way back to zero, but much closer than 10 GB.
From the FlashArray GUI we see the datastore is essentially back to the 433 MB that was there before the copy–it is now 435 MB.
Final thoughts and next steps…
So really for me this has introduced far more questions than answers, but it is a great first step. I plan on doing a lot more digging in the near term so look for more posts shortly. Some of my questions:
UPDATE: Answers from VMware here!
- Does VMware actually support this? Not documented anywhere that I could find yet
- Will it only work with UNMAP? How about TRIM or WRITE SAME with the UNMAP bit set?
- Does it even care that UNMAP was run? Is there direct SCSI integration between the guest and ESXi? Or was it just the shrinking of the VMDK that matters–ESXi sees a VMDK shrink operation occurred and issues its own UNMAP (this is my strong suspicion at this point)?
- If that is the case, is that why thick virtual disks will not work? I think Windows won’t do it because it doesn’t see it as thin provisioned. If what I suspect is the case though, even if it could force Windows or some other OS to issue UNMAP to those types of disks it will never make to the underlying storage.
- Who Framed Roger Rabbit? Never saw the movie.
- When does Windows use TRIM and when UNMAP? Same for Linux?
- How does the optimize drive work (when and what)?
- Why doesn’t auto reclaim with delete not work but the optimize and the defrag CLI work? Maybe a difference of trim vs. UNMAP?
- Is it based on this fling?
- What other OSes work?
- Nested ESXi?
That and plenty more! Nothing but greenfield testing to do here! Stay tuned…