Direct Guest OS UNMAP in vSphere 6.0

This is certainly not my first post about UNMAP and I am pretty sure it will not be my last, but I think this is one of the more interesting updates of late. vSphere 6.0 has a new feature that supports the ability for direct UNMAP operations from inside a virtual machine issued from a Guest OS. Importantly this is now supported using a virtual disk instead of the traditional requirement of a raw device mapping.

First, let’s quickly recap the history of UNMAP and VMware:

  1. Automatic VMFS UNMAP support was introduced in ESXi 5.0. This means that whenever an entire virtual disk was deleted the target space on the VMFS was reclaimed automatically by the delete operation. Also after a successful Storage vMotion the source VMFS space was reclaimed as part of the SvMotion cleanup.
  2. Shortly after 5.0 came out, problems with automatic UNMAP were abound. Storage vMotions timed out because the UNMAP process took a very long time on certain arrays. The heavy I/O workload caused by the UNMAP process overwhelmed certain arrays on their front end and back end causing poor performance for VMs on the corresponding datastore or often any workload on the shared array.
  3. In 5.0 Patch 2, VMware disabled the UNMAP primitive by default. The option /VMFS3/EnableBlockDelete was changed from 1 to 0. You could re-enable this if you wanted to.
  4. In 5.0 Update 1, VMware re-introduced UNMAP but as a CLI-only operation, by providing firm support of running UNMAP via the CLI command “vmkfstools -y” Also, the option /VMFS3/EnableBlockDelete was completely disabled–even if you enabled it, it did nothing. Completely defunct.
  5. The performance problems with UNMAP were not mitigated though, so VMware and many storage vendors, required the use of UNMAP during maintenance periods only. Furthermore, the UNMAP process was not particularly flexible. You could specify a percentage of free space to be reclaimed (1-99%). If you specified a high percentage, you would reclaim more space, but you risked temporarily filling up the datastore with the balloon file that was created during the UNMAP process, which introduced its own risks. If you used a low percentage you didn’t risk space exhaustion, but you didn’t reclaim a lot of space. Also it didn’t work well with large datastores.
  6. In vSphere 5.5 the vmkftools method was retired and an enhanced version of UNMAP was introduced into esxcli (could still be done with vmkfstools, but it used the new functionality only). The new process allowed for an iterative UNMAP that by default reclaimed 200 MB at a time–so space exhaustion due to a balloon file was not an issue. Furthermore–it always reclaimed all of the space, so it was much more efficient. The underlying UNMAP SCSI command was also improved as to what ESXi could leverage. ESXi now supported 100 block descriptors per UNMAP command instead of 1–possibly making the process faster, or at least more efficient (assuming the underlying storage supported this, as identified by inquiring the Block Limits VPD page B0). Also since it was in esxcli, PowerCLI could be used to script this process very easily.

So this is where we were in vSphere 5.5 for UNMAP outside of a guest (on VMFS dead space).

Let’s get on with it

But what about dead space INSIDE of a VM? As in deleting files from within a guest on a filesystem on a virtual disk.

Prior to vSphere 6, if you wanted to reclaim space inside of a virtual disk it was a semi-arduous process. With the exception of the SE Sparse disk (which is only supported in VMware View today) the only decent option to reclaim space was through the use of some zeroing tool inside the guest, like sDelete in Windows. This was unfortunate because so many Guest OSes actually support issuing UNMAP themselves. Since ESXi virtualized the SCSI layer, even if OSes attempted to send UNMAP down to the virtual disk it would not do anything and would not make it to the array. This zeroing behavior caused unnecessary I/O and inflation of virtual disks if they were thin. This is no longer so in vSphere 6.0.

Remember, just a bit ago I mentioned the option /VMFS3/EnableBlockDelete? Prior to vSphere 6.0, if you looked at this option and read the description it said, “Enable VMFS block delete.” If you now look at it in vSphere 6, it now looks like:

 Path: /VMFS3/EnableBlockDelete
 Type: integer
 Int Value: 0
 Default Int Value: 0
 Min Value: 0
 Max Value: 1
 String Value:
 Default String Value:
 Valid Characters:
 Description: Enable VMFS block delete when UNMAP is issued from guest OS

Interesting! The description has changed! You may also note it is still disabled by default (default value is 0).

If you enable this option, ESXi will now permit guest OSes that issue UNMAPs to a virtual disk to be translated down to the array so that the space can be reclaimed. Furthermore, the virtual disk will be shrunk down by the amount of space reclaimed if the virtual disk is thin. From my testing it seems that only a thin virtual disk supports this guest UNMAP functionality–but I am not sure if that is a VMware restriction or a guest OS restriction just not wanting to UNMAP virtual disks that are “thick,” it cannot tell the underlying actual storage is thin. Definitely need to do some research here. So for the purposes of this post I am going to assume that thin virtual disks are required. Which brings us to the requirements…

Requirements

Please note!! The following list is NOT FROM VMWARE! This is my observation–I am in the middle of vetting out the official behavior, requirements, architecture with VMware now. So refer to this post as a “hey, look what I found!” anecdotal post. Look for an upcoming post with a lot more details and hopefully official information.

Anecdotal, seemingly required, requirements:

I tested with two different operating systems. RHEL 6.5 and Windows 2012 R2.

A quick note…

Before I continue on, let’s be clear about what the EnableBlockDelete option actually does. It does NOT enable Guest OSes to execute UNMAP–this is always enabled by vSphere 6. If this is not enabled and UNMAP is run in a guest OS it will work and the virtual disk will be shrunk. So read the description again:

“Enable VMFS block delete when UNMAP is issued from guest OS”

What this option does is not allow UNMAP in the guest, but it allows ESXi to recognize the virtual disk was shrunk by UNMAP and it then issues UNMAP to the underlying storage. So essentially if you want the guest to UNMAP the space the whole way down to the array–enable this option. When it is enabled, you will see in esxtop the VAAI Delete counter increment–this will show you that something happened, besides of course the space being reclaimed on the array.

RHEL 6.5

So I didn’t really have any luck getting this to work on Linux. I tried the normal things that work great on RDMs.

UPDATE: So after I wrote this but before I published, after some email exchanges I was pointed to this KB article by Cormac Hogan that explains why Linux doesn’t work

UPDATE 2: Linux now works with ESXi 6.5!!

Linux 6.x+ with the ext4 filesystem (I haven’t looked into XFS yet) offer two options (that I am aware of) for reclaiming space.

  1. Mount the filesystem with the discard option: mount -t ext4 -o discard /dev/sdc /mnt/UNMAP This will make Linux automatically issue UNMAP when files are deleted from the filesystem.
  2. Use the command fstrim. Fstrim is a command that will reclaim deadspace across a directory or entire filesystem on demand–this does not require the discard option to be set, but is compatible with filesystems that do have it enabled.

Both of these options work with RDMs great. I could not get either to work with virtual disks though. After some reading, I suspect RHEL is using TRIM not UNMAP (the name fstrim seems to be a blatant hint 🙂 ), and I am guessing that TRIM may not be supported by this new feature. RHEL literature references UNMAP quite a bit, but maybe it is just being used in a colloquial sense to refer to space reclamation?

 

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2112333

In short, the SPC version in ESXi VMDKs is too old. So hopefully this will be fixed in the future.

Windows Server 2012 R2

Okay more in my wheelhouse! Yes…I am a Windows guy, deal with it.

Getamac
I like his tie.

Windows I could get to work, mostly.

UPDATE 1: This also works with automatic UNMAP but you need the allocation unit to be 32 K or 64 K for NTFS. See this post Allocation Unit Size and Automatic Windows In-Guest UNMAP on VMware

UPDATE 2: As of ESXi 6.5 P1 you no longer even need to write allocation unit. See this post: In-Guest UNMAP Fix in ESXi 6.5 Part I: Windows

Windows 2012 R2 and Windows 8, introduced UNMAP support to reclaim space from NTFS volumes. Hyper-V has been able to UNMAP through guests for awhile because of this. In addition to UNMAP support, Windows redesigned their “defrag” utility to be smarter about SSDs and thinly-provisioned volumes. As you might be aware, defrag operations on an SSD (and often just in general on a VM) is useless and possibly deleterious (unnecessary write amplification, space bloat etc.).

Defrag is now a utility called “Optimize Drives.”

optimizedrives

The utility allows you to schedule operations on a given volume, and it will intelligently decide what type of operation to do depending on the device type. Something like defrag for HDDs, TRIM for SSDs (I believe), and UNMAP for thinly-provisioned volumes.

As you can see, some of the drives are recognized as SSDs and others as thin provisioned volumes. Accordingly, thin virtual disks show up as “thin provisioned” and eagerzeroedthick/zeroedthick virtual disks show up as “Solid state drive.” So, let’s run through an example.

First, ensure EnableBlockDelete is enabled:

enableblockdelete

Then, let’s add a thin virtual disk to my Windows 2012 R2 virtual machine:

newvd

We can take a look at the VMDK size on the datastore and it is currently 0 KB:

vdsize_beforecopy

The array reports the volume as having 443 MB written after reduction–the base VM is on the same datastore:

gui_beforecopy

Now format the volume as NTFS and then copy some files. I am copying over a bunch of vSphere 5.5 ISOs.

newntfs

copyfiles

Let’s review the size of the virtual disk now (10 GB) and the datastore reported raw used:

vdsize_aftercopy

datastoresize_Aftercopy

I realize that I don’t need these ISOs anymore (because I am running vSphere 6.0, why do I need 5.5?!) and delete them. Make sure to delete them permanently (shift+delete) OR delete then empty the recycling bin (very important).

delete_files

Normally, the delete operation will reclaim the space automatically–but this behavior doesn’t seem to work with a virtual disk. So launch the Optimize Drives utility. Click the volume and then choose “Optimize”

optimize_before

optimize_during

The process is pretty quick and will report as OK (100% space efficiency) when done.

optimize_After

Now if we look at the VMDK size it is back to a small number–80 MB. Not the whole way back to zero, but much closer than 10 GB.

vdisksize_afterdelete

From the FlashArray GUI we see the datastore is essentially back to the 433 MB that was there before the copy–it is now 435 MB.

datastoresize_afterdelete

Done!

Final thoughts and next steps…

So really for me this has introduced far more questions than answers, but it is a great first step. I plan on doing a lot more digging in the near term so look for more posts shortly. Some of my questions:

UPDATE: Answers from VMware here!

  1. Does VMware actually support this? Not documented anywhere that I could find yet
  2. Will it only work with UNMAP? How about TRIM or WRITE SAME with the UNMAP bit set?
  3. Does it even care that UNMAP was run? Is there direct SCSI integration between the guest and ESXi? Or was it just the shrinking of the VMDK that matters–ESXi sees a VMDK shrink operation occurred and issues its own UNMAP (this is my strong suspicion at this point)?
    1. If that is the case, is that why thick virtual disks will not work? I think Windows won’t do it because it doesn’t see it as thin provisioned. If what I suspect is the case though, even if it could force Windows or some other OS to issue UNMAP to those types of disks it will never make to the underlying storage.
  4. Who Framed Roger Rabbit? Never saw the movie.
  5. When does Windows use TRIM and when UNMAP? Same for Linux?
    1. How does the optimize drive work (when and what)?
    2. Why doesn’t auto reclaim with delete not work but the optimize and the defrag CLI work? Maybe a difference of trim vs. UNMAP?
    3. Is it based on this fling?
  6. What other OSes work?
  7. Nested ESXi?

That and plenty more! Nothing but greenfield testing to do here! Stay tuned…

 

29 thoughts on “Direct Guest OS UNMAP in vSphere 6.0”

  1. Cody,

    I thought that the UMMAP/TRIM pass trough was only for VVols (since they are closer to RDM than to VMDKs and Cormack posted it few weeks ago) but your discovery is great.

    I tested your steps (running esxtop,u) and I can confirm that it only worked on W2012r2 on thin vmdk.
    fstrim on CentOS7 gives me “ioctl failed: Operation not supported”.

    Have you tested on VVols yet ?

    Drew, these TBs are gold, I do read them.

    1. Yeah Linux is looking for SPC-4 support for UNMAP which ESXi doesnt currently advertise. So Linux will always fail at this time. I haven’t tested it with VVols yet–this will be forthcoming though

  2. Interesting

    Windows 8 thin provisionned on an SSD … optimization unavailable
    BUT same conf with Windows 10 … optimized and trimed 🙂

    1. Interesting. I need to do some more research on what/when/why Windows does here. Seems to be a variety of behaviors. Thanks!!

  3. Can this work on vSAN? Got it working like a charm on VMFS, but the Guest (W2K12R2 in my case) Detects the ‘Media Type’ as ‘Hard Disk Drive’ on vSAN vs ‘Thin Provisioned Drive’ when the VM is running on VMFS.

    1. VSAN doesn’t support UNMAP last I saw, so this would not work in that scenario. Would need to confirm with VMware though

  4. Sorry to resurrect an old post, but I went through your steps and can not get the vmdk to shrink.

    -Using 2012 r2
    -hw 11
    -vmtools 10.0.6
    -host has EnableClockDelete=1
    -DisableDeleteNotify=0 on guest
    -vmdk set to thin

    What am I doing wrong?

    1. Is Change Block Tracking enabled on the guest? This is the most common reason for it not working. Also how does the device in the guest show up (in disk optimizer)? thin provisioned or SSD?

  5. There is no CBT entry in the vmx file. Does it need to be enabled for this? I thought during your VMWorld session, you said it should be disabled.

    Shows as Thin provisioned in Disk Optimizer.

    1. Yeah CBT should be turned off. Since it is showing up as thin provisioned you should be good from that standpoint. How much capacity did you reclaim (or intend to), also is whatever was deleted, removed from the recycling bin?

  6. Created a 10gb partition. Copied 1.5gb to the disk, then deleted and emptied Recycle bin. Ran optimize, but vmdk still showed ~2.2GB size

    1. Hmm. That’s exactly what i’ve done in the past. Is this on Pure Storage? Or other storage? If it is on us, I would like you to open a support case with us. It is technically a VMware problem, but I know we have had a few cases around this we’ve solved so it might be quicker to start with our support. Shoot me an email at cody AT pure storage DOT com

  7. This is driving me crazy right now with my Compellent array. I’m running ESXi 6.X and VMFS.EnableBlockDelete is of course disabled by default. I emailed VMware for answers and they mistakenly even told me VMFS.EnableBlockDelete is enabled by default in ESXi 6.X.

    I was told years ago to never thin on thin. When Dell sold me this SC4020 (all flash) I was told to select Thick Lazy and have been doing that since.

    I’m in the process of breaking out my VMs into their own volumes for various reasons. It’s easier to see individual system IOPS metrics and control replication per VM this way.

    I plan on performing the same tests – create two VMs, two volumes, one thin on thick, and one thin on thin, etc. Move data in/out and see how the volumes on the SAN behave.

    According to Dell these volumes are always thin provisioned and this cannot be changed on the SAN. So I’m currently trying to confirm with them if they have other logic in place to determine and reclaim dead space within a VMDK.

    If you google around you’ll note someone did test the Compellent UNMAP to be working, however, this wasn’t a file deletion WITHIN the VMDK, but rather a deletion of the VMDK itself off of the volume.

    I’m glad I started digging into this before I carved out 200 volumes and moved my VMs in their current lazy thick state. I’ve only shuffled maybe 20 so far before deciding further research was needed.

    I was under the impression the UNMAP command in Win2012R2 would be issued during a file deletion as well as a defrag/optimization. It’s funny, because a few years ago, before Microsoft made these updates to Windows defrag, I actually disabled the defrag tool entirely via GPO for all systems running on my all flash array. I didn’t want it wearing down my flash!

    I’m waiting to hear back from a senior copilot engineer at Dell right now. I’ll post back if they say anything interesting. I’ll also post back once I start testing with my Compellent array.

    1. I’ll be curious how it turns out! The automated UNMAP in Windows does not work with this in-guest reclamation in VMware and I don’t know for sure why. Only disk analyzer does. I have a few suspicions on why and until recently I haven’t had a way to test one of my theories, which I am hoping to get to soon. It might have to with some SCSI versioning of the Windows UNMAP command through different mechanisms. I can’t speak to how it works on compellent, but as long as they advertise the volumes as thinly provisioned in the device VPD and the other pre-req are met it should work. There are some enhancements coming to vSphere in future versions that should make this a bit easier to manage.

      1. Hi Cody,
        Windows Server 2016 on vSphere 6 with VVOLs and the VMFS.EnableBlockDelete=1 set also supports the unmap.

        In addition to Windows Server 2012R2 the file delete unmap also works after a couple of minutes and the space is reclaimed on the array.

        1. I got this to work too, I didnt have my allocation unit set to 64K for the NTFS–apparently this is a requirement.

    2. I couldnt get the automatic unmap to work in NTFS until I formatted the NTFS with a 64K allocation unit. Then it worked. As soon as I deleted the file–unmap was issued.

  8. Hi all,

    In regards to unmap within the VM layer (win 2012r2) This previously wasn’t working because of CBT enabled for Veeam backups.

    Is there any news of this on VSHpere 6.5? I noticed unmap is now automated at a storage level but cant see anything to do with CBT and windows layer reclaim

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *