While the title of this post does sound like a halfway decent Harry Potter novel, this is far more nefarious. Pure Storage, like many other vendors have a best practice around lowering the Disk.DiskMaxIOSize setting on ESXi hosts when using UEFI boot for your Windows VMs. Why? Well:
Yes not having it set in a few situations would cause BSOD. First off, why?
In normal scenarios, you may not see much of a performance difference between the standard IOPS switching-based policy and the latency one. So don’t necessarily expect that switching policies will change anything. But then again, multipathing primarily exists not for healthy states, but instead exists to protect during times of poor health. Continue reading “Latency-based PSP in ESXi 6.7 Update 1: A test drive”
This is my first (but certainly not last post) on the new path selection policy option in vSphere 6.7 Update 1. In reality, this option was introduced in the initial release of 6.7, but it was not officially supported until update 1.
So what is it? Well first off, see the official words from my colleague Jason Massae at VMware here:
Why was this PSP option introduced? Well the most common path selection policy is the NMP Round Robin. This is VMware’s built-in path selection policy for arrays that offer multiple paths. Round Robin was a great way to leverage the full performance of your array by actively using all of the paths simultaneously. Well…almost simultaneously.
In short, VMware only supported one mechanism of LUN ID addressing which is called “peripheral”. A different mechanism is generally encouraged by the SAM called “flat” especially for larger LUN IDs (like 256 and above). If a storage array used flat addressing, then ESXi would not see LUNs from that target. This is often why ESXi could not see LUN IDs greater than 255, as arrays would use flat addressing for LUN IDs that number or higher.
I wrote a blog post a year or so ago about ESXi and storage queues which has received a lot of wonderful feedback (thank you!!) and I eventually turned it into a VMworld session and other engagements:
So in the past year I have had quite a few discussions around this. And one part has always bothered me a bit.
In ESXI, there are a variety of latency metrics:
GAVG. Guest average. Sometimes called “VM observed latency”. This is the amount of time it takes for an I/O to be completed, after it leaves the VM. So through ESXi, through the SAN (or iSCSI network) and committed to the array and acknowledged back.
KAVG. Kernel average. This is how long an I/O is spending in the ESXi kernel. If this is anything but zero, there is some kind of bottleneck (often a maxed out queue)
DAVG. This is how long it takes for the I/O to be sent from host, through the SAN and to the array and acknowledged back.
Storage capacity reporting seems like a pretty straight forward topic. How much storage am I using? But when you introduce the concept of multiple levels of thin provisioning AND data reduction into it, all usage is not equal (does it compress well? does it dedupe well? is it zeroes?).
This multi-part series will break it down in the following sections:
Let’s talk about the ins and outs of these in detail, then of course finish it up with why VVols makes this so much better.
NOTE: Examples in this are given from a FlashArray perspective. So mileage may vary depending on the type of array you have. The VMFS and above layer though are the same for all. This is the benefit of VMFS–it abstracts the physical layer. This is also the downside, as I will describe in these posts.
And I wanted to upgrade my whole lab environment to it and I haven’t set up auto-deploy or update manager yet (I plan to, making all of this much easier to manage). So I wrote a quick and dirty PowerCLI script that updates to the latest patch and if the host doesn’t have any VMs on it, puts it into maintenance mode and reboots it. I will reboot the other ones as needed.
Migrating VMDKs or virtual mode RDMs to VVols is easy: Storage vMotion. No downtime, no pre-creating of volumes. Simple and fast. But physical mode RDMs are a bit different.
As we all begrudgingly admit there are still more than a few Raw Device Mappings out there in VMware environments. Two primary use cases:
Microsoft Clustering. Virtual disks can only be used for Failover Clustering if all of the VMs are on the same ESXi hosts which feels a bit like defeating the purpose. So most opt for RDMs so they can split the VMs up.
Physical to virtual. Sharing copies of data between physical and virtual or some other hypervisor is the most common reason I see these days. Mostly around database dev/test scenarios. The concept of a VMDK can keep your data from being easily shared, so RDMs provide a workaround.
Ahem, I suppose I will prove it out. The real answer is, well maybe. Depends on the array.
So debates have raged on for quite some time around performance of virtual disk types and while the difference has diminished drastically over the years, eagerzeroedthick has always out-performed thin. And therefore many users opted to not use thin virtual disks because of it.
With VMFS-6, space reclamation is now an automatic, but asynchronous process. This is great because, well you don’t have to worry about running UNMAP anymore. But since it is asynchronous (and I mean like 12-24 hours later asynchronous) you lose the instant gratification of reclamation.
So you do find yourself wondering, did it actually reclaim anything?
Besides looking at the array and seeing space reclaimed, how can I see from ESXi if my space was reclaimed?