I wrote a blog post a year or so ago about ESXi and storage queues which has received a lot of wonderful feedback (thank you!!) and I eventually turned it into a VMworld session and other engagements:
So in the past year I have had quite a few discussions around this. And one part has always bothered me a bit.
In ESXI, there are a variety of latency metrics:
GAVG. Guest average. Sometimes called “VM observed latency”. This is the amount of time it takes for an I/O to be completed, after it leaves the VM. So through ESXi, through the SAN (or iSCSI network) and committed to the array and acknowledged back.
KAVG. Kernel average. This is how long an I/O is spending in the ESXi kernel. If this is anything but zero, there is some kind of bottleneck (often a maxed out queue)
DAVG. This is how long it takes for the I/O to be sent from host, through the SAN and to the array and acknowledged back.
Ahem, I suppose I will prove it out. The real answer is, well maybe. Depends on the array.
So debates have raged on for quite some time around performance of virtual disk types and while the difference has diminished drastically over the years, eagerzeroedthick has always out-performed thin. And therefore many users opted to not use thin virtual disks because of it.
I posted a few months back about ESXi queue depth limits and how it affects performance. Just recently, Pure Storage announced our upcoming support for vSphere Virtual Volumes. So, this begs the question, what changes with VVols when it comes to queuing? In a certain view, a lot. But conceptually, actually very little. Let’s dig into this a bit more.
So I am in the middle of updating my best practices guide for vSphere on FlashArray and one of the topics I am looking into providing better guidance around is ESXi queue management. This breaks down to a few things:
Array volume queue depth limit
Datastore queue depth limit
Virtual Machine vSCSI Adapter queue depth limit
Virtual Disk queue depth limit
I have had more than a few questions lately about handling this–either just general queries or performance escalations. And generally from what I have found it comes down to fundamental understanding of how ESXi queuing works. And how the FlashArray plays with it. So I put a blog post together of a use case and walking through solving a performance problem. Explaining concepts along the way.
This is a simple example to explain how queuing works in ESXi
Mileage will vary depending on your workload and configuration
This workload is targeted specifically to make relationships easier to understand
PLEASE do not make changes in your environment at least until you read my conclusion at the end. And frankly not without direct guidance from VMware support.
I am sorry, this is a long one. But hopefully informative!
If you prefer a video, here is my 1 hr VMworld session that goes into depth on what I write below:
In the latest GA release of Purity, version 4.1.5, there have been some nice improvements in how we handle host connectivity/balance reporting. There is a new CLI command to monitor the balance of I/O from a host standpoint as well as how we report/display host connectivity in the FlashArray web GUI. Let’s take a look at these enhancements. In Part 1, I will talk about the CLI enhancement.
Quick post here. I am working on updating some documentation and I wanted to add a bit more color to a section on changing the IO Operations limit for ESXi NMP Round Robin devices. The Pure Storage recommendation is to change this value to one from the default of 1,000. Therefore, ESXi will switch logical paths after each I/O instead of 1,000. There are some performance benefits to this and some evidence for improved failover time (in the case of a path failure) with this setting. I am not going to get into the veracity of these benefits right now. What I wanted to share here is that there is no doubt changing this to 1 makes a big difference to I/O balance on the array itself. Continue reading “ESXi IO Operations Limit Parameter and IO Balance”
I posted a week or so ago about the ESXCLI UNMAP process with vSphere 5.5 on the Pure Storage FlashArray here and came up with the conclusion that larger block counts are highly beneficial to the UNMAP process. So the recommendation was simply use a larger block count than the default to speed up the UNMAP operation, something sufficiently higher than the default of 200 MB. I received a few questions about a more specific recommendation (and had some myself) so I decided to dive into this a little deeper to see if I could provide some guidance that was a little more concrete. In the end a large block count is perfectly fine–if you want to know more details–read on!