Demystifying IO Operation Readouts in ESXi

This doesn’t come up very often these days, but every once and awhile it does and every time it does, I look to see if we have documentation on it and there never seems to be. After writing this post I did find a forum post where my friend Drew answers it there too. Well anyways let’s quickly explain the situation.

Most block vendors these days tell customers to change their path switching policy for their storage in ESXi from the default of Round Robin (1,000) to 1. This makes ESXi switches logical paths for a given device after every I/O instead of every 1,000. The reason I say this doesn’t come up much anymore is that in modern version of ESXi (6.0 express patch+, 6.5 U1+ and 6.7+) we (Pure) have rules in ESXi that makes sure this is set by default without any user configuration. Many other vendors do as well.

Anyways, when using VMware tools to see if a device is configured properly, depending on how it is set, it can readout differently.

So if I run the following command:

esxcli storage nmp device list

I see two devices that have slightly different multipathing configurations (or so it seems):

naa.624a93705ee86996f8334fa000011012
   Device Display Name: PURE Fibre Channel Disk (naa.624a93705ee86996f8334fa000011012)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}{TPG_id=1,TPG_state=AO}}
   Path Selection Policy: VMW_PSP_RR
   Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0; lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba3:C0:T3:L253, vmhba3:C0:T2:L253, vmhba1:C0:T4:L253, vmhba1:C0:T3:L253
   Is USB: false

naa.624a937073e940225a2a52bb0002b7c5
   Device Display Name: PURE Fibre Channel Disk (naa.624a937073e940225a2a52bb0002b7c5)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=1,TPG_state=AO}{TPG_id=0,TPG_state=AO}}
   Path Selection Policy: VMW_PSP_RR
   Path Selection Policy Device Config: {policy=rr,iops=1,bytes=10485760,useANO=0; lastPathIndex=0: NumIOsPending=0,numBytesPending=0}
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba4:C0:T2:L1, vmhba4:C0:T1:L1, vmhba2:C0:T2:L1, vmhba2:C0:T1:L1
   Is USB: false

Notice a difference? Well there isn’t much of one except for this part. policy=rr and policy=iops. But both say IOPS=1. What does that mean and how is it different?

Well let’s look at the devices in another way.

[root@esxi-01:~] esxcli storage nmp psp roundrobin deviceconfig get -d naa.624a93705ee86996f8334fa000011012
   Byte Limit: 10485760
   Device: naa.624a93705ee86996f8334fa000011012
   IOOperation Limit: 1
   Latency Evaluation Interval: 0 milliseconds
   Limit Type: Iops
   Number Of Sampling IOs Per Path: 0
   Use Active Unoptimized Paths: false

[root@esxi-01:~] esxcli storage nmp psp roundrobin deviceconfig get -d naa.624a937073e940225a2a52bb0002b7c5
   Byte Limit: 10485760
   Device: naa.624a937073e940225a2a52bb0002b7c5
   IOOperation Limit: 1
   Latency Evaluation Interval: 0 milliseconds
   Limit Type: Default
   Number Of Sampling IOs Per Path: 0
   Use Active Unoptimized Paths: false

Notice the Limit Type property. One says IOPS and the other says Default. This is a little clearer.

What if I create a custom (a.k.a. user) SATP rule and then provision a new device?

[root@esxi-01:~] esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "iops=10" -e "FlashArray SATP Rule"
[root@esxi-01:~] esxcli storage nmp psp roundrobin deviceconfig get -d naa.624a93705ee86996f8334fa00002aff4
   Byte Limit: 10485760
   Device: naa.624a93705ee86996f8334fa00002aff4
   IOOperation Limit: 10
   Latency Evaluation Interval: 0 milliseconds
   Limit Type: Default
   Number Of Sampling IOs Per Path: 0
   Use Active Unoptimized Paths: false

Note I made the IO Operation limit 10 instead of 1 so I know that it hit that rule. It still says Default.

So. If you see Default or RR, you know that device was configured according to a default SATP rule or a custom one. If you see IOPS, then you know it was because someone manually changed that device.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.