ESXi NVMe-oF Namespace IDs, LUNs, and other Identifiers

In the world of SCSI, a storage device is generally addressed by two things:

  1. LUN–Logical Unit Number. This is a number used to address the device down a specific path to a specific array, for a specific host. So it is not a unique number really, it is not guaranteed to be unique to an array, to a host, or a volume. So for every path to a volume there could be a different LUN number. Think of it like a street address. 100 Maple St. There are many “100 Maple Streets”. So it requires the city, the state/province/etc, the country to really be meaningful. And a street name can change. So can other things. So it can usually get you want you want, but it isn’t guaranteed.
  2. Serial number. This is a globally unique identifier of the volume. This means it is entirely unique for that volume and it cannot be change. It is the same for everyone and everything who uses that volume. To continue our metaphor, look at it like the GPS coordinates of the house instead of the address. It will get you where you need, always.

So how does this change with NVMe? Well these things still exist, but how they interact is…different.

Now, first, let me remind that generally these concepts are vendor neutral, but how things are generated, reported, and even sometimes named vary. So I write this for Pure Storage, so keep that in mind.

There is a new concept in NVMe-oF called a namespace. This is essentially interchangeable with a SCSI device or a volume. It is the object you present and put a file system on. In fact, on FlashArray we do not treat it any differently–creating, configuring, etc is all of the same. You create a volume. Whether it is NVMe or SCSI only depends on what host you connect it to (does that host have NVMe initiators or SCSI-based ones?).

An NVMe namespace also has a namespace ID. This is not the serial number–well not directly, we actually don’t really even show the ID in our UI, we do tell a host connecting to it via NVMe-oF the ID though. Let’s walk through presenting it to a VMware ESXi host.

So I will create a new volume:

I will connect it to an ESXi cluster:

You will see it does assign a LUN ID:

More on that in a bit. In vSphere, I click on my host and go to the adapter. You can see the NVMe namespace there:

And of course it shows up in the Devices listing.

So two things to note. The EUI (extended unique identifier). This is very similar to the volume serial number which was:

Serial:

F439F7C5A4AB425000017158

EUI:

00f439f7c5a4ab4224a9375000017158

VMware just adds two zeroes to the start and inserts the Pure Storage OUI (organizational unique identifier) in the middle 24a937.

But what about that wildly large LUN ID? Where does that come from? The LUN ID on the array, as you might note, is 253. VMware says it is 94551. Who is correct?

Well neither really. There isn’t really a concept of a LUN ID in NVMe-oF. Discovery is done in a different way, but both products use LUN IDs to track storage in some way (ESXi uses it for path identification, FlashArray uses it to reserve a volume slot to a host).

ESXi uses the namespace ID for the LUN ID. So where does it get the namespace ID? Let’s walk through how a host sees a new namespace.

First, there are NVMe controllers, in my case 4 (the four target ports)”

More easily seen via esxcli

What is helpful in esxcli is he mapping to the controller number as well, because if you watch the ESXi vmkernel.log file you see some NVMe async events when the namespace is connected:

2021-01-21T23:51:19.185Z cpu41:2097726)WARNING: NVMEIO:2423 Controller 257 receives async event: type 2, info 0, log page ID 4.
2021-01-21T23:51:19.185Z cpu30:2101456)WARNING: NVMEIO:2423 Controller 261 receives async event: type 2, info 0, log page ID 4.
2021-01-21T23:51:19.246Z cpu43:2097953)WARNING: NVMEIO:2423 Controller 259 receives async event: type 2, info 0, log page ID 4.
2021-01-21T23:51:19.246Z cpu30:2101456)WARNING: NVMEIO:2423 Controller 263 receives async event: type 2, info 0, log page ID 4.

You can see each controller sent an async event, which in NVMe terms means that “hey ESXi, something changed, check it out!”.

Now a nice side effect of this is that you no longer need to do a rescan, NVMe ACL changes are communicated automatically. So bye, bye, slow, expensive, SCSI bus rescans. This behavior does exist in SCSI, and in fact we do implement it via what is called a Unit Attention, but support around this SCSI bi-directional communication is spotty, and inconsistent enough that rescans are often still required for SCSI. Though this behavior is well-implemented in SCSI vVols because the behavior is specifically defined.

You will see something like this. The above event told VMware to look at a specific page for changes, which translates to there are namespace changes. So it discovers new ones.

2021-01-21T23:51:19.247Z cpu25:2098246)NVMEDEV:4244 Discover namespace on controller 263
2021-01-21T23:51:19.247Z cpu45:2098271)NVMEDEV:4244 Discover namespace on controller 259
2021-01-21T23:51:19.185Z cpu25:2098246)NVMEDEV:4244 Discover namespace on controller 261
2021-01-21T23:51:19.185Z cpu45:2098271)NVMEDEV:4244 Discover namespace on controller 257

Then it finds the new namespace on each controller.

2021-01-21T23:51:19.185Z cpu25:2098246)NVMEDEV:3565 Controller 261, construct namespace 94552
2021-01-21T23:51:19.185Z cpu45:2098271)NVMEDEV:3565 Controller 257, construct namespace 94552
2021-01-21T23:51:19.247Z cpu45:2098271)NVMEDEV:3565 Controller 259, construct namespace 94552
2021-01-21T23:51:19.247Z cpu25:2098246)NVMEDEV:3565 Controller 263, construct namespace 94552

It found the new namespace! And the ID. So what is that ID? Well that ID is calculated and presented by the FlashArray. Our namespace IDs are calculated from the volume serial number, the last 4 hex bytes in fact. So the serial is again:

F439F7C5A4AB425000017158

The bolded portion being the last four bytes (this is hexadecimal to each digit is half a byte). So 17158 in decimal is 94552. The namespace ID above! But, you might notice the LUN ID is not 94552. It is instead 94551. Huh?

The reason for this is that there isn’t really a correlation between our LUN ID and VMware’s–in fact they don’t know about it. So they have the agency to choose what that value is–in NVMe it is not allowed to have a namespace ID of zero. It must be one or higher. In SCSI LUN ID’s can be zero–it was traditionally quite common for boot volumes. So VMware actually has a mapping of NVMe namespace IDs to LUN IDs and the corresponding LUN ID is one lower. This will probably for the possibility to have LUN 0, even though it is not a supported NVMe namespace ID.

You can see the paths generated with the LUN ID of 94551:

2021-01-21T23:51:19.247Z cpu23:2097683)HPP: HppClaimPath:3718: ALUA target (vmhba65:C0:T1:L94551)
2021-01-21T23:51:19.186Z cpu23:2097683)HPP: HppClaimPath:3718: ALUA target (vmhba65:C0:T0:L94551)
2021-01-21T23:51:19.185Z cpu11:2097680)HPP: HppClaimPath:3718: ALUA target (vmhba66:C0:T0:L94551)
2021-01-21T23:51:19.247Z cpu11:2097680)HPP: HppClaimPath:3718: ALUA target (vmhba66:C0:T1:L94551)

So this means a few things. First, the LUN ID is ALWAYS the same for a given namespace, the overall paths are unique of course, but the ending LUN ID is always going to be the same in ESXi–the slot on the FlashArray host connection doesn’t matter.

But the number itself still isn’t globally unique–global uniqueness requires looking at the full serial–which is the EUI. So once again, while the LUN ID in ESXi is a better option today for mapping than with SCSI, the EUI is still a better option.

These operations in NVMe are FAST. See the below unedited GIF of an NVMe namespace resize.

4 Replies to “ESXi NVMe-oF Namespace IDs, LUNs, and other Identifiers”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.