Sorry I had to throw up in my mouth a bit.
Anyways. In general I am not a fan of using in-guest iSCSI for VMware environments (or really any virtualized environment), but the question does come up. And I find a surprising amount of customers using it in secret. So what of it? Why do I hate it? Why should one use it? Are there reasons to?
So let me preface this post with, is in-guest iSCSI inherently evil? No, of course not. Are there situations where it makes sense? Yes, and I go into that. The post is focused on this use case: I need to provision storage from the FlashArray to my VMware VMs. What is the best method? I expect this to be a living document. So please let me know if there are more downsides or upsides and I will add them in.
Let’s take a look.
So first let’s look at the options for ESXi storage. I am going to focus on external block here, as this is a conversation around iSCSI. vSAN vs. file vs. external block is a larger conversation that I will mark, for the purposes of this conversation, as out of scope.
So what are the options? Well:
- VMFS virtual disks. Files on a file system on a block device. Various formats that allow for efficiency or performance or something in between. Natively created and managed in VMware, but the block device does need to be provisioned somehow externally first.
- Physical RDM. A block device that is presented to the ESXi host which is then presented up to the VM directly. Looks just like a block device presented to a physical server, meaning its VPD information is not changed by VMware. It consumes a slot in the SCSI adapter for the VM, but it cannot be managed natively in VMware, just passed through. So resizes, creation, etc. must be done externally.
- Virtual RDM. Somewhat in the middle of the above options, but once again a block volume presented up to VM via ESXi. I/Os go through the vSCSI layer so things like snapshots or Storage vMotion are possible, but still has most of the management limitations of a physical RDM.
- vVol. A lot of advantages here. Best of both worlds mostly between the above, but it does require a lot of vendor support and has a few more moving parts. Overall though extremely attractive.
With all of the being said, what is in-guest iSCSI?
Well, first off iSCSI is not a storage choice. All of the above options can be used with iSCSI. Just like they can all be used with Fibre Channel. It is a connectivity choice, not really a “storage” choice. What we are really talking about here is who is the initiator? A storage array is the target. The initiator is who is on the receiving end of that storage. In physical server world, this is a Windows host. Or a Linux host. Or whatever.
In a VMware world the initiator is ESXi. ESXi is installed on a physical box with HBAs. Those HBAs have initiator addresses (FC or iSCSI or FCoE or NVMe-oF etc). The array then presents storage to that initiator. ESXi then uses it in one of the above ways.
In-guest iSCSI is another way of saying that the OS inside of the VM is the initiator directly. The array is not presenting the storage to initiators belonging to ESXi, but instead to the OS in the VM. In many OSes (well pretty much all) they do not need a physical HBA to present an iSCSI initiator. Just like ESXi, they offer the ability to configure a software initiator that allows you to speak iSCSI over the TCP/IP network. That’s almost literally the reason people use iSCSI, I don’t need special physical networks to access my block storage.
Can you do in-guest FC? Yeah. It’s called NPIV and it’s a horrible thing (in this scenario). Don’t believe me? Fine. It’s not a battle I have any interest in fighting. You’ll eventually agree.
The Dark Side of in-Guest iSCSI
Okay. So what’s the harm? I have a physical server and my storage is fine there. I have a Linux box with iSCSI configured and it rocks!
First off, virtualize that you goofball.
Furthermore, we have a layer of abstraction here that can get you into trouble. Let me count the ways.
- Multipathing. When you present a block device to ESXi, multipathing is configured from ESXi to the array. It has physical NICs or HBAs and has a very easy to use multipathing system with Round Robin (or whatever). ESXi details out very well if something is wrong. Also from in the guest how do you even know if it is multipathed? Even if the I/O is going through two NICs (or eight even) how do you know those are not going through the same physical NIC on the underlying ESXi host? Without a thorough understanding of the whole stack it is tough. Part of the whole reason for virtualization is that the OS should not have to care about that stuff.
- Device loss. On that note, what happens if an ESXi host somehow loses its network connection to that array? Someone makes a firewall or network change? Or more likely in the ESXi host someone makes a vSwitch alteration that blocks access to that VLAN or network or whatever to that storage? The VMware admin doesn’t know that the storage is even being used by that VM. The storage bypasses ESXi entirely. It bypasses the vSCSI adapter entirely. It is only known by the array and the OS inside of the VM. So the VMware admin could make a change that cuts off storage access. You might argue well a network admin could cut off storage access to an ESXi host too! Well true. But that would impact both either way. But with vSphere HA if that was accidentally changed for one host on the network, vSphere HA would reboot it on a host that wasnt affected. With in-guest, ESXi has no knowledge it was cut off. No reboot. Maybe the boot object of the VM was on different storage, so the VM looks to be running fine. In short, there is a lot of network uncertainty.
- Backup. A common method for backing up VMs is VADP. A provider like Rubrik, Veeam, Commvault, Cohesity, etc. The talk to VMware and ask “hey what storage does this VM have?”. Well if the storage is connected via in-guest iSCSI, that storage is not known by VMware. As far as VMware sees it, the VM is being fully backed up. So it can be easily missed.
- Performance. What NIC on ESXi is that traffic being routed through? It could be the mgmt network. It could be a 1 Gb link, not a 40 Gb one. There could be Network I/O control configured in ESXi that throttles it (they may not know there is “storage traffic”).
- Jumbo Frames. Are they configured throughout? Are they not? One more layer to that question.
- Network control. Yes you might have storage control. But you pay for it in network control, like mentioned above–you have zero insight into the network, so if something changes somewhere, you still need to interact with VMware.
- Best Practices. This is where it gets ugly. What OS is it? Are there specific best practices for it? Settings? Generally this is taken care of in the ESXi layer. For VMs using the “traditional” storage options installing VMware tools and using PVSCSI is good enough. For in-guest iSCSI you need to understand the recommendations for that specific OS. And how to configure multipathing. And should you even configure multipathing? Maybe it is done at ESXi?
- Support. Is this OS or OS version even supported by your storage vendor? With ESXi the whole point is that it doesn’t matter. We support ESXi, VMware supports what runs in it.
- Host count. Each VM is now a host on your array. You are managing a lot more hosts.
- vMotion. If the VM gets moved elsewhere will it have access to the storage? If the storage is “known” by VMware (VMFS, RDM, vVol) it will not move it to another host if that host does not see that storage. This is not the case with in-guest. It could move to another host, cluster or even datacenter.
- Storage vMotion. If the move is not only a host move, but a storage move, that move will not move the in-guest storage. You have now lost the ability to non-disruptively move that storage volume.
- Disaster Recovery. Is that VM replicated? If so, is it protected by some kind of DR orchestration like SRM? Well if so, SRM has no idea about in-guest iSCSI. Even if you script it, you need to manually make sure that if you add a disk or remove etc the script is updated.
I can think of more reasons, but that’s good for now. Just remember:
So why would I want I want to use it?
The Bright Side of in-Guest iSCSI
So there are some reasons where it makes sense. Somewhat:
- Large volumes. You need something larger than 62 TB which is the largest VMware supports. With that being said, why not have more than one of those to the VM and then use OS LVM or whatever to make a larger file system?
- Shrinking. Ah yes. VMware cannot shrink a virtual disk. No matter what type. Storage vendors like Pure support it. So if this happens a lot then I suppose it might be an option. But why? Our storage is thin provisioned. So too large doesn’t really mean anything. UNMAP from the OS reclaims deleted space. Not a great reason, but a reason nonetheless.
- Snapshots. If you are restoring from a snapshot of a smaller size or a larger size VMware does not allow it. You have to remove the disk and (this is vVol or RDM) and then restore, then add it back. With in-guest you can unmount the device, do what you need, and remount.
- No VMware interaction. I don’t have to talk to VMware to provision, change, or alter my storage. True. But as said above this may not be a good thing.
- 3rd Party stuff. I have seen some 3rd party stuff that only works with physical servers. Backup proxies are a good example. They need to present storage to itself and remove it and have no ability to know ESXi is under it. To me, this is an RFE to that backup vendor. But it is still a reality for some.
- Mount speed. This is solid one. Latency matters, and not only for I/O. In situations where microseconds matter to data access, skipping the entire ESXi storage layer and presenting to the guest directly makes a difference. It is faster. No doubt. This is one of the reasons we use this method for container storage provisioning.
- Consistent provisioning method. Whether it is a physical server or a VM, the steps to provision can be the same. Yes boot is different, but that’s unavoidable. Whether it is a VM, or a physical server, or an EC2 instance in AWS, it is the same script, or integration to do it. There can be value in this. Another one of the reasons we use it for containers–it doesn’t matter where the container is. A physical server or a VM etc.
Cloud Block Store
You might say, well doesn’t CBS use in-guest iSCSI for storage? Yup. But also the majority of the downsides above don’t apply here. The biggest sticking point here is just like a physical server–initially setting up iSCSI in the OS. There are a lot of great options around automation of EC2 and their internal features, native options that are readily available in AWS (SSM for instance). We are working on providing and improving this automation–so that in the end, it won’t even matter to you. Big focus for me is cloud-init. Downsides above like networking don’t apply either–unlike with ESXi–the physical network doesn’t exist in AWS (well for all intents and purposes), instead just your security groups and subnets and VPCs. So there are multiple levels there to manage. Main thing to remember is that this post is talking about delivering storage to VMware VMs. Use cases outside of that change the considerations above.
Pure Service Orchestrator
So we do use in-guest iSCSI for PSO to deliver persistent storage to containers. The key here is to deliver storage to containers, not really VMs. The fastest way to do this is in-guest iSCSI. If the VM moves, but the container does, we don’t have to worry about where that VM goes and if the storage is presented there as well. Furthermore, we don’t need to worry about whether the containers are running on baremetal or VMs. Which makes our flex/csi plugin more portably across K8s distributions. Though with the announcements around Project Pacific/Tanzu–FCDs with vVols will likely change what is done here, once containers are first class citizens inside of VMware.
Can you do it? Sure. Should you do it? Please, please, please, think about it first. There are alternatives. This MIGHT be the best option for you. It might be the worst.