Musing on In-Guest iSCSI

Image result for throw up emoji

Sorry I had to throw up in my mouth a bit.

Anyways. In general I am not a fan of using in-guest iSCSI for VMware environments (or really any virtualized environment), but the question does come up. And I find a surprising amount of customers using it in secret. So what of it? Why do I hate it? Why should one use it? Are there reasons to?

So let me preface this post with, is in-guest iSCSI inherently evil? No, of course not. Are there situations where it makes sense? Yes, and I go into that. The post is focused on this use case: I need to provision storage from the FlashArray to my VMware VMs. What is the best method? I expect this to be a living document. So please let me know if there are more downsides or upsides and I will add them in.

Let’s take a look.

Background

So first let’s look at the options for ESXi storage. I am going to focus on external block here, as this is a conversation around iSCSI. vSAN vs. file vs. external block is a larger conversation that I will mark, for the purposes of this conversation, as out of scope.

So what are the options? Well:

  • VMFS virtual disks. Files on a file system on a block device. Various formats that allow for efficiency or performance or something in between. Natively created and managed in VMware, but the block device does need to be provisioned somehow externally first.
  • Physical RDM. A block device that is presented to the ESXi host which is then presented up to the VM directly. Looks just like a block device presented to a physical server, meaning its VPD information is not changed by VMware. It consumes a slot in the SCSI adapter for the VM, but it cannot be managed natively in VMware, just passed through. So resizes, creation, etc. must be done externally.
  • Virtual RDM. Somewhat in the middle of the above options, but once again a block volume presented up to VM via ESXi. I/Os go through the vSCSI layer so things like snapshots or Storage vMotion are possible, but still has most of the management limitations of a physical RDM.
  • vVol. A lot of advantages here. Best of both worlds mostly between the above, but it does require a lot of vendor support and has a few more moving parts. Overall though extremely attractive.

With all of the being said, what is in-guest iSCSI?

Well, first off iSCSI is not a storage choice. All of the above options can be used with iSCSI. Just like they can all be used with Fibre Channel. It is a connectivity choice, not really a “storage” choice. What we are really talking about here is who is the initiator? A storage array is the target. The initiator is who is on the receiving end of that storage. In physical server world, this is a Windows host. Or a Linux host. Or whatever.

In a VMware world the initiator is ESXi. ESXi is installed on a physical box with HBAs. Those HBAs have initiator addresses (FC or iSCSI or FCoE or NVMe-oF etc). The array then presents storage to that initiator. ESXi then uses it in one of the above ways.

In-guest iSCSI is another way of saying that the OS inside of the VM is the initiator directly. The array is not presenting the storage to initiators belonging to ESXi, but instead to the OS in the VM. In many OSes (well pretty much all) they do not need a physical HBA to present an iSCSI initiator. Just like ESXi, they offer the ability to configure a software initiator that allows you to speak iSCSI over the TCP/IP network. That’s almost literally the reason people use iSCSI, I don’t need special physical networks to access my block storage.

Can you do in-guest FC? Yeah. It’s called NPIV and it’s a horrible thing (in this scenario). Don’t believe me? Fine. It’s not a battle I have any interest in fighting. You’ll eventually agree.

The Dark Side of in-Guest iSCSI

Okay. So what’s the harm? I have a physical server and my storage is fine there. I have a Linux box with iSCSI configured and it rocks!

First off, virtualize that you goofball.

Furthermore, we have a layer of abstraction here that can get you into trouble. Let me count the ways.

  • Multipathing. When you present a block device to ESXi, multipathing is configured from ESXi to the array. It has physical NICs or HBAs and has a very easy to use multipathing system with Round Robin (or whatever). ESXi details out very well if something is wrong. Also from in the guest how do you even know if it is multipathed? Even if the I/O is going through two NICs (or eight even) how do you know those are not going through the same physical NIC on the underlying ESXi host? Without a thorough understanding of the whole stack it is tough. Part of the whole reason for virtualization is that the OS should not have to care about that stuff.
  • Device loss. On that note, what happens if an ESXi host somehow loses its network connection to that array? Someone makes a firewall or network change? Or more likely in the ESXi host someone makes a vSwitch alteration that blocks access to that VLAN or network or whatever to that storage? The VMware admin doesn’t know that the storage is even being used by that VM. The storage bypasses ESXi entirely. It bypasses the vSCSI adapter entirely. It is only known by the array and the OS inside of the VM. So the VMware admin could make a change that cuts off storage access. You might argue well a network admin could cut off storage access to an ESXi host too! Well true. But that would impact both either way. But with vSphere HA if that was accidentally changed for one host on the network, vSphere HA would reboot it on a host that wasnt affected. With in-guest, ESXi has no knowledge it was cut off. No reboot. Maybe the boot object of the VM was on different storage, so the VM looks to be running fine. In short, there is a lot of network uncertainty.
  • Backup. A common method for backing up VMs is VADP. A provider like Rubrik, Veeam, Commvault, Cohesity, etc. The talk to VMware and ask “hey what storage does this VM have?”. Well if the storage is connected via in-guest iSCSI, that storage is not known by VMware. As far as VMware sees it, the VM is being fully backed up. So it can be easily missed.
  • Performance. What NIC on ESXi is that traffic being routed through? It could be the mgmt network. It could be a 1 Gb link, not a 40 Gb one. There could be Network I/O control configured in ESXi that throttles it (they may not know there is “storage traffic”).
  • Jumbo Frames. Are they configured throughout? Are they not? One more layer to that question.
  • Network control. Yes you might have storage control. But you pay for it in network control, like mentioned above–you have zero insight into the network, so if something changes somewhere, you still need to interact with VMware.
  • Best Practices. This is where it gets ugly. What OS is it? Are there specific best practices for it? Settings? Generally this is taken care of in the ESXi layer. For VMs using the “traditional” storage options installing VMware tools and using PVSCSI is good enough. For in-guest iSCSI you need to understand the recommendations for that specific OS. And how to configure multipathing. And should you even configure multipathing? Maybe it is done at ESXi?
  • Support. Is this OS or OS version even supported by your storage vendor? With ESXi the whole point is that it doesn’t matter. We support ESXi, VMware supports what runs in it.
  • Host count. Each VM is now a host on your array. You are managing a lot more hosts.
  • vMotion. If the VM gets moved elsewhere will it have access to the storage? If the storage is “known” by VMware (VMFS, RDM, vVol) it will not move it to another host if that host does not see that storage. This is not the case with in-guest. It could move to another host, cluster or even datacenter.
  • Storage vMotion. If the move is not only a host move, but a storage move, that move will not move the in-guest storage. You have now lost the ability to non-disruptively move that storage volume.
  • Disaster Recovery. Is that VM replicated? If so, is it protected by some kind of DR orchestration like SRM? Well if so, SRM has no idea about in-guest iSCSI. Even if you script it, you need to manually make sure that if you add a disk or remove etc the script is updated.

I can think of more reasons, but that’s good for now. Just remember:

So why would I want I want to use it?

The Bright Side of in-Guest iSCSI

So there are some reasons where it makes sense. Somewhat:

  • Large volumes. You need something larger than 62 TB which is the largest VMware supports. With that being said, why not have more than one of those to the VM and then use OS LVM or whatever to make a larger file system?
  • Shrinking. Ah yes. VMware cannot shrink a virtual disk. No matter what type. Storage vendors like Pure support it. So if this happens a lot then I suppose it might be an option. But why? Our storage is thin provisioned. So too large doesn’t really mean anything. UNMAP from the OS reclaims deleted space. Not a great reason, but a reason nonetheless.
  • Snapshots. If you are restoring from a snapshot of a smaller size or a larger size VMware does not allow it. You have to remove the disk and (this is vVol or RDM) and then restore, then add it back. With in-guest you can unmount the device, do what you need, and remount.
  • No VMware interaction. I don’t have to talk to VMware to provision, change, or alter my storage. True. But as said above this may not be a good thing.
  • 3rd Party stuff. I have seen some 3rd party stuff that only works with physical servers. Backup proxies are a good example. They need to present storage to itself and remove it and have no ability to know ESXi is under it. To me, this is an RFE to that backup vendor. But it is still a reality for some.
  • Mount speed. This is solid one. Latency matters, and not only for I/O. In situations where microseconds matter to data access, skipping the entire ESXi storage layer and presenting to the guest directly makes a difference. It is faster. No doubt. This is one of the reasons we use this method for container storage provisioning.
  • Consistent provisioning method. Whether it is a physical server or a VM, the steps to provision can be the same. Yes boot is different, but that’s unavoidable. Whether it is a VM, or a physical server, or an EC2 instance in AWS, it is the same script, or integration to do it. There can be value in this. Another one of the reasons we use it for containers–it doesn’t matter where the container is. A physical server or a VM etc.

Cloud Block Store

You might say, well doesn’t CBS use in-guest iSCSI for storage? Yup. But also the majority of the downsides above don’t apply here. The biggest sticking point here is just like a physical server–initially setting up iSCSI in the OS. There are a lot of great options around automation of EC2 and their internal features, native options that are readily available in AWS (SSM for instance). We are working on providing and improving this automation–so that in the end, it won’t even matter to you. Big focus for me is cloud-init. Downsides above like networking don’t apply either–unlike with ESXi–the physical network doesn’t exist in AWS (well for all intents and purposes), instead just your security groups and subnets and VPCs. So there are multiple levels there to manage. Main thing to remember is that this post is talking about delivering storage to VMware VMs. Use cases outside of that change the considerations above.

Pure Service Orchestrator

So we do use in-guest iSCSI for PSO to deliver persistent storage to containers. The key here is to deliver storage to containers, not really VMs. The fastest way to do this is in-guest iSCSI. If the VM moves, but the container does, we don’t have to worry about where that VM goes and if the storage is presented there as well. Furthermore, we don’t need to worry about whether the containers are running on baremetal or VMs. Which makes our flex/csi plugin more portably across K8s distributions. Though with the announcements around Project Pacific/Tanzu–FCDs with vVols will likely change what is done here, once containers are first class citizens inside of VMware.

Conclusion

Can you do it? Sure. Should you do it? Please, please, please, think about it first. There are alternatives. This MIGHT be the best option for you. It might be the worst.

Image result for scientists were so preoccupied

15 Replies to “Musing on In-Guest iSCSI”

  1. Couldn’t agree more, I see in guest iSCSI way too often. Typically for Windows failover clusters, with the reason usually being “..well, that’s how we did it when the SQL server was physical”. I translate this as “We’re not sure how to do storage for failover clusters in vSphere”.

    When I show them the benefits of the other options, it’s often the first change on the list.

  2. We use it for Veeam servers because that is what they recommended (maybe that has changed). We also use it for MS SQL clusters (use native SQLs backups there). We do enable multipathing on the OS.

    1. Dump Veeam and go to Rubrik, lol.

      I hope you are doing more than just enabling multipath in the OS for iSCSI, VM with two NICs just for iSCSI, each on their own port group tied to separate physical NICs on separate switches on separate subnets hitting the target on separate interfaces. +/- some stuff I am probably missing.

        1. Good, scared me for a sec there.

          Guess there is always zerto, but no idea what their pricing looks like, will have to take a look at Cohesity. I do love Rubrik for being stupid easy and just working, but there are some things I would like to see change, and yes, the pricing is a real kick to the crotch. We are getting ~73% data reduction on it and it is nice to be able to search for files on a vm and whatnot. Going to have to beat my account rep up to get some less sucky pricing, the hardware is cheap, the software not so much.

          1. Haven’t dug into Zerto. We are actually using Cohesity right now as a backup repo. They include a small amount of the “Protect” with the “Platform” so we’ve played with it a bit, but similar to Rubrik it comes with a price tag. But yea we are also getting some really good compression numbers so far using it as a repo.

  3. I moved to in-guest iSCSI from Pure volumes mounted over iSCSI as ESXi datastores. My reason was performance issues with SQL Server. I tried *everything* on the interwebs to get maximum throughput to/from Pure but simply couldn’t (go ahead, ask me if I tried x or y). As soon as I did this and enabled MPIO in the guest, I was getting close to 20 Gibps with two physical 10GigE NICs. As a performance engineer, I couldn’t settle for anything less.

    Unfortunately (there’s always a but), there was a problem that made my benchmark suites more complicated and of longer duration than necessary. Between performance testcases, I wanted to revert the VM to a pristine snapshot. Well, for obvious reasons, it’s not supported by VMWare. You can do it and get a running VM but data on one or more volumes will become corrupted. So I have to restore from a pristine backup before every test run. Tradeoffs, tradeoffs…

  4. PurePSO volume, iSCSI in-guest (but I generally use pure-file on FB/NFS for volumes).

    Even if my K8s clusters run on VMWare, I have chosen some alternative to create/manage them.

    I could install vmware storage plugin though but it looked to me that the principle of vmdk moving around and VMs depending on the location of the pod that attaches it was a bit sketchy, because vmdk are for VMs not for containers.

    With vVol and latest evolution this may change.

  5. in-guest iSCSI is awful and so is iSCSI on bare metal, we support hundreds of hosts on iSCSI. There are so many other operational complexities like delayed ack, dealing with IP addresses, blocking iSCSI IPs from talking to other iSCSI IP, difficult LUN migrations from legacy array to Pure, etc. Virtual disk and Fibre Channel are no-brainers.

    Customers using WSFC in Production have a choice to use RDM (no way), vvols (see below), dedicating physical blades (no $$$) or in-guest iSCSI.

    We support >700 SQL servers, many are clustered, physical or virtual. We had planned to move to vVols for WSFC and we were successful doing so. However we scrapped the entire project because VMware vVols, shared bus and SCSI-3 won’t support expanding disks online. So we have to take down customer facing clusters to expand disks.

    A complete non-starter and we were very disappointed. VMware needs to fix this.

    1. These are very good points and things we have heard more than a few times (and I agree with). I have passed these comments over to VMware PM and will lean on it. A few questions as they will ask me:
      -How often do you need to expand? Is it reasonable to make it just very large and due to thin provisioning that overage is okay?
      -Is there a move to SQL Always On that will make this less common?

      Thanks!

      Cody

      1. I actually work for Ben and was a bit amused to find this post.

        Our DBA’s place bulk requests to expand anywhere up to around 100 volumes on a regular basis (probably once every 1-2 weeks). We could possibly oversize the volumes, but that would create another problem in the fact that our internal customers are somewhat resource hungry and will quickly consume what’s presented to them.

        Some of our deployments use AAG’s, but it’s something that I believe is handled more on a case-by-case basis and I don’t believe there is any desire to migrate 100% to AAG’s.

        I know the disk expansion issue is a limitation of enabling physical bus sharing mode in vSphere and not a reflection on vVols, but there has to be some way that the ESXi kernel would be able to bypass the disk lock.

        It’s my understanding that this is still a limitation in vSphere 7.0 (including Update 1), so hopefully the folks at Pure and/or VMware can find a solution.

  6. Ha, well if you HAVE to use in-guest like I did because I needed a virtual SQL cluster with clustered disks using different storage policies. Wait, I know what you’re thinking, use vvols!

    vvols is indeed a great choice for per disk storage polcies and a windows failover cluster but not if you also want to use SRM to protect those wsfc nodes. Believe it or not, SRM does not (yet) support per disk storage policy configuration for it’s protected VMs. I assume the vvols team is moving too fast for the SRM team!

    Anyway, my hand was forced but there are decisions to be made when configuring in-guest iSCSI.

    Do you create 1 ‘iSCSI’ VM port group for your VMs or 2? You’re gonna want that multipathing and resilience at a guest level. If you create only 1 VM port group, you can’t override the physical adapters (think the equivalent to network port binding at a VM port group level).

    If you go with 2 VM port groups, do you inherit the vSwitch settings which will more than likely be 2 active adapters or override at VM port group level effectively tying each VM port group to 1 physical network adapter?

    Google won’t help you with that specific use case, VMware aren’t interested because it’s ‘in-guest’ even though it isn’t because the config we’re talking about is done in vCenter.

    Some food for thought if you do decide you have to go down that route…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.