S.M.A.R.T. Alerts in vmkernel Log with FlashArray™ Hardware-backed Volumes

Hello- Nelson Elam, a Solutions Engineer on Cody’s team at Pure, guest-writing here again.

If you are a current Pure customer and have had ESXi issues that warranted you checking the vmkernel logs of a host, you may have noticed a significant amount of messages similar to this for SCSI:

Cmd(0x45d96d9e6f48) 0x85, CmdSN 0x6 from world 2099867 to dev "naa.624a9370f439f7c5a4ab425000024d83" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

Or this for NVMe-oF:

WARNING: NvmeScsi: 172: SCSI opcode 0x85 (0x45d9757eeb48) on path vmhba67:C0:T1:L258692 to namespace eui.00f439f7c5a4ab4224a937500003f285 failed with NVMe error status: 0x1 translating to SCSI error
ScsiDeviceIO: 4131: Cmd(0x45d9757eeb48) 0x85, CmdSN 0xc from world 2099855 to dev "eui.00f439f7c5a4ab4224a937500003f285" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

If you reached out to Pure Storage support to ask what the deal is with this, you were likely told that these are 0x85s and nothing to worry about because it’s a VMware error that doesn’t mean anything with Pure devices.

But why would this be logged and what is happening here?

ESXi regularly checks the S.M.A.R.T. status of attached storage devices, including for array-backed devices that aren’t local. When the SCSI command is received on the FlashArray software, it returns 0x85 with the following sense data back to the ESXi host:

failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

These can be quite challenging to decode. Luckily, virten.net has a powerful tool for decoding these. When I paste this output into that site, I get the following details:

TypeCodeNameDescription
Host Status[0x0]OKThis status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.
Device Status[0x2]CHECK_CONDITIONThis status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Plugin Status[0x0]GOODNo error. (ESXi 5.x / 6.x only)
Sense Key[0x5]ILLEGAL REQUEST
Additional Sense Data20/00INVALID COMMAND OPERATION CODE

The key thing here is the Sense Key which has a value of ILLEGAL REQUEST. The FlashArray software does not support S.M.A.R.T. SCSI requests from hosts, so the FlashArray software returns ILLEGAL REQUEST to the ESXi host to tell the host we don’t support that request type.

This is for two reasons:

1. Since the FlashArray software’s volumes are not a physically attached storage device on the ESXi host, S.M.A.R.T. from the ESXi host doesn’t really make sense.
2. The FlashArray software handles drive failures and drive health independent of ESXi and monitoring the health of these drives that back the volumes is handled by the FlashArray software, not ESXi. You can read more about this in this datasheet.

Great Nelson, thanks for explaining that. Why are you talking about this now?

Pure has been working with VMware to reduce the noise and unnecessary concern caused by these errors. Seeing a failed ScsiDeviceIO in your vmkernel logs is alarming. In vSphere 7.0U3c, VMware fixed this problem and this will now only log once this when the ESXi host boots up instead of as often as every 15 minutes.

This means that in vSphere 7.0U3c if you are doing any ESXi host troubleshooting you no longer have to concern yourself with these errors; for me, this means I won’t have to filter these out in my greps anymore when looking into an ESXi issue in my lab. Great news all around!

vSphere Remote Plugin: .local vCenter Domains

Hello- Nelson Elam here! I’m a VMware Solutions Engineer at Pure Storage and wanted to make you aware of an issue we’ve seen crop up a couple of times recently with our vSphere Remote Plugin and provide a quick explanation.

If your vCenter uses a .local domain (vcenter.purestorage.local is one example), you might have seen the following 3 errors in Pure’s vSphere Remote Plugin in vCenter:

  1. In the FlashArray list page, the error “Error retrieving array list. Please try again later.” is returned.
    clipboard_eaaa133673603ef70ba1a091e33b493c7.png
  2. When trying to import arrays via Pure1, the error “Authenticate with Pure1 to use this feature” is returned despite previously successful registration with Pure1 through the plugin.
  3. When adding an array manually, a “no permissions” error is returned.

Resolution:
To resolve this, follow step 14 from the Online Deployment Procedure for the remote plugin by running this command after customizing it to your environment:
pureuser@purestorage-vmware-appliance:~$ puredns setattr --search {your .local domain} --nameservers {ip or FQDN of DNS server}

So what’s going on here? When the OVA where you deployed the Remote vSphere Plugin tries to reach out to your vCenter with a .local domain suffix, it can’t resolve the DNS address unless you’ve provided the appropriate search domain for the OVA and will return different errors depending on where you are trying to interact with it in vCenter.

Luckily this is a simple fix despite the seemingly unrelated errors that pop up. Hopefully this was helpful!

Refreshing a VM configuration from a vVol Snapshot

A lot of the time when we talk about vVols and snapshots we talk about restoring the virtual disks (the data vVols). This of course is a huge benefit of vVols–the virtual disks are 1:1 to a volume on the array so the snapshots (and other array features) can be used at a level of virtual disk. Need to restore a database on virtual disk B (the E:\ drive or whatever), just use the snapshot restore to instantly refresh the entire disk. No need to mount a copied datastore, resignature, remove the old disk etc. etc. Just copy from the snapshot to the vVol volume and re-mount the file system in the guest. Fast and easy.

VMware snapshots exist with vVols too–they create array-based copies. But when you restore from them, you restore the whole VM. And the existence of them complicate the VM configuration–extra pointers and files etc. So a common vVol option is to just temporarily use VMware snapshots for backup procedures or for one off protection of VMs while I run an upgrade etc and then delete it when it works.

What if I want to refresh the VM configuration from a snapshot? Keep the data on the disk as is, but refresh the VM config files (VMX mainly) from a snapshot?

This is possible from a VMFS but quite complex. For a vVol VM this is really simple. Process?

  1. Shutdown VM.
  2. Copy from snapshot to config volume.
  3. Reload VMX
  4. Power-on VM.

So for some background, in a vVol world the VM directory (which houses the VMX file, some logs, virtual disk pointers, and some other frivolities) looks like a folder. But in reality it is a logical pointer to a volume on the array. This volume is called a config vVol and each “directory” in the vVol datastore maps to one. This config vVol is actually a mini VMFS. See more details here

Since this is a volume, you can of course take snapshots of it. There are a few ways to do this, either create one off snapshots of it or through protection policies.

Continue reading “Refreshing a VM configuration from a vVol Snapshot”

Affecting Persistent Volumes in VMware Tanzu

Note: This is another guest blog by Kyle Grossmiller. Kyle is a Sr. Solutions Architect at Pure and works with Cody on all things VMware.

VMware Tanzu is a game-changing piece of technology for numerous reasons, but probably the most transformational piece of it is also the most apparent – it provides the capability for the vCenter admin to give resources for both consumers of traditional virtual machines as well as Kubernetes/DevOps users from the same set of compute hosts and storage. This consolidation means that the vCenter admin can more easily see what is being allocated where, as well as gaining insight into what application(s) might be candidates to make the move into a container-based environment from a virtual machine.

A Tanzu deployment is comprised of quite a few moving pieces and a central piece of this is durable storage made possible by persistent volumes. While container nodes and pods are ephemeral by nature (which is one of their major advantages), the data that they consume, produce and manipulate must be performant, portable and often, saved. So, there is obviously a different set of things we care about for persistent data vs the Kubernetes nodes that Tanzu runs in unison with here. For the remainder of this post we will show a couple of quick and easy ways you can change your persistent volumes to suit your application needs. There’s a bit of work and some choices to be made around getting a Tanzu environment up and running in vSphere, and I’d encourage you to check out the VMware Tanzu User Guide on our Pure Storage support site or Cody’s blog series to get some additional information.

With that being said, when a persistent volume is created via either dynamic or static provisioning, one of the first things the application developer needs to decide is what will happen to that volume and data when the application that uses it itself is no longer needed. The default behavior for an SPBM policy/storageclass assigned to a vSphere Namespace is to delete it, but through a simple kubectl patch command line, the persistent volume can be saved for future usage.

To make this change, first get the persistent volume name that you want to Retain/save:

 
$ kubectl get pv
 NAME           CAPACITY    RECLAIM POLICY   STATUS 
 pvc-f37c39fd   5Gi         Delete           Bound 

Next, apply this kubectl command line to it to switch the reclaim policy from Delete to Retain:

​$ kubectl patch pv (PV_Name) -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

So for our PV example:

$ kubectl patch pv pvc-f37c39fd -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

When we run the kubectl get pv command again, we can see it is set to Retain, so we are all set:

$ kubectl get pv
NAME           CAPACITY    RECLAIM POLICY   STATUS  
pvc-f37c39fd   5Gi         Retain           Bound          

If there is anything close to a certainty in the storage world – it is that the longer a volume exists, the more full of data it will become. This becomes even more of a certainty if a persistent volume is retained and reused across multiple application instances for increasing amounts of time. In the vSphere and Supervisor cluster 7.0U2 release VMware has introduced the capability for Online Volume Expansion. What this means is that while in previous versions users had to unbind their persistent volume claim from a pod or node prior to resizing it (otherwise known as offline volume expansion) – now they are able to accomplish that same operation without that step . This is a huge advantage as the offline expansion required that the volume be effectively be taken out of service when additional space was added to it, which could lead to application downtime. With the online volume expansion enhancement that annoyance goes away completely.

Online volume expansion operation is really simple to do. This time we find the persistent volume claim (which is basically the glue between the persistent volume and the application) that we need to expand:

$ kubectl get pvc
NAME             STATUS  VOLUME        CAPACITY
pvc-vvols-mysql  Bound   pvc-f37c39fd  5Gi      

Now we run the following patch command against the PVC name we found above so that it knows to request additional storage for the persistent volume that it is bound to. In this case, we will ask to expand from 5Gi to 6Gi:

$ kubectl patch pvc pvc-vvols-mysql -p '{"spec": {"resources": {"requests":{"storage": "6Gi"}}}}'

After waiting for a few moments for the expansion to complete, we look at the pvc in order to confirm we have the additional space that we asked for and we can see it has been added:

$ kubectl get pvc
NAME              STATUS   VOLUME         CAPACITY
pvc-vvols-mysql   Bound    pvc-f37c39fd   6Gi     

Taking a closer look at the PVC via the describe command shows that it indeed increased the PV size while it remained mounted to the mysql-deployment node under the events section:

$ kubectl describe pvc
Name:          pvc-vvols-mysql
Namespace:     default
StorageClass:  cns-vvols
Status:        Bound
Volume:        pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volumehealth.storage.kubernetes.io/health: accessible
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      6Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    mysql-deployment-5d8574cb78-xhhq5
Events:
  Type     Reason                      Age   From                                             Message
  ----     ------                      ----  ----                                             -------
  Warning  ExternalExpanding           52s   volume_expand                                    Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
  Normal   Resizing                    52s   external-resizer csi.vsphere.vmware.com          External resizer is resizing volume pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
  Normal   FileSystemResizeRequired    51s   external-resizer csi.vsphere.vmware.com          Require file system resize of volume on node
  Normal   FileSystemResizeSuccessful  40s   kubelet, tkc-120-workers-mbws2-68d7869b97-sdkgh  MountVolume.NodeExpandVolume succeeded for volume "pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c"

Those are just a couple of the ways we can update our persistent volumes to do what we need them to do within a Tanzu deployment, and we have really just scratched the surface with these few examples. To see how to do more advanced operations like migrating a persistent volume to a different Tanzu Kubernetes Cluster, please head over to our new Tanzu User Guide. Of course, it also is very important to mention that Portworx combined with Tanzu gives us even more features and functionalities like RBAC, automated backup and recovery and a whole lot more. Getting deeper into how Portworx interoperates with Tanzu is what I’m working on next so please stay tuned for some more cool stuff.

What’s new in Pure’s vROps Management Pack 3.1.1 – vVol VM Metrics

Hello- my name is Nelson Elam and I’m a Solutions Engineer at Pure Storage. I am guest writing this blog for use on Cody’s website. I work on Pure’s vROps Management Pack.

With the introduction of Pure’s version 3.1.1 Management Pack for vROps, we now have the ability to monitor vVol VMs and datastores. We’ve updated the user guide here with the new vVol content but I thought it would be nice to cover some of this on this blog as well.

The big feature to come in 3.1.1 is the ability to monitor your Pure-hosted vVol VMs and the volumes backing them. To see some examples of this after installing 3.1.1, first you’ll want to click on the Environment tab then click on FlashArray Resources on the left hand side:


Next expand PureStorage World, the array your Pure-hosted vVol VM is on, then your VM name, and finally expand the volume group (suffix of -vg unless someone has changed it on your array) to see the volumes backing that VM. There are other ways to navigate to the same objects from this list but I find this way to be the most intuitive.

For my example, I’m going to focus on the only data volume connected to this VM because that will be more interesting to me from a performance perspective. This volume is where the OS for this VM lives.

Looking at our VM, I’ve focused on Average IOPS and Average Latency (ms). Using the default view, we can see that on April 29th, we had a spike in average IOPS to 5224; this was when I powered on this VM. Otherwise, IOPS are really low on average. Looking at latency on this volume, we can see it peaks at .82 ms and generally stays below .5 ms.

There are many other metrics available to us on these vVol volumes, but you might recognize them because they are the same as any other volume on a Pure Storage FlashArray!

Thank you for taking the time to read this- I hope you found it helpful.

Creating a File Repository on a vVol Datastore with PowerShell

I wrote about a new feature in vSphere 7.0 U2 here:

What’s New in vSphere 7.0 U2 Storage: Creating a File Repository on a vVol Datastore

This allows you to create a large config vVol in a vVol datastore to store large files. Use cases like ISO repositories etc.

This was at launch only available in the direct API, but as of PowerCLI 12.3 it is available in PowerCLI as well.

Pretty simple to do.

Login to vCenter:

connect-viserver -Server vcenter-70-siteb

Get the datastore manager object from the service instance of vCenter:

Continue reading “Creating a File Repository on a vVol Datastore with PowerShell”

What’s New in vSphere 7.0 U2 Storage: Creating a File Repository on a vVol Datastore

This feature has many names. Creating a larger config vVol. Creating a sub-vVol datastore. Creating an ISO repository. Etc.

In 7.0U2, VMware added a new feature that supports creating a custom size config vVol–while this was technically possible in earlier releases, it was not supported. Also, I should note that this is not supported by all vVol vendors, so of course speak to your vendor first.

First to review what a config vVol is check out this post:

What is a Config VVol Anyways?

In short, it is a mini VMFS that gets created when you create a directory in a vVol datastore (most commonly created by creating a new VM). This defaults to 4 GB in size. Enough to store the general VM files; some logs, VMDK pointers, vmx file, and some other frivolities.

The issue though is that this was not large enough to store large things like ISOs or vib files or whatever. So if you tried to upload something to a vVol datastore folder it would fail with an out-of-space issue. And you cannot upload to the root of a vVol datastore because a vVol datastore is not a file system. So you had to use VMFS or NFS to store those objects.

This is no longer the case.

Continue reading “What’s New in vSphere 7.0 U2 Storage: Creating a File Repository on a vVol Datastore”

What’s New in vSphere 7.0 U2 Storage: Increased iSCSI Path Limit

A continuation of the series I started here.

This is a simple but important one. iSCSI path limits. Since the dawn of human, ESXi has had a disparity in path limits between iSCSI and Fibre Channel. 32 paths for FC and 8 (8!) paths for iSCSI.

Continue reading “What’s New in vSphere 7.0 U2 Storage: Increased iSCSI Path Limit”

What’s New in vSphere 7.0 U2 Storage: Multiple SPBM Configurations

The vSphere “update” releases are much more significant than they used to be–traditionally most of the new features came in the major releases. 6.5, 6.7, etc. vSphere 7.0U2 just released and there are quite a bit of storage-related features.

One of the features discussed here:

https://blogs.vmware.com/virtualblocks/2021/03/09/vsphere-7-u2-core-storage/

…is called “SPBM Multiple Snapshot Rule Enhancements“.

What does that mean?

Continue reading “What’s New in vSphere 7.0 U2 Storage: Multiple SPBM Configurations”