vCenter Storage Provider “Refresh certificate” Functionality Restored

This will be a short blog, partially because my teammate Alex Carver already wrote a great blog that covers one workaround for this button not working that uses vCenter’s MOB.

If you have been using self-signed certificates in your vVols environment since vCenter 6.7 and updated to vCenter 7.0, you might have noticed something frustrating when trying to refresh those certificates manually: the button was greyed out! If you were like me, you were probably wondering why this useful functionality was removed and thought maybe it was for security reasons; your concerns might have been validated when searching VMware’s KB system and finding this KB that read like it was functionality that was removed on purpose (recently updated to reflect the current situation better).

Turns out my guess was wrong and that KB was a little misleading. VMware has brought this button’s functionality back in vCenter 7.0U3d and higher. You might say to yourself “that’s great Nelson, but I don’t upgrade my production vCenter whenever a new vCenter version comes out”. If you want a simpler workflow than re-creating the storage providers before you upgrade to newer versions of vCenter when the certificates expire eventually, Alex Carver has the method for you which uses vCenter’s MOB to refresh the storage providers without re-creating them.

Pure Storage’s vROps Management Pack ™ 3.2.0 – New Features and Changes

Pure Storage recently launched a new management pack for vROps that had a number of important fixes and some changes to the interface. You can download it here and find the full release notes here. What’s new?

  • Interface changes
    • Updated icons
    • Restructuring of Pure Storage objects in the Object view of vROps
  • Add Offload Snapshot capacity metric
  • Add FlashArray Software™ version property

Let’s go over the interface changes first. If you navigate to Environment -> Object Browser -> Pure Storage FlashArray -> FlashArray Resources -> PureStorage World and expand an array, the layout will look quite different than what was there before. For starters, the icons have almost all been updated to mirror what you would expect to see on a modern FlashArray Purity version (or vCenter if that is a vCenter object). We made this change to make the vROps management pack experience as close to the FlashArray experience as possible.

Additionally, we moved the structure of the objects around to be more consistent with what you’d expect from the FlashArray. No objects were removed and the same object can be listed in multiple places where it makes sense (for example, if you expand a Hosts group, you will see the pertinent volumes there as well as under the Volumes group).

Next, we’ve added the Offload Snapshot capacity metric in this version as well as a FlashArray Offload Target object. The Offload Target object is visible under Protection and you can see the current space used by that Offload Target in the badge for that object; additionally, there is a Capacity metric for this object that shows historical consumption.

Lastly, you can now retrieve the Purity version of the array directly from vROps to help plan your FlashArrays’ upgrades. This information is found by selecting a FlashArray and going to Metrics -> Properties -> Details -> Purity Version.

Native Pure Storage FlashArray™ File Replication – Purity 6.3


With the release of Purity 6.3, Native FA File replication has been added to the Pure Storage FlashArray™ software. This adds an often important feature to the FA File folder redirection solution I wrote about last year. Pure Storage is referring to this feature as ActiveDR for File Services.

ActiveDR for File Services is a useful feature if you’ve set up or are going to set up folder redirection on FA File and you would like the file data to be replicated asynchronously to a different array, whether that FlashArray hardware is at the same site or a different one. This feature is included with FlashArray.

This allows you to use your FlashArray for native block and file workloads that need the protection that replication provides and allow you to benefit from the great data reduction rate that FlashArray is known for with those replicated file sets.

Now, if you lose a site or an array for some reason, the file workload you have hosted on FA File can be recovered natively on FlashArray easily and quickly.

There are some differences between file and block workloads when it comes to ActiveDR replication. You can read more in the ActiveDR for File Services section of this Pure KB.

Horizon Folder Redirection Hosted on FlashArray™ File

Late last year, I wrote a KB for a solution that I wanted to bring up here- hosting Horizon’s VDI user directories on FlashArray™ File with folder redirection controlled through a group policy object (GPO). I’d like to discuss this for a couple of reasons:

1. Configuring FA File was surprising easy, especially compared to what I remember from setting up a Windows file server was for the same purpose in a previous role.
2. Why I landed on using folder redirection for this KB instead of roaming profiles or another solution for user shares in a VDI environment.

When I have managed or set up VDI environments from scratch in previous jobs, there were always a ton of considerations that went into the VDI environment. From determining the appropriate amount of virtual resources to deploy to each VDI user to determining how much hardware I actually needed to buy to support the full deployment, each step can be more painful than the last. Any opportunity we can take to help ourselves be successful in the project is a good step to invest in. But when that step is easier and I don’t have to invest any resources to get the benefit of improving the success of the project, I have to take a step back and appreciate what just went so well.

ComputerEntryFlashArrayConfiguration.png


It took me roughly 30 minutes to deploy and configure FA File in my existing Active Directory environment in my lab the first time. That included carefully digesting all the applicable new-to-me Pure documentation. From what I can recall with this process from my previous roles, that was at best a 2 hour job with a carefully put together and well documented Active Directory environment with automated Windows server deployments; at worst, that might have taken me a full day or two when I had to build everything from scratch. When any task took a day or more, I always had interrupts that would drag the process out and I ended up taking more time to review what I had done and what I needed to do from a documentation perspective.

AD create dialog.png


On the point of why I used folder redirection instead of roaming profiles with Active Directory, VMware has this very helpful KB that outlines decisions you might make if you are using Dynamic Environment manager (DEM), but I think a lot of the points are applicable even if you aren’t using DEM. I’d like to highlight some disadvantages they list of roaming profiles:

Disadvantages
-Large roaming profiles might get corrupted and cause the individual roaming profile to reset completely. As a result, users might spend a lot of time getting all personalized settings back.
-Roaming profiles do not roam across different operating systems. This results in multiple roaming profiles per user in a mixed environment, like desktops and Terminal Services.
-Potential for unnecessary growth of roaming profile, causing long login times.

When I saw these three specifically, I decided to go with folder redirection instead of roaming profiles. Anytime corruption is mentioned I try to avoid it. With VDI projects (let’s be real, most IT projects), you always want to minimize the impact to the end users partially because it will hurt adoption of it or reduce confidence from different groups in the company.

There is more to come with FA File and data protection, so please keep this blog in mind!

Validating SafeMode configuration on your Pure Storage Fleet with Pure1 API via PowerShell

If you are not familiar with Pure Safemode, you should be, check out the details here:

Or some of my thoughts in general here

Each FlashArray, Cloud Block Store, and FlashBlade has a built-in REST API, but so does Pure1–a place that aggregates all reporting for you in one API. Reporting on a Safemode configuration is a useful tool, to ensure our extra protections are configured (if they aren’t reach out to Pure support–for security reasons customers cannot turn it on themselves, nor off).

The Pure1 REST API has a beta release out (v1.1.b) that includes Safemode reporting in the arrays endpoint and it is super easy to pull via PowerShell.

Install the module (if you haven’t):

Install-Module PureStorage.Pure1

Create a new certificate (if you haven’t already) and retrieve the public key

Login to Pure1.purestorage.com as an admin and add the public key to Pure1 and get the application ID.

If you already created an app ID you do not need to do all of the above each time. Just once.

Now normally you can just connect, but the module auto-connects to the latest GA REST version by default, so before you do you need to set a variable to force it to the beta release:

Now connect.

Next, use Get-PureOneArrays and store it in a variable.

If you look at one of the results, you will see each array returned has a new property:

safe_mode

And there are additional properties available stating what is turned on (if at all)

If you see all-disabled, it means it is not enabled on that platform. Now keep in mind, this is a beta API so it may and likely will change by GA release of the REST–especially since our Safemode work is rapidly expanding on the storage platforms.

S.M.A.R.T. Alerts in vmkernel Log with FlashArray™ Hardware-backed Volumes

Hello- Nelson Elam, a Solutions Engineer on Cody’s team at Pure, guest-writing here again.

If you are a current Pure customer and have had ESXi issues that warranted you checking the vmkernel logs of a host, you may have noticed a significant amount of messages similar to this for SCSI:

Cmd(0x45d96d9e6f48) 0x85, CmdSN 0x6 from world 2099867 to dev "naa.624a9370f439f7c5a4ab425000024d83" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

Or this for NVMe-oF:

WARNING: NvmeScsi: 172: SCSI opcode 0x85 (0x45d9757eeb48) on path vmhba67:C0:T1:L258692 to namespace eui.00f439f7c5a4ab4224a937500003f285 failed with NVMe error status: 0x1 translating to SCSI error
ScsiDeviceIO: 4131: Cmd(0x45d9757eeb48) 0x85, CmdSN 0xc from world 2099855 to dev "eui.00f439f7c5a4ab4224a937500003f285" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

If you reached out to Pure Storage support to ask what the deal is with this, you were likely told that these are 0x85s and nothing to worry about because it’s a VMware error that doesn’t mean anything with Pure devices.

But why would this be logged and what is happening here?

ESXi regularly checks the S.M.A.R.T. status of attached storage devices, including for array-backed devices that aren’t local. When the SCSI command is received on the FlashArray software, it returns 0x85 with the following sense data back to the ESXi host:

failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

These can be quite challenging to decode. Luckily, virten.net has a powerful tool for decoding these. When I paste this output into that site, I get the following details:

TypeCodeNameDescription
Host Status[0x0]OKThis status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.
Device Status[0x2]CHECK_CONDITIONThis status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Plugin Status[0x0]GOODNo error. (ESXi 5.x / 6.x only)
Sense Key[0x5]ILLEGAL REQUEST
Additional Sense Data20/00INVALID COMMAND OPERATION CODE

The key thing here is the Sense Key which has a value of ILLEGAL REQUEST. The FlashArray software does not support S.M.A.R.T. SCSI requests from hosts, so the FlashArray software returns ILLEGAL REQUEST to the ESXi host to tell the host we don’t support that request type.

This is for two reasons:

1. Since the FlashArray software’s volumes are not a physically attached storage device on the ESXi host, S.M.A.R.T. from the ESXi host doesn’t really make sense.
2. The FlashArray software handles drive failures and drive health independent of ESXi and monitoring the health of these drives that back the volumes is handled by the FlashArray software, not ESXi. You can read more about this in this datasheet.

Great Nelson, thanks for explaining that. Why are you talking about this now?

Pure has been working with VMware to reduce the noise and unnecessary concern caused by these errors. Seeing a failed ScsiDeviceIO in your vmkernel logs is alarming. In vSphere 7.0U3c, VMware fixed this problem and this will now only log once this when the ESXi host boots up instead of as often as every 15 minutes.

This means that in vSphere 7.0U3c if you are doing any ESXi host troubleshooting you no longer have to concern yourself with these errors; for me, this means I won’t have to filter these out in my greps anymore when looking into an ESXi issue in my lab. Great news all around!

Refreshing a VM configuration from a vVol Snapshot

A lot of the time when we talk about vVols and snapshots we talk about restoring the virtual disks (the data vVols). This of course is a huge benefit of vVols–the virtual disks are 1:1 to a volume on the array so the snapshots (and other array features) can be used at a level of virtual disk. Need to restore a database on virtual disk B (the E:\ drive or whatever), just use the snapshot restore to instantly refresh the entire disk. No need to mount a copied datastore, resignature, remove the old disk etc. etc. Just copy from the snapshot to the vVol volume and re-mount the file system in the guest. Fast and easy.

VMware snapshots exist with vVols too–they create array-based copies. But when you restore from them, you restore the whole VM. And the existence of them complicate the VM configuration–extra pointers and files etc. So a common vVol option is to just temporarily use VMware snapshots for backup procedures or for one off protection of VMs while I run an upgrade etc and then delete it when it works.

What if I want to refresh the VM configuration from a snapshot? Keep the data on the disk as is, but refresh the VM config files (VMX mainly) from a snapshot?

This is possible from a VMFS but quite complex. For a vVol VM this is really simple. Process?

  1. Shutdown VM.
  2. Copy from snapshot to config volume.
  3. Reload VMX
  4. Power-on VM.

So for some background, in a vVol world the VM directory (which houses the VMX file, some logs, virtual disk pointers, and some other frivolities) looks like a folder. But in reality it is a logical pointer to a volume on the array. This volume is called a config vVol and each “directory” in the vVol datastore maps to one. This config vVol is actually a mini VMFS. See more details here

Since this is a volume, you can of course take snapshots of it. There are a few ways to do this, either create one off snapshots of it or through protection policies.

Continue reading “Refreshing a VM configuration from a vVol Snapshot”

Affecting Persistent Volumes in VMware Tanzu

Note: This is another guest blog by Kyle Grossmiller. Kyle is a Sr. Solutions Architect at Pure and works with Cody on all things VMware.

VMware Tanzu is a game-changing piece of technology for numerous reasons, but probably the most transformational piece of it is also the most apparent – it provides the capability for the vCenter admin to give resources for both consumers of traditional virtual machines as well as Kubernetes/DevOps users from the same set of compute hosts and storage. This consolidation means that the vCenter admin can more easily see what is being allocated where, as well as gaining insight into what application(s) might be candidates to make the move into a container-based environment from a virtual machine.

A Tanzu deployment is comprised of quite a few moving pieces and a central piece of this is durable storage made possible by persistent volumes. While container nodes and pods are ephemeral by nature (which is one of their major advantages), the data that they consume, produce and manipulate must be performant, portable and often, saved. So, there is obviously a different set of things we care about for persistent data vs the Kubernetes nodes that Tanzu runs in unison with here. For the remainder of this post we will show a couple of quick and easy ways you can change your persistent volumes to suit your application needs. There’s a bit of work and some choices to be made around getting a Tanzu environment up and running in vSphere, and I’d encourage you to check out the VMware Tanzu User Guide on our Pure Storage support site or Cody’s blog series to get some additional information.

With that being said, when a persistent volume is created via either dynamic or static provisioning, one of the first things the application developer needs to decide is what will happen to that volume and data when the application that uses it itself is no longer needed. The default behavior for an SPBM policy/storageclass assigned to a vSphere Namespace is to delete it, but through a simple kubectl patch command line, the persistent volume can be saved for future usage.

To make this change, first get the persistent volume name that you want to Retain/save:

 
$ kubectl get pv
 NAME           CAPACITY    RECLAIM POLICY   STATUS 
 pvc-f37c39fd   5Gi         Delete           Bound 

Next, apply this kubectl command line to it to switch the reclaim policy from Delete to Retain:

​$ kubectl patch pv (PV_Name) -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

So for our PV example:

$ kubectl patch pv pvc-f37c39fd -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

When we run the kubectl get pv command again, we can see it is set to Retain, so we are all set:

$ kubectl get pv
NAME           CAPACITY    RECLAIM POLICY   STATUS  
pvc-f37c39fd   5Gi         Retain           Bound          

If there is anything close to a certainty in the storage world – it is that the longer a volume exists, the more full of data it will become. This becomes even more of a certainty if a persistent volume is retained and reused across multiple application instances for increasing amounts of time. In the vSphere and Supervisor cluster 7.0U2 release VMware has introduced the capability for Online Volume Expansion. What this means is that while in previous versions users had to unbind their persistent volume claim from a pod or node prior to resizing it (otherwise known as offline volume expansion) – now they are able to accomplish that same operation without that step . This is a huge advantage as the offline expansion required that the volume be effectively be taken out of service when additional space was added to it, which could lead to application downtime. With the online volume expansion enhancement that annoyance goes away completely.

Online volume expansion operation is really simple to do. This time we find the persistent volume claim (which is basically the glue between the persistent volume and the application) that we need to expand:

$ kubectl get pvc
NAME             STATUS  VOLUME        CAPACITY
pvc-vvols-mysql  Bound   pvc-f37c39fd  5Gi      

Now we run the following patch command against the PVC name we found above so that it knows to request additional storage for the persistent volume that it is bound to. In this case, we will ask to expand from 5Gi to 6Gi:

$ kubectl patch pvc pvc-vvols-mysql -p '{"spec": {"resources": {"requests":{"storage": "6Gi"}}}}'

After waiting for a few moments for the expansion to complete, we look at the pvc in order to confirm we have the additional space that we asked for and we can see it has been added:

$ kubectl get pvc
NAME              STATUS   VOLUME         CAPACITY
pvc-vvols-mysql   Bound    pvc-f37c39fd   6Gi     

Taking a closer look at the PVC via the describe command shows that it indeed increased the PV size while it remained mounted to the mysql-deployment node under the events section:

$ kubectl describe pvc
Name:          pvc-vvols-mysql
Namespace:     default
StorageClass:  cns-vvols
Status:        Bound
Volume:        pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volumehealth.storage.kubernetes.io/health: accessible
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      6Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    mysql-deployment-5d8574cb78-xhhq5
Events:
  Type     Reason                      Age   From                                             Message
  ----     ------                      ----  ----                                             -------
  Warning  ExternalExpanding           52s   volume_expand                                    Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
  Normal   Resizing                    52s   external-resizer csi.vsphere.vmware.com          External resizer is resizing volume pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
  Normal   FileSystemResizeRequired    51s   external-resizer csi.vsphere.vmware.com          Require file system resize of volume on node
  Normal   FileSystemResizeSuccessful  40s   kubelet, tkc-120-workers-mbws2-68d7869b97-sdkgh  MountVolume.NodeExpandVolume succeeded for volume "pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c"

Those are just a couple of the ways we can update our persistent volumes to do what we need them to do within a Tanzu deployment, and we have really just scratched the surface with these few examples. To see how to do more advanced operations like migrating a persistent volume to a different Tanzu Kubernetes Cluster, please head over to our new Tanzu User Guide. Of course, it also is very important to mention that Portworx combined with Tanzu gives us even more features and functionalities like RBAC, automated backup and recovery and a whole lot more. Getting deeper into how Portworx interoperates with Tanzu is what I’m working on next so please stay tuned for some more cool stuff.

What’s new in Pure’s vROps Management Pack 3.1.1 – vVol VM Metrics

Hello- my name is Nelson Elam and I’m a Solutions Engineer at Pure Storage. I am guest writing this blog for use on Cody’s website. I work on Pure’s vROps Management Pack.

With the introduction of Pure’s version 3.1.1 Management Pack for vROps, we now have the ability to monitor vVol VMs and datastores. We’ve updated the user guide here with the new vVol content but I thought it would be nice to cover some of this on this blog as well.

The big feature to come in 3.1.1 is the ability to monitor your Pure-hosted vVol VMs and the volumes backing them. To see some examples of this after installing 3.1.1, first you’ll want to click on the Environment tab then click on FlashArray Resources on the left hand side:


Next expand PureStorage World, the array your Pure-hosted vVol VM is on, then your VM name, and finally expand the volume group (suffix of -vg unless someone has changed it on your array) to see the volumes backing that VM. There are other ways to navigate to the same objects from this list but I find this way to be the most intuitive.

For my example, I’m going to focus on the only data volume connected to this VM because that will be more interesting to me from a performance perspective. This volume is where the OS for this VM lives.

Looking at our VM, I’ve focused on Average IOPS and Average Latency (ms). Using the default view, we can see that on April 29th, we had a spike in average IOPS to 5224; this was when I powered on this VM. Otherwise, IOPS are really low on average. Looking at latency on this volume, we can see it peaks at .82 ms and generally stays below .5 ms.

There are many other metrics available to us on these vVol volumes, but you might recognize them because they are the same as any other volume on a Pure Storage FlashArray!

Thank you for taking the time to read this- I hope you found it helpful.

Creating a File Repository on a vVol Datastore with PowerShell

I wrote about a new feature in vSphere 7.0 U2 here:

What’s New in vSphere 7.0 U2 Storage: Creating a File Repository on a vVol Datastore

This allows you to create a large config vVol in a vVol datastore to store large files. Use cases like ISO repositories etc.

This was at launch only available in the direct API, but as of PowerCLI 12.3 it is available in PowerCLI as well.

Pretty simple to do.

Login to vCenter:

connect-viserver -Server vcenter-70-siteb

Get the datastore manager object from the service instance of vCenter:

Continue reading “Creating a File Repository on a vVol Datastore with PowerShell”