Migrating From SCSI To NVMe on vCenter (Part 1 – Live Migration)

This is going to be broken up into two parts- first, a live migration where no VMs get powered off during the migration; second, a migration where you temporarily power off VMs attached to the SCSI datastore.

Why would you want to do it one way or another?

Pros of live migration:

  • No VM downtime
  • Simpler configuration changes and overlap. Less to go wrong or mess up

Pros of powering off VMs:

  • The total migration time will be significantly less because no data will have to be moved. Currently VMware doesn’t support XCOPY (even on the same array) for NVMe-oF

Great, you’ve decided on a live migration for your VMs because you don’t care about how long it takes; you just want to minimize downtime of your VMs as much as possible. If you haven’t already, you’ll need to follow the guides Pure Storage has for setting up NVMe-oF in your environment.

Once you’ve configured NVMe-oF in your environment, you’ll need to create the namespace (volume), connect it to the appropriate host group, create the NVMe-oF datastore in vCenter and finally storage vMotion your VMs from the SCSI datastore to the NVMe datastore.

Create the Volume

From a FlashArray perspective, this is identical to SCSI except for the slightly different terms and labels. Cody wrote a nice article explaining the differences. Log into your FlashArray, select (1) Storage then (2) Volumes then click the (3) + on the right hand side of the GUI.

In the window that pops up, populate a (1) Name for the namespace (volume), give it a (2) Provisioned Size then click (3) Create.

Note the volume serial number by going to (1) Storage then (2) Volumes, finding the name of your (3) Volume, then (4) clicking on the hyperlink name of it.

On the next window, note the Serial of the volume. We will use this later in vCenter to validate that we are connecting the right namespace.

Connect The Volume To the Appropriate Host Group

Still in the FlashArray GUI, go back to (1) Storage, select (2) Hosts, then select the (3) Host Group you have created for your NVMe-oF hosts. In this case, I am setting this up for NVMe-FC but the steps will be the same for NVMe-RoCE after you have followed the previously linked KB articles.

Next, click the three vertical dots (I think this is called a hamburger) and select Connect.

For the last step in the FlashArray GUI, select the (1) Namespace (volume) you created before then click (2) Connect.

Create The NVMe-oF Datastore

Switching over to vCenter, we’ll first want to create a datastore from the namespace that we’ve just presented to our host group. This process is easier than with SCSI datastores because you do not have to rescan the storage adapters- all you need to do is create a datastore on top of the NVMe namespace that is already present.

(1) Right click on the vSphere cluster you’ve presented the namespace to, hover over (2) Storage, then click (3) New Datastore.

Select (1) VMFS (currently vVols is unsupported by VMware with NVMe-oF) and click (2) Next.

Specify a (1) Name for your datastore, (2) Select a host that the namespace was presented to, select the (3) namespace from the list and click (4) Next. Validate the serial number of the namespace (volume) from the FlashArray GUI before in the Name column.

Select (1) VMFS 6 (who uses 5 anymore anyways?!) and click (2) Next.

Click (1) Next.

Review the details and click (1) Finish.

Validate the hosts are connected to your newly created NVMe-oF datastore by going to the (1) Storage tab, selecting the (2) Datastore Name and clicking on the (3) Hosts tab. If anything looks incorrect here (not all hosts from the cluster are connected, etc), please review your NVMe-oF configuration for issues.

Storage vMotion the VMs from SCSI-backed Datastore(s) to NVMe-backed Datastore(s)

Staying in the vCenter GUI, select the (1) Hosts and Clusters tab, right click on the (2) VM you want to migrate from SCSI to NVMe then select (3) Migrate… from the list that pops up.

Select (1) Change storage only from the window that pops up and click (2) Next.

Select the (1) NVMe datastore you created before then click (2) Next. Optionally you can modify the storage policies for the VM and the virtual disk format.

Finally, verify the details of the migration and click (1) Finish.

And now wait until the VM has migrated to the NVMe-oF datastore. Migrations in general can be very daunting, but luckily with NVMe-oF, it can be extremely simple. Hopefully you found this helpful.

Pure Storage vSphere Remote Plugin™ 5.1.0 launch: vVol Point-in-Time Recovery

We are excited to announce the launch of the latest version of Pure Storage’s remote vSphere plugin, 5.1.0. It includes a number of bug fixes PLUS a highly sought after feature: vVols VM point-in-time (PiT) recovery!

Why am I excited about this feature?

With vVol PiT VM recovery, you can now easily recover an entire VM that was accidentally deleted (and eradicated) or you can restore the state of a VM back to a point in time that you took a snapshot from vCenter directly while using Pure’s vSphere plugin.

The requirements of this are Pure’s vSphere remote plugin 5.1.0 and Purity™ 6.2.6 or higher for PiT revert and for PiT VM undelete with a vVol VM that has had its FlashArray™ volumes eradicated from the FlashArray itself. If you’re undeleting a vVol VM that has not been eradicated yet, that functionality is present for Purity versions 6.1 and lower.

For PiT VM revert, you will also need to make sure that you have snapshots of all of the volumes associated with the vVol VM except swap- at least one data volume and one configuration volume.

For VM undelete before the volumes have been eradicated, you will need a snapshot of the vVol VM’s configuration volume.

For VM undelete after the vVol-backed VM has been eradicated, you’ll need a FlashArray protection group snapshot of all the VM’s data volumes, managed snapshots and configuration volumes.

Rather than rehash what my teammate Alex Carver has put a lot of work into, I’m just going to link to the KB and videos he created:

Download the new plugin (part of Pure’s OVA), read the release notes and test out vVol PiT recovery today! Like a lot of things, it’s better to have some understanding of what’s happening and why before needing something that might be part of your recovery process. Please note that you can also upgrade in-place from 5.0.0 to 5.1.0 (and future remote plugin releases) by following this guide.

vCenter Storage Provider “Refresh certificate” Functionality Restored

This will be a short blog, partially because my teammate Alex Carver already wrote a great blog that covers one workaround for this button not working that uses vCenter’s MOB.

If you have been using self-signed certificates in your vVols environment since vCenter 6.7 and updated to vCenter 7.0, you might have noticed something frustrating when trying to refresh those certificates manually: the button was greyed out! If you were like me, you were probably wondering why this useful functionality was removed and thought maybe it was for security reasons; your concerns might have been validated when searching VMware’s KB system and finding this KB that read like it was functionality that was removed on purpose (recently updated to reflect the current situation better).

Turns out my guess was wrong and that KB was a little misleading. VMware has brought this button’s functionality back in vCenter 7.0U3d and higher. You might say to yourself “that’s great Nelson, but I don’t upgrade my production vCenter whenever a new vCenter version comes out”. If you want a simpler workflow than re-creating the storage providers before you upgrade to newer versions of vCenter when the certificates expire eventually, Alex Carver has the method for you which uses vCenter’s MOB to refresh the storage providers without re-creating them.

Pure Storage’s vROps Management Pack ™ 3.2.0 – New Features and Changes

Pure Storage recently launched a new management pack for vROps that had a number of important fixes and some changes to the interface. You can download it here and find the full release notes here. What’s new?

  • Interface changes
    • Updated icons
    • Restructuring of Pure Storage objects in the Object view of vROps
  • Add Offload Snapshot capacity metric
  • Add FlashArray Software™ version property

Let’s go over the interface changes first. If you navigate to Environment -> Object Browser -> Pure Storage FlashArray -> FlashArray Resources -> PureStorage World and expand an array, the layout will look quite different than what was there before. For starters, the icons have almost all been updated to mirror what you would expect to see on a modern FlashArray Purity version (or vCenter if that is a vCenter object). We made this change to make the vROps management pack experience as close to the FlashArray experience as possible.

Additionally, we moved the structure of the objects around to be more consistent with what you’d expect from the FlashArray. No objects were removed and the same object can be listed in multiple places where it makes sense (for example, if you expand a Hosts group, you will see the pertinent volumes there as well as under the Volumes group).

Next, we’ve added the Offload Snapshot capacity metric in this version as well as a FlashArray Offload Target object. The Offload Target object is visible under Protection and you can see the current space used by that Offload Target in the badge for that object; additionally, there is a Capacity metric for this object that shows historical consumption.

Lastly, you can now retrieve the Purity version of the array directly from vROps to help plan your FlashArrays’ upgrades. This information is found by selecting a FlashArray and going to Metrics -> Properties -> Details -> Purity Version.

Native Pure Storage FlashArray™ File Replication – Purity 6.3


With the release of Purity 6.3, Native FA File replication has been added to the Pure Storage FlashArray™ software. This adds an often important feature to the FA File folder redirection solution I wrote about last year. Pure Storage is referring to this feature as ActiveDR for File Services.

ActiveDR for File Services is a useful feature if you’ve set up or are going to set up folder redirection on FA File and you would like the file data to be replicated asynchronously to a different array, whether that FlashArray hardware is at the same site or a different one. This feature is included with FlashArray.

This allows you to use your FlashArray for native block and file workloads that need the protection that replication provides and allow you to benefit from the great data reduction rate that FlashArray is known for with those replicated file sets.

Now, if you lose a site or an array for some reason, the file workload you have hosted on FA File can be recovered natively on FlashArray easily and quickly.

There are some differences between file and block workloads when it comes to ActiveDR replication. You can read more in the ActiveDR for File Services section of this Pure KB.

Horizon Folder Redirection Hosted on FlashArray™ File

Late last year, I wrote a KB for a solution that I wanted to bring up here- hosting Horizon’s VDI user directories on FlashArray™ File with folder redirection controlled through a group policy object (GPO). I’d like to discuss this for a couple of reasons:

1. Configuring FA File was surprising easy, especially compared to what I remember from setting up a Windows file server was for the same purpose in a previous role.
2. Why I landed on using folder redirection for this KB instead of roaming profiles or another solution for user shares in a VDI environment.

When I have managed or set up VDI environments from scratch in previous jobs, there were always a ton of considerations that went into the VDI environment. From determining the appropriate amount of virtual resources to deploy to each VDI user to determining how much hardware I actually needed to buy to support the full deployment, each step can be more painful than the last. Any opportunity we can take to help ourselves be successful in the project is a good step to invest in. But when that step is easier and I don’t have to invest any resources to get the benefit of improving the success of the project, I have to take a step back and appreciate what just went so well.

ComputerEntryFlashArrayConfiguration.png


It took me roughly 30 minutes to deploy and configure FA File in my existing Active Directory environment in my lab the first time. That included carefully digesting all the applicable new-to-me Pure documentation. From what I can recall with this process from my previous roles, that was at best a 2 hour job with a carefully put together and well documented Active Directory environment with automated Windows server deployments; at worst, that might have taken me a full day or two when I had to build everything from scratch. When any task took a day or more, I always had interrupts that would drag the process out and I ended up taking more time to review what I had done and what I needed to do from a documentation perspective.

AD create dialog.png


On the point of why I used folder redirection instead of roaming profiles with Active Directory, VMware has this very helpful KB that outlines decisions you might make if you are using Dynamic Environment manager (DEM), but I think a lot of the points are applicable even if you aren’t using DEM. I’d like to highlight some disadvantages they list of roaming profiles:

Disadvantages
-Large roaming profiles might get corrupted and cause the individual roaming profile to reset completely. As a result, users might spend a lot of time getting all personalized settings back.
-Roaming profiles do not roam across different operating systems. This results in multiple roaming profiles per user in a mixed environment, like desktops and Terminal Services.
-Potential for unnecessary growth of roaming profile, causing long login times.

When I saw these three specifically, I decided to go with folder redirection instead of roaming profiles. Anytime corruption is mentioned I try to avoid it. With VDI projects (let’s be real, most IT projects), you always want to minimize the impact to the end users partially because it will hurt adoption of it or reduce confidence from different groups in the company.

There is more to come with FA File and data protection, so please keep this blog in mind!

Validating SafeMode configuration on your Pure Storage Fleet with Pure1 API via PowerShell

If you are not familiar with Pure Safemode, you should be, check out the details here:

Or some of my thoughts in general here

Each FlashArray, Cloud Block Store, and FlashBlade has a built-in REST API, but so does Pure1–a place that aggregates all reporting for you in one API. Reporting on a Safemode configuration is a useful tool, to ensure our extra protections are configured (if they aren’t reach out to Pure support–for security reasons customers cannot turn it on themselves, nor off).

The Pure1 REST API has a beta release out (v1.1.b) that includes Safemode reporting in the arrays endpoint and it is super easy to pull via PowerShell.

Install the module (if you haven’t):

Install-Module PureStorage.Pure1

Create a new certificate (if you haven’t already) and retrieve the public key

Login to Pure1.purestorage.com as an admin and add the public key to Pure1 and get the application ID.

If you already created an app ID you do not need to do all of the above each time. Just once.

Now normally you can just connect, but the module auto-connects to the latest GA REST version by default, so before you do you need to set a variable to force it to the beta release:

Now connect.

Next, use Get-PureOneArrays and store it in a variable.

If you look at one of the results, you will see each array returned has a new property:

safe_mode

And there are additional properties available stating what is turned on (if at all)

If you see all-disabled, it means it is not enabled on that platform. Now keep in mind, this is a beta API so it may and likely will change by GA release of the REST–especially since our Safemode work is rapidly expanding on the storage platforms.

S.M.A.R.T. Alerts in vmkernel Log with FlashArray™ Hardware-backed Volumes

Hello- Nelson Elam, a Solutions Engineer on Cody’s team at Pure, guest-writing here again.

If you are a current Pure customer and have had ESXi issues that warranted you checking the vmkernel logs of a host, you may have noticed a significant amount of messages similar to this for SCSI:

Cmd(0x45d96d9e6f48) 0x85, CmdSN 0x6 from world 2099867 to dev "naa.624a9370f439f7c5a4ab425000024d83" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

Or this for NVMe-oF:

WARNING: NvmeScsi: 172: SCSI opcode 0x85 (0x45d9757eeb48) on path vmhba67:C0:T1:L258692 to namespace eui.00f439f7c5a4ab4224a937500003f285 failed with NVMe error status: 0x1 translating to SCSI error
ScsiDeviceIO: 4131: Cmd(0x45d9757eeb48) 0x85, CmdSN 0xc from world 2099855 to dev "eui.00f439f7c5a4ab4224a937500003f285" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

If you reached out to Pure Storage support to ask what the deal is with this, you were likely told that these are 0x85s and nothing to worry about because it’s a VMware error that doesn’t mean anything with Pure devices.

But why would this be logged and what is happening here?

ESXi regularly checks the S.M.A.R.T. status of attached storage devices, including for array-backed devices that aren’t local. When the SCSI command is received on the FlashArray software, it returns 0x85 with the following sense data back to the ESXi host:

failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

These can be quite challenging to decode. Luckily, virten.net has a powerful tool for decoding these. When I paste this output into that site, I get the following details:

TypeCodeNameDescription
Host Status[0x0]OKThis status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.
Device Status[0x2]CHECK_CONDITIONThis status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Plugin Status[0x0]GOODNo error. (ESXi 5.x / 6.x only)
Sense Key[0x5]ILLEGAL REQUEST
Additional Sense Data20/00INVALID COMMAND OPERATION CODE

The key thing here is the Sense Key which has a value of ILLEGAL REQUEST. The FlashArray software does not support S.M.A.R.T. SCSI requests from hosts, so the FlashArray software returns ILLEGAL REQUEST to the ESXi host to tell the host we don’t support that request type.

This is for two reasons:

1. Since the FlashArray software’s volumes are not a physically attached storage device on the ESXi host, S.M.A.R.T. from the ESXi host doesn’t really make sense.
2. The FlashArray software handles drive failures and drive health independent of ESXi and monitoring the health of these drives that back the volumes is handled by the FlashArray software, not ESXi. You can read more about this in this datasheet.

Great Nelson, thanks for explaining that. Why are you talking about this now?

Pure has been working with VMware to reduce the noise and unnecessary concern caused by these errors. Seeing a failed ScsiDeviceIO in your vmkernel logs is alarming. In vSphere 7.0U3c, VMware fixed this problem and this will now only log once this when the ESXi host boots up instead of as often as every 15 minutes.

This means that in vSphere 7.0U3c if you are doing any ESXi host troubleshooting you no longer have to concern yourself with these errors; for me, this means I won’t have to filter these out in my greps anymore when looking into an ESXi issue in my lab. Great news all around!

Refreshing a VM configuration from a vVol Snapshot

A lot of the time when we talk about vVols and snapshots we talk about restoring the virtual disks (the data vVols). This of course is a huge benefit of vVols–the virtual disks are 1:1 to a volume on the array so the snapshots (and other array features) can be used at a level of virtual disk. Need to restore a database on virtual disk B (the E:\ drive or whatever), just use the snapshot restore to instantly refresh the entire disk. No need to mount a copied datastore, resignature, remove the old disk etc. etc. Just copy from the snapshot to the vVol volume and re-mount the file system in the guest. Fast and easy.

VMware snapshots exist with vVols too–they create array-based copies. But when you restore from them, you restore the whole VM. And the existence of them complicate the VM configuration–extra pointers and files etc. So a common vVol option is to just temporarily use VMware snapshots for backup procedures or for one off protection of VMs while I run an upgrade etc and then delete it when it works.

What if I want to refresh the VM configuration from a snapshot? Keep the data on the disk as is, but refresh the VM config files (VMX mainly) from a snapshot?

This is possible from a VMFS but quite complex. For a vVol VM this is really simple. Process?

  1. Shutdown VM.
  2. Copy from snapshot to config volume.
  3. Reload VMX
  4. Power-on VM.

So for some background, in a vVol world the VM directory (which houses the VMX file, some logs, virtual disk pointers, and some other frivolities) looks like a folder. But in reality it is a logical pointer to a volume on the array. This volume is called a config vVol and each “directory” in the vVol datastore maps to one. This config vVol is actually a mini VMFS. See more details here

Since this is a volume, you can of course take snapshots of it. There are a few ways to do this, either create one off snapshots of it or through protection policies.

Continue reading “Refreshing a VM configuration from a vVol Snapshot”

Affecting Persistent Volumes in VMware Tanzu

Note: This is another guest blog by Kyle Grossmiller. Kyle is a Sr. Solutions Architect at Pure and works with Cody on all things VMware.

VMware Tanzu is a game-changing piece of technology for numerous reasons, but probably the most transformational piece of it is also the most apparent – it provides the capability for the vCenter admin to give resources for both consumers of traditional virtual machines as well as Kubernetes/DevOps users from the same set of compute hosts and storage. This consolidation means that the vCenter admin can more easily see what is being allocated where, as well as gaining insight into what application(s) might be candidates to make the move into a container-based environment from a virtual machine.

A Tanzu deployment is comprised of quite a few moving pieces and a central piece of this is durable storage made possible by persistent volumes. While container nodes and pods are ephemeral by nature (which is one of their major advantages), the data that they consume, produce and manipulate must be performant, portable and often, saved. So, there is obviously a different set of things we care about for persistent data vs the Kubernetes nodes that Tanzu runs in unison with here. For the remainder of this post we will show a couple of quick and easy ways you can change your persistent volumes to suit your application needs. There’s a bit of work and some choices to be made around getting a Tanzu environment up and running in vSphere, and I’d encourage you to check out the VMware Tanzu User Guide on our Pure Storage support site or Cody’s blog series to get some additional information.

With that being said, when a persistent volume is created via either dynamic or static provisioning, one of the first things the application developer needs to decide is what will happen to that volume and data when the application that uses it itself is no longer needed. The default behavior for an SPBM policy/storageclass assigned to a vSphere Namespace is to delete it, but through a simple kubectl patch command line, the persistent volume can be saved for future usage.

To make this change, first get the persistent volume name that you want to Retain/save:

 
$ kubectl get pv
 NAME           CAPACITY    RECLAIM POLICY   STATUS 
 pvc-f37c39fd   5Gi         Delete           Bound 

Next, apply this kubectl command line to it to switch the reclaim policy from Delete to Retain:

​$ kubectl patch pv (PV_Name) -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

So for our PV example:

$ kubectl patch pv pvc-f37c39fd -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

When we run the kubectl get pv command again, we can see it is set to Retain, so we are all set:

$ kubectl get pv
NAME           CAPACITY    RECLAIM POLICY   STATUS  
pvc-f37c39fd   5Gi         Retain           Bound          

If there is anything close to a certainty in the storage world – it is that the longer a volume exists, the more full of data it will become. This becomes even more of a certainty if a persistent volume is retained and reused across multiple application instances for increasing amounts of time. In the vSphere and Supervisor cluster 7.0U2 release VMware has introduced the capability for Online Volume Expansion. What this means is that while in previous versions users had to unbind their persistent volume claim from a pod or node prior to resizing it (otherwise known as offline volume expansion) – now they are able to accomplish that same operation without that step . This is a huge advantage as the offline expansion required that the volume be effectively be taken out of service when additional space was added to it, which could lead to application downtime. With the online volume expansion enhancement that annoyance goes away completely.

Online volume expansion operation is really simple to do. This time we find the persistent volume claim (which is basically the glue between the persistent volume and the application) that we need to expand:

$ kubectl get pvc
NAME             STATUS  VOLUME        CAPACITY
pvc-vvols-mysql  Bound   pvc-f37c39fd  5Gi      

Now we run the following patch command against the PVC name we found above so that it knows to request additional storage for the persistent volume that it is bound to. In this case, we will ask to expand from 5Gi to 6Gi:

$ kubectl patch pvc pvc-vvols-mysql -p '{"spec": {"resources": {"requests":{"storage": "6Gi"}}}}'

After waiting for a few moments for the expansion to complete, we look at the pvc in order to confirm we have the additional space that we asked for and we can see it has been added:

$ kubectl get pvc
NAME              STATUS   VOLUME         CAPACITY
pvc-vvols-mysql   Bound    pvc-f37c39fd   6Gi     

Taking a closer look at the PVC via the describe command shows that it indeed increased the PV size while it remained mounted to the mysql-deployment node under the events section:

$ kubectl describe pvc
Name:          pvc-vvols-mysql
Namespace:     default
StorageClass:  cns-vvols
Status:        Bound
Volume:        pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volumehealth.storage.kubernetes.io/health: accessible
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      6Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    mysql-deployment-5d8574cb78-xhhq5
Events:
  Type     Reason                      Age   From                                             Message
  ----     ------                      ----  ----                                             -------
  Warning  ExternalExpanding           52s   volume_expand                                    Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
  Normal   Resizing                    52s   external-resizer csi.vsphere.vmware.com          External resizer is resizing volume pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
  Normal   FileSystemResizeRequired    51s   external-resizer csi.vsphere.vmware.com          Require file system resize of volume on node
  Normal   FileSystemResizeSuccessful  40s   kubelet, tkc-120-workers-mbws2-68d7869b97-sdkgh  MountVolume.NodeExpandVolume succeeded for volume "pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c"

Those are just a couple of the ways we can update our persistent volumes to do what we need them to do within a Tanzu deployment, and we have really just scratched the surface with these few examples. To see how to do more advanced operations like migrating a persistent volume to a different Tanzu Kubernetes Cluster, please head over to our new Tanzu User Guide. Of course, it also is very important to mention that Portworx combined with Tanzu gives us even more features and functionalities like RBAC, automated backup and recovery and a whole lot more. Getting deeper into how Portworx interoperates with Tanzu is what I’m working on next so please stay tuned for some more cool stuff.