S.M.A.R.T. Alerts in vmkernel Log with FlashArray™ Hardware-backed Volumes

Hello- Nelson Elam, a Solutions Engineer on Cody’s team at Pure, guest-writing here again.

If you are a current Pure customer and have had ESXi issues that warranted you checking the vmkernel logs of a host, you may have noticed a significant amount of messages similar to this for SCSI:

Cmd(0x45d96d9e6f48) 0x85, CmdSN 0x6 from world 2099867 to dev "naa.624a9370f439f7c5a4ab425000024d83" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

Or this for NVMe-oF:

WARNING: NvmeScsi: 172: SCSI opcode 0x85 (0x45d9757eeb48) on path vmhba67:C0:T1:L258692 to namespace eui.00f439f7c5a4ab4224a937500003f285 failed with NVMe error status: 0x1 translating to SCSI error
ScsiDeviceIO: 4131: Cmd(0x45d9757eeb48) 0x85, CmdSN 0xc from world 2099855 to dev "eui.00f439f7c5a4ab4224a937500003f285" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

If you reached out to Pure Storage support to ask what the deal is with this, you were likely told that these are 0x85s and nothing to worry about because it’s a VMware error that doesn’t mean anything with Pure devices.

But why would this be logged and what is happening here?

ESXi regularly checks the S.M.A.R.T. status of attached storage devices, including for array-backed devices that aren’t local. When the SCSI command is received on the FlashArray software, it returns 0x85 with the following sense data back to the ESXi host:

failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

These can be quite challenging to decode. Luckily, virten.net has a powerful tool for decoding these. When I paste this output into that site, I get the following details:

TypeCodeNameDescription
Host Status[0x0]OKThis status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.
Device Status[0x2]CHECK_CONDITIONThis status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Plugin Status[0x0]GOODNo error. (ESXi 5.x / 6.x only)
Sense Key[0x5]ILLEGAL REQUEST
Additional Sense Data20/00INVALID COMMAND OPERATION CODE

The key thing here is the Sense Key which has a value of ILLEGAL REQUEST. The FlashArray software does not support S.M.A.R.T. SCSI requests from hosts, so the FlashArray software returns ILLEGAL REQUEST to the ESXi host to tell the host we don’t support that request type.

This is for two reasons:

1. Since the FlashArray software’s volumes are not a physically attached storage device on the ESXi host, S.M.A.R.T. from the ESXi host doesn’t really make sense.
2. The FlashArray software handles drive failures and drive health independent of ESXi and monitoring the health of these drives that back the volumes is handled by the FlashArray software, not ESXi. You can read more about this in this datasheet.

Great Nelson, thanks for explaining that. Why are you talking about this now?

Pure has been working with VMware to reduce the noise and unnecessary concern caused by these errors. Seeing a failed ScsiDeviceIO in your vmkernel logs is alarming. In vSphere 7.0U3c, VMware fixed this problem and this will now only log once this when the ESXi host boots up instead of as often as every 15 minutes.

This means that in vSphere 7.0U3c if you are doing any ESXi host troubleshooting you no longer have to concern yourself with these errors; for me, this means I won’t have to filter these out in my greps anymore when looking into an ESXi issue in my lab. Great news all around!

Affecting Persistent Volumes in VMware Tanzu

Note: This is another guest blog by Kyle Grossmiller. Kyle is a Sr. Solutions Architect at Pure and works with Cody on all things VMware.

VMware Tanzu is a game-changing piece of technology for numerous reasons, but probably the most transformational piece of it is also the most apparent – it provides the capability for the vCenter admin to give resources for both consumers of traditional virtual machines as well as Kubernetes/DevOps users from the same set of compute hosts and storage. This consolidation means that the vCenter admin can more easily see what is being allocated where, as well as gaining insight into what application(s) might be candidates to make the move into a container-based environment from a virtual machine.

A Tanzu deployment is comprised of quite a few moving pieces and a central piece of this is durable storage made possible by persistent volumes. While container nodes and pods are ephemeral by nature (which is one of their major advantages), the data that they consume, produce and manipulate must be performant, portable and often, saved. So, there is obviously a different set of things we care about for persistent data vs the Kubernetes nodes that Tanzu runs in unison with here. For the remainder of this post we will show a couple of quick and easy ways you can change your persistent volumes to suit your application needs. There’s a bit of work and some choices to be made around getting a Tanzu environment up and running in vSphere, and I’d encourage you to check out the VMware Tanzu User Guide on our Pure Storage support site or Cody’s blog series to get some additional information.

With that being said, when a persistent volume is created via either dynamic or static provisioning, one of the first things the application developer needs to decide is what will happen to that volume and data when the application that uses it itself is no longer needed. The default behavior for an SPBM policy/storageclass assigned to a vSphere Namespace is to delete it, but through a simple kubectl patch command line, the persistent volume can be saved for future usage.

To make this change, first get the persistent volume name that you want to Retain/save:

 
$ kubectl get pv
 NAME           CAPACITY    RECLAIM POLICY   STATUS 
 pvc-f37c39fd   5Gi         Delete           Bound 

Next, apply this kubectl command line to it to switch the reclaim policy from Delete to Retain:

​$ kubectl patch pv (PV_Name) -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

So for our PV example:

$ kubectl patch pv pvc-f37c39fd -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

When we run the kubectl get pv command again, we can see it is set to Retain, so we are all set:

$ kubectl get pv
NAME           CAPACITY    RECLAIM POLICY   STATUS  
pvc-f37c39fd   5Gi         Retain           Bound          

If there is anything close to a certainty in the storage world – it is that the longer a volume exists, the more full of data it will become. This becomes even more of a certainty if a persistent volume is retained and reused across multiple application instances for increasing amounts of time. In the vSphere and Supervisor cluster 7.0U2 release VMware has introduced the capability for Online Volume Expansion. What this means is that while in previous versions users had to unbind their persistent volume claim from a pod or node prior to resizing it (otherwise known as offline volume expansion) – now they are able to accomplish that same operation without that step . This is a huge advantage as the offline expansion required that the volume be effectively be taken out of service when additional space was added to it, which could lead to application downtime. With the online volume expansion enhancement that annoyance goes away completely.

Online volume expansion operation is really simple to do. This time we find the persistent volume claim (which is basically the glue between the persistent volume and the application) that we need to expand:

$ kubectl get pvc
NAME             STATUS  VOLUME        CAPACITY
pvc-vvols-mysql  Bound   pvc-f37c39fd  5Gi      

Now we run the following patch command against the PVC name we found above so that it knows to request additional storage for the persistent volume that it is bound to. In this case, we will ask to expand from 5Gi to 6Gi:

$ kubectl patch pvc pvc-vvols-mysql -p '{"spec": {"resources": {"requests":{"storage": "6Gi"}}}}'

After waiting for a few moments for the expansion to complete, we look at the pvc in order to confirm we have the additional space that we asked for and we can see it has been added:

$ kubectl get pvc
NAME              STATUS   VOLUME         CAPACITY
pvc-vvols-mysql   Bound    pvc-f37c39fd   6Gi     

Taking a closer look at the PVC via the describe command shows that it indeed increased the PV size while it remained mounted to the mysql-deployment node under the events section:

$ kubectl describe pvc
Name:          pvc-vvols-mysql
Namespace:     default
StorageClass:  cns-vvols
Status:        Bound
Volume:        pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volumehealth.storage.kubernetes.io/health: accessible
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      6Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    mysql-deployment-5d8574cb78-xhhq5
Events:
  Type     Reason                      Age   From                                             Message
  ----     ------                      ----  ----                                             -------
  Warning  ExternalExpanding           52s   volume_expand                                    Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
  Normal   Resizing                    52s   external-resizer csi.vsphere.vmware.com          External resizer is resizing volume pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c
  Normal   FileSystemResizeRequired    51s   external-resizer csi.vsphere.vmware.com          Require file system resize of volume on node
  Normal   FileSystemResizeSuccessful  40s   kubelet, tkc-120-workers-mbws2-68d7869b97-sdkgh  MountVolume.NodeExpandVolume succeeded for volume "pvc-f37c39fd-dbe9-4f27-abe8-bca85bf9e87c"

Those are just a couple of the ways we can update our persistent volumes to do what we need them to do within a Tanzu deployment, and we have really just scratched the surface with these few examples. To see how to do more advanced operations like migrating a persistent volume to a different Tanzu Kubernetes Cluster, please head over to our new Tanzu User Guide. Of course, it also is very important to mention that Portworx combined with Tanzu gives us even more features and functionalities like RBAC, automated backup and recovery and a whole lot more. Getting deeper into how Portworx interoperates with Tanzu is what I’m working on next so please stay tuned for some more cool stuff.

What’s New in vSphere 7.0 U2 Storage: Creating a File Repository on a vVol Datastore

This feature has many names. Creating a larger config vVol. Creating a sub-vVol datastore. Creating an ISO repository. Etc.

In 7.0U2, VMware added a new feature that supports creating a custom size config vVol–while this was technically possible in earlier releases, it was not supported. Also, I should note that this is not supported by all vVol vendors, so of course speak to your vendor first.

First to review what a config vVol is check out this post:

What is a Config VVol Anyways?

In short, it is a mini VMFS that gets created when you create a directory in a vVol datastore (most commonly created by creating a new VM). This defaults to 4 GB in size. Enough to store the general VM files; some logs, VMDK pointers, vmx file, and some other frivolities.

The issue though is that this was not large enough to store large things like ISOs or vib files or whatever. So if you tried to upload something to a vVol datastore folder it would fail with an out-of-space issue. And you cannot upload to the root of a vVol datastore because a vVol datastore is not a file system. So you had to use VMFS or NFS to store those objects.

This is no longer the case.

Continue reading “What’s New in vSphere 7.0 U2 Storage: Creating a File Repository on a vVol Datastore”

Digging into vSphere Workload Management Options

Note: This is another guest blog by Kyle Grossmiller. Kyle is a Sr. Solutions Architect at Pure and works with Cody on all things VMware.

VMware Tanzu is a great way for VMware users to manage their virtual machine environments while in parallel coming up to speed with containers all under the same familiar pane of glass. In fact, that’s possibly the biggest value proposition that Tanzu gives us today: extending vSphere and all of its enterprise features and goodness to the realm of K8s in a recognizable context.

There’s a catch, though. Before one can start to use Tanzu one has to get Tanzu setup. While it’s reasonably straight forward to do so – it is also important to note that there are multiple ways to enable Tanzu and to distinguish the pros/cons of each of them. It’s also key to understand what these underlying components are and how they interact to help troubleshoot any potential problems. This post will focus on two of these methods: vCenter Network Option (HA-Proxy) and NSX-T (VMware Cloud Foundation). Cody covered the 3rd option of directly deploying Tanzu Kubernetes Grid to vSphere in an earlier post that can be found here.

It’s critical to note that you must be at a minimum of vSphere 7 before you can use either of these below methods we will cover. ESXi 6.7U3 and up is supported via the method shown in Cody’s post I linked above. The two Workload Management enablement networking options are built right into the vSphere 7 UI under Workload Management when you add a cluster:

The other important prerequisite is that you will need one or more SPBM (Storage Policy Based Management) policies defined in order to get the Supervisor Cluster up and running. There are a couple of KB articles on our Platform Guide which shows how to do this on the FlashArray and the differences between VMFS and vVols based policies.

So let’s lay out the key components and differences between the vCenter Server Network and NSX-T backed Workload Management options and provide some more information on how to enable each of them…

The vCenter Server Network option allows customers to get Workload Management working in their vSphere environment by deploying a single OVA (known as HA-Proxy) and does not require any other external products like NSX-T or SDDC Manager. This OVA and associated information can be found at this link on GitHub. The main benefit of this option is the relative simplicity of getting Tanzu running since the only items you need to setup are a distributed switch portgroup to handle your Kubernetes ingress/egress ranges and the HA-Proxy OVA itself. The HA-Proxy will act as the load balancer for kubernetes traffic and provide the supervisor cluster API endpoint. The downside is that the HA-Proxy VM represents a single point of failure and larger kubernetes deployments at some point will more than likely overwhelm the CPU/memory resources available to it. This option is best for those who want to look at Tanzu in a POC/exploratory type of setup.

Here’s a quick narrated technical video I created showing how to setup Workload Management with the HA-Proxy OVA:

After Workload Management was enabled, I created another quick demo video showing how to create Namespaces and deploy a Tanzu Kubernetes Guest Cluster:

NSX-T based Workload Management is recommended to be backed/managed by a VMware Cloud Foundation deployment. This option gives you all of the enterprise grade features, resilience and lifecycle management that comes embedded with SDDC Manager. The con of this option is there is more setup work and moving pieces involved than the other Tanzu deployment choices and it requires additional licensing once the trial period expires. The key component that needs to be setup for Tanzu in particular is called an NSX-T Edge Cluster. An Edge Cluster is comprised of at least one (but really it should be two for resiliency) Edge VMs which help route and load balance network traffic from your top of rack switch to the underlying kubernetes deployments. The Edge Cluster deployment can be automated to a large extent from within SDDC Manager via a wizard.

In our lab, I went the route of manually deploying a 2 node Edge Cluster within NSX-T as this gives a better ‘under the hood’ view of how everything works together. As most customers likely do not have top of rack switch access to setup BGP, I also decided to use the static routing option within NSX-T. Here’s a video showing how to set this up end-to-end:

With the EdgeCluster built, the next step is to enable Workload Management which is shown in this video:

Hopefully this post has provided a bit of guidance and insight in terms of what Tanzu solution might be right for your environment. We are continuing to investigate and document how to best leverage the FlashArray with kubernetes so please check back here and our platform guide often for updates. Thanks for reading.

ESXi NVMe-oF Namespace IDs, LUNs, and other Identifiers

In the world of SCSI, a storage device is generally addressed by two things:

  1. LUN–Logical Unit Number. This is a number used to address the device down a specific path to a specific array, for a specific host. So it is not a unique number really, it is not guaranteed to be unique to an array, to a host, or a volume. So for every path to a volume there could be a different LUN number. Think of it like a street address. 100 Maple St. There are many “100 Maple Streets”. So it requires the city, the state/province/etc, the country to really be meaningful. And a street name can change. So can other things. So it can usually get you want you want, but it isn’t guaranteed.
  2. Serial number. This is a globally unique identifier of the volume. This means it is entirely unique for that volume and it cannot be change. It is the same for everyone and everything who uses that volume. To continue our metaphor, look at it like the GPS coordinates of the house instead of the address. It will get you where you need, always.

So how does this change with NVMe? Well these things still exist, but how they interact is…different.

Now, first, let me remind that generally these concepts are vendor neutral, but how things are generated, reported, and even sometimes named vary. So I write this for Pure Storage, so keep that in mind.

Continue reading “ESXi NVMe-oF Namespace IDs, LUNs, and other Identifiers”

Tanzu Kubernetes 1.2 Part 2: Deploying a Tanzu Kubernetes Guest Cluster

So in the previous post, I wrote about deploying a Tanzu Kubernetes Grid management cluster.

VMware defines this as:

A management cluster is the first element that you deploy when you create a Tanzu Kubernetes Grid instance. The management cluster is a Kubernetes cluster that performs the role of the primary management and operational center for the Tanzu Kubernetes Grid instance. This is where Cluster API runs to create the Tanzu Kubernetes clusters in which your application workloads run, and where you configure the shared and in-cluster services that the clusters use.

https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.2/vmware-tanzu-kubernetes-grid-12/GUID-tkg-concepts.html

In other words this is the cluster that manages them all. This is your entry point to the rest. You can deploy applications in it, but not usually the right move. Generally you want to deploy clusters that are specifically devoted to running workloads.

  1. Tanzu Kubernetes 1.2 Part 1: Deploying a Tanzu Kubernetes Management Cluster
  2. Tanzu Kubernetes 1.2 Part 2: Deploying a Tanzu Kubernetes Guest Cluster
  3. Tanzu Kubernetes 1.2 Part 3: Authenticating Tanzu Kubernetes Guest Clusters with Kubectl

These clusters are called Tanzu Kubernetes Cluster. Which VMware defines as:

After you have deployed a management cluster, you use the Tanzu Kubernetes Grid CLI to deploy CNCF conformant Kubernetes clusters and manage their lifecycle. These clusters, known as Tanzu Kubernetes clusters, are the clusters that handle your application workloads, that you manage through the management cluster. Tanzu Kubernetes clusters can run different versions of Kubernetes, depending on the needs of the applications they run. You can manage the entire lifecycle of Tanzu Kubernetes clusters by using the Tanzu Kubernetes Grid CLI. Tanzu Kubernetes clusters implement Antrea for pod-to-pod networking by default.

https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.2/vmware-tanzu-kubernetes-grid-12/GUID-tkg-concepts.html
Continue reading “Tanzu Kubernetes 1.2 Part 2: Deploying a Tanzu Kubernetes Guest Cluster”

vVols, please report to the Principal’s Office! VCF 4.1 and vVols!

Note: This is another guest blog by Kyle Grossmiller. Kyle is a Sr. Solutions Architect at Pure and works with Cody on all things VMware.

In VMware Cloud Foundation (VCF) version 4.1, vVols have taken center stage as a Principal Storage type available for Workload Domain deployments.  This inclusion in one of VMware’s premier products reinforces the continued emphasis on vVols and all the benefits that they enable from VMware.  vVols with iSCSI is particularly exciting to us as this is the first instance of the iSCSI protocol being supported as a Principal Storage type within VCF.  We at Pure Storage are honored to have had a little bit of influence over this added functionality by serving as a design partner for this new feature and we are confident you are going to like what you see!

Someone who is using VMFS datastore with VCF today might ask themselves ‘why vVols’? This is a great question deserving of an expansive answer beyond this blog post.  Fundamentally, though, using vVols enables you to fully use the FlashArray in the way it was intended.  By leverage VASA (VMware API for Storage Awareness) you gain far more granular control and monitoring abilities over your individual VMs.  Native FlashArray capabilities such as snapshots and replication are directly executed against the underlying array via policy-driven constructs.  Further information on these and other benefits with vVols are available here.

Using vVols as Principal Storage is a lot like the methods VCF customers are used to for pre-existing Principal Storage options.  Image an ESXi host, apply a few prerequisites to it, commission it to SDDC manager and create Workload Domains.  Deploying Workload Domains with VMware Cloud Foundation automates and takes all the guesswork out of deploying vCenter and NSX-T for modern use cases such as Kubernetes via Workload Management

Stepping into some specifics for a moment; here’s the process on how to use FlashArray iSCSI and vVols for VCF Workload Domains:

The most fundamental update to SDDC Manager to allow vVols is the capability to register a VASA Provider.  In the below screenshot and following detailed information, we show an example of how you can add a FlashArray using another block protocol:  Fibre Channel:

  1. Provide a descriptive name for the VASA provider.  It is recommended to use the FlashArray name and append it with -ct0 or -ct1 to denote which controller the entry is associated with.
  2. Provide the URL for the VASA provider.  This cannot be the management VIP of the array.  Instead this field needs to be the management IP address associated with one of the controllers.  The URL also is required to have the VASA port and version.xml appended to it.  The format for the URL is:  https://<IP of FlashArrayController>:8084/version.xml
  3. Give a FlashArray user name with the arrayadmin role.  The procedure for how to create such a user can be found here.  While the pureuser account can be used, we recommend creating and using a separate FlashArray user for VASA operations.
  4. Provide the password for the FlashArray username to be used.
  5. Container Name must be Vvol container.  Note that this value is case-sensitive.
  6. For Container Type, select FC from the drop-down menu to use Fibre Channel.
  7. Once all entries are completed, click Save.

Obviously, there’s a lot more to share here so we will be expanding on this substantially in the very near future on our VMware Platform Guide site.

Rounding out this post, I’m happy to show a demo video of just how easy it is to deploy a FC+vVols-based Workload Domain with VMware Cloud Foundation.

SRA 4.0 Released! ActiveDR support

We just released our latest version of our Storage Replication Adapter, version 4.0 for VMware Site Recovery Manager. There are a lot of enhancements in this release and improvements–if you are on 3.1 (or certainly earlier) I recommend an upgrade when you get a chance.

For all the need-to-know information (release notes, user guide, videos, download link, etc.) see here:

https://support.purestorage.com/Solutions/VMware_Platform_Guide/Quick_Reference_by_VMware_Product_and_Integration/Site_Recovery_Manager_Quick_Reference

Continue reading “SRA 4.0 Released! ActiveDR support”

Pure Storage Plugin v4.4.0 for the vSphere Client

A lot going on in the Pure world right now, specifically:

And I have a lot to say on that (stay tuned!) but for now, let’s focus on some new releases.

We have released a few new plugins:

  • vSphere Plugin 4.4.0
  • Storage Replication Adapter 4.0
  • vROPs Management Pack 3.0.2
  • vRealize Orchestrator 3.5.0

Let’s walk through these one by one. In this post, I will go over the vSphere Plugin.

For release notes, installation instructions, etc:

https://support.purestorage.com/Solutions/VMware_Platform_Guide/Quick_Reference_by_VMware_Product_and_Integration/Pure_Storage_Plugin_for_the_vSphere_Client_Quick_Reference

Quite a few bug fixes in this release too.

vSphere Plugin 4.4.0

This release adds support for a few things:

ActiveDR

Datastores can now be provisioned to ActiveDR pods via the plugin:

There is a new tab “Continuous” which is where you will find ActiveDR-enabled pods. The fields show the source pod (where the volume would go), the target pod (where the volume will be replicated to), the source and target arrays (which currently own those pods), the replication direction, and the “lag”. The lag is how far behind the target pod is from the source pod.

When you click on a datastore, you will see a few more pieces of information in the FlashArray summary panel:

This will show the ActiveDR information if the volume of course is in an enabled ActiveDR pair. The plugin also supports all of the usual features with ActiveDR datastores: resize, rename, QoS, snapshot, refresh from snapshot, copy from snapshot.

Demo of provisioning and ActiveDR datastore:

vVol Snapshots

You can create a snapshot of a VM using the standard VMware snapshot tool, but that snapshots every single virtual disk–which you may not want/need. We used to have the ability in the plugin to create a one-off snapshot of a vVol, but removed it due to some early issues that have since been resolved. This feature has been reintroduced:

Now you can click on a vVol-type VM and navigate to the Configure tab and click on Pure Storage – > Virtual Volumes.

You can select a single vVol disk and click Create Snapshot.

This will create a new single snapshot of the volume that is that vVol. You can then restore from it, or copy from it with the other tools.

You can also do this with the home directory (config) vVol. Why would you want to snapshot this? Well because protects your virtual machine configuration. The pointer files, the VMX file, snapshot hierarchies, logs, etc. If you accidentally make a change to the VMX file that breaks your VM (or you made a lot and don’t know what you did) the restore can restore the config without having to restore the entire VM.

The other reason, is “undelete” protection. When you delete a VM, ESXi first deletes all of the files from the config vVol, then it tells the array to delete the volumes. When we delete volumes, we put the volumes in the destroyed volumes folder, then they get permanently deleted in 24 hours (by default) or manually by an admin (unless safemode is turned on and then manual eradication is not possible).

The problem here, is that if you delete a VM, we can restore the config volume itself, but VMware wiped the data from it. So it is blank. VMware does not wipe the data from the virtual disks, so those can be “undeleted” and the original data is still there. So to fully restore an undeleted VM, we need a snapshot of the config vVol. This will restore all of the files.

The ideal option here, is to assign a snapshot storage policy to the home vVol (or even more ideally all of the vVols) to have the array snapshot on a schedule:

So to do this, create a 1 hour snapshot protection group on the FlashArray:

Import the protection group into vSphere as an SPBM policy:

Select and import:

And it is now a policy:

Then assign the policy and the group to the VM (or just the VM home to protect the config).

If you don’t need frequent snapshots of the config vVol and just one will do (or whenever you want), this is what we added. You can select the VM home and click the Create Snapshot button:

Alternatively we have another place to do this. If you click on the VM summary tab and look at the FlashArray panel, there is an Undelete Protection box. If we do not see any snapshots for the config vVol, we will show a warning like below:

What this means, is that we cannot fully restore this VM if it is accidentally deleted. The data, yes. But the VM configuration, no. You can create a snapshot from here too, by clicking Snapshot now…

If it is protected, we will show the timestamp of the latest discovered snapshot:

So if you delete it:

You can restore via the plugin easily:

If the VM configuration is changing a lot–you probably want to protect via schedule. If the VM does not change a lot, then one off snapshots will work fine.

ESXi Host Personality

Also, we now set the ESXi host personality when creating new clusters:

This is important for some ActiveDR and ActiveCluster scenarios, so it is our best practice by default.

For more info on that:

https://support.purestorage.com/Solutions/VMware_Platform_Guide/User_Guides_for_VMware_Solutions/FlashArray_VMware_Best_Practices_User_Guide/bbbFlashArray_Configuration#Setting_the_FlashArray_.E2.80.9CESXi.E2.80.9D_Host_Personality

Deploying VMware Tanzu Kubernetes Grid with Pure Storage vVols Part I: Deploy TKG on vSphere

This is the start of a multi-part series (how many parts? I have no idea). But let’s start at the basics–getting TKG deployed on vSphere.

Prepare Environment

So the first step is to download the two OVAs required:

The HA proxy and the photon appliance itself. Download the latest:

https://my.vmware.com/group/vmware/downloads/info/slug/infrastructure_operations_management/vmware_tanzu_kubernetes_grid/1_x

Continue reading “Deploying VMware Tanzu Kubernetes Grid with Pure Storage vVols Part I: Deploy TKG on vSphere”