This post I will talk about using PowerCLI to run a test failover for VVol-based virtual machines. One of the many nice things about VVols is that in the VASA 3.0 API this process is largely automated for you. The SRM-like workflow of a test failover is included–so the amount of storage-related PowerShell you have to manually write is fairly minimal.
- PowerCLI and VVols Part I: Assigning a SPBM Policy
- PowerCLI and VVols Part II: Finding VVol UUIDs
- PowerCLI and VVols Part III: Getting VVol UUIDs from the FlashArray
- PowerCLI and VVols Part IV: Correlating a Windows NTFS to a VMDK
- PowerCLI and VVols Part V: Array Snapshots and VVols
- PowerCLI and VVols Part VI: Running a Test Failover
- PowerCLI and VVols Part VII: Synchronizing a Replication Group
- PowerCLI and VVols Part VIII: Running a Planned Migration
With VVols, you don’t really failover a single VM, you fail over a replication group. While it is certainly possible to just recover a single VM from that failover etc, I will, for this blog post, show failing over the entire group.
In this environment, I have the following configured:
- My source vCenter called ac-vcenter-1.purecloud.com. This has access to my source FlashArray and its VASA providers.
- My target vCenter called ac-vcenter-2.purecloud.com. This has access to my target FlashArray and its VASA providers.
My PowerCLI version is 11.1.0. Make sure you are at least running this release–there are important updates that make this process simpler. Older revisions require a few more steps.
A couple things to remember about this fact:
- The “SPBM” cmdlets do not shut down VMs on the source side. Manipulation of VMs (other than making a copy accessible on the target site) is out of the scope of the VVol failover cmdlets.
- The “SPBM” cmdlets do not power-on or register the VMs on the target side. These “SPBM” cmdlets are solely about getting the VMs available on site B. Though I will give examples on doing this part.
A few pre-requisites here:
- Make sure the VVol datastore is mounted somewhere in the recovery site. You also of course need the source mounted somewhere too, but I figure you’ll have that done. No datastore = no VMs.
- Register VASA from both arrays in their correct locations. More on that in the next section.
- Configure one or more VMs on the source VVol datastore. Assign them a SPBM storage policy that includes FlashArray replication. If you manually assign them to replication groups on the array instead of using storage policies, this process will not work. Pro-tip: assigning features like replication via SPBM not only tells the VMware admin how they are configured, but also tells vSphere how they are configured.
First things first, connect to both of your vCenters.
What if I am failing over my VMs inside of a single vCenter? Well that’s fine of course–just connect to just the one vCenter in that case.
The key is: make sure that your target and source VASA providers are registered to the vCenters that you need to use them with.
If you have two vCenters, make sure that your target VASA providers are registered to the target vCenter and your source VASA providers are registered to your source vCenter.
If you are using one vCenter make sure that both your target and source VASA providers are registered with that vCenter.
Get Replication Group
The next step is to identify your TARGET replication group that needs to be failed over. VVols fails VMs over in the granularity of a replication group–and this is what is passed to VASA to tell the array to bring up the appropriate storage. There are source replication groups and target replication groups.
On the FlashArray, replication groups are 1:1 mapped to what we call a protection group. A protection group is a consistency group with (either or both) a local snapshot policy and/or a replication policy. If the protection group does not have a replication policy, it will NOT have a target protection group.
You can simply list the replication groups, or you can use some kind of source object to get the relevant group (or groups).
Note–if you DO NOT see the source group you are looking for, you likely are connected to only the target vCenter and NOT the source vCenter. Make sure you are connected to the source vCenter too if you are failing VMs between vCenters.
One option is if you have a VM that is in a replication group, you could run something like this:
Get-SpbmReplicationgroup -vm <vm name>
That will return the replication group that the VM belongs to.
Or you can get it from a storage policy. Find your policy:
Then pass that into get-spbmreplication group:
Or you could just run get-spbmreplicationgroup without any inputs:
And choose your source group from those. The choice is yours–it doesn’t really matter what you choose as your journey to get the replication group, just that you identify the one you want.
Now that we have the source group…
FlashArray source replication groups are named by the array name, followed by a colon, followed by the protection group name. Target group names are named by a UUID and replication group number. To run a test failover, you need the target group, so how do we get the corresponding target group from the source group?
So let’s say (regardless to which method above I used) I have identified the group “sn1-x70-b05-33:1hour” as the source I want to run a test failover from. The simplest way to get to target group is by using the get-spbmreplication pair cmdlet. I will store my selected replication group in a new object that I will call $repGroup:
I can then pass that into get-spbmreplicationpair to get the replication group pair. Pass that $repGroup into the -source parameter.
I can now store the target replication group in a new object called $targetGroup:
At this point I can run a test failover–nothing else is needed. But there are some option parameters. So let’s take a look.
OPTIONAL: Point-in-Time Recovery
The next step is optional. Do you want to failover to a specific point-in-time? If you do, you can query the available point-in-time’s available. Otherwise, skip this step.
$<target replication group> | Get-SpbmPointInTimeReplica
Above, I store all of my available point-in-times in $PiTs then index to the one I want and store it in $PiT. So during the test failover command, I can specify $PiT to make it failover to that point-in-time.
Run Test Failover
The next step is to run the test failover with the command start-spbmreplicationtestfailover.
If you want it to use the latest point-in-time available just run:
Start-SpbmReplicationTestFailover -ReplicationGroup $targetGroup
If you want to specify a point-in-time in the past, then do so:
Start-SpbmReplicationTestFailover -ReplicationGroup <group> -PointInTimeReplica <point in time>
So go ahead and run it. I highly recommend storing the response in an object–the operation will return the VM paths to you, so you can then register and power them on.
So what actually happens in here? Well a few things:
- The target FlashArray does a “purepgroup copy” operation. This takes a certain pgroup snapshot (which is a consistent snapshot of all of the volumes in that pgroup at that time) and creates a new local pgroup with all of the new volumes created from that pgroup copy.
- The new pgroup is enabled to replicate back to the original FlashArray, this will be cleaned up during the test failover stop. This is created to allow for some advanced post-test operations.
- It then creates new volume groups, one for each VM created in the test and adds their corresponding volumes to it.
- Then associates the new VMs to the target VVol datastore.
The last step is from VMware. Updating the files in the VVol datastore. This takes the bulk of the time.
Once complete go ahead and register and power-on the VMs. To register, a simple loop like below will do it:
$registeredVms = @()
foreach ($testVm in $vms)
$registeredVms += New-VM -VMFilePath $testVm -ResourcePool <resource pool>
This will register all of your VMs. You will likely want to change the “new-vm” line to register as appropriate to your environment (what cluster or host, or resource pool, or folder etc.
Now make any changes if necessary to your VMs. Change the networking etc.
Go ahead and power them on!
Once you are done, go ahead and power-off the VMs and unregister them. You can either just unregister them, or you can delete them from disk. It doesn’t really matter–though my recommendation is to just delete them.
The last step is then stop the test failover–this allows the array to clean itself up.
stop-SpbmReplicationTestFailover -ReplicationGroup <target group>
This will clean up any remnants of the test failover (volumes, volume groups, protection groups) on the array.
Failures in Get-SpbmReplicationPair
Sometimes you will get failures when using the get-spbmreplicationpair. This usually means a few things.
First you didn’t specify the source replication group. This means it will query for everything. Basically means you ran get-spbmreplicationpair without any parameters.
Or/also, you are not connected to all of your required vCenters. PowerCLI queries all available VASA providers from all vCenters and then matches the appropriate pairs. If one half a relationship is not reported by anyone, PowerCLI will report an error for that pair. If you didnt connect to a vCenter that exclusively owns your target VASA provider, you will see some failures. Or maybe you don’t care about some of the pairs, which means those errors don’t matter. The pairs you need are returned, but it will still report errors for the others, so to keep things clean, be specific with your queries. Include the source group in the get-spbmreplicationpair cmdlet.
Get-SpbmReplicationPair : 1/28/2019 4:28:36 PM Get-SpbmReplicationPair The target replication group with id
‘e671ca7e-5a3d-3ca3-8258-cf3448c334a6/b70bff9d-ce6b-4ca3-af13-79c3f93e5ac2:1’ for source replication group id
‘e671ca7e-5a3d-3ca3-8258-cf3448c334a6/b70bff9d-ce6b-4ca3-af13-79c3f93e5ac2:1’ could not be found. Please verify that
the vSphere server for peer replication group is connected.
Above I did a query for all replication pairs. In my environment, I have two arrays. Both of which have a protection group that does not have replication enabled–this means there is no target group anywhere. So for those an error will always be returned in this cmdlet. This is why it is good to be specific in this particular query.
Failures in Get-SpbmReplicationGroup
A common place where get-spbmreplicationgroups can fail (or partially fail) is if the VVol datastore has not been mounted in the recovery site. Without this object, VMware has no reference on what to query for available replication groups. You will see an error like below:
Get-SpbmReplicationGroup : 1/28/2019 4:33:51 PM Get-SpbmReplicationGroup SMS runtime fault on server
‘/VIServer=purecloud\cody@ac-vcenter-2:443/’: Unknown server error. See the event log for details.
In this case, ensure the VVol datastore is properly mounted where you plan to recover.