Site Recovery Manager 6 and Storage DRS Tagging: Part II–FlashArray SRA

The first post in this two-part series was about the general new feature of VMware vCenter Site Recovery Manager 6.0 and Storage DRS. Read about it here. In this post, I am going to take a bit more of a specific look at this when it comes to the FlashArray and our Storage Replication Adapter (SRA).

flashrecover_benefits_04

So to review, there are three tags added by SRM, one to denote the volume is replicated, one to list its consistency group (CG) and one to list its SRM protection group (PG).

A device is listed as replicated as soon as it is seen as a valid replicated device in a SRM device discovery. Its consistency group tag is added at the same time and is based on the consistency group reported by the SRA of that device–if no CG is listed then one is uniquely assigned to that device by SRM for tagging purposes (it is not assigned one in SRM though–just a vCenter tag). Finally if the device is in a protection group a tag is assigned to that device to denote what PG it is in.

Okay so how does this work with the FlashArray, its replication and its SRA?

So first, how does the FlashArray define if a device is replicated and what does it define as a consistency group? Well these essentially are the same thing, all replicated volumes are in a consistency group. I won’t spend a lot of time talking about how our replication works but you can find a blog post here and a white paper here. But for a quick synopsis:

FlashArray replication is controlled by an object we refer to as a protection group (not to be confused with a SRM protection group–for the remainder of this post I will use SRM PG to refer to a SRM protection group or FlashArray PG for a FlashArray one). A FlashArray protection group is a collection of specific volumes, hosts or host groups that have the same local snapshot policy and/or remote replication policy. This provides the associated volumes with an identical and cross-volume consistent protection policy, whether that be with local snapshots and/or via remote replication.

protectiongroup

If a device is in one of the FlashArray protection groups and that group is enabled on both the source and target FlashArray (both arrays need to acknowledge and accept a protection group if it is configured for remote replication). At this point, when a device discovery is run within SRM, all of the devices in these FlashArray protection groups that involve the “enabled” arrays in SRM are reported to SRM as replicated and will be tagged as such.

A protection group on the FlashArray is also a consistency group–all of the device snapshots for a given point-in-time (whether remote or local) are consistent with one another.

That being said, the Pure Storage SRA does NOT report consistency groups to SRM–we hide this from SRM. The reason this was decided upon was that when you report a consistency group to SRM, SRM forces you to fail them over in tandem, whether that be a test failover or actual failover. This really reduces the granularity of control of device management. Plus even when two or more devices are in a FlashArray consistency group together there is no physical restriction on failing over single volumes–so why force our SRM users to do so? Therefore we decided not to advertise CGs to SRM.

So if we don’t advertise CGs how does the CG tagging work? Well if SRM doesn’t see a CG for a device, it will just create a “fake” CG and assign each non-CG device a unique consistency group tag in vCenter. The problem here is that if each device is tagged with a separate CG (in vCenter), SDRS will not allow Storage vMotion between those devices for fear the VM will no longer be consistent with its peers.

differentCG

So the flexibility provided by not advertising CGs sticks a bit of a wrench into the gears of SDRS in 6.0. So what can be done? There are a few options–I will go over these in detail after listing the options. In no particular order:

  1. Use SDRS in manual mode–the recommendations will still be made, but with warnings–this prevents automatic moves. But an admin can override these warnings and make the recommended moves manually.
  2. Disable auto-tagging by SRM altogether. SRM can still create the tag categories, but you can manually tag the datastores with CGs and SRM PGs.
  3. Disable tagging repair in SRM. Allow SRM to auto-tag, but change them as you see fit–SRM will not “fix” them.

Let’s review the ins and outs of these options.

Option 1: SDRS in Manual Mode

I suppose this is the simplest option and ironically provides for the most automation. This is essentially using all of the default tagging behavior in SRM so that SRM will create the tag categories in vCenter:

  • SRM-com.vmware.vcDr:::status (indicates that the datastore is replicated)
  • SRM-com.vmware.vcDr:::consistencyGroup (indicates what CG the datastore belongs to, if any)
  • SRM-com.vmware.vcDr:::protectionGroup (indicates what SRM PG the datastore belongs to, if any)

But since it will assign an unique consistency group tag to every volume that SRM identifies Storage DRS will never move a virtual machine to or from a datastore tagged as replicated, it will only suggest it. So this kills part of the “automatic” part of automated Storage DRS, but since the recommendations will still appear, the “automated” part of it is still there (recommendation and orchestration of actual moves), it just needs some admin intervention.

I’d say arguably this is the best option, because it requires the least manual work.

Option 2: Disable auto-tagging altogether

The next option is to disable datastore auto-tagging entirely in SRM. If you disable to option storage.enableSdrsTagging within SRM, datastores will not be tagged at all by SRM. SRM will just create the tag categories mentioned in option 1 above but nothing more. This will allow you to tag them yourself.

The advantage here, is that it will allow you to leverage the automated moves of SDRS because you can manually tag the appropriate datastores with the same user-created consistency group, SRM protection group and replication status tags. If you do not have a very dynamic environment, meaning:

  • You don’t move volumes in and out of protection groups often
  • You don’t move volumes in and out of consistency groups often
  • You don’t start or end replication for volumes often

Of course even if you do the aforementioned things, this is still workable, but you will have to remain vigilant with remembering to tag, de-tag and re-tag your datastores as necessary. So in the SRM interface disable this on both SRM servers in the storage advanced settings.

disabletagging

To add the tags, go to the relevant datastore in the vSphere Web Client and click on the summary tab. Locate the “Tags” box and click “Assign…” in the lower right hand corner.

assigntag

Select the icon in the upper left to create the tags.

createtag

Now create, one-by-one, the three tags for replication status, CG and SRM PG. Note the description doesn’t really matter–make is something that makes sense to you and others, I’d also recommend including the word manual or something. This will denote you created it, not SRM.

pgtagchoose pgtagcreate

The name of the tag is the value and this is the important thing (in addition to of course the category). For the replication tag, make the name “replicated.” For the protection group, ideally make it the name of the SRM protection group that datastore is in. Same goes for the consistency group tag.

Once you have created all three, select them all and click OK in the “Assign Tag” window.

assigntags

The tags are now assigned to the datastore.

tagsassigned

The nice thing is that once you create these tags they can be re-used for other datastores. The only time you need to create a new tag is if you have a different SRM protection group or consistency group that you need to assign to a datastore. So doing my second datastore is easy. The previously created tags will appear in the “Assign Tags” window for the next datastore so you can just choose them and click “Assign.”

Any SDRS recommendations will now be valid for movement in the Datastore Cluster!

recommendations

Don’t forget to update and change once either a SRM protection group or consistency group is altered or replication is ceased for a given device.

Option 3: Disable SRM tag repair

The third option is to disable SRM tag repair. This is a bit of a mix of the above two options–SRM will still tag volumes upon protection group changes or consistency group changes but if you delete, replace or add tags SRM will not override them.

disablerepair

This will allow you to leverage the dynamic nature of the SRM tagging. So it will automatically tag the datastores as replicated as well as a CG and SRM PG tag. What this will mean is that you need to delete the CG tag it makes and replace each set of datastores with the same CG tag so they can be used with SDRS.

When this option is disabled I have not found any configuration change operation (yet) in SRM that overrides a manual CG tag. Well…kinda.

Failover/failback will affect this. Also–a reboot of the SRM server/service will replace any user entered PG/CG tags if they differ from what SRM would itself set.

So this option might be semi-attractive, but due to the SRM restart issue you will need to keep an eye on that. You will need to re-apply manual CG tags if that occurs. Something to think about.

Ending thoughts…

So the nice thing about this integration is that is doesn’t really require SRM. If you replicate your datastores you can manually create these tag categories and assign them yourself. The lack of CG advertisement by a SRA is a bit of a double-edge sword in this case, but I think the flexibility caused by not advertising CG to SRM is definitely preferable to this in the end.

Good stuff.