The FlashArray Storage Replication Adapter for VMware Site Recovery Manager supports many:many replication since the 2.0 release of the SRA. Use of test failover, failover and reprotect is no different than with 1:1, and nor is the setup of the volumes. The only real difference is how you configure the array managers in SRM. So let’s review how this is done.
Hey all. Just wanted to let you know there is an updated Pure Storage FlashArray Storage Replication Adapter now posted on VMware’s Site Recovery Manager compatibility guide:
So we ran into a customer issue recently with VMware Site Recovery Manager that I have not seen before and have not found any on-point articles on, so I thought I’d share this one. Was an insidious one too, when troubleshooting this one I could not find the issue, eventually one of our rockstar escalation engineers at Pure (Jacob Hopkinson) figured it out after going through SRM debug logs line by line. Comes down to case sensitivity in iSCSI IQNs. I’ll explain…
Somewhat surprisingly I have been getting a fair amount of questions in the past few months concerning VMware vCenter Site Recovery Manager and Raw Device Mappings (RDMs) and using this with Pure Storage. Common question is whether or not we support this (we do) but more commonly it is about how it works. There is a bit of a misunderstanding on how they differ or do not differ from VMFS management in SRM. So figured I would put a post out to explain this. Old topic somewhat, but worth reviewing for those newer SRM customers. Plus, I haven’t found a whole lot of on-point posts anywhere, so why not?
Quick post here. I recently updated my environment to vCenter 6.0 Update 1 and VMware Site Recovery Manager 6.1 and after my first attempted test failover (and subsequent ones) the test would always fail when it tried to power on the virtual machines. Some powered-on and some didn’t. The following errors appeared for about half of my VMs:
I was recently asked how to query SRM for protected VMs and I decided it would make a good quick blog post. There is a great post here on using PowerCLI with SRM, but it doesn’t show the information to return per virtual machine information by default. Needs a bit more.
All it returns is a SRM-based virtual machine ID which doesn’t relate to what a user is probably looking for (a virtual machine name). So it needs a few more simple steps. The following script which can be found on my GitHub page here that does the following things:
- Connects to a vCenter
- Connects to SRM
- Creates a log folder with a time stamp in the name
- Iterates through each Protection Group
- Logs every virtual machine in that protection group
VMware vCenter Site Recovery Manager 6.0 was mostly a compatibility release–getting it to work right with vCenter 6.0 essentially. That being said, there were a few new features (and some nice tweaks in the GUI) included in the release. One of the new features that sparked my interest was SRM and Storage DRS compatibility enhancements.
Ben covers most of the history of this in his post so I will skip over that. Let’s take a look though a little closer at this functionality. So to overview there are three tags that SRM introduces to a datastore:
- SRM-com.vmware.vcDr:::status (indicates that the datastore is replicated)
- SRM-com.vmware.vcDr:::consistencyGroup (indicates what CG the datastore belongs to, if any)
- SRM-com.vmware.vcDr:::protectionGroup (indicates what PG the datastore belongs to, if any)
Replication status is assigned as soon as SRM (and it’s respective Storage Replication Adapter) discovers it to be replicated through a Device Discovery operation. Upon this discovery a consistency group tag is also assigned. If the volume is not advertised by the SRA as being in a consistency group a unique one will be created for that volume–basically indicating it is in its own consistency group.
A protection tag is not assigned until the volume is actually added to a protection group. Once the datastore is assigned to a protection group it will receive the tag (remember a volume can only be in one PG and SRM only supports being in one CG so there will always only be one to assign).
So what do these tags do? Well Storage DRS will note these tags and not make any automatic moves if a Storage vMotion would violate any of them, this means it will not move from one datastore to another if:
1) Source datastore is replicated and target is not
2) Source datastore is NOT replicated and target is
3) Source datastore is in a different consistency group than the target
4) Source datastore is replicated AND in a protection group but target is replicated but NOT in an protection group
Basically Storage DRS will not move a VM from one datastore to another if it deems it to cause a change in the configuration of the protection group or consistency of a virtual machine.
So automatic Storage DRS will never make these moves. It may suggest them if it cannot find a better option, but it will never make a move that will violate these rules. If for some reason you want this to occur you can always override the warning and execute the operation.
Let’s take a look now at the relevant configurable behavior in SRM.
There are four options:
|Setting Name||Description and Default Value|
|storage.enableSdrsStandardTagCategoryCreation||This creates the three tag categories in vCenter for you.|
|storage.enableSdrsTagging||This actually applies the tags to the datastores when discovered etc.|
|storage.enableSdrsTaggingRepair||This allows SRM to fix datastore tag when something has changed (PG/CG membership changes for instance).|
|storage.sdrsTaggingPollInterval||How often SRM checks tags to make sure they are accurate.|
All of these options are enabled by default, well, kinda, the last one is just set to 50 seconds.
So like the table says the enableSdrsStandardTagCategoryCreation option is pretty straight forward. Creates the three categories. You can, of course, create them yourself if you choose to, not sure why you would though with the exception of the reason stated in the option description:
“In Federated SSO setups, this flag should be disabled and the tags and tag categories should be manually created.”
When enableSdrsTagging is enabled, SRM will place the correct tags at the appropriate times. So when a new device is discovered or its protection group membership changes.
The option enableSdrsTaggingRepair is a little more to think about. New tags will still be placed on datastores, replicated/cg tags during device discovery, pg tags upon adding it to a new or different pg. But it will not fix or remove them, if you remove it from a PG or delete the PG, the tag will remain. If you delete the SRM provided tag and replace it with you own, it will not fix it. Though if you add it to a new PG it will remove an old one if it exists and then give it the correct one. But it won’t ever do that unless you make that PG change.
A note about the repair functionality. If you decide to delete a SRM-provided tag and make you own, it will not last long if this feature is enabled. SRM will right things quite quickly (50 seconds or less). So if you want more control over this tagging for SRM-related devices, disabling this is an option. Of course disabling this can easily lead to stale information in the tags, so do so at your own risk.
In general, I think this is a great enhancement. I would like to see more granular control from the SRM side of things (enable/disable CG auto-tagging when a CG doesn’t exist for that device for instance. This also should have a play in non-SRM environments, it’s just a bit more work because you have to do the tagging yourself.
In Part II, I will take a look at how this works with the FlashArray SRA and what’s involved in that.
This week I received a question from a customer about some slowness in the vSphere “Add Storage” wizard they were seeing. This is a problem that has occurred over the years quite a few times for a variety of different reasons. VMware has fixed most of them, this latest reason luckily was known and has a relatively simple solution. An option called VMFS.UnresolvedVolumeLiveCheck.
I’ve have been working with VMware’s vCenter Site Recovery Manager since the tail end of the 1.x release and I have to say this is the most excited I have been about a Storage Replication Adapter release that I can remember. Since I started with Pure in late April 2014 I have been working with our development team and product management to design and shape this initial release of the Pure Storage SRA. I have to say it has been a blast–a really great team that does some really amazing work! It is now officially approved and posted on VMware’s compatibility guide and SRA download site:
Quick post here. So I have been reviewing some great posts from @vmKen and @BenMeadowcroft about automating Site Recovery Manager operations with PowerCLI and wanted to give it a try myself. They outlined the process rather clearly in their blogs so it was a breeze to get most of the stuff up and running. But when I went to actually execute a test recovery or a recovery etc. it kept failing! The PowerCLI command to start the recovery was $VMrp.Start($RPmode)–the $VMrp being my recovery plan and the $RPMode being the recovery plan mode of a recovery. The command was accepted but the recovery plan never started.
I got the following error in vCenter:
Unable to start the requested operation. Another operation may be in progress. Please wait for it to finish and try again.
Hmm…weird. I could kick off a test from the GUI with no issue so nothing was “interfering” from what I could tell. I thought maybe since I was using Site Recovery Manager 5.8 maybe something had changed so I tried it with my 5.5 environment and got the same result.
After I was about to lose my mind it finally occurred to me that I was connecting to the protected vCenter and the protected SRM server (I did enter in remote credentials for the recovery SRM server though). While I could query the recovery plan etc without issue from here, maybe SRM didn’t allow a recovery plan to be started unless you directly connected to the recovery vCenter/SRM server.
So I reconnected to the recovery site and it worked! So I guess it makes a difference, so FYI. Now there might be a workaround to this and it is definitely possible I missed something that allows this but this seems to be what you need to do. If you find this isn’t true please let me know!
Thanks Ken and Ben for getting me started!! Cool stuff. Kens posts: