Last year Pure Storage introduced built-in replication on the FlashArray 400 series in our Purity Operating Environment version 4.0. Our replication offers a variety of benefits–they center around two things. First it is completely free. There is no license charge for replication itself or by capacity. If you need to have is two FlashArrays and a TCP/IP network between the two of them to replicate over. No additional hardware to buy for the array or license packages required (all of our software is always free). Secondly, it is very easy to use–from a green field array to replicating volumes takes maybe five minutes–in reality probably far less than that. So I wanted to take some time to review how our replication is setup and how it works. I went over replication briefly when we released Purity 4.0, but I think it is time for a closer look.
Before I go on, I want to say that our replication best practice guide is now posted–it is really well written and I think anyone using our replication will find it very helpful. Check it out here:
How is it enabled?
Replication on the FlashArray is pretty simple to enable. As mentioned, if you have a 400 series you have the hardware already. So all you need to do is connect the proper NICs on the FlashArray to your network with a standard ethernet cable. Then assign an IP address to the bonded NICs.
Once the replication ports are configured on each array (a one-time configuration) the arrays need to be “paired” (also a one-time configuration). This is also a simple wizard, log on to either the intended target or source array (it doesn’t make a difference) and obtain a connection key. Now log into the other array and launch the array connection wizard. Put in the IP address of the replication port on the remote array and the management IP address and the connection key. Once completed you are fully configured and ready to protect your data!
How is it managed?
Remote replication on the FlashArray is managed by an object referred to as a Protection Group. A Protection Group is a logical association of a given set of volumes that are replicated together in a write-consistent identical interval. In a protection group you can either add individual volumes, an entire host or an entire host group. The benefit of adding via individual volumes is obviously the granular level of control–the downside being is also the granular level of control. In other words, if you add a volume to a host that you want replicated you have to remember to replicate it 🙂 The other methods (hosts or host groups) are less granular–everything presented to a host or host group will be replicated. Which you may or may not want. But you do not need to remember to replicate volumes or stop replication when the volume is removed. As a volume is added to a host or host group it will automatically be added and removed to the protection group.
Inside a protection group is a few different pieces:
- Added objects (volumes, hosts or host groups)
- Targets, a protection group can replicate up to 5 different arrays at once in a concurrent fashion
- Protection policy
A protection policy is comprised of two parts: local snapshots and remote replication settings.
The “Snapshot Schedule” is for local snapshot scheduling. Do you want local snaps created on a certain interval? How long should be retained? The “Replication Schedule” is for remote replication. Same questions as before (RPO and retention) plus the ability to have blackout periods in case you don’t need to bother replicating at night for instance.
Local snapshots and remote snapshots do not rely on one another–so you do not need to configure one to permit the other. You can configure both, one or actually neither. If you choose neither, you can manually replicate on demand. You can of course still replicate on demand if these schedules are active as well.
How does it work?
So how does replication work on the FlashArray? Our replication is based our our efficient snapshot technology. There is no such thing as a static target volume–instead the FlashArray creates snapshots of a volume or a set of volumes and replicates those snapshots to the remote array at an interval designated by the user. The interval determines your Recovery Point Objective (RPO)–this could be minutes, hours or even days depending on the needs of your applications and the available network bandwidth, change rate, and workload.
Anyways, once the snapshot is created it is sent over the link. If it is the first snapshot to be created the full unique data copy needs to be sent over to the new array. Subsequent snapshots will only require that the unique changes since the last snapshot be transmitted.
The data is transmitted over dedicated (and included) 1 Gb/s or 10 GB/s ethernet ports over a standard TCP/IP network. No special licensing or hardware purchases are required to enable replication ports. The data is sent over the wire compressed and the remote array will receive the data and of course all background reduction techniques will be applied the the replicated data on the target array to integrate it with the data set on the array. So as new data sets are written to the target array the replicated data will be further deduplicated or compressed. Sending the data compressed over the wire allows for WAN optimization–as only compressed data is transmitted, so the raw amount of data actually sent is far less than the logical written amount on the volume.
One of the benefits to using snapshots is that the data can be accessed on the remote side without having to stop or alter the replication stream. Snapshots on the FlashArray are not volumes that can be connected to a host, they are simply a point-in-time copy of data. The data in a snapshot is leveraged by “copying” the snapshot to an actual volume. The data of course is not written again on the array–that would be wasteful. Instead the metadata for the snapshot is copied to an actual volume and that volume just points to the original data. Reads go to those same segments of data and if a new write happens the original one preserved by the snapshot remains and a new segment is allocated for the write. This allows snapshots to be re-used over and over.
Another benefit of snapshots is that you do not need to pre-create any volumes on the target array. You only need to create a new volume when you actually need to leverage a snapshot.
In the protection group (as viewed from the replication target) there is a listing of the remote snapshots that were created either on schedule or on-demand. See the image below:
There are two columns here to note (the other ones are obvious enough that I won’t go into them), transferred and snapshots. Transferred is simply the total amount of raw data that was sent over the wire to create that particular snapshot, which is essentially what has changed since the last snapshot was taken. The snapshot column describes how much space would be reclaimed if that snapshot was deleted. So it does not describe the total space consumed–it does not include shared space, just unique space. The latest snapshot listing though also includes the total unique space of the entire protection group–so if you deleted that snapshot it would roll down to the previous one (if there is one) or if you create a new one it would roll forward into that.
The snapshots listed here are not necessarily individual snapshots, they are a point-in-time for the entire protection group. All volumes are replicated with consistency across the volumes. So each represents (unless there is only one volume in the protection group) multiple volume snapshot that were taken at the same time. If you open up an individual snapshot point-in-time here you will see the individual volume snapshots.
These can then be individually copied to new volumes or pre-existing volumes so they can be presented to a host.
That’s pretty much everything you need to know about replication–it is rather easy. VMware vCenter Site Recovery Manager and our SRA automates the vast majority of this process, save setting up the replication. So in that case it is even easier to recover. If you would like some more nuts and bolts information there is a best practices document that was just posted on community.purestorage.com. Definitely worth a read if you’re curious.