Can Pure make cloud storage better?

Cloud Block Store is now GA! About a year ago, we announced our intentions–to bring Purity (the OS for the FlashArray) to AWS. I wrote a post about it here:

https://www.codyhosterman.com/2018/11/announcing-pure-storage-cloud-block-store-for-aws/

In the past 10 months I have been pretty focused on learning AWS. Not just how to use it, but more importantly, how others are using it. It has been a fun ride–definitely already incorporated some of my learnings into my solution work for on-premises integration and have some cool stuff coming. A lot of my work has been of course on using it, how to deploy EC2, how to deploy VMware Cloud on AWS, managing S3, CloudFormations, IAM, SSM, using the billions of other services in AWS. But much of my focus has been on listening to what people have seen, learned, and want to do with public cloud. AWS and the like have had a decent amount of runway now, so there have certainly been some lessons learned.

What have I seen? Well a couple things:

  • Lift and shift is real (or sometimes called “move and improve” which is probably more accurate). Do I refactor my apps before or after? Many focus on getting it there, then seeing what they can do. A lot of this depends on what you are trying to do, your timelines, and your costs.
  • Deciding “what” should go into AWS is becoming more common. Yes sometimes that answer is “all of it”. Sometimes not.
  • Multi-cloud does not mean “I can move my app anywhere at any time”. Follow-the-cloud applications don’t really make much sense. Would you build an application to be able to seamlessly move from ESXi to Hyper-V at any given time? Of course not, while you could, you would likely lose most if not all of the benefits of either hypervisor. Building an application that can run on everything means it cannot really use anything well. A duck boat can go on water and land, but not really do either well. You have to design for the least common denominator, reducing the value of whatever you put it on. Is AWS the best for this app? Is on-premises the best for this one? Multi-cloud means having a practice to deploy to different places. A consistent practice. E.G. use Terraform to deploy to both AWS and Azure. The days of having a network team, a database team, a VMware team might be going away. Having an on-premises team, an Azure team, an AWS team, and an application consultancy team is becoming a potential reality. More on this later.
  • The conceptual/architectural problems of yesterdays infrastructure do not go away with public cloud. Not entirely. Do I no longer have to deal with hardware and cables? I do not. Do I no longer have to think about availability or DR? Of course you still do.
  • Is everyone going 100% public? No. Is on-premises dead? No. Are people going to use both? Yes. Are some going to only use one or the other? Yes. Is one way or another intrinsically wrong? No–maybe for you, maybe not for someone else. AWS Outpost and Google Anthos is further evidence of this I think.

There’s a lot more, but I want to get to my point.

Btw, if you want a great public cloud podcast to listen to, I HIGHLY recommend https://www.screaminginthecloud.com/ by Corey Quinn. It is a great listen always, and has really helped me get out of my “on-premises” focused thinking.

So can Pure help make public cloud storage better?

The short answer is “damn it, I think it’s absolutely worth a shot–I think if we do it right, we can”. I’ve wavered in my opinion since I first heard we were working on it. But I have settled on cautious optimism. There are tons of value add 3rd party services on top of AWS. Why can’t storage be one of them? VMware Cloud puts vSAN on top of AWS, why can’t we put Purity there too?

CBS has been interesting, it has opened me up, and our conversations with our customers, prospects, and partners, to an entirely new world. An entirely new world though with eerily similar problems.

Let’s start with a simple question. Why does one buy a storage array on-premises in the first place? Why not just buy a bunch of flash drives and call it a day? Well many reasons. One is that most people don’t want to deal with this type of flash and that type of flash. They don’t want to deal with wear issues, and connecting them, implementing dedupe and compression, and protecting the data, and looking at what’s next in flash. People buy storage arrays because the company they buy from deals with those problems. We investigate the new technology and implement it. We make sure it works with VMware and Microsoft and Linux. Let us be the flash experts, so you can focus on what is really important to your business.

As we introduce new hardware and software, or denser capacities, you get it in the background, without changing how you use it. Creating a volume stays the same, provisioning it stays the same. In many ways this is the concept of Evergreen storage. Also, to an extent, the promise of public cloud. Among many things, you don’t need to worry about bad DIMMs or newer CPUs or aging hardware in general. As long as you keep paying your AWS bill, they deal with that. You pay Pure to deal with the storage stuff. You pay AWS to deal with the compute and other infrastructure.

The onus is on Pure and AWS though to keep adding value. Add services, improve the product, so you can focus less and less on the banality of the internals and focus on what your business needs you to focus on. Our goal is to have you WANT to stay on our product, not for you to HAVE to.

With the FlashArray, we have simplified the storage layer. Allowed you to take advantage of our expertise around flash.

Great. But more and more organizations are using public clouds like AWS–where we can’t simply plop a FlashArray (yes, direct connect exists, but cloud native-ish it is not).

So can we help in a world where physical infrastructure doesn’t “exist”?

Let me preface this with, while some of the arguments make sense on their face, it is hard to know for sure until we try. Am I wrong about some things? Probably. But I think the arguments are strong to warrant thinking about it more. I am sure some people are going to fiercely argue I don’t have a clue. And maybe I don’t. But hey, I learn the most when I am wrong.

IT Fundamentals Don’t Change

In public cloud, understand infrastructure availability, and your availability requirements does not go away.

And it is kind of funny these days, when some type of SaaS offering is offline–my immediate reaction is not “dammit vendor X” it is usually “ugh what failure at AWS just happened” (frankly, that is the wrong reaction for a lot of reasons but a conversation for another time). Services fail, on-premises and public alike. Now granted, when a large failure occurs, you can be sure AWS root causes it and improves it. Making that failure less likely. So in all likelihood, your availability will improve over time. This is once again similar to a FlashArray, as we find problems, we fix it and make it better. The onus therefore is on us. Or AWS. But in the end it causes your business to be affected. So understanding the availability of a service or failure domain, and understanding what to do if it or a part of it fails is important. SREs exist for a reason. Then deciding if implementing safe guards around that failure is something your business needs/demands.

Anyways, how do you make things more resilient (or at least recoverable) on-premises? Well one option is array-based replication (asynchronous, synchronous or active-active). This provides the ability to make an application, or at least the data, available across failure domains. This is something we can do with AWS too. 2 CBS instances spread across whatever failure domain you need, using what level of replication you need. Though be mindful of networking charges–this can always be a surprise on an AWS bill. So if the level of protection runs into this–make the cost/benefit ratio of it. It may make sense, it may not. Our data reducing/differencing replication can certainly help here. But replication traffic is certainly non-zero.

But protection can be more pointed. Protect against an EBS failure. Protect against an EC2 instance failure. The ability to present shared block storage still exists.

I think the simplicity of Purity in general can help here. If it is simpler to increase availability–higher availability is simpler achieve. Sounds like a dumb sentence, but there’s truth there. I was once told by a customer that the FlashArray is the simplicity of Pure is the “single biggest crutch to their developers in their environment“. I, to this day, am not sure if that was meant to be a compliment or not, but I think about it a lot.

Elasticity

One of the key benefits of public cloud is elasticity of resources. I use what I need and return to sender what I don’t. A fun example was the recent super computer built in AWS: https://www.top500.org/system/179693

This is something that is tough for on-premises. One way or another that resource is there. Maybe at some level it can be reclaimed and handed to another user. But if you need storage, it needs to physically be there consuming space and power. If you don’t need it this second, it is still there. We have made this better with offering charging by usage instead of up-front (called PureAsAService, formally called ES2) but if you need more storage someone has to order it and ship it and plug it in. Same thing with compute etc.

With public cloud you pay for what you use. We can do this too. As you use storage it can be added, as it goes out of use, the storage footprint can be shrunk. Is this all dynamic and built-in today to CBS? No, certainly more we can do, but the possibilities here are very interesting. If we need more capacity, we could grab it from AWS. If we no longer need it, we could return it. The internals are there–we can shelf evacuate today on-premises to take advantage of denser flash–why not repurpose this? Even the system itself can be deployed on-demand via CloudFormations. CF deploys much faster than a truck can deliver an array or storage shelf.

Multi-Cloud

Ah yes. The much maligned term of “multi-cloud”. I already stipulated that some definitions of this are somewhat unrealistic. Moving an application at any time to any place is not going to happen very often. I’d say the closet this will get to happening is the VMware on Cloud implementations. But that’s an entirely different conversation. In general, the effort to build something to be cloud native, but not native to one given cloud is overkill for most–or maybe even impossible for all intents and purposes. For 99%, I would say investing your developers time elsewhere makes a lot more sense.

Multi-cloud more likely will mean deploying services to different clouds in similar ways. A common automation layer or practice. Having a common storage layer across on-premises and public (or as we look into more places for CBS, maybe public to public) offers identical implementation of deploying storage while still using what you need in the cloud. A common layer that DOES take advantage of the underlying benefits of that particular place (hardware flash or AWS resources). Replication between the two allows for migration, or potentially DR, or portability. It does give you a data “out” if you need to move it–but mainly: provisioning storage in AWS or on-premises is the same, and so is applying storage features while still having the confidence that it is using the stuff below in the right way.

Snapshots and Data Reduction

Probably one of the first arguments around CBS was, “hey couldn’t I use the Purity data reduction to reduce my storage costs in AWS?”.

Makes sense.

Indeed, we can use our data reduction to dedupe and compress–and our wear leveling etc built around that can reduce the amount of I/O that actually needs to be committed. Add snapshots on top of that, and data reduction in our replication–money could be saved. Features native in OSes like simple things such as UNMAP can make sure what is committed down is what is actually in use. Thin provisioning allows for capacity to be dealt out, while only paying for what is in use–UNMAP makes that usage 100% accurate. There is a compute cost though of running CBS–and we also don’t give CBS away (we sell it)–so these things have to actually be used to make this work. Yes, an on-premises FlashArray and Evergreen can actually save you money over other storage options over time, but only if you use it.

But the core value here is the features. Instant snapshots, protection group consistency groups, replication, QoS limits, etc. Array based features that are not cost items. You pay for your capacity and use the features you need.

Our Implementation

The reasons to introduce a new product generally fall into these 2 categories:

  • To solve a customer problem in a new way, or a better way than what exists today
  • To solve a company problem of not being in that market

Sometimes the latter is enough–having a product, even if it isn’t really better can open the door. But this isn’t something we want to do at Pure. We have worked hard to offer a product on-premises that we feel is better than what was out there. We made storage simple. We continued to keep it simple. As new hardware options came up, we integrated it in the best way we could. If you don’t do #1, the product usually is not successful long-term.

When it came to public cloud, we wanted to solve both problems. We didn’t have a play in AWS, outside of direct connect. We could have ported Purity into EC2, slapped some EBS behind it and called it a day. But it doesn’t really solve bullet number 1 up there. So while we could have some new conversations we wouldn’t be able to add much value.

So none of this holds any water unless we start from the right place. Simply porting our FlashArray Purity software to an AWS instance (virtual appliance if you will) is not enough. To make this successful, a lift and shift operation of the Purity code is not enough. It does need to be cloud native. It needs to understand how AWS works and use the unique benefits it offers. This is no different than our challenge when first creating Purity for flash in general. Purity must be written in a way to understand and take advantage of flash, or the FlashArray would never really make it. It wouldn’t be anything else out there that slapped flash into the backend. So this is what we did–we treated AWS as a new hardware release. We took Purity and all of its features and just treated AWS infrastucture as a new version of hardware to support. CBS, deployed via CloudFormations, leverages a combination of S3 and EBS and uses EC2 auto-scaling groups to make sure things stay up. Focusing on how things could fail and how to make sure Purity reacts as non-disruptively as if you had a controller or drive fail on-premises.

So the same Purity code runs on the FlashArray and CBS. If it sees direct flash in FlashArray, it behaves one way. If it sees older SSDs in a FlashArray//M, it behaves differently. If it sees AWS resources it behaves differently. Internally. To the end user it is all the same. Creating a volume on CBS is the same exact API call, or PowerShell commend, or whatever, on the FlashArray and CBS. We continue to improve Purity and as AWS offers up new features we can take advantage of that. Just like AWS improves your environment without you having to make changes–Purity will improve itself as our engineers continue to focus on using AWS resources better (more efficiently, faster, more resiliently, etc.). By viewing AWS as essentially a new hardware platform, but not rewriting Purity, our Purity team can keep adding features that both CBS and FlashArray customers will enjoy. The underlying interaction changes, but the higher level features (snapshots, replication, QoS) they are none the wiser. And therefore niether is the end user. This allows us to make CBS a first class citizen product, without having to create an entirely new business unit of engineers to rival the FlashArray in size.

CBS includes all of the VMware goodness. VASA, vVols, VAAI, it is all in there. External storage is on the public roadmap, so when it is ready, so is CBS (permitting testing and qualification of course).

There was an interesting thread on Twitter the other day where someone suggested that using managed K8s was a mistake. Any bug you ran into or feature lack, you had to wait for that vendor to do it. Who knows how long that will take. Fair point. But one response was, well so what’s the solution? Hire more engineers than google (for instance) or better engineers than google? Also, fair point. I think there is some relation to CBS here too, could you build all of our features and efficiencies into your applications that need persistent data? Sure. But is that something you have the time, desire, or personnel to do? Is “outsourcing” that work to Pure something that could be helpful? I think for some (many?) the answer is likely “yes”.

In the end, all of these small pieces of potential add up to a lot of overall potential I think. Beta testing which has been going on for 9 months or so, has gone through all of the scenarios above. Customers are interested and are trying them. So let’s see what happens in practice. Personally, I am excited about the possibilities. I think Pure can make a difference in public cloud, just like we did on-premises.

2 Replies to “Can Pure make cloud storage better?”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.