FlashStack VMware vSphere Reference Architecture

Pure Storage announced last week our very first converged architecture offering appropriately named FlashStack. The initial release of FlashStack is built off of Cisco hardware (UCS of course) and the FlashArray. We have two reference architectures presently, one for VMware Horizon View and one for general purpose VMware vSphere environments (choose your own guest OSes). My colleague Ravi Venkat (@ravivenk) architected the View ref arch, while I focused on the general vSphere one. In this blog post I am going to overview what we did with the vSphere ref arch. For more information on either, refer to the respective reference architecture white papers at the usual place:

pure_storage_whitepaper_flashstack_horizonviewpure_storage_whitepaper_flashstack_vsphere

 

 

 

The main goal of this reference architecture is to describe performance sizing for the FlashArray, not capacity, nor compute. The field already has a very good handle on capacity sizing and Cisco has mastered compute sizing for UCS years ago. So no need to re-invent those wheels.

Performance sizing for general vSphere, or well any virtualization platform, when it comes to storage is tricky. By definition, general virtualization environments are all over the place with their workloads. You don’t and can’t know exactly what people are going to be running on VMware. So basically you have to tell them what the storage can do and they can work it backwards from that since they know their applications–or at least certainly have a better idea that I. The problem that is often run into here is that these sizing white papers turn into marketing vanity numbers. I can do 500,000 IOPS! I can do 9 GB/s! Etc. They use completely unrealistic workloads to generate falsely high performance benchmark numbers. A common mistake here is to use extremely small IO sizes. 8 KB or 4 KB. 100% reads and the like. Makes for nice charts but is completely useless to the end-user when sizing.

So the first thing we did was look out our customer environments from our Cloud Assist platform. Cloud assist is a really sweet proactive call-home and monitoring tool that we use to make sure our customers are having the right experience and to detect problems before they even know (much more than that actually). We looked at ones we knew to be VMware environments in scope and observed their workloads, I/O sizes, read/write ratio and all of that.

We observed that the average I/O size was much larger than 8 KB–about 27.6 KB in average. Of course, not all I/Os were that size, it’s a spectrum like any thing else. The spectrum we came up with is as follows:

iospectrum

A mixture of 4 KB, 8 KB, 16 KB, 32 and 64 KB. Equaling out to 27.6.

So what we did was develop a reference virtual machine that we called a “building block”. This building block was a simple VM running VDBench that was set to run its workload upon startup. Each virtual machine ran the same workload but to its own dataset.

workload

 

Each virtual machine would do 40 IOPS, 3:1 I/O ratio and we chose a hot section of data to run the workload on–as most datasets have hot sections and cold sections. The virtual machine first does a “fill” stage where it just writes the data out (AFA testing should always include overwrites!) and then a workload stage where it actually runs this workload. This virtual machine is then turned into a template and deployed many, many times. In this case 1,024.

Interesting note–it took me only 12.5 minutes to deploy all of those virtual machines with XCOPY on the FlashArray. Using PowerCLI I wrote a small script to issue these deploy from template operations. At first it took me almost 45 minutes to do this, but I realized I was hitting the concurrency limits in vCenter. I made a few changes (multiple templates etc.) and it went down to about half–then I realized that PowerCLI simply couldn’t issue commands fast enough to keep up with the speed of XCOPY on the FlashArray (yes, I used -RunAsync) so I used concurrent PowerCLI scripts to get it down to 12.5 minutes. Blazingly fast.

Below is a printout of 1,024 building block virtual machines running their workloads at the same time:

8hosts

You can see the latency is well below 1 ms. The VMs in total are doing 40,000 IOPS and about 1 GB/second of read/write I/O. You will also notice if you look closely the I/O size is 26.6 KB at that point in time.

latency

We certainly have customers who are doing more than this, and obviously there is room to still grow on the array. Can certainly do more reads. But this is a good comfort spot I think. A bit conservative too, because normally customers won’t have this consistent of an overwrite pattern. This is a defensible workload that I think should help customers accurately size their FlashStacks.

So I just wanted to a do a quick overview. Head over to the FlashStack launch page for more information on the program!