| « Why Back Up Your Business Data? Server and Email Restores Aren't Good Enough Reasons... | Deduplication Wars: EMC Avamar vs CommVault Simpana » |
Data Storage Virtualization Technology: A Great Tool, But No Panacea
I love technology. There is nothing better than waking to the news of the next cool tool, gadget, or blinky thing. However, I have recently become a bit cynical of recent releases in the storage world. Many of these new “innovations” have been the product of the marketing teams more than the engineering teams at our storage manufacturers. Everyone out there is touting their storage “virtualization” and I have completely lost track of what that term actually means. So my apologies for the next few paragraphs which are a little more negative than usual.
Head trip: Middle managers for your storage TPS reports…
IBM, Hitachi and EMC have long had head appliances you can shove everything from ghetto RAID to ultra high end storage behind and present it all as a single storage pool from which to carve. In this case we are creating one big “virtual” array from lots of other smaller arrays. I suppose there was a time when the companies out there who needed Petabytes of storage may have wanted this to simplify their lives by having a single interface and had a few hundred thousand dollars burning a hole in their budget. I have never worked with that finance guy (rare as the Loch Ness Monster), but I would love to. These days even mid range arrays scale to just shy of a Petabyte and have a very healthy set of tools so it seems a shame to dumb them down to disk behind this “one ring to rule them all”.
Big Pool: Come on in, the storage is fine…
Here we are talking about putting all (or a big bunch) of your disks into one big RAID and slicing off storage as you need it. Wow, now that I just wrote that I have no idea what is new about this (sneaky marketers). VMware is even doing this in software. I guess what has changed is the messaging from the storage vendors saying that all your storage is ideal for all of this. I get that more spindles thrown at an IO problem is great, but I don’t know if I want to comingle my critical Oracle data on the same spindles as my summer intern’s home drive who may very well upload his classic Ren and Stimpy collection to the public share during month end cube runs. Call me crazy. I guess if you have so many disks that this does not affect your performance why not, but again this is the classic mark of the wayward finance guy. Fencing off a little IO for my critical apps still seems like a good idea unless you have some Quality of Service tools in the array to prevent dropping business critical apps for business social tasks. These QoS tools are becoming more available all the time from the likes of EMC. Net of net, I would be wary of a salesperson or their engineer calling this strategy a technology. The worst part is that they are selling it as something to make the IT person’s life easier. I am a lazy as the next guy but I don’t know if I am going to risk my performance (and maybe my job) to save myself one more trip down a wizard to create a RAID before I carve my LUN.
Performance Pyramid: Step on up, for now…
Now we get into what I would really call a new technology. Here we get into moving away from your classic contiguous LUN idea and move to a LUN actually being an amalgamation of stripes of data from several different RAIDs and potentially from RAIDs of different performance bands. Pillar tried pulling this off in the array by moving data that needed more performance to the edge of the platters. This, in a lab, really would result in better performance. That said, the array still puts data on the inner radials so the actuator arm is still working very hard and wanders away from your performance sensitive data. It was a great concept but my experience what that this was a very “gen 1” way of going about it. Generation two has the LUN moving between multiple kinds of disks based on its temporal needs. If it does not need it to be fast, the LUN moves down to SATA, if the LUN needs heat, it moves up to 15K SAS/ Fibre Channel drives until it qualifies to move back down. Here the algorithm on what moves when is key. If the window of decision is too big, by the time the array decides the data needs more performance, the job may already be done requiring the performance. If the window is too small, minor blips from an application could result in hundreds of gigs of data flying around the array, wasting precious IOPS. Here is where the policy engine will rule and I expect these engines to get very complex very quickly but the resulting savings from potentially overbuilding arrays for end of month style processes will be worth it.
Performance Soup: Pay no attention to the man behind the curtain…
So now that we have awesome engines making really good calls on when the data can move up and down based on educated empirical data lets reduce the LUN to smaller components and move the busy pieces all the way to flash so I don’t use any more of the prime real estate than necessary to get the job done. Most of the LUN may actually be on SATA, and still other pieces may sit on Fibre Channel or SAS. This is the end game until solid state drives are so cheap we all just buy as much as we need for capacity and walk away. This small segment migration is happening now. The array that will win the day will be the one with the smartest (not fastest) head since it will be up to the rule base to make the magic happen.
So the good news is that there are some really applicable technologies out there changing the face of storage as we know it. Just don’t let the marketing teams fool you into mistaking old school lazy for next gen cool.
Feedback awaiting moderation
This post has 58 feedbacks awaiting moderation...