Score one for the marketers: why definitions of 'cloud' based computing frustrate me
July 14th, 2009Marketing is a great thing, but sometimes it just goes too far. All of this "cloud" talk is a prime example.
The discussion around cloud reminds me of the proverbial spring/summer activity of laying in the yard as kids and watching the clouds go by, calling out what it is you think this or that cloud is.
"Oooh, look at that one, it's a dragon!"
"What? that's not a dragon, that's a turtle!"
"Uh, huh, but now it's a sheep!"
The cloud is what you want it to be, because it is a giant nebulous concoction of water vapor and ice crystals, constantly changing shape and direction as dictated by the wind. It isn't any one thing.
That's great for childhood games, but as an architect and a consultant, I've just about had enough of all this. IT is a concrete thing. Technology is black and white (or 1 and 0 if you prefer). Nebulousness (is that a word?) does not fit in this environment. You don't get to say to the CEO, "well, it was sort of online, depending on what your definition of 'online' is" (or depending upon your definition of 'is' for you politicos out there). It's on or it's off and the business doesn't care why it was off if they needed it on.
Every IT vendor out there has some definition of 'cloud'. Literally, every single one. Look at the web sites. Everyone can tell you how they fit into the 'cloud', but not a single one of them can tell you what the 'cloud' really is. Of course, VMware and Cisco think they have a good definition, as long as it means running everything on the Cisco unified compute platform with VMware. That's the cloud? Hmmm, I thought 6 months ago that was called consolidation. Amazon thinks they have a definition, as long as it means shipping your data out to their server and storage farm to run compute cycles in their data center. You know, I thought that was just good old-fashioned outsourcing. SAP's got a definition, EMC has one, HP, IBM, everybody out there has one, and they are all centric to the products they've been trying to sell everyone for years. Yes, there are some unique aspects to some of the products (EMC Atmos comes to mind), but for the most part, it's all the same with re-packaging. vSphere and vCloud are ESX server with some really neat new features (and some completely undelivered as of yet promises), but it's still ESX Server. The Cisco unified compute platform is servers. Granted, an interesting architecture and management footprint, but let's be real here, they're servers.
What the 'cloud' really is in the IT context is the same thing it is in real life—vapor. It's the second coming of the xSP model that will save all businesses from the pain of having an IT department. I said to someone delivering a cloud pitch the other day "did you activate a time warp just before you came in here and take me back to 1999/2000? These are the same concepts that were pitched at me about ASPs, data center outsourcing, and eCommerce 10 years ago!" At least it sure feels like it to me.
Over the next few posts, I'm going to keep delivering my take on this topic, if nothing else to flush out some things and vent a little. I think that there is too much hype here, and we're headed to another bubble bursting if we aren't careful. The foundation of a technological breakthrough requires more than just a whizz-bang technology. It requires sound planning, terrific execution, and a goal. I don't think that anyone talking about cloud is providing those. There's a lot of theory, a lot of talking, but not a lot of reality. I'm all for open discussion, but could we maybe back off the child-like excitement that IT is going to be saved and suddenly align with the business based on the 'cloud'?
Well, they did it.
July 9th, 2009EMC wins the Data Domain battle.
http://www.cnbc.com/id/31829763/site/14081545
No real surprise to me, but an exciting thing for sure. What I find interesting in CNBC's article is the continued speculation that NTAP may chase down CommVault, FalconStor, or Sepaton. Even more interesting is the notion that NTAP might put themselves up for sale. Of course, as an ex-EMCer, the idea they might put themselves on the auction block is not a new rumor, but I haven't seen too many folks repeating that idea outside 76 South Street or EMC Field offices.
Will your data benefit from deduplication? Find out by testing dedupe rates with EMC CAT tool download.
July 7th, 2009Is deduplication really all it’s cracked up to be?
With everyone in the industry talking about deduplication, you can’t go 2 minutes without hearing how great it is and outlandish claims regarding deduplication rates. So the question is… is dedupe really all it’s cracked up to be? The answer isn’t really in the deduplication technology itself. It’s actually in the make-up of the data you’re looking to deduplicate. So how do you know if dedupe is the right technology for you?
Do you have a ton of highly-compressed images or multimedia files? These aren’t the ideal data types for deduplication.
Does your environment contain a lot of large databases like SQL, Exchange, Oracle, and Exchange? If that is the case, dedupe can help, but not as much as those crazy marketing numbers say.
Do you have large File Servers? Lots of VMware? Remote offices you need to backup? This is where dedupe really shines and you can really add some efficiencies in your environment. It is also where those numbers like 200:1 or 500:1 come from and can actually be beat in some cases.
Now of course your data doesn’t nicely fit into just one of those categories above. It likely spans two of them, if not all three. So we’re back to the original question…is dedupe the right technology for you? One of the keys to deploying an *effective* deduplication solution is to know where to deploy it, how to deploy it, and what to expect.
Just because you can deduplicate your live VMware or database environment doesn’t necessarily mean you should. There are a lot of implications to trying to deduplicate data that is frequently accessed and performance can severely suffer in some cases. While dedupe is a great technology, it can bring your environment to its knees if implemented incorrectly. I’ll address this in another post because that is a whole different topic. Today I want to focus on how to figure out if, and how, deduplication can benefit your business in the backup process.
Knowing where to deploy deduplication and what rates to expect can really only be determined through an assessment of your current environment. EMC has a great tool called the Commonality Assessment Tool (CAT) that will allow you to look at a subset of your data and see exactly what the commonality is. This tool can be downloaded from the IDS website for free here (click the link for the EMC CAT tool download).
So what is this tool and what can you expect from it? EMC offers a deduplication solution called Avamar, which is a backup software and backup-to-disk appliance all wrapped into one. The CAT Tool is essentially a modified Avamar client that will perform a simulated backup on your server(s) and instead of actually backing the data up, it just tracks what the deduplication rate is and how long the actual backup would have taken. I’ll quickly take you through the process of running this tool and show you how easy it is to figure out exactly how much commonality is in your data.
***One important thing to note before beginning is that the CAT Tool has the same impact on your system as a normal backup client. It is recommended to run it off-hours and not during your regular backup window.
- Download the CAT Tool from the IDS website (click for the deduplication rate test tool download). A link will be emailed to you where you can download a zip file containing the tool.
- Extract the zip file to C:\Avasst. The directory will contain avtar.exe and avasst.exe. *Note: This directory must exist or the tool will not run correctly.
- Open a command prompt by going to Start/Run and running cmd.exe.
- Browse to the CAT directory by typing “cd\Avasst”
- Run the CAT tool by typing “avasst”
- You will be prompted to select a folder to scan. In this example, I will scan the D Drive by entering “d:”. If you want to scan multiple folders at once, see the notes at the end.

- The first time you run the tool, you can expect the tool to take approximately 1 hour per 100GB of file and 1 hour per million files. However, subsequent runs will be much quicker due to deduplication.
- When the tool has completed running, you will see the following screen:
- Now if you look in the c:\Avasst folder, you will see several files that are tracking the deduplication rates and backup times for your data. They are just raw data and need to be run through a tool in order to interpret the results.
- In order to see the full benefits of deduplication, you will want to run this tool against the same dataset several times (at least 3). You can also run it across several different datasets to see commonality across several servers. Since the commonality tracking is stored locally in the c:\Avasst folder, you will want to mount directories from other servers and scan them from this server across the network.
- When you have scanned your datasets, zip the results and send them to your IDS Engineer to have the results interpreted.
Some other notes on the CAT Tool:
- If you want to scan data from other servers, you can simply mount another server’s drive to a drive letter on the local server and scan that drive.
- In a real deduplication solution, all data will be deduplicated globally against other servers and backup sets. Since the assessment tool tracks deduplication locally, you will need to scan all datasets from the same server to see global deduplication benefits.
- The CAT Tool can easily be scheduled using the built-in Windows Scheduler. Those instructions are included in a Word document included with the CAT Tool download.
- If you want to scan multiple folders at once, you will need to create a silent file that contains the folders you want to scan. Simply create a file named “Silent” with no extension in the c:\Avasst folder. Inside of that file, just put a line for each drive you want to scan (see screenshot below)

**Note that you cannot end any entries with a backslash. For example, C: and D:\Users are valid, but C:\ and D:\Users\ are invalid.
Integrated Data Storage Named to Crain's Fast Fifty List 2009, Ranked 15 of 50
June 3rd, 2009On behalf of myself and the dedicated employees of the company, I'd like to say that Integrated Data Storage is honored to be named to Crain's 2009 Fast Fifty list. We see it as a noteworthy accomplishment.
Managing the growth hasn't always been easy, but we're truly grateful for the company's success—for the benefit to the bottom line, yes, but more so because it has enabled us to establish long-term relationships with more clients, partnering with them to deliver integrated solutions that meet their complex data management needs.
It's somewhat ironic that the propeller of our growth and widening market reach (we recently expanded to three new markets, Seattle and Kansas City among them) has been our continued narrow focus on a smaller set of products, each best of breed. This absolute focus has allowed our staff of engineers and salespeople to build an immense depth of knowledge in the design and delivery of our products, along with an ability to expertly fine tune them to create fully integrated and customized solutions for our clients.
In 2002, when our company was founded, the scope of our service was the Chicago area. Last year, for a client with a multinational presence, we successfully rolled out a globe-spanning product implementation of EMC's Avamar that reached across thirty-nine countries. Thanks to the hard work and committment of our team, like the other businesses on the list, we've come a long way in a short time. Now, instead of taking a breath, we look forward to the challenge of continuing the pace—we hope for a long time to come.
So it gets better.......(NTAP to acquire CommVault?)
June 2nd, 2009I don't plan on making this a rumor mill blog, but this one's pretty good. StreetInsider.com and other financial sites are speculating that if EMC wins the battle for Data Domain (how could they not?), NTAP will go after CommVault.
Now THAT's an interesting move that has a wild chance at success! (In case you couldn't tell, that was written with a lot of sarcasm).
If I were CommVault folks, I'd be worried right about now. NTAP has a great track record of buying software companies and turning them into, well, bought companies. Topio went away, Spinnaker is nowhere to really be seen outside of a roadmap slide that shows the same 'Coming Soon' graphic quarter after quarter, the Decru product seems to show up less and less every time I cross paths with NTAP -- are you detecting a theme here?
NTAP does some things very well. They should be rightly credited with driving the unified storage market to the point it is at today, pushing the thinking of the marketplace to places many didn't think it would go. They have proven that 'good enough' in many situations is really good enough, and unified storage platforms can effectively solve many storage needs in the middle and upper middle marketplace.
They also do some things not so well. They don't utilize space on the array very efficiently. Their implementation of Fibre Channel is a little less performant than I'd like it to be at scale. Their track record of acquiring at least moderately successful software companies and turning them into a game-changing feature or product line is, well, less than stellar. They tend to get a little too testy when someone questions their technical methodologies. Nothing horrific. They aren't a bad company, and their technology is good, just not as good as others, in my opinion.
To me, this acquisition just doesn't make sense. NTAP doesn't have a history of being able to sell software effectively. Bringing a company like CommVault into the fold will take superior execution and understanding that the model for selling software can be quite different than that of selling hardware. It took EMC several tries to grasp this, and it almost didn't work out very well for them (and in some cases it could be debated that they grasped it in time. Legato, anyone?).
How do you go from all but acquiring another hardware player that seems a perfect fit to a forced software buy? You're not going to take CommVault and integrate it into the filers like you could the DD code. Even if they could make it work, it would suck down so much performance on the array you'd have to buy FOUR of them to get anything accomplished in your environment. Do you buy them and just leave them alone, letting them do their thing but add to the bottom line? Maybe, but that doesn't really seem to gain them much. Or do you swallow them up hoping it makes you more attractive to HP and they then gobble you both up to replace their anemic and ailing EVA line and obtain a modern backup product? That doesn't seem to be 'in your face' enough for Donatelli to do, but it might just be enough of a poke in Tucci's eye to do it. It would certainly stir the pot up a little more, and would make some rational sense. It seems more likely to me that HP will pursue heavy R & D in the LeftHand product and try to turn some of that 'Invent' power loose to turn the storage world upside down.
What a weird day this turned out to be.