Friday, July 11, 2014

The Number One Inhibitor to Cloud Storage (Part 2 of 2)!

The number one inhibitor is Access! (Part 2)

I've been feeling bad about delaying this second part of my blog, but in hindsight it was good; EMC acquired TwinStrata in the meantime validating the whole premise of my current blog!

Anyways, a few weeks ago I talked about how access, in my view, is the biggest inhibitor to cloud storage. Specifically the five issues are:

1. How do I get massive amounts of data in-and-out of the cloud?
2. How do I get my application to interface with cloud storage?
3. How do I get cloud storage to fit within my current workflow?
4. How do I figure out what data to move to the cloud?
5. Once the data is moved, how do I know it's in the cloud?

Also, the publisher for my OpenStack Swift book is having this contest:
Book Give-away:

Get a chance to win a free copy of the Implementing Cloud Storage with OpenStack Swift, just by commenting about the book with the link -! For the contest we have 7  ecopy each of  the book Implementing Cloud Storage with OpenStack Swift, to be given away to 7 lucky winners.

How you can win:

To win your copy of this book, all you need to do is come up with a comment below highlighting the reason "why you would like to win this book” with the link of the book - Implementing Cloud Storage with OpenStack Swift

Note – To win, the winners must also mention the book link in their comments  -

Duration of the contest & selection of winners:
The contest is valid for 1 week (i.e. from 7/25/14 - 8/1/14), and is open to everyone. Winners will be selected on the basis of their comment posted.

I already discussed topics 1 & 2 in part 1. Let's now discuss topics 3-5.

3. How do I get cloud storage to fit within my current workflow?

Cloud storage must fit within existing workflows. Nobody is going to alter their workflow to accommodate cloud storage. Each industry has an existing workflow. For example, if the medical imaging workflow is to store data into a PACS system and always access data through it, then cloud storage must fit in this paradigm through a cloud gateway or native connector. Expecting users to pull data out of their PACS system and store it into cloud storage, even with compelling cost savings, is unrealistic.

Similarly in media & entertainment there is a very strict workflow for example, pre-production-> production -> post-production -> distribution -> archival. Again, cloud storage needs to fit into this workflow without requiring any changes. This has numerous access implications in terms of the manner-of-access (direct over WAN vs. bulk import/ export vs. utilizing specialized software like Aspera), frequency-of-access (write-heavy or read-heavy), and nature-of-access (collaborative vs. not, very large files ~50GB+ vs. regular large files ~few GB).

4. How do I figure out what data to move to the cloud?

Note: This section ONLY applies to cloud storage that is used for repository/ archival style storage as opposed to a warmer tier of storage.

In clear-cut workflows like medical imaging where say any image older than 3 years is suitable to be moved to the cloud, nothing has to be figured out. However, I've met numerous customers who are all struggling to figure out where their static data sits and how much data is indeed static. They know that they have 100s of TB or PBs of static data, but need help identifying it. This is where file analytics comes in. There are a large number of tools that will scan all your filesystems (including NAS devices) and provide analytics. These analytics may be based on file metadata like age of files, size etc. or they may look deeper into the content e.g. finding all powerpoint files created in Europe that contain the keyword "confidential". In my view, these tools will be invaluable in figuring out what to move to the cloud.

5. Once the data is moved, how do I know it's in the cloud?

Note: This section also applies ONLY to cloud storage that is used for repository/ archival style storage as opposed to a warmer tier of storage.

Once the data mover in the previous section moves the data, how does the user know where their data is? Either the application used to access the data has to know about data movement OR the software has to leave stubs or symbolic links behind. It is also important for there to be robust data integrity checking to know that they data moved into the cloud is indeed intact without any corruption.

Net-net, most public clouds and private cloud storage providers will have to spend a lot of energy solving the access problem. Fortunately my employer EVault, Inc. (part of Seagate) views this as a critical element and has spent a lot of time differentiating in this area ( :-).


  1. Well, it seem obvious that cloud storage should be able to integrate with data workflows without requiring the customer to re-architect their primary data storage systems. Application vendors, like Moonwalk, can tier data from primary data storage to cloud storage. BridgeHead Software, which specializes in medical data management, can backup PACS data and archive it to cloud storage. These capabilities are generally referred to as information life cycle management or enterprise data management applications. It seems a bit disingenuous of Mr. Kapadia not to mention up front that he is employed by EVault, which was recently pulled completely into Seagate after Seagate bought the company eight years ago. Even though this is not an official EVault or Seagate blog, Mr. Kapadia should mention this fact up front as it has direct bearing on his the last sentence. There is no issue with Mr. Kapadia working for EVault of Seagate, he should just be clear about it up front with his readers.

  2. Well, I also came across a link to this blog from LinkedIn, where Mr. Kapadia mentions that he does marketing at EVault in addition to his OpenStack Swift blogging. That's perfectly legitimate and I recommend that he post the same notice on his Build Cloud Storage blog too.

  3. Yes, I am an EVault/ Seagate employee, and I did mention it in the blog; sorry if it wasn't obvious. I can make it more obvious next time!