Friday, July 11, 2014

The Number One Inhibitor to Cloud Storage (Part 2 of 2)!

The number one inhibitor is Access! (Part 2)

I've been feeling bad about delaying this second part of my blog, but in hindsight it was good; EMC acquired TwinStrata in the meantime validating the whole premise of my current blog!

Anyways, a few weeks ago I talked about how access, in my view, is the biggest inhibitor to cloud storage. Specifically the five issues are:

1. How do I get massive amounts of data in-and-out of the cloud?
2. How do I get my application to interface with cloud storage?
3. How do I get cloud storage to fit within my current workflow?
4. How do I figure out what data to move to the cloud?
5. Once the data is moved, how do I know it's in the cloud?

Also, the publisher for my OpenStack Swift book is having this contest:
Book Give-away:

Get a chance to win a free copy of the Implementing Cloud Storage with OpenStack Swift, just by commenting about the book with the link -! For the contest we have 7  ecopy each of  the book Implementing Cloud Storage with OpenStack Swift, to be given away to 7 lucky winners.

How you can win:

To win your copy of this book, all you need to do is come up with a comment below highlighting the reason "why you would like to win this book” with the link of the book - Implementing Cloud Storage with OpenStack Swift

Note – To win, the winners must also mention the book link in their comments  -

Duration of the contest & selection of winners:
The contest is valid for 1 week (i.e. from 7/25/14 - 8/1/14), and is open to everyone. Winners will be selected on the basis of their comment posted.

I already discussed topics 1 & 2 in part 1. Let's now discuss topics 3-5.

3. How do I get cloud storage to fit within my current workflow?

Cloud storage must fit within existing workflows. Nobody is going to alter their workflow to accommodate cloud storage. Each industry has an existing workflow. For example, if the medical imaging workflow is to store data into a PACS system and always access data through it, then cloud storage must fit in this paradigm through a cloud gateway or native connector. Expecting users to pull data out of their PACS system and store it into cloud storage, even with compelling cost savings, is unrealistic.

Similarly in media & entertainment there is a very strict workflow for example, pre-production-> production -> post-production -> distribution -> archival. Again, cloud storage needs to fit into this workflow without requiring any changes. This has numerous access implications in terms of the manner-of-access (direct over WAN vs. bulk import/ export vs. utilizing specialized software like Aspera), frequency-of-access (write-heavy or read-heavy), and nature-of-access (collaborative vs. not, very large files ~50GB+ vs. regular large files ~few GB).

4. How do I figure out what data to move to the cloud?

Note: This section ONLY applies to cloud storage that is used for repository/ archival style storage as opposed to a warmer tier of storage.

In clear-cut workflows like medical imaging where say any image older than 3 years is suitable to be moved to the cloud, nothing has to be figured out. However, I've met numerous customers who are all struggling to figure out where their static data sits and how much data is indeed static. They know that they have 100s of TB or PBs of static data, but need help identifying it. This is where file analytics comes in. There are a large number of tools that will scan all your filesystems (including NAS devices) and provide analytics. These analytics may be based on file metadata like age of files, size etc. or they may look deeper into the content e.g. finding all powerpoint files created in Europe that contain the keyword "confidential". In my view, these tools will be invaluable in figuring out what to move to the cloud.

5. Once the data is moved, how do I know it's in the cloud?

Note: This section also applies ONLY to cloud storage that is used for repository/ archival style storage as opposed to a warmer tier of storage.

Once the data mover in the previous section moves the data, how does the user know where their data is? Either the application used to access the data has to know about data movement OR the software has to leave stubs or symbolic links behind. It is also important for there to be robust data integrity checking to know that they data moved into the cloud is indeed intact without any corruption.

Net-net, most public clouds and private cloud storage providers will have to spend a lot of energy solving the access problem. Fortunately my employer EVault, Inc. (part of Seagate) views this as a critical element and has spent a lot of time differentiating in this area ( :-).


  1. Can companies reduce risks in cloud technology? but when this data is stored in the cloudwedge, they have less direct control over leaks. And a cloud storage strategy should be simple to set up, right? to our State of Storage poll citing cost as a major inhibitor to cloud use dropped nine points. There is no one protocol for working nicely together. thanks

  2. Hey guys,

    I just would like to let you know about a service I'm using and it's being really useful to me.
    Finally I could find a platform that joins both email and cloud storage services. I'm using with Box but it's also available for Google Drive. By the way I'm using Google Apps email but I know it's available for Office 365 as well.
    My favorite app is Archive, in which all my emails are also copied and placed into my Box account. Automatically. They have the Importer in which you can send emails to the cloud storage service on demand. It files the email content and attachment.
    I'm now trying and loving Fusion with some accounts in my domain. It basically manage all my incoming and outgoing attachments seamlessly with Box. Control versioning, access restrictions etc. I have to do nothing, it's crazy!
    Take a look if you have a chance, I strongly recommend:


  3. Well, it seem obvious that cloud storage should be able to integrate with data workflows without requiring the customer to re-architect their primary data storage systems. Application vendors, like Moonwalk, can tier data from primary data storage to cloud storage. BridgeHead Software, which specializes in medical data management, can backup PACS data and archive it to cloud storage. These capabilities are generally referred to as information life cycle management or enterprise data management applications. It seems a bit disingenuous of Mr. Kapadia not to mention up front that he is employed by EVault, which was recently pulled completely into Seagate after Seagate bought the company eight years ago. Even though this is not an official EVault or Seagate blog, Mr. Kapadia should mention this fact up front as it has direct bearing on his the last sentence. There is no issue with Mr. Kapadia working for EVault of Seagate, he should just be clear about it up front with his readers.

  4. Well, I also came across a link to this blog from LinkedIn, where Mr. Kapadia mentions that he does marketing at EVault in addition to his OpenStack Swift blogging. That's perfectly legitimate and I recommend that he post the same notice on his Build Cloud Storage blog too.

  5. Yes, I am an EVault/ Seagate employee, and I did mention it in the blog; sorry if it wasn't obvious. I can make it more obvious next time!

  6. The location of this storage facility is extremely convenient. The facilities are clean, well-lit and feel safe. The quality of experience is consistently the same - super chill and cool.
    Storage in Phillip

  7. The Customer service is also impeccable. They will go out of their way to make you feel welcome, they also keep track of people entering the facility which gives me peace of mind when it comes to storing something in the city.
    Car storage in Brisbane

  8. I have read a few of the articles on your website now, and I really like your style of blogging. I added it to my favorites blog site list and will be checking back soon. Please check out my site as well and let me know what you think. best link building service