Tuesday, May 20, 2014

The Number One Inhibitor to Cloud Storage (Part 1 of 2)!

The number one inhibitor is Access!

Before I jump in, a quick comment:
I’ve been missing-in-action on my blogs because I’ve been busy co-authoring a new book “Implementing Cloud Storage with OpenStack Swift". As of today, the book is available at http://www.packtpub.com/implementing-cloud-storage-with-openstack-swift/book (will be available on Amazon and B&N in a couple of days)! Until tomorrow users can take advantage of this 20% off promo code: uwQF3UaR (first 300 orders) on the Packt website. Now that the book is complete, I hope to get back to my regular blog schedule i.e. about 1 every 2 months.
OK, now on to the topic-at-hand, to paraphrase a quote from President Bill Clinton’s 1992 campaign, “It’s the access, stupid”  (of course, this is comment is being addressed to me and not the reader :-). Before the availability of EVault’s Long-Term Storage Service (LTS2), if you had asked me about the key problem(s) with cloud storage, I would have naively gone on-and-on about core storage issues like durability, scalability, management, eventual consistency etc. However, now after having talked to numerous customers, my view is different. The number one problem is: how does a user access cloud storage? In fact, I’ve found that most customers actually accept that the storage piece in cloud storage is taken care of and are much more worried about access. Let me explain the access problem through these five questions:

  1. How do I get massive amounts of data in-and-out of the cloud?
  2. How do I get my application to interface with cloud storage?
  3. How do I get cloud storage to fit within my current workflow?
  4. How do I figure out what data to move to the cloud?
  5. Once the data is moved, how do I know it's in the cloud?
Let’s discuss problems 1-2 in this blog and problems 3-5 in a subsequent blog.

1. How do I get massive amounts of data in-and-out of the cloud?

If I have 50TB or data, the WAN (wide-area-network) is fine; but if I have 500TB or 5PB, how do move data in and out of the cloud? Although this topic applies more to a public cloud, it also applies to a private cloud. See this simple table below.

Therein lies the problem, the WAN is not very efficient for large data transfers. There are three solutions to this problem:
  • Dedicated links: If the data transfer is sustained (as opposed to initial migration), a dedicated link may be more suitable than a WAN link. If it’s only initial migration on the other hand, you need elastic networking (burst spot bandwidth), where users can get a large amount of bandwidth for a short period. However the concept of elastic networking is in its early days and not easily available.
  • Physical media: Disk or tape shipped via fedex or UPS can be used for one-time bulk data transfer. This is a reasonably inexpensive way to solve the problem, but involves a lot of logistic and people issues. My current employer, EVault, is putting a lot of energy here for LTS2.
  • Hybrid cloud storage (or cloud-integrated-storage): In storage, a hybrid cloud means something completely different than in compute. In compute, hybrid means spill-over excess compute capacity. In storage, it means having the ability to migrate data from a private cloud to a public cloud or to be able to create an additional copy in a public cloud. In other words a hybrid cloud can be used for data transfer. A hybrid cloud does not inherently reduce migration time, but a long migration period can be tolerated since this is happening in the background without disrupting any other operation. Cloud-integrated-storage or CIS is a similar concept where traditional block or file storage device has a back-end interface to cloud storage. CIS accomplishes the same thing as hybrid cloud storage where the migration can occur as a background process. In my opinion, within 3-5 years, every storage product will be CIS enabled.

2. How do I get my application to interface to cloud storage?

Randy Bias, in his recent blog shows how next gen apps are the future, and how next gen cloud applications will equal existing applications in size by 2018. Unfortunately, this is not the case today! This means applications don’t yet talk to storage using Swift or Amazon S3 compatible http REST APIs and instead rely on traditional block or file interfaces. I had honestly expected more applications to be ported to REST APIs. Furthermore, applications ported to one cloud may not necessarily work with another cloud with the same API. The two solutions to this application compatibility problem are:
  • Cloud gateways: Cloud gateways perform protocol translation between traditional block or file storage interfaces like iSCSI, CIFS, NFS to REST APIs (Swift or S3). They optionally add other value like WAN optimization, compression, deduplication, encryption, file sharing etc. While these gateways solve the interface problem, they introduce a new one: the data stored on the cloud is in the cloud gateway vendor’s proprietary format. So not ideal!
  • Cloud connectors based on standards: Applications porting natively to cloud storage is the right answer. However, this needs to be done with interop in mind. While an organization like SNIA has spent substantial amount of time creating a new REST API called CDMI for cloud storage that nobody uses, if they or some organization were to put the same energy around a Swift or S3 compatibility in terms of standardization and interop testing, it would solve a huge problem for the industry. More applications would start implementing cloud connectors sooner to connect to cloud storage, if they were ensured interop across multiple public clouds.
In my next blog, I'll cover problems 3-5.


  1. If you look at the benefits of cloud computing, you can sum up adding things like cost savings, elasticity, load bursting, scalability, storage on demand, etc. These can be called as the most usual and advertised benefits of cloud computing, which people in the solid business case for employing either the third party services or the virtualized data center.
    cloud wedge

  2. Hi Amar,

    you are very correct with your second point. I believe every one is convinced that Object Storage is the future. But on the other hand it is very hard to actually use it due to its interface and behavior. Let's say that you want to run a Virtual Machine on top of it. There are 2 possibilities, or you change the hypervisor or you change the interface of the storage so it delivers block storage. Option 1 is impossible. Option 2 (this is what Ceph has tried) is very hard to do. It will be hard to make it scalable and performong, let alone cover eventual consistency. The only real option is, like with cloud gateways for files, building a layer in between the hypervisor and the object storage. This is what we have done with Open vStorage (http://openvstorage.com/), build a solution which turns object storage in block storage usable by VMs. This is not an easy task as it needs to be scalable, performant, reliable, etc. We also believe that this layer is to important to be proprietary, so that is why we have open-sourced it.

    If you would be interested, I've written a white paper about how you can turn OpenStack Swift into a performing VM storage platform: http://download.openvstorage.com/whitepaper_Swift.php