Monday, December 16, 2013

3 Reasons Customers Will Store Exabytes of Long-Term Data onto EVault's New Cloud Storage Service

Just about every industry is creating massive amounts of digital data. Examples range from media & entertainment (film, broadcast TV/radio content), healthcare (medical images), oil & gas (seismological data),  surveillance (video content), bioinformatics (genome sequencing), finance (records), legal (records & discovery), pharmaceutical (research data), engineering (design data)... The list goes on.

This data needs to be stored in a reliable manner. Storage time periods are getting stretched out, in some cases to decades and (I estimate) 100’s of Exabytes of cumulative data between now and 2017 will need to be stored for the long-term.

When retrieved, though not often, the data needs to be accessed immediately. For instance a radiologist needing a medical image or a consumer wanting long-tail video content won’t be satisfied waiting more than a few seconds for it. Something has to be done to address these requirements and it has to be done economically.

Such a storage solution didn't exist (until last week).

Thursday, October 24, 2013

Seagate Kinetic – A Game Changer for Cloud Storage Hardware Architectures

Seagate recently announced a new technology platform called the Kinetic Open Storage Platform that is a genuine game changer for cloud storage hardware architectures (and perhaps other storage architectures as well).  My prediction is that in 2-3 years, cloud storage hardware will be unrecognizable as compared to the classic x86 architecture of today.

Friday, October 4, 2013

Swift Durability and the Mystery of 11 9s

This blog builds on my earlier blog on Swift reliability calculated via MTTDL.

A key measure of cloud storage reliability is a metric called durability. This metric was brought into vogue by Amazon and it is interesting to note that the metric wasn't popular before S3. Durability is defined as the 1 - average annual expected loss of objects as a percentage. For example, 11 9’s of durability means that if you store 10,000 objects you can expect an average loss of a single object every 10,000,000 years. The product of the two i.e. 10^4 objects and 10^7 years gives you 10^11 which corresponds to the 11 9’s.

The question is, can OpenStack Swift match the durability advertised by major cloud storage providers which is 11 9s?

Wednesday, August 14, 2013

OpenStack API Wars – Is Swift API:S3 API::Android:Apple IOS?

I recently wrote an EVault blog about the recent OpenStack API wars. Although most of the discussion is around Nova, the debate also applies to Swift. In this blog I look at the various API options for a public cloud storage offering, and give my 2 cents on what makes sense. Please check it out! 

Also, there's an OpenStack meetup at EVault tomorrow (sponsored by my employer EVault)  where Randy Bias, CloudScaling, and Boris Renski, Mirantis, will debate the API topic. If you haven't registered, there's still time. I think it's going to be a lot of fun as compared to a regular meetup :-)!

Monday, June 10, 2013

Hollywood and OpenStack

I recently wrote an EVault blog about the recent announcement by the Entertainment Technology Center at USC about their Production in the Cloud project . The announcement further states that the project will utilize OpenStack. I think this is a big win for both sides. The media & entertainment industry will win big with  OpenStack cloud technology  to help slash IT costs. OpenStack will win since a key vertical is adopting OpenStack in new and interesting ways. My blog explains my views on this topic in more detail, please check it out!

Wednesday, May 1, 2013

OpenStack Swift Comes of Age with the Grizzly Release

I recently wrote an EVault blog about the recent OpenStack Summit and the Coming of Age of Swift . The blog talks about the dynamics around Swift at the OpenStack Summit rather than talking about specific  feature of Grizzly (which has been covered by a number of blogs & articles). For example, I talk the various unconference sessions which were of very high quality. Please check out the blog.

Friday, January 11, 2013

Microserver Architectures & Cloud Storage

First, apologies for a long pause between postings. I was in the middle of changing jobs -- I'm now at EVault, a Seagate subsidiary that offers cloud-connected backup & restore. I'm excited about this for multiple reasons i) I get to work on cloud storage full-time as opposed to a hobby, and they actually pay me for it ;-) ii) EVault is in San Francisco. Just between San Jose to San Francisco means such a sea change in the culture. iii) While I can't talk specifics, the project I'm working on is very ambitious and cutting edge!

Now to the topic of the post - there is tremendous industry buzz around the potential use of microserver CPUs (also called "wimpy" cores and most often associated with ARM SoCs) for datacenter applications as an alternative to traditional "brawny" x86 CPUs. These are a new class of light-weight power-efficient CPUs that promise to reduce power, real-estate, and cost while delivering the same aggregate performance. It may, of course, take multiple microserver CPUs to match the performance of one traditional CPU. But as we will see further in the post, that may not matter in scale-out architectures.

A microserver, therefore, is a new class of extremely low-power dense server. CPUs with wimpy cores have several common elements to them:

  • Very low power for a given unit of performance
  • Less number of lighter cores as compared to a typical Xeon or Opteron CPUs. Typically these processors also don’t carry the burden of supporting legacy modes, full blown virtualization etc.
  • System-on-a-chip (SoC) integration that eliminates an expensive chip-set
  • Need not be based on x86 architecture – While brawny cores in servers are all based on x86, wimpy cores are mostly based on ARM (with the exception of Intel).
There’s a ton of activity from the vendor side where companies such as AMD, Marvell, Calxeda, Cavium, NVidia, Samsung, Applied Micro, TI are working on enterprise class ARM SoCs; while Intel is working on similar Atom products. These companies either already have products or have announced plans for products in this category. The activity is so frenzied that wimpy cores might become a self-fulfilling prophecy!

There is also a lot of buzz from end-users e.g. this article demonstrates FaceBook’s interest in wimpy cores

However, these new lighter-weight CPUs are not a good fit for all workloads. If one were to broadly classify workloads as A) virtualized B) scale-up C) HPC and D) scale-out, microservers are best suited for scale-out computing. This is because scale-out workloads are typically simple, independent, homogeneous, but numerous and bursty. Scale-out computing is also based on a lot of open-source code which makes it easier to port to new server architectures e.g. ARM.

All of this combined would indicate that wimpy cores are a good fit for cloud storage systems such as Swift. As a reminder Swift is an open-source cloud storage project that is part of the OpenStack effort. But are microservers really a good fit here? Let’s take a look:

Positives of a microserver  architecture for Swift:
  • OpenStack Swift is open-source and runs on Ubuntu. That’s great for microservers since Canonical has taken a leadership role in porting Ubuntu to ARM.
  • OpenStack Swift is indeed a scale-out architecture. This bodes well for microservers to be used here.
  • There is a lot of flexibility in constructing the right compute : memory : storage ratio for OpenStack Swift. In fact one could argue that rather than sticking 24 drives behind a single or dual socket Xeon/ Opteron processor, it might actually be a lot more efficient to stick 4 drives behind one microserver CPU by providing compute a lot closer to storage. This architecture has the promise to reduce cost and power at the same time improving reliability and performance!

Negatives of microservers for Swift:
  • Most microserver CPUs plan to have SATA interfaces and not SAS. This means the architecture is OK for SATA, but difficult to use for nearline SAS drives.
  • By increasing the number of compute nodes, we are putting more strain on the network. This trade-off would have to be looked at.

Hopefully one of the above companies will like this use-case enough to run some real performance test to say one way or another in any conclusive manner, but superficially Cloud Storage systems such as Swift seem like a good target for a microserver architecture with wimpy cores.

Monday, August 27, 2012

Cold Storage Using OpenStack Swift vs. AWS Glacier

Amazon announced their latest IaaS service last week called Glacier, which is intended for cold storage of data. It is 10x cheaper than Amazon S3 (ignoring access charges). Amazon S3 is already ridiculously cheap as compared to enterprise storage and Glacier takes it to the next level. With Glacier, retrieval needs to be infrequent and can take hours. This restriction is what makes it "cold" storage. It seems like tape-as-a-service to me even though Amazon doesn't use this word at all. However if it walks like a duck and talks like a duck, it must be a duck. In this case tape.

The cost equation is amazing. OpenStack does not have an answer. Does this mean a problem for OpenStack? I don't think so, rather I think this is an opportunity. A combination of OpenStack Swift and Linear Tape File System (LTFS) can not only match, but leapfrog AWS Glacier.

Monday, August 6, 2012

Is OpenStack Swift Reliable Enough for Enterprise Use? (Corrected)

CORRECTION: I had incorrectly interpreted the non-correctable error number as being the probability of a bit-rot. This is not the case. I've been told that the probability of a silent bit-rot error is actually quite low,  1 (bit up to sector) in 10^21 (in reality it is even lower) or lower. Even with this 1 in 10^21 number, the MTTDL improves significantly! Apologies to the Swift community for representing Swift in.

In this blog, I’d like to tackle reliability of OpenStack Swift. OpenStack Swift is a very successful open-source object storage project that is suitable for public and private cloud storage.  I believe reliability is a really important topic to discuss for enterprise adoption of Swift to progress, even though terms such as mean-time-to-data-loss may put even the most die-hards into a deep slumber J!!

Thursday, July 5, 2012

The Significance of Hadoop running on OpenStack Swift

The folks at BigDataCraft are working on integrating Hadoop with OpenStack Swift; see for more. This is really exciting! Most readers might ask the obvious question - Hadoop already runs very well on HDFS. Why would running it on top of Swift be of any interest at all?

There are two ways to answer this question. One is from the end-user point of view and the other is from a Swift-enthusiast point of view. Let's explore each one.

Thursday, April 19, 2012

Zmanda Tackles the Hardware Selection Problem for Swift

With the OpenStack conference going on in San Francisco, we’re hearing about a number of very interesting announcements & developments around Swift especially the Essex release. There is a lot to discuss, but I’d like to focus on Zmanda, a cloud backup startup, that is tackling a very interesting problem – how does one select the right hardware for Swift.

Tuesday, February 28, 2012

End-User Feedback on OpenStack Swift: A Deeper Look at UCSD's Implementation

I had previously blogged about UCSD’s OpenStack Swift Storage Cloud. Subsequently I had the good fortune of chatting with Stephen Meier, manager of the storage group at UCSD SDSC, to get more details about their Swift implementation. Here's a synopsis.

Thursday, February 23, 2012

OpenStack Swift’s New Wins at HP, SoftLayer, Wikimedia - a Tipping Point

OpenStack Swift is an open-source object-storage software that can be used to create an Amazon S3 like private or public cloud storage implementation. It started as an internal Rackspace project. Once Rackspace open-sourced Swift, under the OpenStack umbrella, the first wave of adopters was Korea Telecom, Internap, and UCSD (see my blog on UCSD’s implementation).

The second set of wins is with HP, SoftLayer, and Wikimedia (operator of Wikipedia). This is not an incremental progress, in my view Swift has practically won. First, let’s analyze these three wins, then look at why I believe Swift is going to be the winner, and finally see what it means for the industry at large.

Sunday, February 5, 2012

OpenStack Swift Command Line Reference Document

Brian Garvey, Cloud and Hosting Commentator and I have collaborated to put together an OpenStack Swift Command Line Reference document. Hopefully this will make life a little bit easier for Swift administrators.

URL for document: