I recently wrote an EVault blog about the recent OpenStack Summit and the Coming of Age of Swift . The blog talks about the dynamics around Swift at the OpenStack Summit rather than talking about specific feature of Grizzly (which has been covered by a number of blogs & articles). For example, I talk the various unconference sessions which were of very high quality. Please check out the blog.
Build Cloud Storage
Thoughts on Cloud Storage by Amar Kapadia
Wednesday, May 1, 2013
Friday, January 11, 2013
Microserver Architectures & Cloud Storage
First, apologies for a long pause between postings. I was in the middle of changing jobs -- I'm now at EVault, a Seagate subsidiary that offers cloud-connected backup & restore. I'm excited about this for multiple reasons i) I get to work on cloud storage full-time as opposed to a hobby, and they actually pay me for it ;-) ii) EVault is in San Francisco. Just between San Jose to San Francisco means such a sea change in the culture. iii) While I can't talk specifics, the project I'm working on is very ambitious and cutting edge!
Negatives of microservers for Swift:
Hopefully one of the above companies will like this use-case enough to run some real performance test to say one way or another in any conclusive manner, but superficially Cloud Storage systems such as Swift seem like a good target for a microserver architecture with wimpy cores.
Now to the topic of the post - there is tremendous industry buzz around the potential use of microserver CPUs (also called "wimpy" cores and most often associated with ARM SoCs) for datacenter applications as an alternative to traditional "brawny" x86 CPUs.
These are a new class of light-weight power-efficient CPUs that promise to reduce power, real-estate, and cost while delivering the same aggregate performance. It may, of course, take multiple microserver CPUs to match the performance of one traditional CPU. But as we will see further in the post, that may not matter in scale-out architectures.
A microserver, therefore, is a new class of extremely low-power dense server. CPUs with wimpy cores have several common elements to them:
A microserver, therefore, is a new class of extremely low-power dense server. CPUs with wimpy cores have several common elements to them:
- Very low power for a given
unit of performance
- Less number of lighter
cores as compared to a typical Xeon or Opteron CPUs. Typically these processors also
don’t carry the burden of supporting legacy modes, full blown virtualization etc.
- System-on-a-chip (SoC)
integration that eliminates an expensive chip-set
- Need not be based on x86
architecture – While brawny cores in servers are all based on x86, wimpy
cores are mostly based on ARM (with the exception of Intel).
There’s a ton of activity from the vendor side
where companies such as AMD, Marvell, Calxeda, Cavium,
NVidia, Samsung, Applied Micro, TI are working on enterprise class ARM SoCs; while Intel is working on similar Atom products. These companies either already have products or
have announced plans for products in this category. The activity is so frenzied
that wimpy cores might become a self-fulfilling prophecy!
There is also a lot of buzz from end-users e.g. this
article demonstrates FaceBook’s interest in wimpy cores http://www.businessweek.com/articles/2012-07-05/wimpy-cores-are-coming-to-facebook-dot-but-whose-cores.
However, these new lighter-weight CPUs are not a good fit
for all workloads. If one were to broadly classify workloads as A) virtualized
B) scale-up C) HPC and D) scale-out, microservers are best
suited for scale-out computing. This is because scale-out workloads are
typically simple, independent, homogeneous, but numerous and bursty. Scale-out
computing is also based on a lot of open-source code which makes it easier to port
to new server architectures e.g. ARM.
All of this combined would indicate that wimpy cores are a
good fit for cloud storage systems such as Swift. As a reminder Swift is an open-source cloud
storage project that is part of the OpenStack effort. But are microservers
really a good fit here? Let’s take a look:
Positives of a microserver architecture for Swift:
- OpenStack Swift is
open-source and runs on Ubuntu. That’s great for microservers since
Canonical has taken a leadership role in porting Ubuntu to ARM.
- OpenStack Swift is indeed
a scale-out architecture. This bodes well for microservers to be used
here.
- There is a lot of
flexibility in constructing the right compute : memory : storage ratio for OpenStack Swift. In fact one could argue that rather
than sticking 24 drives behind a single or dual socket Xeon/ Opteron
processor, it might actually be a lot more efficient to stick 4 drives
behind one microserver CPU by providing compute a lot closer to storage.
This architecture has the promise to reduce cost and power at the same
time improving reliability and performance!
Negatives of microservers for Swift:
- Most microserver CPUs plan to have SATA interfaces and not SAS. This means the architecture is OK for SATA,
but difficult to use for nearline SAS drives.
- By increasing the number
of compute nodes, we are putting more strain on the network. This
trade-off would have to be looked at.
Hopefully one of the above companies will like this use-case enough to run some real performance test to say one way or another in any conclusive manner, but superficially Cloud Storage systems such as Swift seem like a good target for a microserver architecture with wimpy cores.
Monday, August 27, 2012
Cold Storage Using OpenStack Swift vs. AWS Glacier
Amazon announced their latest IaaS service last week called Glacier, which is intended for cold storage of data. It is 10x cheaper than Amazon S3 (ignoring access charges). Amazon S3 is already ridiculously cheap as compared to enterprise storage and Glacier takes it to the next level. With Glacier, retrieval needs to be infrequent and can take hours. This restriction is what makes it "cold" storage. It seems like tape-as-a-service to me even though Amazon doesn't use this word at all. However if it walks like a duck and talks like a duck, it must be a duck. In this case tape.
The cost equation is amazing. OpenStack does not have an answer. Does this mean a problem for OpenStack? I don't think so, rather I think this is an opportunity. A combination of OpenStack Swift and Linear Tape File System (LTFS) can not only match, but leapfrog AWS Glacier.
The cost equation is amazing. OpenStack does not have an answer. Does this mean a problem for OpenStack? I don't think so, rather I think this is an opportunity. A combination of OpenStack Swift and Linear Tape File System (LTFS) can not only match, but leapfrog AWS Glacier.
Monday, August 6, 2012
Is OpenStack Swift Reliable Enough for Enterprise Use? (Corrected)
2/15/2013
CORRECTION: I had incorrectly interpreted the non-correctable error number as being the probability of a bit-rot. This is not the case. I've been told that the probability of a silent bit-rot error is actually quite low, 1 (bit up to sector) in 10^21 (in reality it is even lower) or lower. Even with this 1 in 10^21 number, the MTTDL improves significantly! Apologies to the Swift community for representing Swift in.
==
In this blog, I’d like to tackle reliability of OpenStack Swift. OpenStack Swift is a very successful open-source object storage project that is suitable for public and private cloud storage. I believe reliability is a really important topic to discuss for enterprise adoption of Swift to progress, even though terms such as mean-time-to-data-loss may put even the most die-hards into a deep slumber J!!
CORRECTION: I had incorrectly interpreted the non-correctable error number as being the probability of a bit-rot. This is not the case. I've been told that the probability of a silent bit-rot error is actually quite low, 1 (bit up to sector) in 10^21 (in reality it is even lower) or lower. Even with this 1 in 10^21 number, the MTTDL improves significantly! Apologies to the Swift community for representing Swift in.
==
In this blog, I’d like to tackle reliability of OpenStack Swift. OpenStack Swift is a very successful open-source object storage project that is suitable for public and private cloud storage. I believe reliability is a really important topic to discuss for enterprise adoption of Swift to progress, even though terms such as mean-time-to-data-loss may put even the most die-hards into a deep slumber J!!
Thursday, July 5, 2012
The Significance of Hadoop running on OpenStack Swift
The folks at BigDataCraft are working on integrating Hadoop with OpenStack Swift; see http://bigdatacraft.com/archives/349 for more. This is really exciting! Most readers might ask the obvious question - Hadoop already runs very well on HDFS. Why would running it on top of Swift be of any interest at all?
There are two ways to answer this question. One is from the end-user point of view and the other is from a Swift-enthusiast point of view. Let's explore each one.
Thursday, April 19, 2012
Zmanda Tackles the Hardware Selection Problem for Swift
With the OpenStack conference going on in San Francisco, we’re
hearing about a number of very interesting announcements & developments
around Swift especially the Essex release. There is a lot to discuss, but I’d
like to focus on Zmanda, a cloud backup startup, that is tackling a very
interesting problem – how does one
select the right hardware for Swift.
Tuesday, February 28, 2012
End-User Feedback on OpenStack Swift: A Deeper Look at UCSD's Implementation
I had previously blogged about UCSD’s OpenStack Swift Storage Cloud. Subsequently I had the good fortune of chatting with Stephen Meier, manager of the storage group at UCSD SDSC, to get more details about their Swift implementation. Here's a synopsis.
Thursday, February 23, 2012
OpenStack Swift’s New Wins at HP, SoftLayer, Wikimedia - a Tipping Point
OpenStack Swift is an open-source object-storage software that can be used to create an Amazon S3 like private or public cloud storage implementation. It started as an internal Rackspace project. Once Rackspace open-sourced Swift, under the OpenStack umbrella, the first wave of adopters was Korea Telecom, Internap, and UCSD (see my blog on UCSD’s implementation).
The second set of wins is with HP, SoftLayer, and Wikimedia (operator of Wikipedia). This is not an incremental progress, in my view Swift has practically won. First, let’s analyze these three wins, then look at why I believe Swift is going to be the winner, and finally see what it means for the industry at large.
Sunday, February 5, 2012
OpenStack Swift Command Line Reference Document
Brian Garvey, Cloud and Hosting Commentator and I have collaborated to put together an OpenStack Swift Command Line Reference document. Hopefully this will make life a little bit easier for Swift administrators.
URL for document: http://www.scribd.com/fullscreen/81218981?access_key=key-1jr68gmpk5zxs07l2olu
URL for document: http://www.scribd.com/fullscreen/81218981?access_key=key-1jr68gmpk5zxs07l2olu
Friday, January 27, 2012
UCSD’s OpenStack Swift Implementation - a Harbinger of Private Cloud Storage
March/5/12 Note: I've posted a follow up to this blog "End-User Feedback on OpenStack Swift: A Deeper Look at UCSD's Implementation".
The University of California’s San Diego Supercomputer Center (SDSC) introduced a Data Storage Cloud using OpenStack Swift last September, making it the largest educational private cloud storage implementation. Pretty awesome, but the question is whether this is a one-off event or is this the start of a trend? In other words, are there benefits of this implementation that will carry over to other educational institutions, government organizations, and even enterprises? Let’s first look at what SDSC implemented and why. Next we can explore the question at hand.
![]() |
| Summary of Benefits to UCSD and Whether they Extend Beyond UCSD |
Wednesday, January 4, 2012
Can OpenStack Swift Hit Amazon S3 like Cost Points?
This table summarizes my conclusion on this topic. The executive summary is that cost is indeed competitive and should not be an inhibitor in you moving forward.
![]() |
| Is a Swift Storage Cluster Cost Competitive with Amazon S3? |
Monday, December 12, 2011
3 use-cases of OpenStack Swift
If you’ve been following my previous blogs, I’ve covered the technical aspects of installing OpenStack Swift with S3 APIs (object storage for cloud applications) and gone through 6 reasons why someone might want to consider it. But so far we haven’t talked about what you might do with it. Essentially, I see 3 major use-cases for Swift:
Tuesday, November 22, 2011
6 reasons to use OpenStack Swift
Executive Summary
Object storage such as OpenStack Swift has distinct benefits over file and block storage for cloud applications. The six key benefits are:
1. Lower capital expenditure cost via the use of commodity hardware
2. Lower operational costs by simplifying expensive file storage practices e.g. locating files, managing files, lack of failure handling, manual reliability model, poor scalability, and manual storage growth
3. New capabilities such as REST APIs, sharing storage across applications, and the mixing of compute and storage on the same node
4. Production ready code
5. Open-source software
6. Powerful, elegant, yet simple architecture
I believe these benefits are compelling and position Swift as the only viable choice for Cloud Storage applications today (other technologies may emerge over time e.g. CEPH).
Tuesday, November 8, 2011
S3 APIs on OpenStack Swift
DISCLAIMER: The views expressed here are my own and don't necessarily represent my employer Emulex's positions, strategies or opinions.
In the last blog http://bit.ly/u5PqEx, I described how I installed a 6 node Swift cluster on a set of EC2 machines. I mentioned that Swift is the only production-ready open-source software available to build your own S3 like cloud storage (there are other alternative technologies that merit observation such as CEPH to see how they mature). So it is only logical that we try putting S3 APIs on top of the Swift cluster. The reason I am such a big fan of S3 APIs is that the API is a de-facto standard for cloud storage. Numerous applications utilize the S3 API and creating a storage system that exposes this API provides instant interoperability with those applications.
Fortunately, the Diablo version of OpenStack Swift comes with optional S3 APIs that can be enabled. Let's enable and then test-drive the S3 APIs.
In the last blog http://bit.ly/u5PqEx, I described how I installed a 6 node Swift cluster on a set of EC2 machines. I mentioned that Swift is the only production-ready open-source software available to build your own S3 like cloud storage (there are other alternative technologies that merit observation such as CEPH to see how they mature). So it is only logical that we try putting S3 APIs on top of the Swift cluster. The reason I am such a big fan of S3 APIs is that the API is a de-facto standard for cloud storage. Numerous applications utilize the S3 API and creating a storage system that exposes this API provides instant interoperability with those applications.
Fortunately, the Diablo version of OpenStack Swift comes with optional S3 APIs that can be enabled. Let's enable and then test-drive the S3 APIs.
Monday, October 24, 2011
Installing an OpenStack Swift cluster on EC2
DISCLAIMER: The views expressed here are my own and don't necessarily represent my employer Emulex's positions, strategies or opinions.
If you want to build your own Amazon S3 like storage (private or public cloud storage), there’s really only one open-source choice – OpenStack Swift (there are other technologies worth watching e.g. CEPH, but I believe Swift is the only one ready for production environments). I decided to test drive it.
I implemented a 6 node Swift cluster on Amazon EC2 (because I don’t own 6 servers). This blog walks through the process. We can discuss the attributes of object stores, the pros & cons of Swift and other advanced topics in subsequent blogs; this one talks about the basic install process. The exercise is quite instructive and gives a really good feel for how Swift is built and how it works. These instructions assume some basic Linux knowledge.
Here is a diagram of the cluster created:
Here is a diagram of the cluster created:
Subscribe to:
Posts (Atom)



