Commercialization of OpenStack Object Storage

Commercialization of OpenStack: Object Storage

April 26, 2010

Joe Arnold, Cloudscaling
Dr. Jinkyung Hwang, KT
Dr. Jaesuk Ahn, KT

Wednesday, April 27, 2011

Building cloud infrastructure for
telcos and service providers


- Thanks to the core Swift team. They've been invaluable in sharing their knowledge about the system.
- We've brought to market several OpenStack Object Storage systems for our customers. We’re leading
the charge on large-scale deployments of OpenStack Object Storage.
- Our focus is on building infrastructure cloud services for telcos and service providers. To do this we've
focused on integrating the hardware, software and operational components so that our customers can
go to market with a fully-integrated stack.

•Cloud Visionaries
•Infrastructure Cloud Services
•End-user Cloud Products
•Very involved in Korean OpenStack Community


- KT has been visionaries in the cloud computing space.
- Cloudscaling has been working with KT for about a year. In that time, Cloudscaling has
helped KT launch infrastructure compute clouds including an object storage system based on
Swift.
- Released end-user cloud products
- Kicked-off Korean OpenStack Community

Billions of Objects in S3

300

225

150

75
Q4 2006
Q4 2007
Q4 2008
Q4 2009 0
Q4 2010


- Storage is growing.
- Applications are sprouting up for Tablets/Games/mobile devices. That application data is
living in the cloud
- Media consumption over the internet is increasing. Volume of that data is increasing.
- Need for asset storage is large.
- Users are participating and consuming more than they ever have. Social media, online
video, user-generated content are all contributing to the vast need for easily-consumable
storage systems.

Today’s storage systems need to supply endless storage.

Rackspace runs billions of objects and petabytes of ﬁles.

Clearly there is demand for these types of services.

Brief Refresher on
OpenStack Object
Storage


Object Storage

API

Data Storage


- objects via HTTP
- Not traditional filesystem
- not blocks
- GET/PUT/Delete over REST API
- Object storage is not a traditional filesystem, or a raw block device.
- It’s just containers (folders) and objects (files) that’s available via an HTTP API.
- It can’t be mounted like a folder in your OS directly.
- There isn’t random-access to files and there can be multiple concurrent writers, so it’s
unsuitable for transactional applications like traditional relational databases. Also, it doesn’t
provide raw data blocks that an operating system can form into a filesystem, so it’s unsuitable
for booting an OS.

- Applications need to be designed with object storage in mind. As object storage is partition
tolerant, it’s not possible to create file-system locks. The newest file wins. Applications need
to be designed designed with this in mind.

Upload

PUT

Data Storage


- A simpliﬁed view of of upload.
- A client makes a REST API request to PUT an object into an existing Container. The request
is received by the cluster.
- The data then is sent to three locations in the cluster. At least two of the three writes must
be successful before the client is notiﬁed that the upload was successful.

Download

GET

Data Storage


- A request comes in for an Account/Container/Object. One of the location is determined. A
lookup in the Ring reveals which storage nodes contain that partition. A request is made to
one of the storage nodes to fetch the object and if that fails, requests are made to the other
nodes.

Horizontal Growth & Concurrency


- OpenStack Object Storage is designed to have linear growth characteristics. As the system
gets larger and requests increase, the performance doesn’t degrade. To scale up, the system
is designed to grow where needed — adding storage nodes to increase storage capacity,
adding compute capacity as requests increase and growing network capacity where there are
choke points.

- Space available isn’t a useful statistic alone. A key benchmark is the storage system’s
concurrency. Swift it able to be conﬁgured to handle a great number of simultaneous
connections.

- It’s great to have the ability to scale the storage system as your customers’ applications
grow.

Fantastic Durability/Availability properties

Durability - Data Persists Availability - Access to the data
Auditors Shared-nothing access tier
Replicators Data served by any Zone
Independent Zones


- Durability:
- As we all know, the 2nd worst thing you can do in this business is loose someone’s data.
The ﬁrst, or course, being to corrupt customer’s data. Durability refers to the systems ability
to not lose or corrupt data.
- These systems are extremely durable. To achieve extreme durability numbers,
-- objects are distributed in triplicate across the cluster.
-- Auditors run to ensure the integrity of data to check for bitrot.
-- Replicators run to ensure that enough copies are in the cluster. In the event that a device
fails, data is replicated throughout the cluster to ensure there remains three copies.

- Availability: Ability to for the data to be accessed.
- The servers that handle incoming api requests scale up just like any “front-end” tier for a
web application. The system is architected to use a shared-nothing approach and can use the
same proven techniques that have been used to provide high-availability by many web
applications.
- Early in a client deployment we went into pre-production (closed BETA) without monitoring
and a server had failed without noticing it. There was no service interruption and Swift
dutifully replicated data across to other nodes to keep 3 copies of data in place. We ﬁnally
noticed when peak throughput numbers weren’t quite as high as they were previously. This
really points out the robustness of the Swift architecture.

Zones: Failure Boundaries

1 2 3


- Another feature is the ability to deﬁne failure zones. Failure zones allow a cluster to be
deployed across physical boundaries which could individually fail. For example, a cluster
could be deployed across several, nearby data centers and be able to survive multiple
datacenter failures.
- 3 copies of each bit of data is distributed across zones
- We go for rack-per-zone. That means we plan for rack outages of storage servers.
- At Swift’s smallest, a zone could be a single drive or a grouping of a few drives. This scale
of deployment is quite useful for creating development / staging environments.

Five Zones

1 2 3 4 5


How this translates into a deployment-
- Everything in Swift is stored, by default, three times. There are three copies of just about
everything the system needs to store data.
- In order for three copies of the data be stored, at ﬁrst blush, it seems like it would make
sense for there to be three zones. However, Swift is designed to be a durable, highly-available
system. It needs its three copies of everything – at all times.
- If a Zone goes down from a three-Zone system, there will only be two zones left!
- Five Zones is recommended as a starting point because if a Zone goes down, there will be
other zones for data to be replicated to. Having at least ﬁve zones leaves enough wiggle room
to accommodate the occasional Zone failure and enough capacity to replicate data across the
system.

Object Storage for
Service Providers



- I’d like to recommend OpenStack Object Storage (Swift), what else?
- The software that has been battle tested by the huge deployment at Rackspace. Billions of
objects & Petabytes of storage.
- Something is never ‘proven’ until it’s running at scale. So, by that measure, Rackspace
Cloud Files (and Swift), which is known to be proven. No other object storage type system
available is proven deployed at this scale.

- We at Cloudscaling have been working with Swift from it’s initial launch in July of last year.
- Now, with KT and other commercial installations, momentum is building behind this project.

- who should? what does it look like? What should you know going in?

Storage is Not an Island


- Must have a reason to offer storage
- Storage is an anchor service
- Grounded with other compelling services where storage is a component.

- Data is sticky. Application migration is easy. Data migration is tricky. Moving data around is
difficult, often requires downtime, or is tricker to orchestrate.
-- Bring customer data into your ecosystem/platform.
- AWS S3 offered free TX-in for a very long time. Offers low cost physical media moving, so
that it can get as much of customers data in their ecosystem as possible.
-- S3 grew like crazy with EC2 right next door with 150% y/o/y growth. This is staggering.

- When building a storage product, there must be /compelling/ reasons for customers to put
data into it.
-- That can be:
--- convenience
--- access to compute resources
--- features associated with the uploaded data (transcoding, data processing)
--- even legal or compliance reasons.

Have an Advantage

leading South Korean landline,
mobile, internet, IPTV.


What's your unfair advantage?
- KT is the leading provider in Korea for internet, mobile, and IPTV.
- They have a huge network advantage for providing services to end-users.
- Not only that, South Korea intends to connect every home in the country with gigabit
speeds. http://www.nytimes.com/2011/02/22/technology/22iht-broadband22.html
- KT is in a unique position from a network prospective to offer the platform of services to
serve this market. The media assets, consumer media assets, need a place to reside that is
well connected to the Korean consumer of these services. There is a distinct edge that
regional service providers have an edge in providing services to their local market.
- Other, out of country providers won’t have the same cost-advantages or quality of services
for that market.

- Other unique assets from some of our other customers include
-- Colocation facilities and an existing customer base of managed hosting customers
-- Extensive CDN services. Object Storage serves as a jumping-off point for CDN services.

Be Compatible


- The contrarian point here is that for all the advantages you are going to present to your
users, the service needs to remain compatible with the tooling ecosystem.

- At one client meeting we were going down the path of 'differentiation' -- What makes this
product unique? The answer of course, was -- nothing! That's the point. In fact, we've been
working hard to make sure that you are compatible with the ecosystem of tools that are
available for end-users of the service.
-- We've worked with and contributed back to the open-source libraries
-- We've worked with OpenStack vendors like Nasuni and Gladinet to make our 'outside of
Rackspace' implementations work.
- What is distinct is the bundle services that you provide your customers, the customer base
that you already have, the network access that you enjoy.

- one of the huge assets that OpenStack brings is the ecosystem of tools that come to the
party
-- Commercial vendors such as Nasuni and Gladinet
-- OpenSource tools such as Cyberduck, fog, and Rackspace's own Cloudfiles language
bindings/libraries for C#, java, ruby, php, python.
- You don't need to build these per-se... but you do need to ensure compatibility with your
service.
-- Lots of little issues that needed to be addressed (adding alternate Cloudfiles urls, fixing
port issues with cyberduck, ssl cert issues with Gladinet, different format of keys, user
names, passwords) So you will need to make sure that these tools are compatible with your
deployment.

- The differentiation is still important! Differentiation should be in providing services on top
of the infrastructure and building platform services or other infrastructure services based on
storage.

Online Service Providers / Private
Huge Flat Namespace

'Repatriation' from public clouds


I know that this is the service providers track. But it's worthwhile to address folks who are
building online services or who have a need to provide private solutions.

Huge Flat Namespace
- Accounts -> Containers -> Objects
- Proliferation of storage systems requires knowledge of what data is located where. The
extreme scaling options of Swift can solve some of these issues.
- Each storage cluster can grow to be several petabytes, and for regional or additional scaling
the authentication service can route users to different clusters if need be.

'Repatriation' from public clouds
- For those who are thinking about bringing their data back in house, using an
architecturally-compatible system to the popular cloud storage products out there like S3
and CloudFiles can make a lot of sense.
- For the major reason that an application doesn't need to be re architected.
- Using something that still delivers the durability and reliability and not just API compatible.

Building


Building the System

Ecosystem

Billing Portal
Authentication
Installer Front-End

Network Ops

Hardware

Data Center


You must build it
- Development effort. So you must consider the R&D expense.
- Ramp-up a development team to understand the core of swift
- Development of integration components

OpenStack Object Storage provides a core of services and functionality.
- You can't just sudo apt-get install openstack
- OpenStack Object Storage is a solid foundation. But must be supported by a host of
services.

Let’s go into a few.

Billing


Billing
- There is utilization tracking as part of Swift in the Cactus release. It's much better, but it's
still 'tricky'.
- Many steps involved here. I'll address two things that I think are unique to the object
storage system.
- Charge per GB Stored
- Charge for TX ingress/egress
- Charge for # of API requests

Pricing

Consumption Pricing Capacity Pricing

vs


Further, there is a decision to be made on consumption-pricing vs. capacity pricing.

When you typically go to buy bandwidth, you are charged 95-percentile. You pay for
bandwidth that goes unused because you’re paying for the capacity to be available with some
wiggle-room for extra-ordinary bursts.

So service providers are having to ﬁgure out how to deal with this.
It’s a bigger deal at a smaller scale. A single customer could but-in and consume a large
amount of a cluster on a percentage basis.

Authentication & User Management


Authentication & User Management
- Two real options.
- 1) Use the existing authentication service that is built into Swift. Swift comes with an
authentication service that stores account information within the cluster itself.
-- Benefit is that it the cluster is more self-contained and not dependent on any external
services that could result in availability issues for your customers.
-- However, that means integration. If you're supporting a large customer base that has
access to other services and you want a way to centralize that so that the customers
accounts/credentials/authentication credentials are manageable. More integration effort is
required.
- 2) Build your own authentication service. There is an API defined. Build to that spec, make
sure it's scale-out / HA properties are something that you're comfortable with.
-- Benefit is that an authentication system remains centralized and can service a range of
services for the customer. If this is part of a larger IT initiative or part of a broader cloud
computing offering, it’s desirable to provide end-users with a consistent way to manage and
use credentials.
-- Downside is that it's another component to build and maintain.

Load Balancing


Load Balancing
- One of the great properties about the architecture of Swﬁt is it's ability to horizontally
scale-out to handle increasing API access (GET/PUT/DELETE)
- An incoming request does not need to be processed by a centralized storage controller.
- Load balancing can handled by many mechanisms that have been reﬁned over the past 15
years
- The complexity of this setup will vary with the needs of the deployment. It can be as simple
as using round-robin DNS or using Pound to using commercial load balancing solutions like a
Netscaler. For whatever load balancer is used, a health check needs to written for the load
balancer to monitor.

Storage Nodes

24-48 GB RAM
36-48, 2TB Drives
SATA
No RAID
Newish Xeon

The Hardware

Storage Nodes
- 36-48 disk JBODs
- 24-48 GB RAM
- Go for good price/performance CPUs - Xeon E5620s / E5640s.
-- Not just data, also replicators, auditors
- While commodity, these are not JBODs (Just a Bunch of Disks). There is a reasonable
amount of memory and CPU. Metadata needs to be readily available to quickly return objects.
The object stores each run services not only to field incoming requests from the Access Tier,
but to also run replicators, auditors, reapers, etc.
- Our configurations currently run 2TB disks, SATA disks without RAID. We use desktop-
grade drives where we have more-responsive remote hands in the datacenter and enterprise-
grade drives elsewhere.
- SATA desktop drives (not green drives).
-- We placed an order with another drive vendor (who will go nameless). Based on the order
size of ~$300k worth of drives, one hard drive vendor refused to fill the order because to
them it was obvious that we were not using them for desktop application.

Proxy Nodes
Proxy Servers
Authentication Servers

24 GB RAM
10 GbE
Newish Xeon


Proxy Nodes
- Go for “sweet spot” in price/performance (Xeon E5620s / E5640s). As it's better to have
many of them and scale out, than have fewer monster machines.
- Dual 10GbE
- 12-44GB RAM
- Cloudscaling’s deployments segment off an “Access Tier”. This tier is the “Grand Central” of
the Object Storage system. It fields incoming API requests from clients and moves data in and
out of the system. This tier is composed of front-end load balancers, ssl-terminators,
authentication services, and it runs the Proxy server processes.
- These access servers are in their own tier. This enables read/write access to be scaled-out
independently of storage capacity. For example, if the cluster is on the public internet with
demanding needs on ssl-termination and data access, many access servers can be
provisioned. However, if the cluster is on a private network and it is being used primarily for
archival purposes, fewer access servers are needed.
- We deploy a collection of 1U servers to service this tier. These systems are use a moderate
amount of RAM and are CPU intensive. As these systems field each incoming API request, we
recommend two high-throughput (10GbE) interfaces. One interface for “front-end” incoming
requests, the other for “back end” access to the object stores to put and fetch data.
Factors to consider:
- For most publicly-facing deployments, or private deployments available across a wide-
reaching corporate network, SSL will be used to encrypting traffic to the client. SSL adds
significant processing load to establish sessions between clients and more capacity in the
access layer will need to be provisioned. SSL may not be required, for private deployments on
a trusted networks.
- Application intensive vs archive oriented. Simply put, the volume of requests will have an
impact on the provisioning of the access tier.

Networking
Aggregation Aggregation

Proxy Proxy
Proxy Proxy

Switch Switch

Object Object
Object Object

Networking
- An pair of aggregation switches with two links back to the access network / border
network. The aggregation switches connect to two pools of the Access Tier and to each of the
ﬁve Zone switches that connect the Object Stores. All connections to the Access Tier and the
Zones are 10GbE.
- Zone Network
-- Each Zone has a switch to connect itself to the aggregation network. We run a single, non-
redundant switch as the system is designed to sustain a Zone failure. Depending on overall
concurrency desired, Cloudscaling will deploy either an 1GbE or a 10GbE network to the
object stores.
- Remember that when you have a write coming into the proxy server, you have 3x going to
the object stores to write the three replicas. Be sure to account for that when ﬁgure out the
theoretical limits for read/write traffic. Typically, the expected bandwidth coming in is the
celling.

Raw System Costs


Raw System Costs:
- TCO caveat: There are many components that are part of the TCO of the entire cluster.
-- Facilities, power, cooling, network, NOC staff
-- Many of those factors are site-speciﬁc

Raw System Costs

2 Agg Switches
6 Proxy/Auth Servers
~$750,000
1 Petabyte 5 ToR Switches
50 Object Stores $0.75/GB
...and cables, racks, etc

2 ToR Switch
2 Proxy/Auth Servers ~$95,000
120 Terabyte 5 Object Stores $0.79/GB
...and cables, rack, etc


- Illustrate hardware pricing as a baseline
- All-in hardware costs (switching, load balancing, storage nodes, optics, cabling, forged
metal for the racks, PDUs)
-- (To note: Amazon's retail pricing for S3 is $0.140 - $0.055)
-- That price is going to go down as hardware prices go down.

Understanding TCO

- Total-cost of ownership for the cluster should include development costs, hardware and
ongoing costs.
These include:
-- Design/Development
-- Hardware
-- Hardware Standup
-- Datacenter Space
-- Power/Cooling
-- Networking
-- Ongoing Software Maintenance and Upgrades
-- Operational Support
-- Customer Support

•Design/Development/Integration
•Hardware
•Hardware Standup
•Datacenter Space
•Power/Cooling
•Network Access
•Ongoing Software Maintenance
•Operational Support
•Customer Support

Understanding TCO

- Total-cost of ownership for the cluster should include development costs, hardware and
ongoing costs.
These include:
-- Design/Development
-- Hardware
-- Hardware Standup
-- Datacenter Space
-- Power/Cooling
-- Networking
-- Ongoing Software Maintenance and Upgrades
-- Operational Support
-- Customer Support

Planning Checklist
•Product Service Requirements
•Hardware Selection
•Network Design
•Facilities Planning
•Hardware Standup
•Software Provisioning
•System Configuration
•Load Balancing
•Authentication Integration
•Utilization & Billing Integration
•Additional Platform Services
•Monitoring Integration
•Operational Tooling
•Operator Training and Documentation
•Customer Training and Documentation

- There are many pieces that need to come together for a successful project. Many groups
that must come together to design, build, deploy, integrate, operate and onboard customers.
Consider the these of activities during your planning phase:

Assemble a cross-functional team as there are many hats that are needed for a successful
standup.
Data center technicians to help plan the power/cooling needed at the DC,
networking experts to help design and plan out the network,
a great software development team to write the integrations needed and fix issues related to
the software systems of the cluster,
Swift is built around common unix tools and folks who are good systems administrator skills
can really help tune a running system.
Product/Sales team who can communicate the value to customers. Who can bring the product
to market.

-- Customer Discovery / Determining Service Requirements
-- Hardware Selection
-- Network Design
-- Facilities Planning
-- Hardware Standup
-- Software Provisioning
-- System Configuration
-- Load Balancing
-- Authentication Integration
-- Utilization & Billing Integration
-- Additional “Value Add” Services
-- Monitoring Development and Integration
-- Operational Tooling
-- Operator Training and Documentation

Storage as a Service


Yes, you can offer storage as a service.
- don't be 'just storage' offer as a suite of services
- use OpenStack Object Storage with a commodity hardware stack to develop a cost-
competitive product offering
- Put together a cross-functional team. Many roles are needed.
- Get help. Feel free to reach out to us, we've deployed over a 6 of petabytes in several
environments and can help design a solution for your needs.

KT ucloud storage service
with openstack object
storage

OpenStack conference

Jinkyung Hwang
KT Cloud Business Unit/PEG

jkhwang@kt.com

ACTION의 실천이 기업문화 혁신을 완성합니다.


What we did
□ Swift start up at Sept. 2010 and initial build-up with
.1
Chef deploy at Dec. 2010 Aus ti n 1

 SAIO --> Swift on multi-servers --> Swift on VM --> Swift with Chef

□ Deployment on KT data center r 1.2 h
Bexa Waut
w i th S
1 peta bytes

□ Customer service & Interworking
 portal, cdn interworking, api server, and other cloud services in KT

□ Beta test service from March 2011~
re
lewa r CDN
 hundred of customers Midd n fo
o
additi API
 with performance testing and system tunings
& Open


What we did – automatic deployment
□ Swift deployment with Chef
Swift Ready
Hardware

success
Automatic deployment

IP role install

kickstart Kickstart role
url install
OS url (OS) CHEF
OS Image server

MAC OS url,
kickstart url mirror kickstart
Roles per IP
repository file
MAC TFTP
OS url, Kickstart url Auto
IP server per MAC deploy
server
booting DHCP
IP alloc for MAC
server

IPMI
clean
hardware

What we did - services
* cs: compute service
□ user portal : cs.ucloud.com/ss * ss: storage service

< products >


What we did – Cyberduck, Gladinet, Cloudfuse IW


What we did - architecture
□ KT Swift is based on , designed with

□ Currently, interworking with Cloud services of KT and 3rd party
services with API are underway

Portal Swift Cluster
proxies Storage servers
CDN Swift API

S
DR,A

3rd party
tools
repository

Compute
cloud
Monitoring Management
Backend auth &
RDB systems Console
billing systems


What we did – performance test
□ Internal performance tests are underway with massive loads
□ ‘Advanced’ Swift bench code is used & submitted to launchpad
 http://bazaar.launchpad.net/~jkyoung0/+junk/bench_server/files

 auth create, delete, authenticate (get url & token), container create, delete, file upload, download
and delete

□ Still Tuning Cluster before Launch


Issues to solve
□ Tunings for best/optimal performance
 seems like disk IO bottlenecks rather than network bandwidth
 tunings with system parameters as well as Swift config values are necessary

□ Lookup ID middleware for CDN, API server interworking
 kt add a ‘cdn-uri lookup’ and ‘portal-id lookup’ middleware to retrieve Swift URI
with CDN URI, user ID
 general lookup middleware is necessary for service interworking

□ Statistics (1.2.0)
 seems incorrect values and bugs existed

□ Management & operations tools are necessary
 system monitoring and Swift mgmt such as ring re-balancer etc

□ Revision control visibility for commercial services
 As a service provider, update is almost-no-down-time is important.
 Need more visibility on the upgrade path.

 e.g. ubuntu latest v 10.10 vs. ubuntu LTS (long term support) v10.4

THANK YOU!
감사합니다

April 26, 2010

Joe Arnold, Cloudscaling
Dr. Jinkyung Hwang, KT
Dr. Jaesuk Ahn, KT


Commercialization of OpenStack Object Storage

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Commercialization of OpenStack Object Storage

Similar to Commercialization of OpenStack Object Storage (20)

Recently uploaded

Recently uploaded (20)

Commercialization of OpenStack Object Storage