Commercialization of OpenStack: Object Storage April 26, 2010 Joe Arnold, Cloudscaling Dr. Jinkyung Hwang, KT Dr. Jaesuk Ahn, KTWednesday, April 27, 2011
Building cloud infrastructure for telcos and service providersWednesday, April 27, 2011- Thanks to the core Swift team. Theyve been invaluable in sharing their knowledge about the system.- Weve brought to market several OpenStack Object Storage systems for our customers. We’re leadingthe charge on large-scale deployments of OpenStack Object Storage.- Our focus is on building infrastructure cloud services for telcos and service providers. To do this wevefocused on integrating the hardware, software and operational components so that our customers cango to market with a fully-integrated stack.
•Cloud Visionaries •Infrastructure Cloud Services •End-user Cloud Products •Very involved in Korean OpenStack CommunityWednesday, April 27, 2011- KT has been visionaries in the cloud computing space.- Cloudscaling has been working with KT for about a year. In that time, Cloudscaling hashelped KT launch infrastructure compute clouds including an object storage system based onSwift.- Released end-user cloud products- Kicked-off Korean OpenStack Community
Billions of Objects in S3 300 225 150 75 Q4 2006 Q4 2007 Q4 2008 Q4 2009 0 Q4 2010Wednesday, April 27, 2011- Storage is growing.- Applications are sprouting up for Tablets/Games/mobile devices. That application data isliving in the cloud- Media consumption over the internet is increasing. Volume of that data is increasing.- Need for asset storage is large.- Users are participating and consuming more than they ever have. Social media, onlinevideo, user-generated content are all contributing to the vast need for easily-consumablestorage systems.Today’s storage systems need to supply endless storage.Rackspace runs billions of objects and petabytes of ﬁles.Clearly there is demand for these types of services.
Brief Refresher on OpenStack Object StorageWednesday, April 27, 2011
Object Storage API Data StorageWednesday, April 27, 2011- objects via HTTP- Not traditional ﬁlesystem- not blocks- GET/PUT/Delete over REST API- Object storage is not a traditional ﬁlesystem, or a raw block device.- It’s just containers (folders) and objects (ﬁles) that’s available via an HTTP API.- It can’t be mounted like a folder in your OS directly.- There isn’t random-access to ﬁles and there can be multiple concurrent writers, so it’sunsuitable for transactional applications like traditional relational databases. Also, it doesn’tprovide raw data blocks that an operating system can form into a ﬁlesystem, so it’s unsuitablefor booting an OS.- Applications need to be designed with object storage in mind. As object storage is partitiontolerant, it’s not possible to create ﬁle-system locks. The newest ﬁle wins. Applications needto be designed designed with this in mind.
Upload PUT Data StorageWednesday, April 27, 2011- A simpliﬁed view of of upload.- A client makes a REST API request to PUT an object into an existing Container. The requestis received by the cluster.- The data then is sent to three locations in the cluster. At least two of the three writes mustbe successful before the client is notiﬁed that the upload was successful.
Download GET Data StorageWednesday, April 27, 2011- A request comes in for an Account/Container/Object. One of the location is determined. Alookup in the Ring reveals which storage nodes contain that partition. A request is made toone of the storage nodes to fetch the object and if that fails, requests are made to the othernodes.
Horizontal Growth & ConcurrencyWednesday, April 27, 2011- OpenStack Object Storage is designed to have linear growth characteristics. As the systemgets larger and requests increase, the performance doesn’t degrade. To scale up, the systemis designed to grow where needed — adding storage nodes to increase storage capacity,adding compute capacity as requests increase and growing network capacity where there arechoke points.- Space available isn’t a useful statistic alone. A key benchmark is the storage system’sconcurrency. Swift it able to be conﬁgured to handle a great number of simultaneousconnections.- It’s great to have the ability to scale the storage system as your customers’ applicationsgrow.
Fantastic Durability/Availability properties Durability - Data Persists Availability - Access to the data Auditors Shared-nothing access tier Replicators Data served by any Zone Independent ZonesWednesday, April 27, 2011- Durability:- As we all know, the 2nd worst thing you can do in this business is loose someone’s data.The ﬁrst, or course, being to corrupt customer’s data. Durability refers to the systems abilityto not lose or corrupt data.- These systems are extremely durable. To achieve extreme durability numbers,-- objects are distributed in triplicate across the cluster.-- Auditors run to ensure the integrity of data to check for bitrot.-- Replicators run to ensure that enough copies are in the cluster. In the event that a devicefails, data is replicated throughout the cluster to ensure there remains three copies.- Availability: Ability to for the data to be accessed.- The servers that handle incoming api requests scale up just like any “front-end” tier for aweb application. The system is architected to use a shared-nothing approach and can use thesame proven techniques that have been used to provide high-availability by many webapplications.- Early in a client deployment we went into pre-production (closed BETA) without monitoringand a server had failed without noticing it. There was no service interruption and Swiftdutifully replicated data across to other nodes to keep 3 copies of data in place. We ﬁnallynoticed when peak throughput numbers weren’t quite as high as they were previously. Thisreally points out the robustness of the Swift architecture.
Zones: Failure Boundaries 1 2 3Wednesday, April 27, 2011- Another feature is the ability to deﬁne failure zones. Failure zones allow a cluster to bedeployed across physical boundaries which could individually fail. For example, a clustercould be deployed across several, nearby data centers and be able to survive multipledatacenter failures.- 3 copies of each bit of data is distributed across zones- We go for rack-per-zone. That means we plan for rack outages of storage servers.- At Swift’s smallest, a zone could be a single drive or a grouping of a few drives. This scaleof deployment is quite useful for creating development / staging environments.
Five Zones 1 2 3 4 5Wednesday, April 27, 2011How this translates into a deployment-- Everything in Swift is stored, by default, three times. There are three copies of just abouteverything the system needs to store data.- In order for three copies of the data be stored, at ﬁrst blush, it seems like it would makesense for there to be three zones. However, Swift is designed to be a durable, highly-availablesystem. It needs its three copies of everything – at all times.- If a Zone goes down from a three-Zone system, there will only be two zones left!- Five Zones is recommended as a starting point because if a Zone goes down, there will beother zones for data to be replicated to. Having at least ﬁve zones leaves enough wiggle roomto accommodate the occasional Zone failure and enough capacity to replicate data across thesystem.
Object Storage for Service ProvidersWednesday, April 27, 2011
Wednesday, April 27, 2011- I’d like to recommend OpenStack Object Storage (Swift), what else?- The software that has been battle tested by the huge deployment at Rackspace. Billions ofobjects & Petabytes of storage.- Something is never ‘proven’ until it’s running at scale. So, by that measure, RackspaceCloud Files (and Swift), which is known to be proven. No other object storage type systemavailable is proven deployed at this scale.- We at Cloudscaling have been working with Swift from it’s initial launch in July of last year.- Now, with KT and other commercial installations, momentum is building behind this project.- who should? what does it look like? What should you know going in?
Storage is Not an IslandWednesday, April 27, 2011- Must have a reason to offer storage- Storage is an anchor service- Grounded with other compelling services where storage is a component.- Data is sticky. Application migration is easy. Data migration is tricky. Moving data around isdifficult, often requires downtime, or is tricker to orchestrate.-- Bring customer data into your ecosystem/platform.- AWS S3 offered free TX-in for a very long time. Offers low cost physical media moving, sothat it can get as much of customers data in their ecosystem as possible. -- S3 grew like crazy with EC2 right next door with 150% y/o/y growth. This is staggering.- When building a storage product, there must be /compelling/ reasons for customers to putdata into it.-- That can be:--- convenience--- access to compute resources--- features associated with the uploaded data (transcoding, data processing)--- even legal or compliance reasons.
Have an Advantage leading South Korean landline, mobile, internet, IPTV.Wednesday, April 27, 2011Whats your unfair advantage?- KT is the leading provider in Korea for internet, mobile, and IPTV.- They have a huge network advantage for providing services to end-users.- Not only that, South Korea intends to connect every home in the country with gigabitspeeds. http://www.nytimes.com/2011/02/22/technology/22iht-broadband22.html- KT is in a unique position from a network prospective to offer the platform of services toserve this market. The media assets, consumer media assets, need a place to reside that iswell connected to the Korean consumer of these services. There is a distinct edge thatregional service providers have an edge in providing services to their local market. - Other, out of country providers won’t have the same cost-advantages or quality of servicesfor that market.- Other unique assets from some of our other customers include-- Colocation facilities and an existing customer base of managed hosting customers-- Extensive CDN services. Object Storage serves as a jumping-off point for CDN services.
Be CompatibleWednesday, April 27, 2011- The contrarian point here is that for all the advantages you are going to present to yourusers, the service needs to remain compatible with the tooling ecosystem.- At one client meeting we were going down the path of differentiation -- What makes thisproduct unique? The answer of course, was -- nothing! Thats the point. In fact, weve beenworking hard to make sure that you are compatible with the ecosystem of tools that areavailable for end-users of the service. -- Weve worked with and contributed back to the open-source libraries-- Weve worked with OpenStack vendors like Nasuni and Gladinet to make our outside ofRackspace implementations work.- What is distinct is the bundle services that you provide your customers, the customer basethat you already have, the network access that you enjoy. - one of the huge assets that OpenStack brings is the ecosystem of tools that come to theparty-- Commercial vendors such as Nasuni and Gladinet-- OpenSource tools such as Cyberduck, fog, and Rackspaces own Cloudﬁles languagebindings/libraries for C#, java, ruby, php, python.- You dont need to build these per-se... but you do need to ensure compatibility with yourservice. -- Lots of little issues that needed to be addressed (adding alternate Cloudﬁles urls, ﬁxingport issues with cyberduck, ssl cert issues with Gladinet, different format of keys, usernames, passwords) So you will need to make sure that these tools are compatible with yourdeployment. - The differentiation is still important! Differentiation should be in providing services on topof the infrastructure and building platform services or other infrastructure services based onstorage.
Online Service Providers / Private Huge Flat Namespace Repatriation from public cloudsWednesday, April 27, 2011I know that this is the service providers track. But its worthwhile to address folks who arebuilding online services or who have a need to provide private solutions.Huge Flat Namespace- Accounts -> Containers -> Objects- Proliferation of storage systems requires knowledge of what data is located where. Theextreme scaling options of Swift can solve some of these issues.- Each storage cluster can grow to be several petabytes, and for regional or additional scalingthe authentication service can route users to different clusters if need be.Repatriation from public clouds- For those who are thinking about bringing their data back in house, using anarchitecturally-compatible system to the popular cloud storage products out there like S3and CloudFiles can make a lot of sense.- For the major reason that an application doesnt need to be re architected.- Using something that still delivers the durability and reliability and not just API compatible.
Building the System Ecosystem Billing Portal Authentication Installer Front-End Network Ops Hardware Data CenterWednesday, April 27, 2011You must build it- Development effort. So you must consider the R&D expense.- Ramp-up a development team to understand the core of swift- Development of integration componentsOpenStack Object Storage provides a core of services and functionality.- You cant just sudo apt-get install openstack- OpenStack Object Storage is a solid foundation. But must be supported by a host ofservices. Let’s go into a few.
BillingWednesday, April 27, 2011Billing- There is utilization tracking as part of Swift in the Cactus release. Its much better, but itsstill tricky. - Many steps involved here. Ill address two things that I think are unique to the objectstorage system. - Charge per GB Stored- Charge for TX ingress/egress- Charge for # of API requests
Pricing Consumption Pricing Capacity Pricing vsWednesday, April 27, 2011Further, there is a decision to be made on consumption-pricing vs. capacity pricing.When you typically go to buy bandwidth, you are charged 95-percentile. You pay forbandwidth that goes unused because you’re paying for the capacity to be available with somewiggle-room for extra-ordinary bursts.So service providers are having to ﬁgure out how to deal with this.It’s a bigger deal at a smaller scale. A single customer could but-in and consume a largeamount of a cluster on a percentage basis.
Authentication & User ManagementWednesday, April 27, 2011Authentication & User Management- Two real options. - 1) Use the existing authentication service that is built into Swift. Swift comes with anauthentication service that stores account information within the cluster itself. -- Beneﬁt is that it the cluster is more self-contained and not dependent on any externalservices that could result in availability issues for your customers.-- However, that means integration. If youre supporting a large customer base that hasaccess to other services and you want a way to centralize that so that the customersaccounts/credentials/authentication credentials are manageable. More integration effort isrequired.- 2) Build your own authentication service. There is an API deﬁned. Build to that spec, makesure its scale-out / HA properties are something that youre comfortable with.-- Beneﬁt is that an authentication system remains centralized and can service a range ofservices for the customer. If this is part of a larger IT initiative or part of a broader cloudcomputing offering, it’s desirable to provide end-users with a consistent way to manage anduse credentials. -- Downside is that its another component to build and maintain.
Load BalancingWednesday, April 27, 2011Load Balancing- One of the great properties about the architecture of Swﬁt is its ability to horizontallyscale-out to handle increasing API access (GET/PUT/DELETE)- An incoming request does not need to be processed by a centralized storage controller.- Load balancing can handled by many mechanisms that have been reﬁned over the past 15years- The complexity of this setup will vary with the needs of the deployment. It can be as simpleas using round-robin DNS or using Pound to using commercial load balancing solutions like aNetscaler. For whatever load balancer is used, a health check needs to written for the loadbalancer to monitor.
Storage Nodes 24-48 GB RAM 36-48, 2TB Drives SATA No RAID Newish XeonWednesday, April 27, 2011The HardwareStorage Nodes- 36-48 disk JBODs- 24-48 GB RAM- Go for good price/performance CPUs - Xeon E5620s / E5640s.-- Not just data, also replicators, auditors- While commodity, these are not JBODs (Just a Bunch of Disks). There is a reasonableamount of memory and CPU. Metadata needs to be readily available to quickly return objects.The object stores each run services not only to ﬁeld incoming requests from the Access Tier,but to also run replicators, auditors, reapers, etc. - Our conﬁgurations currently run 2TB disks, SATA disks without RAID. We use desktop-grade drives where we have more-responsive remote hands in the datacenter and enterprise-grade drives elsewhere.- SATA desktop drives (not green drives). -- We placed an order with another drive vendor (who will go nameless). Based on the ordersize of ~$300k worth of drives, one hard drive vendor refused to ﬁll the order because tothem it was obvious that we were not using them for desktop application.
Proxy Nodes Proxy Servers Authentication Servers 24 GB RAM 10 GbE Newish XeonWednesday, April 27, 2011Proxy Nodes- Go for “sweet spot” in price/performance (Xeon E5620s / E5640s). As its better to havemany of them and scale out, than have fewer monster machines. - Dual 10GbE- 12-44GB RAM- Cloudscaling’s deployments segment off an “Access Tier”. This tier is the “Grand Central” ofthe Object Storage system. It ﬁelds incoming API requests from clients and moves data in andout of the system. This tier is composed of front-end load balancers, ssl-terminators,authentication services, and it runs the Proxy server processes.- These access servers are in their own tier. This enables read/write access to be scaled-outindependently of storage capacity. For example, if the cluster is on the public internet withdemanding needs on ssl-termination and data access, many access servers can beprovisioned. However, if the cluster is on a private network and it is being used primarily forarchival purposes, fewer access servers are needed.- We deploy a collection of 1U servers to service this tier. These systems are use a moderateamount of RAM and are CPU intensive. As these systems ﬁeld each incoming API request, werecommend two high-throughput (10GbE) interfaces. One interface for “front-end” incomingrequests, the other for “back end” access to the object stores to put and fetch data. Factors to consider:- For most publicly-facing deployments, or private deployments available across a wide-reaching corporate network, SSL will be used to encrypting traffic to the client. SSL addssigniﬁcant processing load to establish sessions between clients and more capacity in theaccess layer will need to be provisioned. SSL may not be required, for private deployments ona trusted networks.- Application intensive vs archive oriented. Simply put, the volume of requests will have animpact on the provisioning of the access tier.
Networking Aggregation Aggregation Proxy Proxy Proxy Proxy Switch Switch Object Object Object ObjectWednesday, April 27, 2011Networking- An pair of aggregation switches with two links back to the access network / bordernetwork. The aggregation switches connect to two pools of the Access Tier and to each of theﬁve Zone switches that connect the Object Stores. All connections to the Access Tier and theZones are 10GbE.- Zone Network-- Each Zone has a switch to connect itself to the aggregation network. We run a single, non-redundant switch as the system is designed to sustain a Zone failure. Depending on overallconcurrency desired, Cloudscaling will deploy either an 1GbE or a 10GbE network to theobject stores.- Remember that when you have a write coming into the proxy server, you have 3x going tothe object stores to write the three replicas. Be sure to account for that when ﬁgure out thetheoretical limits for read/write traffic. Typically, the expected bandwidth coming in is thecelling.
Raw System CostsWednesday, April 27, 2011Raw System Costs:- TCO caveat: There are many components that are part of the TCO of the entire cluster.-- Facilities, power, cooling, network, NOC staff-- Many of those factors are site-speciﬁc
Raw System Costs 2 Agg Switches 6 Proxy/Auth Servers ~$750,000 1 Petabyte 5 ToR Switches 50 Object Stores $0.75/GB ...and cables, racks, etc 2 ToR Switch 2 Proxy/Auth Servers ~$95,000 120 Terabyte 5 Object Stores $0.79/GB ...and cables, rack, etcWednesday, April 27, 2011- Illustrate hardware pricing as a baseline - All-in hardware costs (switching, load balancing, storage nodes, optics, cabling, forgedmetal for the racks, PDUs)-- (To note: Amazons retail pricing for S3 is $0.140 - $0.055)-- That price is going to go down as hardware prices go down.
Understanding TCOWednesday, April 27, 2011- Total-cost of ownership for the cluster should include development costs, hardware andongoing costs.These include:-- Design/Development-- Hardware-- Hardware Standup-- Datacenter Space-- Power/Cooling-- Networking-- Ongoing Software Maintenance and Upgrades-- Operational Support-- Customer Support
•Design/Development/Integration •Hardware •Hardware Standup •Datacenter Space •Power/Cooling •Network Access •Ongoing Software Maintenance •Operational Support •Customer Support Understanding TCOWednesday, April 27, 2011- Total-cost of ownership for the cluster should include development costs, hardware andongoing costs.These include:-- Design/Development-- Hardware-- Hardware Standup-- Datacenter Space-- Power/Cooling-- Networking-- Ongoing Software Maintenance and Upgrades-- Operational Support-- Customer Support
Planning Checklist •Product Service Requirements •Hardware Selection •Network Design •Facilities Planning •Hardware Standup •Software Provisioning •System Conﬁguration •Load Balancing •Authentication Integration •Utilization & Billing Integration •Additional Platform Services •Monitoring Integration •Operational Tooling •Operator Training and Documentation •Customer Training and DocumentationWednesday, April 27, 2011- There are many pieces that need to come together for a successful project. Many groupsthat must come together to design, build, deploy, integrate, operate and onboard customers.Consider the these of activities during your planning phase:Assemble a cross-functional team as there are many hats that are needed for a successfulstandup.Data center technicians to help plan the power/cooling needed at the DC,networking experts to help design and plan out the network,a great software development team to write the integrations needed and ﬁx issues related tothe software systems of the cluster,Swift is built around common unix tools and folks who are good systems administrator skillscan really help tune a running system.Product/Sales team who can communicate the value to customers. Who can bring the productto market.-- Customer Discovery / Determining Service Requirements-- Hardware Selection-- Network Design-- Facilities Planning-- Hardware Standup-- Software Provisioning-- System Conﬁguration-- Load Balancing-- Authentication Integration-- Utilization & Billing Integration-- Additional “Value Add” Services-- Monitoring Development and Integration-- Operational Tooling-- Operator Training and Documentation
Storage as a ServiceWednesday, April 27, 2011Yes, you can offer storage as a service. - dont be just storage offer as a suite of services- use OpenStack Object Storage with a commodity hardware stack to develop a cost-competitive product offering- Put together a cross-functional team. Many roles are needed.- Get help. Feel free to reach out to us, weve deployed over a 6 of petabytes in severalenvironments and can help design a solution for your needs.
KT ucloud storage service with openstack object storage OpenStack conference Jinkyung Hwang KT Cloud Business Unit/PEG firstname.lastname@example.org ACTION의 실천이 기업문화 혁신을 완성합니다.Wednesday, April 27, 2011
What we did □ Swift start up at Sept. 2010 and initial build-up with .1 Chef deploy at Dec. 2010 Aus ti n 1 SAIO --> Swift on multi-servers --> Swift on VM --> Swift with Chef □ Deployment on KT data center r 1.2 h Bexa Waut w i th S 1 peta bytes □ Customer service & Interworking portal, cdn interworking, api server, and other cloud services in KT □ Beta test service from March 2011~ re lewa r CDN hundred of customers Midd n fo o additi API with performance testing and system tunings & OpenWednesday, April 27, 2011
What we did – automatic deployment □ Swift deployment with Chef Swift Ready Hardware success Automatic deployment IP role install kickstart Kickstart role url install OS url (OS) CHEF OS Image server MAC OS url, kickstart url mirror kickstart Roles per IP repository file MAC TFTP OS url, Kickstart url Auto IP server per MAC deploy server booting DHCP IP alloc for MAC server IPMI clean hardwareWednesday, April 27, 2011
What we did - services * cs: compute service □ user portal : cs.ucloud.com/ss * ss: storage service < products >Wednesday, April 27, 2011
What we did – Cyberduck, Gladinet, Cloudfuse IWWednesday, April 27, 2011
What we did - architecture □ KT Swift is based on , designed with □ Currently, interworking with Cloud services of KT and 3rd party services with API are underway Portal Swift Cluster proxies Storage servers CDN Swift API S DR,A 3rd party tools repository Compute cloud Monitoring Management Backend auth & RDB systems Console billing systemsWednesday, April 27, 2011
What we did – performance test □ Internal performance tests are underway with massive loads □ ‘Advanced’ Swift bench code is used & submitted to launchpad http://bazaar.launchpad.net/~jkyoung0/+junk/bench_server/files auth create, delete, authenticate (get url & token), container create, delete, file upload, download and delete □ Still Tuning Cluster before LaunchWednesday, April 27, 2011
Issues to solve □ Tunings for best/optimal performance seems like disk IO bottlenecks rather than network bandwidth tunings with system parameters as well as Swift config values are necessary □ Lookup ID middleware for CDN, API server interworking kt add a ‘cdn-uri lookup’ and ‘portal-id lookup’ middleware to retrieve Swift URI with CDN URI, user ID general lookup middleware is necessary for service interworking □ Statistics (1.2.0) seems incorrect values and bugs existed □ Management & operations tools are necessary system monitoring and Swift mgmt such as ring re-balancer etc □ Revision control visibility for commercial services As a service provider, update is almost-no-down-time is important. Need more visibility on the upgrade path. e.g. ubuntu latest v 10.10 vs. ubuntu LTS (long term support) v10.4Wednesday, April 27, 2011
THANK YOU! 감사합니다 April 26, 2010 Joe Arnold, Cloudscaling Dr. Jinkyung Hwang, KT Dr. Jaesuk Ahn, KTWednesday, April 27, 2011