CMPE 297 Lecture: Building Infrastructure Clouds with OpenStack
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

CMPE 297 Lecture: Building Infrastructure Clouds with OpenStack

on

  • 3,636 views

Lecture for the San Jose State masters program on cloud computing. Topic focuses on using OpenStack to deploy infrastructure clouds with commodity hardware and open source software. Covers ...

Lecture for the San Jose State masters program on cloud computing. Topic focuses on using OpenStack to deploy infrastructure clouds with commodity hardware and open source software. Covers virtualization, networking, storage, deployment and operations.

Statistics

Views

Total Views
3,636
Views on SlideShare
3,635
Embed Views
1

Actions

Likes
4
Downloads
179
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

CMPE 297 Lecture: Building Infrastructure Clouds with OpenStack Document Transcript

  • 1. Building Infrastructure Cloud Computing with OpenStack Joe Arnold April 11, 2011 CMPE 297Monday, April 11, 2011
  • 2. Overview • Taxonomy Cloud Computing • Public Clouds, Private Clouds • OpenStack Compute • OpenStack Object StorageMonday, April 11, 2011
  • 3. Introduction • Aruba AirWave network management • Yahoo! - Bangalore, India • Engine Yard • CloudscalingMonday, April 11, 2011-- Engineering Manager at a startup that was acquired by Aruba Networks. Networkmanagement. 10,000s of devices under management-- Bangalore, India office for Yahoo!-- Head of engineering at Engine Yard. Ruby on Rails deployment platform--- Early cloud player. Built own infrastructure cloud.--- Early serious Amazon AWS user.---- Invested in us, but they were tight lipped about their roadmap.--- Managed the team that built the current Ruby on Rails- Current: Helped found Cloudscaling-- Founded a year ago-- Over the summer built Korea Telcoms infrastructure cloud-- Un-named customer: Huge object storage system. Largest of its kind outside Rackspace(well get into that later)-- Building infrastructure clouds. Focus on utility computing. Help our customers drive costsdown in Facility, Hardware, Software, Operations.- Exercise: Whats your experience in the cloud?-- Used the web? :)-- Who has IT experience? Ran a webserver, mailserver, configured a network of computers?-- Who has programming experience?-- Who has provisioned a machine in the cloud? Amazon, Rackspace, etc.?
  • 4. Building cloud infrastructure for telcos and service providersMonday, April 11, 2011What we do help telcos and service providers build the most competitive clouds in the world;We build large scale, highly competitive Public Clouds for Global Telcos and service providers
  • 5. Taxonomy of Cloud Computing SaaS - Software-as-a-Service PaaS - Platform-as-a-Service IaaS - Infrastructure-as-a-ServiceMonday, April 11, 2011There is a stack of services which all get referred to as Cloud. What is the taxonomy that willframe the discussion?- SaaS: Software-as-a-Service-- Examples: Salesforce, Gmail- PaaS: Platform-as-a-Service-- Examples: AWS Beanstalk, Engine Yard, Twilio- IaaS: Infrastructure-as-a-Service-- Examples: Amazon Web Services, GoGrid, Rackspace Cloud- Focus of discussion will be on IaaS- Grey lines between them
  • 6. Virtualization -> Infrastructure Cloud • Server consolidation drives virtualization • Rise of VPS & Enterprise DC Management • Amazon launches S3, EC2 • Tooling emerges around ecosystemMonday, April 11, 2011- Bill Gates “Blue screen of death” demo at Comdex- VMWare demoed a ‘Blue Screen’. Great for Test/Dev. Solid little market.- Message didn’t click until the server consolidation market was “discovered”- S3/EC2 Launches ~ 2006- EC2 out of Beta in late 2008
  • 7. Infrastructure as a Service (IaaS) • Virtual Machine Instances • Network Services • Data Storage • On Demand • Via an API • Pay-per-use • Multi-tenantMonday, April 11, 2011What is considered infrastructure cloud computing?- Not just virtualization of compute resources- Must provide services:-- Virtual Machine Instances-- Network Services-- Data Storage-- On Demand-- Via an API-- Pay-per-use-- Multi-tenant Platform
  • 8. Public Cloud vs. Enterprise IT • Major disruption in traditional IT • IT is aggressively being automated • Applications are not migrating to the cloudMonday, April 11, 2011Public Cloud vs. Enterprise IT (or Enterprise Cloud)Public infrastructure clouds blazed the trail. Internal IT are looking for same cost advantages.- Major disruption in traditional IT-- I used vs. rather than &, as infrastructure cloud is a major disruption in traditional IT-- hundreds : 1 admin vs thousands : 1 admin-- System administrator may run, say 100 systems in traditional IT / Infrastructure cloudprovider will run thousands of systems per administrator/developer.- IT is aggressively being automated, infrastructure cloud computing is the next evolution- The evolution is hitting an inflection point where some companies are opting-out oftraditional system administrators altogether- Example - Running computer-science lab + automation.- Example - Shift in Engine Yards operating model. 80% systems : 20% software => 30%systems : 70% software- Generalization: Applications are not migrating to the cloud because migration is moreexpensive than operational savings-- Greenfield applications being deployed to the cloud--- Dramatic improvement in configuration automation / deployment tools-- Legacy applications--- Staying put--- Being replaced by software-as-a-service (which are greenfield applications beingdeployed in the cloud)---- How many new Microsoft Exchange installs are happening these days?
  • 9. Cost basis of Infrastructure Clouds Exercise: What is the cost basis of infrastructure cloud hardware? Building a business case: • Need margin for: facilities (space, power, cooling), technical operations • You can sell a 10 GB RAM virtual machine for 50-cents an hour ($360/month) • You can buy 40 GB RAM machines • The practical life of the machines is 5 years How much can you spend on hardware/software to build the infrastructure to give a reasonable margin 70% to the rest of the business?Monday, April 11, 2011-- $360/mo * 4 /machine * 60 months of useful life * 30% target expense = $25,920-- Thats a target that needs to be hit for the machine and its tenancy costs in the system(rack, network ports).-- Software Licensing Costs: Hypervisor, infrastructure management, monitoring, operationalsupport systems- AWSs cost basis- To enter the game, cost basis must be on par with existing infrastructure clouds-- Market price + your value add (control, security, network, physical location, uniquefeatures, etc.)- Private / Enterprise IT is competing with infrastructure cloud computing
  • 10. Applications in the Cloud Not same services as traditional DC / Hosting • Networking different • Storage different • Design for Failure • Reliance on Configuration ManagementMonday, April 11, 2011What are the architectural shifts needed to run applications in the cloud?- Infrastructure clouds are not just the same compute resources packaged up via an APINetworking is radically differentStorage is different-- Reddit’s recent outage was due to them using EBS (Amazon’s block storage) incorrectly.Designed for Failure-- Not high-availability- Increased Importance of Monitoring- Increased Importance of Configuration Management
  • 11. Compute Virtualization Management • VM Management • Networking Management • Storage Management • User Management • Multi-tenancy • Scheduling • Available via an API • Machine Image ManagementMonday, April 11, 2011What are key features for for cloud building?
  • 12. OpenStack Introduction • OpenStack is open source Infrastructure Cloud Software • NASA: Compute Management • Rackspace: Object StorageMonday, April 11, 2011OpenStack is a fusion of two projects.One from Rackspace and another from NASA.Rackspace wanted to build the next version of their cloud in the open.
  • 13. Large Community 53 Companies: http://openstack.org/community/companies/Monday, April 11, 2011I do represent a company who is part of the openstack community. One of many companieswho are part of the OpenStack community.
  • 14. Make a Bigger PieMonday, April 11, 2011Symbiotic Ecosystem The whole product/experience gets better as more people / companiesget involved.These companies want to create a large ecosystem that they are a part of.Why would Rackspace want to do this? After we did our launch of OpenStack Object Storage,there was a team of folks at Rackspace who had to go cube-to-cube to explain why it’s goodfor Rackspace.It’s a common growth pattern -- when you’re behind, open up.Apple iPhone (at least initially), Facebook did this.Openstack has ecosystem potential.There can be a lot of players who are all building stuff around this project.So the next question then is how does openstack foster an ecosystem that you can be a partof?
  • 15. OpenStack is Open Source • Apache 2 - permissive license • Organizational Structure • Blueprints • Governance Board • Very Open CultureMonday, April 11, 2011You guys know the two broad categories of open-source licenses right?- permissive: Apache 2, BSD,- viral: GPL
  • 16. OpenStack is very healthy for our industry • Amazon’s presence is too dominant • While Amazon innovative itself... • ...they have created a proprietary ecosystem • An multi-company ecosystem will foster broader innovation and relevant competitionMonday, April 11, 2011Follow me for a minute and I’m going to use Amazon AWS as my big bad wolf, becausethey’re the reigning leader in this race right now.Amazon has created a defacto and proprietary ecosystem. They were first and are very, verygood.Amazon created this ecosystem with a positive feedback loop. The more PaaS are onAmazon, the better their ecosystem becomes.However, if Amazon decides to add a new feature to their platform, they could crush you.It’s like if Apple comes out with their own version of Angry Birds. (Angry iBirds?)This is a precedent for this -- Cloudera (hadoop) vs. AWS Map-Reduce.This Amazon positive feedback loop is great if you’re Amazon.But not everyone can use AWS.OpenStack is the project that offers the promise of a more open ecosystem.
  • 17. Ecosystem OpenStack Providers ToolsMonday, April 11, 2011For innovation in cloud computing to continue, we must have many cloud computinginfrastructure players to support a diverse ecosystem.Ecosystem of tools is emerging, but we’re very early in this cycle for OpenStack.There is opportunity for a positive feedback loop to happen.- The better the tools, the more compelling it will be for more implementations of Swift tocome online.- The more implementations of OpenStack, the more attractive it becomes to build greattooling around it.
  • 18. OpenStack part of the solution Ecosystem Billing Portal Authentication Installer Front-End Network Ops Hardware Data CenterMonday, April 11, 2011OpenStack alone doesn’t equal a whole service.- The OpenStack code provides the kernel of functionality- But there is much software to write, systems to integrate, hardware choices and designdecisions to make.The components are- Data Center Concerns- Networking- Hardware Choices- Access Layer- Installing
  • 19. OpenStack Compute • OpenStack Compute is the Virtualization component of OpenStackMonday, April 11, 2011End RantA bit of a corporate love triangle is going on here...- Nasa team... mostly all subcontractors of a company called anso labs- Started as a quick hack experiment.- NASA had a need to dynamically provision compute. Tried to use eucalyptus, but it wasntworking out for them at the time. Challenged themselves to work for a week to see what theycould pice together on their own. After that week, they were quite pleased with their progress,that they dumped eucalyptus to build one themselves.
  • 20. OpenStack Compute Early & Evolving • Initially designed for NASA compute needs • Evolving to handle needs of service providers / enterpriseMonday, April 11, 2011Many things are what I would call ‘early’- Account Management reflects the what NASA needed. (Much work for service provider tointegrate.)-- IPs map to projects- API has some elements missing- Scheduling is simplistic (round-robin)- Multi-tenancy security work to be done- In general there is a lot of churn in the code base
  • 21. OpenStack Compute API • AWS EC2 API • Emerging StandardMonday, April 11, 2011- Currently EC2 interfaces- Will standardize on Rackspace API?- unsure on how multiple, concurrent APIs will be supportedCompute API Standards:- Standards emerge.- You dont need to be declarative about it.- Post-facto. Many application developers use libraries anyway. (fog, jclouds, libcloud, etc.)- What’s important here is that you will be able to stand-up a local development environmentwith the same code that will be powering your infrastructure provider.- AWS EC2 API is an emerging standard
  • 22. OpenStack Compute: Commodity HardwareMonday, April 11, 2011- Image: Keith in the KT Data Center- There are other drop-in solutions for infrastructure clouds from Cisco/EMC/VMWare. This isnot one of them.- Its designed around off-the-shelf, lower-price point components- Were using Arista for our networking layer.- Nasa uses off-the-shelf AoE hardware for storage.- I imagine that well use Nexenta for iSCSI blocks (when thats available)- Compute nodes direct from a Taiwanese manufacturer -- not Dell/IBM/HP
  • 23. OpenStack Compute ArchitectureMonday, April 11, 2011
  • 24. OpenStack Compute Hypervisor Support • KVM • Xen • Citrix XenServer • ~ Hyper V • ~ VirtualboxMonday, April 11, 2011
  • 25. Networking • Networking is different • Challenges • Fewer options • Private networks are slow • Benefits • Reconfigurable with an API • Good at flat, large networksMonday, April 11, 2011- Challanges:-- Generally more latent than specially-designed networking configurations-- Fewer options in how the network is configured-- No hardware options (load blanancers, security devices, firewalls, spam filters, deep packetinspection tools, etc.)-- Simulated private networks (VLANS, VPNs) are generally slow- What theyre good at:-- Easily reconfigurable via an API with simple tools.-- Good at very large, flat, layer-3 networks-- This is because layer-3 IP networks scale really well.-- Routing technology is mature. ECMP/OSPF work well.- Implementers perspective-- When trying to scale to hundereds of switches, thousands of physical servers and tens-of-thousands of virtual servers providing a simipler network is much easier to grow and manage.Only layer 3 networks are designed for this scale.-- However, customers dont always want that type of networking. Theyd like to takeadvantage of multicast, choose their own IPs, setup networking tiers, etc. This is especiallytrue of older, legacy applications which had more initial flexibility during initial networkdesign and build-out.
  • 26. TCP / IP • Ethernet ‘switches’ • IP ‘routes’Monday, April 11, 2011- Refresh of OSI & TCP/IP network stack-- diagram-- Eithernet switches--- Layer 2 is simpler. Each resource has an ethernet address (MAC address) and the switchforwards appropriate packets to that physical device.--- Its simple and fast, but it doesnt scale. When there are many devices it means that eachswitch needs to know where everything is. Adding on virtual machines in the mix onlycomponds the problem.--- VLANs are a tool to work around this issue, by creating virtual ethernet networks. But ittoo has scalability issues. (Same goes with spaning tree protocol (STP) and its varients.)-- IP routes--- Layer 3 adds an abstraction layer on top of ethernet.--- IP address provides instructions on how to route a packet to the correct network.--- You often hear, "whats your IP address?" or "Whats the IP address of that machine?" Butits wrong to think of IP in that way. What an IP address is, is a _route_ to a particular network.--- Thought in this way, an IP network can be built to an arbritary size. And we have aworking example! "The Internet!"--- The disadvantage of IP is mobility. With eithernet, switches route packets to the rightmachines whereever they are on the network. With IP if a machine needs to move, its routehas changed, therefore packets will not reach it and its IP needs to change.
  • 27. Middle Ground Emerging • Virtualizing Layer 2 "Ethernet" networks as a middle-ground • L2 over L3 • OpenFlowMonday, April 11, 2011- Virtualizing Layer 2 "Ethernet" networks as a middle-ground-- Currently Layer 2 networks on multi-tennent clouds are generally latent. Implemented byrouting through a dedicated customer VM which leads to bottlenecks.-- More sophisticated networking tools comming soon as cloud providers ramp up theirgame / new implementations (OpenFlow, L2 over L3).-- Noteabily:--- L2 over L3: layer 2 packets will be encapsulated with routable IP. Implemeneted on a ToRswitch (optimlly) or a host, to allow arbritrary network topologies.--- OpenFlow: Think dynamic eithernet switches that are managed by a controller to activelymanage forwarding paths
  • 28. Case Study: Korea Telecom Networking • Provide customer VLANs  • Schedule customers to compute racks • Router VM per customer environment • 10 GbE throughoutMonday, April 11, 2011- Case Study: Korea Telecom Networking-- Provide customer VLANs -- Schedule customers to compute racks-- Router VM per customer environment-- 10 GbE throughout- Summary:-- Hard for existing applications to map directly-- Dramatic improvements around the corner to overcome limits of L2/L3 networking
  • 29. OpenStack Compute Networking Flat NetworkMonday, April 11, 2011- two options-- "flat" L3--- w/ DHCP (w/ physical vlan limitations)- Only configures VMs... not underlying networking infrastructure
  • 30. OpenStack Compute Networking Flat NetworkMonday, April 11, 2011- L2 w/ VLANS-- Must IP chosen before VM is scheduled-- Injects into guest
  • 31. Cloud Storage • VM-Backing Storage • Block Storage • Object StorageMonday, April 11, 2011- Image: Hitachi 2TB desktop drive we use in our Object Storage Cluster
  • 32. Cloud Storage: VM Backing • laying-down OS Image • A place for the running VM • Local Disk vs SAN • Ephemeral vs persistentMonday, April 11, 2011VM-Backing Storage- The main features that are being provided here are:-- Copying over a master/gold operating-system image and laying-down (insertingcredentials, licenses, formatting partition, etc.) the image onto disk readying it for thehypervisor to boot-- A place for the running VMLocal Disk vs. SAN-- Local disk: pro - cheap, fast. con - hard to manage-- SAN: con - expensive, network IO-bound. pro - improved manageability, VM migrationpossibleEphemeral vs persistent-- Ephemeral: Simpler operational model. Tooling & configuration management helps.Opinionated about re-deployable infrastructure-- Persistent: More management. Appealing for traditional admins.
  • 33. Block Storage • Mountable, persistent volumes • Can be provisioned via an API • Features such as snapshotting • Frequently used to run databases or other  • Implemented with a combination of SAN + object storage systemMonday, April 11, 2011
  • 34. Open Stack Object Storage API Data StorageMonday, April 11, 2011Just to baseline. Swift is the project name for the OpenStack Object Storage.It’s a storage service that is accessed via an API.Via the api you can create containers and PUT objects (data) into them.***That’s about it.It’s not a blocks.It’s not a filesystem.Needs an ecosystem
  • 35. Cloud Storage History s3 ’06 ’07 ’08 ’09 ’10Monday, April 11, 2011This whole thing got started in 2006 when Amazon launched S3, Simple Storage Service.And if everyone can re-wind in their heads back to 2006 when S3 came out.It was a strange animal. It made sense, but it was kind-of a novelty.- No SLA- Paranoid about “outsourcing data”But we got used to it. It started with backup tools.When new applications were developed, application developers became really keen on usingS3- Didn’t need to go out and buy a storage array-- And no upfront cost-- They didn’t need to guess at how much they were going to useFor these reasons, S3 became more and more baked into the tools that developers wereusing.- Ruby on Rails (paperclip)- Zend PHP- Hadoop (map-reduce)/* At the Ruby on Rails deployment company, Engine Yard (which is where I was beforeCloudscaling).- In 2008, we developed an AWS-based deployment platform.- The old deployment system was on in-house hardware.- One of the features was a clustered, posix-compliant filesystem with GFS. You could havemany virtual machines all connecting to the same volume in a relatively-sane way.- In the move to AWS, we couldn’t provide the same type of storage system.- But because S3 had permeated into the tools developers were using, it wasn’t an issue.*/
  • 36. Cloud Storage History s3 Cloud Files ’06 ’07 ’08 ’09 ’10Monday, April 11, 2011In 2008, Mosso, a subsidiary of Rackspace, launched its own Object Storage system calledCloudFS, now called Rackspace Cloud Files.
  • 37. Cloud Storage History s3 Cloud Files Object Storage ’06 ’07 ’08 ’09 ’10Monday, April 11, 2011And, of course, over this summer the project was open-sourced as part of the OpenStackproject.And that brings us to present day.***So here we are:- Object Storage is a big market. With two big players in the space.- Rackspace has open-sourced their implementation- This sets the stage for more deployments going forward
  • 38. OpenSource Projects CyberDuck Ruby Multi-Cloud Library Filesystem Rackspace LibraryMonday, April 11, 2011- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.(But it still references Rackspace in its interface.)- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on gettingchanges merged.- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quickresponse to folding in patches.So if you’re thinking of deploying an openstack object storage, there is a reasonable body ofopen-source tools and client libraries.So that brings us to getting this up and running with service providers./*What’s missing from this list are cloud management services.I would love to talk with those who are providing OpenStack support.I know what it’s going to take to provide the real motivation is a large potential customerbase, that’s going to show up when there are running, public, implementations of OpenStack.*/
  • 39. OpenSource Projects CyberDuck Ruby Multi-Cloud Library Filesystem Rackspace LibraryMonday, April 11, 2011- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.(But it still references Rackspace in its interface.)- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on gettingchanges merged.- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quickresponse to folding in patches.So if you’re thinking of deploying an openstack object storage, there is a reasonable body ofopen-source tools and client libraries.So that brings us to getting this up and running with service providers./*What’s missing from this list are cloud management services.I would love to talk with those who are providing OpenStack support.I know what it’s going to take to provide the real motivation is a large potential customerbase, that’s going to show up when there are running, public, implementations of OpenStack.*/
  • 40. OpenSource Projects CyberDuck Ruby Multi-Cloud Library Filesystem Rackspace LibraryMonday, April 11, 2011- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.(But it still references Rackspace in its interface.)- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on gettingchanges merged.- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quickresponse to folding in patches.So if you’re thinking of deploying an openstack object storage, there is a reasonable body ofopen-source tools and client libraries.So that brings us to getting this up and running with service providers./*What’s missing from this list are cloud management services.I would love to talk with those who are providing OpenStack support.I know what it’s going to take to provide the real motivation is a large potential customerbase, that’s going to show up when there are running, public, implementations of OpenStack.*/
  • 41. OpenSource Projects CyberDuck Ruby Multi-Cloud Library Filesystem Rackspace LibraryMonday, April 11, 2011- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.(But it still references Rackspace in its interface.)- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on gettingchanges merged.- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quickresponse to folding in patches.So if you’re thinking of deploying an openstack object storage, there is a reasonable body ofopen-source tools and client libraries.So that brings us to getting this up and running with service providers./*What’s missing from this list are cloud management services.I would love to talk with those who are providing OpenStack support.I know what it’s going to take to provide the real motivation is a large potential customerbase, that’s going to show up when there are running, public, implementations of OpenStack.*/
  • 42. OpenSource Projects CyberDuck Ruby Multi-Cloud Library Filesystem Rackspace LibraryMonday, April 11, 2011- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.(But it still references Rackspace in its interface.)- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on gettingchanges merged.- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quickresponse to folding in patches.So if you’re thinking of deploying an openstack object storage, there is a reasonable body ofopen-source tools and client libraries.So that brings us to getting this up and running with service providers./*What’s missing from this list are cloud management services.I would love to talk with those who are providing OpenStack support.I know what it’s going to take to provide the real motivation is a large potential customerbase, that’s going to show up when there are running, public, implementations of OpenStack.*/
  • 43. Object StorageMonday, April 11, 2011We’ve been working with OpenStack Object Storage since it came out in late July.Because Object Storage deployments makes a lot of sense for our clients.Having a way to conveniently host data is a very good thing for a service provider.1) It’s good to have data close to the resources theyre using (with compute, or contentdistribution, for example)2) It’s awfully convenient to have a place for data - whether that’s for backups - media assets.3) Provide in-house tools to build an application around - anywhere you are using S3 / CloudFiles during application developmentNow, if you want, you can build and run your own cluster with the- performance characteristics that you choose- where you choose,- with the security model you want
  • 44. Object Storage 100 TB Usable $90,000 5 Object Stores 4 Proxy ServersMonday, April 11, 2011Here is what it looked like.
  • 45. 3 PB raw, 1 PB UseableMonday, April 11, 2011
  • 46. Zones 1 2 3 4 5Monday, April 11, 2011Early on in the standup process.Swift has this concept of Zones. Zones designed to be physical segmentation of data.- Racks, power sources, fire sprinkler heads, physical buildings- Boundaries that can isolate physical failuresThe standard configuration is to have data replicated across three zones.So initially we were thinking, well, let’s just create three zones.But the problem is, if one of the zones catastrophically fails, there isn’t another zone for thedata to move into.Now, we wanted to be able to tolerate a whole rack failure and still have a place for data toland.We choose to go with 5 zones.If we had two total zone failures, and there was enough capacity remaining in the system, wecould run with.With our small deployment where we had one machine == one zoneOur larger deployment will have one rack == one zone
  • 47. 7kWMonday, April 11, 2011The Data CenterOne of the 1st things to note is power density -or- space requirements of the system- mechanical things tend to draw a lot of power.- In our configuration, to utilize a full-rack in a data center, we had to live in a “high-density”neighborhood of the data center.- Our configuration runs with 10 4u object stores ran 7kw a rack- That’s 370 drives per rack.- Careful when powering up whole racks- Plan accordingly- The other option for us was to “go wide” and run 1/2 racks, where we would use morespace.
  • 48. Networking Aggregation Aggregation Access Access 10GbE 10GbE Access Access 1GbE Switch Switch 1GbEMonday, April 11, 2011The NetworkingWe took a 2-tier approach.It starts out with a pair of redundant aggregation switches.A single switch would be a single point of failure.All requests go through the “Access Layer” that connect directly to the aggregation switchesat 10GbE.- The access layer contains proxy servers, authentication, load balancers, etc.Each rack has a single switch that is connected via 10GbE to the aggregation layer.- We went with single as we plan on being able to handle single rack failures.And it tapers down to 1GbE to an individual object store from the top-of-rack switches.
  • 49. Object Stores JBOD 48 GB RAM 36, 2TB Drives No RAID Newish XeonMonday, April 11, 2011The object stores:Beefy!- 48GB RAM- 36 2TB- Newish XeonThese are not just a bunch of disks!The system has a lot of work to do. Lots of metadata to keep in memory.***Lots of processes needed to be ran to handle the parallelism.Commodity, but enterprise quality gear. Enterprise drives./*SATA not SAS*/
  • 50. Access Layer Servers AKA Proxy Servers 24 GB RAM 10 GbE Newish XeonMonday, April 11, 2011Access LayerAKA “Proxy Servers”- Xeon w/ 24 GB RAM- 10 GbEOur original deployment bottlenecked here.Huge caveat here.Different usage patterns will dramatically very Architecture, Hardware and Networking mix- Archive-- Occasionally tar-up a wad of data and park it there.-- Much lower burden on the entire system.- Trusted networks-- Don’t need SSL, but wants lots of bandwidth- Lots of little puts and gets-- Proxy servers will need to handle the SSL load of many requestsAlthough I just outlined some hardware here. It’s not an exact recipe to follow.
  • 51. Access Layer Proxy Process SSL Load Balancing Client Access LayerMonday, April 11, 2011How the system is used has a huge impact on what to build.The next uncharted territory we had to figure out was what we’re calling the ‘Access Layer’You need to run the swift proxy-server process, which routes requests to the correct locationin the object stores (using the ring)In addition:- But you’re also likely to want to handle SSL termination- And to load balance across the servers running the proxy processes- We’ve also heard about using a commercial load balancer here as well- HA SolutionMany options here:- What we’re running is a round-robin DNS w/ Nginx terminating SSL directly in front of theproxy.- What we’re likely to end up with is running an HA Proxy configuration sharing an IP usingVRRP for HA, dumping straight into the proxy processesBeing pragmatic, there are other services that need a home as well- Authentication- PortalSticking them in the access layer can make a whole lot of sense.
  • 52. Lessons Learned • Lots of RAM • Think parallelism • Extremely durableMonday, April 11, 2011So what did we learn about this configuration?- High data rates typically require a lot of parallel accesses, there is often significant per-access latency (10s of ms, 1000x what a san or local storage device might show)- But the tradeoff is a ton of objects can be retrieved and written at the same time. Having alot of RAM in aggregate helps. It makes the metadata costs for accessing lots of objectsmanageable.- We grossly undersized the proxy servers.We did a ton of testing:- Bitrot, bad disk, (happened to have) bad DIMM, failed node, failed zone, failed switchThis is an embarrassing story, but it’s worth telling. We even had one of the zones down fortwo days right early on. Nobody noticed. We noticed when we re-ran some benchmarks andsome of the peak performance numbers werent what they were on a previous run.
  • 53. Running & Operating Infrastructure CloudsMonday, April 11, 2011When you buy a new car today. You open up the hood and there is a black plastic cover overthe engine.To diagnose, you plug the car into a computer and there are sensors all over the place to tellyou what is wrong and how to fix it.We need to be mindful to the fact that this is just one of many systems that they operate.As much as possible, we’d like to hand them a whole working product that tells them how tooperate and maintain it.For there to be a large number of swift deployments, the system needs to have handles sothat operators can maintain their system.Here is what we’ve assembled so far.
  • 54. Installation - Chef Server Installer - PXE - REST API Supporting Services $ cli tool - TFTP - DHCP - DNSMonday, April 11, 2011We built an installer.Installing:- To operate at any reasonable scale, consistency in configuration is hugely important.- So we automate the crap out of everything.- This installer (which runs as a virtual appliance) brings bare metal to fully-installed nodeIt is running a PXE server, so machines can ask for a new OSOur configuration management tool of choice is Chef, so the installer is running chef-serverThe net effect is that an operator can use a cli tool and punch in the mac address & what rolethe machine should be.And this system will take it from factory fresh, to a fully-installed swift node.
  • 55. Infrastructure Services are Software SystemsMonday, April 11, 2011A big web applicationSoftware System.Automating the install is not enough.A couple of years ago, I was going through the process of choosing a storage vendor.- Each sales team that I met with said the same thing: “We’re a software company”- Now, I think they said that because there is more margin in being a software company.- But in truth, they’re right. There is a heck of a lot in the software that drives these systems.We treat the entire system as if it were a big web application that is undergoing activedevelopment.Change happens. We are continuously adding features or needing to respond to operationalissuesWe pull from our dev ops roots to build tooling that is capable of driving frequent changeinto the system.We must be able to perform upgrades in the field with confidence.
  • 56. One-Button Install Development Lab Pre-Production (Laptop) (Virtual Machines) (Small Environment) ProductionMonday, April 11, 2011The first thing we put together was to automate the install process so that we could have aone button install of a local development environment.- This brought the system up from a fresh ubuntu install on a single VM somewhere.- Perfect to do local development on.The next step was to model-out a simulated deployment of how we were to configure aproduction environment.- In a lab environment, we recreate the production environment with virtual machines.- And remember that ‘virtual appliance’ installer that does the provisioning?- We use that same tool so that we can have a self-contained, simulated build of the entiresystem.Next, we have our small-ish environment where we have one physical machine per zone.- We can use that environment as a pre-production for changes.
  • 57. Continuous Integration Development Lab (Laptop) (Virtual Machines)Monday, April 11, 2011Finally, to vet the software changes as they are being made, we have a continuous integrationsetup with Hudson.When code is checked-in, we can rebuild in the lab. This is the system with a bunch of VMs.Tests can be ran against that system and report any errors that crop up./* A full-rebuild takes us about 45 minutes. */All these systems put together gives us the confidence that the system that we’ve puttogether will run well and upgrade smoothly./*Time-to-live is a big deal for us.We aggressively pursue automation because it enables a repeatable deployment process atscale.Yes, a lot of time was spent automating installs. But we built these systems because weended up doing a lot of software development to integrate to make it run, ‘as a service’.*/
  • 58. Operations • Automate agent install • NOC • Replacement rather than repairMonday, April 11, 2011Finally, operators need to know when the system needs work.We bake in monitoring agents as part of the system install.Because we have lot cost install tools, we are favoring replacement over repair.- For example if a boot volume goes bad, or bad DIMMs.- Basically anything that doesn’t involve swapping out a drive.- It’s easier to replace the component, have the system rebalance itself and add newequpment.- Integrate w/ existing NOC systems-- nobody will act upon an alert that isn’t seen
  • 59. Billing Utilization - RAM Hours - Storage Usage RequestsMonday, April 11, 2011BillingThis is a huge topic! We could have an entire talk on this topic alone.Collect and store those metrics.Utilization- must compute storage ‘averages’ measuring on some interval you feel comfortable with- Decide if you are going to keep track of small obj / num requestsCompute is usually metered at the RAM / hour.Do you bill for requests? or make small files penal? charge for internal access?At one company I was a heavy user of Amazon S3 -- they called me up out of the blue andtold me to re-architect our application because we were creating too many ‘Buckets’ (Folders).Apparently we were causing excessive load. He thought he was being helpful saying “you willsave $700 a month!” I appreciated the sentiment to try to save me money, but the cost of re-architecting the app would be more than a reasonable payback period in that context. Theexcessive load was their problem. I was okay paying an extra $700 a month.The moral of the story is to be okay with the usage you price for.
  • 60. Pricing Consumption Pricing Capacity Pricing vsMonday, April 11, 2011Welcome to the world of consumption-pricing vs. capacity pricing.Making a dinner reservation at the hippest restaurant in town.They ask for your credit-card. Why? Because if you don’t show up, they still need to chargeyou for the meal you didn’t eat.They’re practicing capacity pricing. They have a fixed number of seats that they need to fill.This same practice is the standard in our industry. When you go buy a bit of storageequipment, you buy based on its capacity. You can’t say to a vendor... would you mindinstalling 100 TB of storage, but I’ll only pay for 50... because I only plan on using 50TB on_average_. They would laugh you out the door!When you go to buy bandwidth, you are charged 95-percentile. You pay for bandwidth thatgoes unused because you’re paying for the capacity to be available with some wiggle-roomfor extra-ordinary bursts.So service providers are having to figure out how to deal with this.It’s a bigger deal at a smaller scale. A single customer could but-in and consume a largeamount of a cluster on a percentage basis.The averages even out at a larger scale.
  • 61. Authentication Client 1 4 3 build this Authentication Service 2 5 Internal AuthenticationInfrastructure CloudMonday, April 11, 2011- This is a multi-tenant system- Authentication server is not intended to be used by a service provider.- However, you will need to setup a few things in between the core storage cluster and yourauthentication system.- Clients (as in libraries/tools/services) expect to hit an auth server so that they can get 1) atoken that can be used with subsequent requests and 2) a URL of the storage system theyshould hit.- This “Authentication Middleware” must also be capable of getting a token from the storagecluster to verify that it’s still good.- So here is how the process works: Client makes request to Auth server and gets a token +URL, Client makes API calls to the storage cluster with the Token, The storage cluster asks theAuth Server if the token is good.
  • 62. THANK YOU! Joe Arnold joe@cloudscaling.com @joearnoldMonday, April 11, 2011