Considerations for building your private cloud rackspace

2,224 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,224
On SlideShare
0
From Embeds
0
Number of Embeds
461
Actions
Shares
0
Downloads
99
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Goal: To give ideas on how to build true private clouds powered by OpenStack software\n\nThese slides are to serve as a guide not and end all be all. Ultimately you’ll have to find the right solution for your company\n\nAsk to hold questions till the end\n\n\n
  • In a private cloud, you need to understand all of these services and how they interact\neach box on here could be and probably is at least one talk here this week\n\nthis isn’t plug and play yet but a number of companies are trying to get it there\n lots of companies have released installers\n
  • Multiple private clouds - \nseems Counter Intuitive to the concept of the cloud but drastically different hardware requirements may drive this \nCan be mitigated by AZs or Cells\n\nDiagram is overly simplified for this talk. Doesn’t take HA, out of band management, backup network, etc into account\n
  • Don’t paint yourself into an architectural corner by answering just one of these\nanswer all of these when looking at design\n\nprivate clouds generally don’t have the luxury of building massive capacity like a public cloud.\n
  • The problem is that you’ll have no idea how many virtual machines can run on here due to flavors\n\nHard to determine size of your environment\nYou have a head start if your already running workloads on a public cloud\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
  • Explain what glance is\n\nIt’s hard to realistically guess how many images or size of images you’ll be using. \n\nIt’s simpler to standardize on base images and use automation tools to configure services within the instances.\n\n
  • I’d love to see a tool that could detect often used images and pre-cache those on remote hosts\n
  • I’d love to see a tool that could detect often used images and pre-cache those on remote hosts\n
  • example of an “object”\ncall back “build with the end in mind”: \nextremely important to build swift partitions correctly\nextrapolate on “zones”\ncould be a drive, a server, a cab\n\n
  • example of an “object”\ncall back “build with the end in mind”: \nextremely important to build swift partitions correctly\nextrapolate on “zones”\ncould be a drive, a server, a cab\n\n
  • Everyones network utilization will be different. Understand your current usage and plan accordingly.\nIf worried about nic saturation, break out your nova services (glance, nova services) to a separate network\n\nSIngle Controller - MySQL, rabbitmq-server, keystone, glance registry/api, nova-scheduler, nova-os-api-compute, nova-cert, nova-vncproxy, horizon\n
  • Convey why to consider swift\n100 nodes, 2000 instances (20 nodes per), any of them could be snapshotting. Will be a bottleneck\n\nfrontend backend networks\nfrontend for external connectivity \nbackend for instance to instance, instance to non-openstack server (dedicated DB, \n
  • Convey why to consider swift\n100 nodes, 2000 instances (20 nodes per), any of them could be snapshotting. Will be a bottleneck\n\nfrontend backend networks\nfrontend for external connectivity \nbackend for instance to instance, instance to non-openstack server (dedicated DB, \n
  • If you’re not using another system (Cinder, SAN, NetApp. etc) for additional storage, IO will need to a top consideration\n
  • right questions: (not taking hardware into account) I can ask 5 questions and build your environment\n\nAs I mentioned when starting, every major slide on here could be an entire talk and most of these slides are “lessons learned”\n\n
  • Specifically - \nThoughts on nova-volumes/Cinder\nthoughts on pre-caching\n\nAdditional thoughts on deployments\n
  • \n
  • Considerations for building your private cloud rackspace

    1. 1. Considerations forBuilding a Private Ryan Richard OpenStack Engineer ryan.richard@rackspace.com @rackninja October 12, 2012
    2. 2. Whaaa???source: http://ken.pepple.info/openstack/2012/09/25/openstack-folsom- architecture/ RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    3. 3. What is a Private Cloud?Generally considered to be smaller than a “public”cloudLess than 100 physical servers (for this talk)API endpoints may not be publicly accessibleLimited inbound connectivity. Use floating IPs toallow for inbound connectivityCan be customized for specific workloads (hardware/network/etc)Company may leverage multiple private clouds RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    4. 4. What is a Private Cloud?Generally considered to be smaller than a “public”cloudLess than 100 physical servers (for this talk)API endpoints may not be publicly accessibleLimited inbound connectivity. Use floating IPs toallow for inbound connectivityCan be customized for specific workloads (hardware/network/etc)Company may leverage multiple private clouds RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    5. 5. Build with the End in Mind What are you building for? A. Are you building for 10 servers? 20? 100? B. Or are you building 500 instances? 1000? 2000? C. Or are you building 400 CPUs? 3TB RAM? 100TB disk? RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    6. 6. Build with the End in Mind What are you building for? A. Are you building for 10 servers? 20? 100? B. Or are you building 500 instances? 1000? 2000? C. Or are you building 400 CPUs? 3TB RAM? 100TB disk? d. ALL OF THE ABOVE RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    7. 7. Build with the End inExample hardware Mind 12 Physical Cores - 24 w/ Hyperthreading - 48 vcpus w/ 2:1 overcommit ratio 128GB of RAM - 1:1 overcommit ratio 8 x 300GB drives RAID 10 - ~1.2 TB usable disk space How many instances can I run on this physical host? RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    8. 8. Build with the End inExample hardware Mind 12 Physical Cores - 24 w/ Hyperthreading - 48 vcpus w/ 2:1 overcommit ratio 128GB of RAM - 1:1 overcommit ratio 8 x 300GB drives RAID 10 - ~1.2 TB usable disk space How many instances can I run on this physical host? (total VCPUs / smallest flavor #VCPUs) = maximum # of instances Double or quadruple this to account for growth - size of fixed network range RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    9. 9. Build with the End in MindNetworking We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine access, OpenStack services) Fixed Network (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    10. 10. Build with the End in MindNetworking We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine access, OpenStack services) Fixed Network (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    11. 11. Build with the End in Mind Networking is the importantNetworking part, get it right! We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine access, OpenStack services) Fixed Network (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    12. 12. Build with the End in Mind Networking is the importantNetworking part, get it right! We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine access, OpenStack services) Fixed Network (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    13. 13. Build with the End in Mind Networking is the importantNetworking part, get it right! We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine Easy to add physical nodes access, OpenStack and/or networks services) Fixed Network (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    14. 14. Build with the End in Mind Networking is the importantNetworking part, get it right! We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine Easy to add physical nodes access, OpenStack and/or networks services) Fixed Network (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    15. 15. Build with the End in Mind Networking is the importantNetworking part, get it right! We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine Easy to add physical nodes access, OpenStack and/or networks services) Don’t try to change the fixed Fixed Network network once in production (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    16. 16. Build with the End in Mind Networking is the importantNetworking part, get it right! We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine Easy to add physical nodes access, OpenStack and/or networks services) Don’t try to change the fixed Fixed Network network once in production (instance network) Floating network RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    17. 17. Build with the End in Mind Networking is the importantNetworking part, get it right! We can build a cloud with 2 networks (3 if using floating IPs) Host Network (physical machine Easy to add physical nodes access, OpenStack and/or networks services) Don’t try to change the fixed Fixed Network network once in production (instance network) Easy to add additional floating Floating network networks RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    18. 18. Build with the End in MindGlance Disk space on server acting as glance backend (file based) will be a limiting factor. Good alternatives: Swift, CloudFiles, NFS (locally mounted) Local disk is considerably faster than the alternatives Will you be leveraging snapshots? If so, disk space will need to be a serious consideration If using qcow2, set “snapshot_image_format=qcow2“ to help limit disk usage RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    19. 19. Build with the End in MindGlance Performance Network throughput is a limitation 1000Mb/s = 125MB/s max (expect ~112MB/s realistically) Large sequential read/writes - RAID5 may be preferred Lean towards disk bandwidth over raw IOPs Reduce # of images to allow for more efficient local caches on compute nodes (dramatically increasing performance of instance creation) RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    20. 20. Build with the End in MindGlance Performance Network throughput is a limitation 1000Mb/s = 125MB/s max (expect ~112MB/s realistically) Large sequential read/writes - RAID5 may be preferred Lean towards disk bandwidth over raw IOPs Reduce # of images to allow for more efficient local caches on compute nodes (dramatically increasing performance of instance creation) Image Size Not Cached Cached 1.4GB 20secs 1sec 16.4GB 2min 21secs 1sec RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    21. 21. Build with the End in MindGlance Performance Network throughput is a limitation 1000Mb/s = 125MB/s max (expect ~112MB/s realistically) Large sequential read/writes - RAID5 may be preferred Lean towards disk bandwidth over raw IOPs Reduce # of images to allow for more efficient local caches on compute nodes (dramatically increasing performance of instance creation) Image Size Not Cached Cached *times from “creating image” to 1.4GB 20secs 1sec “qemu-img create” 16.4GB 2min 21secs 1sec RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    22. 22. To Swift or not to Swift? RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    23. 23. To Swift or not to Swift?Pros Scalable object storage that works great as a backend for Glance Can be leveraged as object storage for other parts of the business Ability to quickly increase the amount of storage available Extremely stable if designed correctly RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    24. 24. To Swift or not to Swift?Pros Cons Scalable object storage that works great as a backend Additional expertise needed to for Glance run Swift Can be leveraged as object Architecture (network/swift storage for other parts of components) design is the business important to get right Ability to quickly increase Depending on initial usage, the amount of storage there may be high up front available costs to populate 5 zones Extremely stable if designed correctly RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    25. 25. Architecture Examples and Thoughts1 - 20 physical servers 20-50 physical servers Single controller (single API Single controller (single API endpoint, single scheduler, etc) endpoint, single scheduler, should suffice etc) should suffice Single network (1Gbps) for Investigate Swift as a glance instance connectivity and backend. OpenStack services is sufficient Start looking into ways to Rackspace “Alamo” installer break apart various controller services RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    26. 26. Architecture Examples and Thoughts50-100 servers Keep an eye on the scheduler to make sure it’s not a bottleneck Strongly consider swift especially for snapshots Consider Availability Zones/ Cells (didn’t make it into Folsom) Consider “frontend” and “backend” networks for RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    27. 27. Architecture Examples and Thoughts50-100 servers Keep an eye on the scheduler to make sure it’s not a bottleneck Strongly consider swift especially for snapshots Consider Availability Zones/ Cells (didn’t make it into Folsom) Consider “frontend” and “backend” networks for RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    28. 28. Architecture Examples and Thoughts50-100 servers Keep an eye on the scheduler to make sure it’s not a bottleneck Strongly consider swift especially for snapshots Consider Availability Zones/ Cells (didn’t make it into Folsom) Consider “frontend” and two or more instance networks? “backend” networks for Set “use_single_default_gateway” in nova.confRACKSPACE® HOSTING | WWW.RACKSPACE.COM
    29. 29. Performance Considerations and BottlenecksIO 20-40 instances per physical server causes high random IO Reduce IO as much as possible - i.e. centralized logging Can be further mitigated with Cinder RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    30. 30. Performance Considerations and Bottlenecks Async&Random&IO&IO rs/speed/test12"(cfq,"host"deadline,"cache=none)" Rs/speed/test13"(noop,"cache=writeback)" 20-40 instances per rs/speed/test13"(cfq,"cache=writeback)" physical server causes high Rs/speed/test12"(noop"cache=none)" randW"(direct)" random IO Rs/speed/test12"(cfq"cache=none)" randR"(direct)" randW" Rs/speed/test13"(cfq,"cache=none,"no"ht)" randR" Rs/speed/test13"(deadline"cache=none)" Reduce IO as much as compute/host"(deadline)" possible - i.e. centralized compute/host"(no"ht)" logging compute/host" 0" 200" 400" 600" 800" 1000" 1200" 1400" 1600" Host&vs.&Instance& 14000" Can be further mitigated with Cinder 12000" 10000" 8000" compute/host" 6000" Rs/speed/test12"(cfq"cache=none)" 4000" 2000" 0" randR" randW" randR" randW" seqR" seqW"RACKSPACE® HOSTING seqR" seqw" | WWW.RACKSPACE.COM (direct)" (direct)" (direct)" (direct)"
    31. 31. Final ThoughtsLessons learned Standardize on a design that works for your organization Find the right questions to ask Important to understand OpenStack as a whole OpenStack is still changing often, keep up to date with current state of the projects RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    32. 32. But....But this is a design summit also Open to discussions/thoughts/questions RACKSPACE® HOSTING | WWW.RACKSPACE.COM
    33. 33. Rackspace is hiring www.rackertalent.com RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COMRACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

    ×