Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data and OpenStack, a Love Story: Michael Still, Rackspace

907 views

Published on

Big Data and OpenStack, a Love Story

Audience: Intermediate

Topic: Storage

Abstract: Increasingly we’re being asked to build out clusters of machines to solve big data problems. These clusters can become quite large, reaching up to thousands of machines. Of course, our operational budgets don’t scale linearly like our machine counts do, and we’re asked to do more and more with less. This talk will explore how organisations around the world are using OpenStack to automate the management of their big data implementations, harnessing interesting characteristics of big data workloads along the way.

Speaker Bio: Michael Still, Rackspace

OpenStack core developer and former Nova PTL, as well as experienced software and reliability engineer. Part of the team that grew Google Mobile to being a billion dollar business. Director of linux.conf.au 2013. Author of The Definitive Guide to ImageMagick (www.imagemagickbook.com) and Practical MythTV (www.mythtvbook.com) from Apress, as well as a bunch of articles.

OpenStack Australia Day Government - Canberra 2016
http://australiaday.openstack.org.au/canberra-2016/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big Data and OpenStack, a Love Story: Michael Still, Rackspace

  1. 1. OPENSTACK AND BIG DATA, A LOVE STORY Michael Still Senior Software Development Manager michael.still@rackspace.com or @mikal on twitter
  2. 2. 2 WHO IS THIS GUY? •  A Canberran born and bred •  An OpenStack developer since 2011, first commit merged January 2012 -  https://review.openstack.org/#/c/2899/ •  A Compute Core Reviewer, former Compute PTL, and have served on the OpenStack Technical Committee •  Manager for a team of OpenStack developers spread across Australia
  3. 3. 3 CAUTION, THIS BIT IS A TEST
  4. 4. 4
  5. 5. 5
  6. 6. 6 WHO IS RACKSPACE? •  Do any of you guys know who Rackspace is and how they fit into the OpenStack story?
  7. 7. 7 BIG DATA •  Hopefully we’re all familiar with the term •  That said, the basic idea is to store and process large amounts of data on commodity equipment •  Pioneered by Internet companies •  But now used by many ”more traditional” organizations
  8. 8. IMAGE PLACEHOLDER 1280X1080 8 THE OLD WAY
  9. 9. IMAGE PLACEHOLDER 1280X1080 9 THE NEW WAY
  10. 10. 10 BIG DATA •  The most obvious thing here is that machine counts are increasing… •  We’re talking about hundreds or thousands of machines instead of the one big machine
  11. 11. 11 BIG DATA •  The most obvious thing here is that machine counts are increasing… •  We’re talking about hundreds or thousands of machines instead of the one big machine •  And our operational budgets are not increasing with machine count (of course)
  12. 12. 12 BIG DATA •  The most obvious thing here is that machine counts are increasing… •  We’re talking about hundreds or thousands of machines instead of the one big machine •  And our operational budgets are not increasing with machine count (of course) •  So we need to automate more
  13. 13. 13 OPENSTACK COMPUTE •  From day zero OpenStack supported running virtual machines •  We call them instances
  14. 14. 14 OPENSTACK COMPUTE •  From day zero OpenStack supported running virtual machines •  We call them instances •  Virtual machines aren’t a great choice for most big data applications though -  For example, its nice if you replicate your data -  But what if all the VMs containing replicas are on the same hypervisor? -  There are performance costs as well
  15. 15. 15 OPENSTACK COMPUTE •  From day zero OpenStack supported running virtual machines •  We call them instances •  Virtual machines aren’t a great choice for most big data applications though -  For example, its nice if you replicate your data -  But what if all the VMs containing replicas are on the same hypervisor? -  There are performance costs as well •  Big data is about bulk, not artisanal machine orchestration
  16. 16. 16 OPENSTACK BAREMETAL •  A research project started in 2012
  17. 17. 17 OPENSTACK BAREMETAL •  A research project started in 2012 •  It was… horrible •  But has been deployed. Yahoo has tens of thousands of machines running this code.
  18. 18. 18 OPENSTACK BAREMETAL •  A research project started in 2012 •  It was… horrible •  But has been deployed. Yahoo has tens of thousands of machines running this code. •  Luckily some adults came along and turned that research project into a productionized thing in 2013
  19. 19. 19 OPENSTACK BAREMETAL •  The new implementation is a separate OpenStack project •  Manages machines by talking IPMI / DRAC / iLO / other things •  Integrates with OpenStack Compute so that the same APIs work everywhere
  20. 20. 20 WHICH MEANS…
  21. 21. 21 API CONTROL OF BULK INFRASTRUCTURE •  We can now build images for all our various big data machine types -  Management nodes -  Zookeeper nodes -  Data storage / worker nodes •  And then manage them with simple command line tools
  22. 22. 22 API CONTROL OF BULK INFRASTRUCTURE •  I’ve spent the last year helping a customer of ours do something like this
  23. 23. 23 API CONTROL OF BULK INFRASTRUCTURE •  I’ve spent the last year helping a customer of ours do something like this •  Why a year? •  Well, they wanted some other stuff like continuous deployment of OpenStack as well, and that was a lot harder than the Hadoop bits
  24. 24. 24 API CONTROL OF BULK INFRASTRUCTURE •  That said, based on a simpler version of their deployment, I think I have some recommendations now for how to approach a project like this…
  25. 25. 25 API CONTROL OF BULK INFRASTRUCTURE •  That said, based on a simpler version of their deployment, I think I have some recommendations now for how to approach a project like this… •  Zookeeper nodes are harder than I thought •  Management nodes are even harder •  But data and processing nodes are easy
  26. 26. 26 API CONTROL OF BULK INFRASTRUCTURE •  That said, based on a simpler version of their deployment, I think I have some recommendations now for how to approach a project like this… •  Zookeeper nodes are harder than I thought •  Management nodes are even harder •  But data and processing nodes are easy Luckily, this is the vast majority of your machines
  27. 27. 27 API CONTROL OF BULK INFRASTRUCTURE •  That said, based on a simpler version of their deployment, I think I have some recommendations now for how to approach a project like this… •  Zookeeper nodes are harder than I thought •  Management nodes are even harder •  But data and processing nodes are easy Luckily, this is the vast majority of your machines And this is possible, just harder
  28. 28. 28 API CONTROL OF BULK INFRASTRUCTURE •  Data and processing nodes -  Golden image deployments are the way to go -  Keep your data on non-boot disks -  To update the OS / image, just rebuild the image and the use nova rebuild -  Use keep-ephemeral to avoid re-syncing data during a rollout
  29. 29. 29 API CONTROL OF BULK INFRASTRUCTURE •  Zookeeper nodes -  This is harder because all the machines in the zookeeper cluster need a shared config listing all their peers -  We solved this by using an overlay network -  But floating IPs would probably work in a simpler environment
  30. 30. 30 CANBERRA OPENSTACK MEETUP Tuesday 29 November 7pm to 9pm https://goo.gl/nxW62K
  31. 31. 31
  32. 32. 32 Copyright © 2016 Rackspace | Rackspace® Fanatical Support® and other Rackspace marks are either registered service marks or service marks of Rackspce US, Inc. in the United States and other countries. Features, benefits and pricing presented depend on system configuration and are subject to change without notice. Rackspace disclaims any representation, warranty or other legal commitment regarding its services except for those expressly stated in a Rackspace services agreement. All other trademarks, service marks, images, products and brands remain the sole property of their respective holders and do not imply endorsement or sponsorship. ONE FANATICAL PLACE | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM
  33. 33. 33 Copyright © 2016 Rackspace | Rackspace® Fanatical Support® and other Rackspace marks are either registered service marks or service marks of Rackspce US, Inc. in the United States and other countries. Features, benefits and pricing presented depend on system configuration and are subject to change without notice. Rackspace disclaims any representation, warranty or other legal commitment regarding its services except for those expressly stated in a Rackspace services agreement. All other trademarks, service marks, images, products and brands remain the sole property of their respective holders and do not imply endorsement or sponsorship. ONE FANATICAL PLACE | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM Feel free to contact me at: michael.still@rackspace.com or @mikal on twitter
  34. 34. 34 DATA CENTERS 10 Worldwide GLOBAL FOOTPRINT Customers in 150 Countries PORTFOLIO Dedicated • Hybrid • Cloud EXPERTS 6,200 Rackers REVENUE Over $2B in Annualized Revenue FORTUNE 100 We serve the majority of the Fortune 100 WHO WE ARE 3,000+ Cloud Experts

×