Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Farming hadoop in_the_cloud

4,479 views

Published on

Presentation at berlinbuzzwords on cloud infrastructure APIs/UIs and Hadoop

Published in: Technology
  • Be the first to comment

Farming hadoop in_the_cloud

  1. 1. FARMING HADOOP IN THE CLOUD Steve Loughran HP Laboratories June 2010
  2. 2. ABOUT ME <ul><li>Researcher at HP Laboratories Bristol, England
  3. 3. Datacentre-scale apps & IaaS
  4. 4. ASF Member: stevel@apache.org
  5. 5. Committer: Ant, Hadoop-common, HDFS, MapReduce
  6. 6. Author: Ant in Action
  7. 7. Somewhat obsessive about testing </li></ul>
  8. 8. CLOUD ELIMINATES <ul><li>Buying hardware based on predicted load
  9. 9. 2+ week lead time on new hardware, storage
  10. 10. High Availability
  11. 11. Homogeneity
  12. 12. Static machine names, addresses and capabilities
  13. 13. Stable machines
  14. 14. A fast private network
  15. 15. Someone in the datacentre who cares about you </li></ul>
  16. 16. APPLICATIONS MUST BE AGILE <ul><li>Directory, database or CM service to configure
  17. 17. Applications to handle moving services
  18. 18. Use dynamic DNS services; don’t cache IPAddrs
  19. 19. Don’t expect HDD content to last on a single disk
  20. 20. Restart VMs on any app failure </li></ul>Nothing is static. Nothing lasts .
  21. 21. CLASSIC TEAM ROLES Business Development Architecture Operations Development
  22. 22. Business Development Architecture Operations Development BEFORE Design Code Test Staging Live
  23. 23. RESPONSIBILITIES Work <ul><li>Architects design the application
  24. 24. Developers code and test on local machines
  25. 25. Operations buy and configure production machines </li></ul>Blame <ul><li>Developers get blame for things not working
  26. 26. Operations get blame for security & availability problems
  27. 27. Getting predictions of demand wrong can kill your project </li></ul>
  28. 28. Architecture Development Business Development EVERYTHING BLURS Design Code Test Live Staging Operations
  29. 29. Role Task Tooling Cloud Architect Design App structure Text editor (?)) Powerpoint (? Cloud Operations Build VMs, set parameters Manage production Text under SCM Web & Command line Developers Request test VMs Web, IDE, build tools Biz Dev Worry about money Web & Spreadsheet DIFFERENT ROLES - DIFFERENT TOOLS
  30. 30. Hadoop's datamining services are needed by the developers, biz-dev, operations and infrastructure teams
  31. 31. HADOOP’S ASSUMPTIONS Hadoop is not agile <ul><li>Master nodes don’t move
  32. 32. Workers can spin for them
  33. 33. Failed workers get blacklisted
  34. 34. Single, static hostnames
  35. 35. Cache all DNS entries
  36. 36. Disks don't move between hosts </li></ul>
  37. 37. CLOUDFARMER Role-driven VM allocation UI split by team role
  38. 38. ROLE SPECIFICATION worker extends HadoopVMRole { description &quot;A Hadoop Worker&quot;; vmPrefix &quot;vm&quot;; bootAutoVol &quot;vol-0-1-47&quot;; min 0; recommendedMin 3; deploy &quot;worker.sf&quot; links extends HadoopServlets { hdfs DFS_DATANODE_HTTP_DEFAULT_PORT; &quot;Hadoop DFS&quot; [&quot;http&quot;, hdfs, &quot;/&quot;]; &quot;Hadoop DFS metrics&quot; [&quot;http&quot;, hdfs, METRICS]; &quot;Hadoop DFS stacks&quot; [&quot;http&quot;, hdfs, STACKS]; &quot;Hadoop DFS conf&quot; [&quot;http&quot;, hdfs, CONF]; &quot;Hadoop DFS conf JSON&quot; [&quot;http&quot;, hdfs, CONF_JSON]; } }
  39. 39. WEBAPP LISTS AVAILABLE ROLES VMs with post-deployment actions
  40. 40. HADOOP CLUSTER: MASTER + WORKER Webapp does the binding and cluster config
  41. 41. REQUEST COMPLETED
  42. 42. LIST HOSTS IN A ROLE Actions: Add a worker, delete all in role “worker”
  43. 43. LIST ALL HOSTS
  44. 44. VIEW A HOST
  45. 46. GRAPH VIEW
  46. 47. BENEFITS <ul><li>Infrastructure can do late-binding install/deploy
  47. 48. Feeds into the UI
  48. 49. Web, IDE, build tools, ...
  49. 50. “ Role” CPU/network history can aid placement
  50. 51. Other templates: network, aggregate clusters </li></ul>
  51. 52. TODO <ul><li>VM Placement based on (data, user, VM role)
  52. 53. What is the right cloud UIs for biz dev?
  53. 54. Collecting, analysing test runs
  54. 55. Collect, analyse VM histories
  55. 56. Hadoop: lifecycle patches in
  56. 57. Hadoop: make agile against NN & JT
  57. 58. Hadoop: more job monitoring </li></ul>
  58. 59. Q&A

×