Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Two Sides of Google Infrastructure for Everyone Else

A two sided debate about the #GIFEE meme and the art of software adoption in different types of organisations.

  • Be the first to comment

Two Sides of Google Infrastructure for Everyone Else

  1. 1. (without introducing more risk) The Two Sides Puppet Gareth Rushgrove Of Google Infrastructure for Everyone Else
  2. 2. (without introducing more risk) @garethr
  3. 3. (without introducing more risk) Gareth Rushgrove
  4. 4. (without introducing more risk) Introduction A strange format for a talk
  5. 5. This is a debate Gareth Rushgrove
  6. 6. I’ll be debating both sides Gareth Rushgrove
  7. 7. Taking opposing viewpoints on the same issue, as a way of exploring it in-depth Gareth Rushgrove
  8. 8. The talk is split into two parts; a For part and an Against part Gareth Rushgrove
  9. 9. I’d like to explore: - Technical practice evolution - How we adopt software - The organisational context Gareth Rushgrove
  10. 10. This house believes… Gareth Rushgrove
  11. 11. Successful companies will look like Google in the future, so we should adopt Google-like software and practices today Gareth Rushgrove
  12. 12. Important disclaimer I’ve never worked for Google Gareth Rushgrove
  13. 13. (without introducing more risk)For
  14. 14. You’re probably: 1 Struggling with distributed systems 2 Missing out on machine learning 3 Wondering how to scale operations Gareth Rushgrove
  15. 15. Gareth Rushgrove have a 10+ year head start
  16. 16. publish research that influences out industry Gareth Rushgrove
  17. 17. Gareth Rushgrove MapReduce
  18. 18. Gareth Rushgrove Chubby
  19. 19. Gareth Rushgrove Borg
  20. 20. releases (and inspires) software we use Gareth Rushgrove
  21. 21. Gareth Rushgrove
  22. 22. Gareth Rushgrove Go
  23. 23. Gareth Rushgrove from
  24. 24. (without introducing more risk) GFS = HDFS BigTable = HBase Protocol Buffers = Thrift or Avro (serialization) Stubby = Thrift or Avro (RPC) ColumnIO = Parquet Dremel = Impala Omega = Mesos Blaze = Pants or Buck FlumeJava = Crunch Logsaver = Scribe or Flume Millwheel = Storm or Samza? Borgmon/Monarch = Graphite Dapper = Zipkin 2014 from @avibryant, @joshwills, @skamille, @marius, @wickmanGareth Rushgrove
  25. 25. We have a term for this; #GIFEE Gareth Rushgrove
  26. 26. Google Infrastructure for Everyone Else Gareth Rushgrove
  27. 27. Distributed systems are hard Gareth Rushgrove
  28. 28. Building your own in-house framework is likely a waste of time Gareth Rushgrove
  29. 29. Gareth Rushgrove From Adrian Colyer, Accel,
  30. 30. Kubernetes is the 3rd generation of Googles cluster management software Gareth Rushgrove
  31. 31. Gareth Rushgrove The Kubernetes API provides primitives that make doing the right thing easier
  32. 32. - Orchestration - Logging - Configuration - Self-healing - Storage Gareth Rushgrove - Load balancing - Service discovery - Scaling - Batch workloads - Lots more
  33. 33. Gareth Rushgrove Exposed via a modern API
  34. 34. Machine learning is going to be massive Gareth Rushgrove
  35. 35. Soon We Won’t Program Computers. We’ll Train Them Like Dogs Gareth Rushgrove ” “
  36. 36. TensorFlow is an open source software library for numerical computation Gareth Rushgrove
  37. 37. (without introducing more risk) Gareth Rushgrove …
  38. 38. - Nearest neighbour - Linear regression - Recurrent neural networks - Multilayer perceptron - Lots more Gareth Rushgrove
  39. 39. Gareth Rushgrove Introductory ML docs
  40. 40. How do I do devops? Gareth Rushgrove Everyone ever ” “
  41. 41. Gareth Rushgrove explain how they work too
  42. 42. Gareth Rushgrove
  43. 43. SRE: Have software engineers do operations Gareth Rushgrove Dan Luu, ex Google ” “
  44. 44. (without introducing more risk) Gareth Rushgrove Dev SRE Ops From by Matthew Skelton
  45. 45. The familiar: - Capacity planning - Performance - Change management - Monitoring Gareth Rushgrove
  46. 46. The unfamiliar: - Error budget - Strong software engineering skills - 50% operations work cap Gareth Rushgrove
  47. 47. A growing ecosystem Gareth Rushgrove
  48. 48. Gareth Rushgrove Friendly vendors
  49. 49. Gareth Rushgrove More friendly vendors
  50. 50. Gareth Rushgrove Even more nice vendors
  51. 51. (without introducing more risk) Summing up For
  52. 52. “infrastructure” is shifting to a higher level of abstraction Gareth Rushgrove
  53. 53. It’s fine to just be a consumer Gareth Rushgrove
  54. 54. You should be standing on the shoulders of giants Gareth Rushgrove
  55. 55. You should be standing on the shoulders of Gareth Rushgrove
  56. 56. (without introducing more risk)Against
  57. 57. Your organisation doesn’t look like Google Gareth Rushgrove
  59. 59. Could your organisation look like Google? Gareth Rushgrove
  60. 60. How many employees do you have? Google have about 60,000 Gareth Rushgrove
  61. 61. What proportion of your organisation are software engineers or operations? Gareth Rushgrove
  62. 62. 50 percent? Based on the Google annual report December 2014 Gareth Rushgrove
  63. 63. How much do you pay software engineers? Gareth Rushgrove
  64. 64. Gareth Rushgrove Data from Glassdoor, June 2016, based on 14k salaries
  65. 65. Gareth Rushgrove The $3million engineer?
  66. 66. Gareth Rushgrove
  67. 67. Gareth Rushgrove Build your own chips?
  68. 68. Could your organisation really look like Google? Gareth Rushgrove
  69. 69. So much of the information in the SRE book makes PERFECT sense if you’re Google Gareth Rushgrove John Vincent, Ops Hero ” “
  70. 70. The reality outside Google Gareth Rushgrove
  71. 71. <1% of US workers are software engineers or programmers Gareth Rushgrove US Bureau of Labor Statistics 2002. 1,069,000 jobs in working age population of 185million
  72. 72. Strategic vendor relationships Gareth Rushgrove
  73. 73. Different application constrains as well as different organisational constrains Gareth Rushgrove
  74. 74. Goal of SRE team isn’t zero outages – SRE and product devs are incentive aligned to spend the error budget to get maximum feature velocity Gareth Rushgrove Dan Luu, ex Google ” “
  75. 75. What if you’re operating an air traffic control system or a nuclear power station? Your goal is probably closer to zero outages Gareth Rushgrove
  76. 76. Gareth Rushgrove John Vincent SRE review
  77. 77. bringing a software engineering perspective to a problem isn’t always the best or right solution Gareth Rushgrove ” “ John Vincent, Ops Hero
  78. 78. Many of Google’s conclusions to operations problems are not unique Gareth Rushgrove
  79. 79. Gareth Rushgrove
  80. 80. Gareth Rushgrove
  81. 81. Innovation happens elsewhere applies as much to Google as to other organisations Gareth Rushgrove
  82. 82. (without introducing more risk) Summing up Against
  83. 83. If a human operator needs to touch your system during normal operations, you have a bug. The definition of normal changes as your systems grow Gareth Rushgrove Carla Geisser, Google SRE ” “
  84. 84. What is normal for Google may not be suitable for your organisation Gareth Rushgrove
  85. 85. Your startup with a single-purpose application does not have the luxury of having your operations team say I’m sorry you’re over your error budget Gareth Rushgrove John Vincent, Ops Hero ” “
  86. 86. Gareth Rushgrove
  87. 87. (without introducing more risk) Conclusions If all you take away is…
  88. 88. Who votes… Gareth Rushgrove For
  89. 89. Who votes… Gareth Rushgrove Against
  90. 90. Who thinks it’s the wrong question? Gareth Rushgrove
  91. 91. Context is king Gareth Rushgrove
  92. 92. Gareth Rushgrove
  93. 93. The Overwhelming power of context Gareth Rushgrove Charity Majors, Ops Person Extraordinaire” “
  94. 94. The technology we run, and how we run it, are interlinked Gareth Rushgrove
  95. 95. (without introducing more risk) The field of Sociotechnical Systems suggests that all human systems include both a technical system and a social system Gareth Rushgrove
  96. 96. (without introducing more risk) Better outcomes are usually obtained by a reciprocal process of joint optimization, through which both the technical system and the social system change Gareth Rushgrove
  97. 97. Containers will not fix your broken culture Gareth Rushgrove Bridget Kromhout, Worlds nicest Ops Person” “
  98. 98. Awesome culture will not fix your broken containers Gareth Rushgrove Me, paraphrasing Bridget ” “
  99. 99. We are all collectively evolving the practice of operations Gareth Rushgrove
  100. 100. Keep sharing, because it’s a pretty amazing ride Gareth Rushgrove
  101. 101. (without introducing more risk) Questions And thanks for listening