Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

My talk at LVEE 2016

121 views

Published on

My talk on Hadoop ops engineering at LVEE 2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

My talk at LVEE 2016

  1. 1. Using Hadoop stack to build a cloud VAT declarations revising service Alex Chistyakov Git in Sky Grodno, LVEE 2016
  2. 2. Who I am ● Hello, my name is Alex ● Principal Engineer @ Git in Sky ● Hadoop operations engineer ● Former Java developer (not only Java and not so “former” in fact)
  3. 3. Who are you? ● Linux and OSS enthusiasts? ● Software developers? ● DevOps engineers? ● Big data guys?
  4. 4. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy
  5. 5. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware
  6. 6. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster!
  7. 7. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster! ● 3) ???
  8. 8. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster! ● 3) ??? ● 4) PROFIT!!!
  9. 9. Big Data is hard! ● A customer wants a number of environments for different purposes (dev, testing, staging & production) ● DevOps culture requires repeatability! ● (Observe a beautiful snowflake to the right) ● Business wants to reduce costs
  10. 10. So, we need a detailed plan ● 1) Buy an enterprise subscription from Oracle
  11. 11. So, we need a detailed plan ● 1) Buy an enterprise subscription from Oracle ● ^ FAIL!
  12. 12. So, we need a detailed plan ● 1) Read the manual on the product site
  13. 13. So, we need a detailed plan ● 1) Read the manual on the product site ● 2) Configure everything manually
  14. 14. So, we need a detailed plan ● 1) Read the manual on the product site ● 2) Configure everything manually ● ^ FAIL!
  15. 15. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop
  16. 16. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface
  17. 17. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface ● 3) Don’t forget to buy an enterprise subscription
  18. 18. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface ● 3) Don’t forget to buy an enterprise subscription ● 4) ^ MULTIPLE FAILS!!!
  19. 19. A word on proprietary software ● Proprietary software is full of nasty bugs, period
  20. 20. A word on open source software ● Open source software is awesome
  21. 21. Software market in 2016 ● It’s not “proprietary vs open source”
  22. 22. Software market in 2016 ● It’s not “proprietary vs open source” ● It’s “open source vs open source”
  23. 23. Open source vs open source ● Cloudera CDH vs vanilla Apache
  24. 24. So, we need a detailed plan ● 1) Hire a DevOps engineer
  25. 25. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something
  26. 26. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things
  27. 27. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things ● 4) ???
  28. 28. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things ● 4) ??? ● 5) PROFIT!!!
  29. 29. 100 reasons not to use Cloudera CDH ● Cloudera CDH obscures configuration ● Cloudera CDH generates textual configs from the DB ● Cloudera CDH is web-interface centric ● Cloudera CDH is a monolith with a vendor lock-in
  30. 30. Our own little open source product ● Based on Ansible (Ansible is like Chef but awesome) ● https://github.com/gitinsky/ansible-hadoop-stack-howto ● https://github.com/gitinsky/ansible-role-*
  31. 31. Problems ● Lack of documentation
  32. 32. Problems ● Lack of documentation ● Lack of manpower
  33. 33. Problems ● Lack of documentation ● Lack of manpower ● Nobody uses our product (except us)
  34. 34. What about the VAT service thing? ● Forget it, it’s not that relevant
  35. 35. Conclusions ● Open source software is awesome ● But Cloudera CDH is not ● We can make open source software better
  36. 36. So long, and thanks for all the fish! ● Ask your questions please ● Alex Chistyakov, Principal Engineer @ Git in Sky ● http://gitinsky.com ● alex@gitinsky.com ● http://meetup.com/DevOps-40

×