Using Hadoop stack to build a cloud VAT
declarations revising service
Alex Chistyakov
Git in Sky
Grodno, LVEE 2016
Who I am
●
Hello, my name is Alex
●
Principal Engineer @ Git in Sky
●
Hadoop operations engineer
●
Former Java developer (not only Java and not so
“former” in fact)
Who are you?
●
Linux and OSS enthusiasts?
●
Software developers?
●
DevOps engineers?
●
Big data guys?
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
●
2) Configure the bloody cluster!
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
●
2) Configure the bloody cluster!
●
3) ???
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
●
2) Configure the bloody cluster!
●
3) ???
●
4) PROFIT!!!
Big Data is hard!
●
A customer wants a number of environments for
different purposes (dev, testing, staging &
production)
●
DevOps culture requires repeatability!
●
(Observe a beautiful snowflake to the right)
●
Business wants to reduce costs
So, we need a detailed plan
●
1) Buy an enterprise subscription from Oracle
So, we need a detailed plan
●
1) Buy an enterprise subscription from Oracle
●
^ FAIL!
So, we need a detailed plan
●
1) Read the manual on the product site
So, we need a detailed plan
●
1) Read the manual on the product site
●
2) Configure everything manually
So, we need a detailed plan
●
1) Read the manual on the product site
●
2) Configure everything manually
●
^ FAIL!
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
●
2) Configure everything from a web interface
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
●
2) Configure everything from a web interface
●
3) Don’t forget to buy an enterprise subscription
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
●
2) Configure everything from a web interface
●
3) Don’t forget to buy an enterprise subscription
●
4) ^ MULTIPLE FAILS!!!
A word on proprietary software
●
Proprietary software is full of nasty bugs, period
A word on open source software
●
Open source software is awesome
Software market in 2016
●
It’s not “proprietary vs open source”
Software market in 2016
●
It’s not “proprietary vs open source”
●
It’s “open source vs open source”
Open source vs open source
●
Cloudera CDH vs vanilla Apache
So, we need a detailed plan
●
1) Hire a DevOps engineer
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
●
3) Automate all the things
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
●
3) Automate all the things
●
4) ???
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
●
3) Automate all the things
●
4) ???
●
5) PROFIT!!!
100 reasons not to use Cloudera CDH
●
Cloudera CDH obscures configuration
●
Cloudera CDH generates textual configs from the DB
●
Cloudera CDH is web-interface centric
●
Cloudera CDH is a monolith with a vendor lock-in
Our own little open source product
●
Based on Ansible (Ansible is like Chef but awesome)
●
https://github.com/gitinsky/ansible-hadoop-stack-howto
●
https://github.com/gitinsky/ansible-role-*
Problems
●
Lack of documentation
Problems
●
Lack of documentation
●
Lack of manpower
Problems
●
Lack of documentation
●
Lack of manpower
●
Nobody uses our product (except us)
What about the VAT service thing?
●
Forget it, it’s not that relevant
Conclusions
●
Open source software is awesome
●
But Cloudera CDH is not
●
We can make open source software better
So long, and thanks for all the fish!
●
Ask your questions please
●
Alex Chistyakov, Principal Engineer @ Git in Sky
●
http://gitinsky.com
●
alex@gitinsky.com
●
http://meetup.com/DevOps-40

My talk at LVEE 2016

  • 1.
    Using Hadoop stackto build a cloud VAT declarations revising service Alex Chistyakov Git in Sky Grodno, LVEE 2016
  • 2.
    Who I am ● Hello,my name is Alex ● Principal Engineer @ Git in Sky ● Hadoop operations engineer ● Former Java developer (not only Java and not so “former” in fact)
  • 3.
    Who are you? ● Linuxand OSS enthusiasts? ● Software developers? ● DevOps engineers? ● Big data guys?
  • 4.
    Well, what isthis all about? ● Configuring a Hadoop/HBase cluster is easy
  • 5.
    Well, what isthis all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware
  • 6.
    Well, what isthis all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster!
  • 7.
    Well, what isthis all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster! ● 3) ???
  • 8.
    Well, what isthis all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster! ● 3) ??? ● 4) PROFIT!!!
  • 9.
    Big Data ishard! ● A customer wants a number of environments for different purposes (dev, testing, staging & production) ● DevOps culture requires repeatability! ● (Observe a beautiful snowflake to the right) ● Business wants to reduce costs
  • 10.
    So, we needa detailed plan ● 1) Buy an enterprise subscription from Oracle
  • 11.
    So, we needa detailed plan ● 1) Buy an enterprise subscription from Oracle ● ^ FAIL!
  • 12.
    So, we needa detailed plan ● 1) Read the manual on the product site
  • 13.
    So, we needa detailed plan ● 1) Read the manual on the product site ● 2) Configure everything manually
  • 14.
    So, we needa detailed plan ● 1) Read the manual on the product site ● 2) Configure everything manually ● ^ FAIL!
  • 15.
    So, we needa detailed plan ● 1) Take Cloudera distribution of Hadoop
  • 16.
    So, we needa detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface
  • 17.
    So, we needa detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface ● 3) Don’t forget to buy an enterprise subscription
  • 18.
    So, we needa detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface ● 3) Don’t forget to buy an enterprise subscription ● 4) ^ MULTIPLE FAILS!!!
  • 19.
    A word onproprietary software ● Proprietary software is full of nasty bugs, period
  • 20.
    A word onopen source software ● Open source software is awesome
  • 21.
    Software market in2016 ● It’s not “proprietary vs open source”
  • 22.
    Software market in2016 ● It’s not “proprietary vs open source” ● It’s “open source vs open source”
  • 23.
    Open source vsopen source ● Cloudera CDH vs vanilla Apache
  • 24.
    So, we needa detailed plan ● 1) Hire a DevOps engineer
  • 25.
    So, we needa detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something
  • 26.
    So, we needa detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things
  • 27.
    So, we needa detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things ● 4) ???
  • 28.
    So, we needa detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things ● 4) ??? ● 5) PROFIT!!!
  • 29.
    100 reasons notto use Cloudera CDH ● Cloudera CDH obscures configuration ● Cloudera CDH generates textual configs from the DB ● Cloudera CDH is web-interface centric ● Cloudera CDH is a monolith with a vendor lock-in
  • 30.
    Our own littleopen source product ● Based on Ansible (Ansible is like Chef but awesome) ● https://github.com/gitinsky/ansible-hadoop-stack-howto ● https://github.com/gitinsky/ansible-role-*
  • 31.
  • 32.
  • 33.
    Problems ● Lack of documentation ● Lackof manpower ● Nobody uses our product (except us)
  • 34.
    What about theVAT service thing? ● Forget it, it’s not that relevant
  • 35.
    Conclusions ● Open source softwareis awesome ● But Cloudera CDH is not ● We can make open source software better
  • 36.
    So long, andthanks for all the fish! ● Ask your questions please ● Alex Chistyakov, Principal Engineer @ Git in Sky ● http://gitinsky.com ● alex@gitinsky.com ● http://meetup.com/DevOps-40