Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On Demand HDP Clusters using Cloudbreak and Ambari

1,832 views

Published on

On Demand HDP Clusters using Cloudbreak and Ambari

Published in: Technology
  • Be the first to comment

On Demand HDP Clusters using Cloudbreak and Ambari

  1. 1. On-Demand HDP Clusters using Cloudbreak and Ambari Karthik Karuppaiya Narendra Bidari Sr. Engineering Manager, CPE Sr. So0ware Engineer, CPE Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  2. 2. Agenda Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari IntroducIon 1 Big Data PlaJorm Challenges 2 What is the soluIon? 3 Self Service AnalyIcs 4 Going Hybrid Cloud using Cloudbreak 5 IngesIng Data 6
  3. 3. IntroducFon Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari Symantec -  Symantec is the world leader in providing security so0ware for both enterprises and end users -  There are 1000’s of Enterprises and more than 400 million devices (Pcs, Tablets and Phones) that rely on Symantec to help them secure their assets from aXacks, including their data centers, emails and other sensiIve data Cloud PlaHorm Engineering (CPE) -  Build consolidated cloud infrastructure and plaJorm services for next generaIon data powered Symantec applicaIons -  A big data plaJorm for batch and stream analyIcs integrated with Openstack -  Open source components as building blocks -  Hadoop and Openstack -  Bridge feature gaps and contribute back
  4. 4. Agenda Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari IntroducIon 1 Big Data PlaJorm Challenges 2 What is the soluIon? 3 Self Service AnalyIcs 4 Going Hybrid Cloud using Cloudbreak 5 IngesIng Data 6
  5. 5. Big Data PlaHorm Challenge • Hundreds of millions of users generaIng Billions of events every day from across the globe • Hundreds of Big Data ApplicaIon Developers developing 1000s of applicaIons • At 12 PB and 500+ nodes, Cloud PlaJorm Engineering AnalyIcs team built the largest security data lake at Symantec Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  6. 6. Big Data PlaHorm Challenge • Great! Now Developers can start building applicaIons on our Big Data Lake • 100s of developers start building applicaIons using different big data tools Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  7. 7. Big Data PlaHorm Challenge • Product team developers wants quick changes, latest versions • PlaJorm team wants stability! • Soon, frustraIon prevails Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  8. 8. Agenda Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari IntroducIon 1 Big Data PlaJorm Challenges 2 What is the soluIon? 3 Self Service AnalyIcs 4 Going Hybrid Cloud using Cloudbreak 5 IngesIng Data 6
  9. 9. What is the SoluFon? • Build and use your own liXle cluster for development • Copy subset of data for development purposes • Tear down the cluster a0er development is complete • Repeat and Rinse Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  10. 10. What is the SoluFon? • But Building clusters are hard and Ime consuming • Too many services to install and configure • Developers are not interested in building and managing clusters Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  11. 11. What is the SoluFon? – Self Service • What if we make it really easy to build clusters? • Abstract all the deployment complexiIes and enable developers to get their own cluster in one click of a buXon Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  12. 12. What is the SoluFon? – Extend Ambari • What about the services that are not supported by Ambari out of the box? • We write our own Ambari custom stack Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  13. 13. Agenda Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari IntroducIon 1 Big Data PlaJorm Challenges 2 What is the soluIon? 3 Self Service AnalyIcs 4 Going Hybrid Cloud using Cloudbreak 5 IngesIng Data 6
  14. 14. Self Service AnalyFcs (SSA) Clusters • RESTful web services to allow creaIon and management of custom clusters • Select from pre-defined Ambari Blueprints • Spins up VMs on our private Openstack cloud • Installs HDP stack specified as part of Ambari blueprint • Dashing dashboard to monitor and manage (start/stop/kill) clusters Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  15. 15. Environment • Private cloud on Openstack • HDP 2.3.2 • Ambari 2.1.2 • Ability to seamlessly support public cloud like AWS Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  16. 16. Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari SSA Architecture
  17. 17. Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari SSA Services
  18. 18. Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari SSA Demo
  19. 19. SSA Pros and Cons •  Pros – Automates spinning up clusters – Gets cluster up and running with HDP in minutes – Users can spin up/kill clusters at will – Central Dashboard to manage clusters – Customizable to cater to our private cloud •  Cons – Tightly coupled with our specific private cloud infrastructure – Not portable to work with other public cloud vendors Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  20. 20. Agenda Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari IntroducIon 1 Big Data PlaJorm Challenges 2 What is the soluIon? 3 Self Service AnalyIcs 4 Going Hybrid Cloud using Cloudbreak 5 IngesIng Data 6
  21. 21. Next Gen SSA •  This is all great! But, we are out of capacity on our private openstack cloud. •  Just use the same so0ware to set up clusters on AWS and Google Cloud – the same code should work right? Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  22. 22. Next Gen SSA – Cloudbreak •  Cloudbreak – Cloudbreak helps to simplify the provisioning of HDP clusters in cloud environments – Supports mulIple clouds including AWS, Google, Azure and Openstack – Uses Apache Ambari for HDP installaIon and management – Has a nice UI to build and manage clusters – Supports automated cluster scaling Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  23. 23. Hybrid Cloud Using Cloudbreak - Gaps •  Cloudbreak’s out of the box support for openstack is limited –  No support for Keystone v3 – private cloud uses keystone v3 –  No support for native Openstack APIs –  Supports only Heat template for cluster provisioning •  Now we have two different systems – one for private openstack cloud and one for AWS Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  24. 24. Symantec’s ContribuFon to Cloudbreak •  Keystone v3 support –  Cloudbreak 1.2 – released 03/2016 •  NaIve Openstack API support (without using Heat template) – Code development is done, will be contributed back soon. Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  25. 25. Cloudbreak – Keystonre V3 Screenshot Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  26. 26. Cloudbreak – Keystone V3 Project Scope Screenshot Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  27. 27. Agenda Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari IntroducIon 1 Big Data PlaJorm Challenges 2 What is the soluIon? 3 Self Service AnalyIcs 4 Going Hybrid Cloud using Cloudbreak 5 IngesIng Data 6
  28. 28. Where is my data? •  Great! I now have a shiny new cluster – but what do I do with no data? Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  29. 29. Data IngesFon Service •  Built as a Storm-Trident Topology •  Support various sources like Kaja, RabbitMQ, HDFS, Cassandra, etc •  Pluggable interface to add more sources •  Sample the data, for tesIng purposes. •  Generalized template provides ability to add data transformaIon •  Ability to monitor data transfer and control rate of transfer •  Trying to achieve Exactly once message Delivery Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  30. 30. Data IngesFon Service Architecture Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  31. 31. Data IngesFon Service Demo Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  32. 32. Summary and Future Work •  A journey towards one click cluster deployment •  Cloudbreak - one tool for all cloud -  Enable Cloudbreak to support our version of Openstack -  Enable Cloudbreak to support Baremetal cluster provisioning -  Single large YARN cluster for variety of compute and storage loads •  Open source – use and contribute -  Work with community to address gaps •  SSA and Data IngesIon code already opensourced -  hXps://github.com/symantec/ Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari
  33. 33. Thank You! Q & A Karthik Karuppaiya karthik_karuppaiya@symantec.com Narendra Bidari narendra_bidari@symantec.com Dublin Hadoop Summit 2016 – Karthik Karuppaiya & Narendra Bidari

×