Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Provisioning Big Data Platform using Cloudbreak & Ambari

1,466 views

Published on

Provisioning Big Data Platform using Cloudbreak & Ambari

Published in: Technology
  • Be the first to comment

Provisioning Big Data Platform using Cloudbreak & Ambari

  1. 1. Provisioning Big Data Platform using Cloudbreak & Ambari Karthik Karuppaiya Vivek Madani Sr. Engineering Manager, CPE Sr. Principal Software Engineer, CPE San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  2. 2. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  3. 3. Introduction San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Symantec - Symantec is the world leader in providing security software for both enterprises and end users - There are 1000’s of Enterprises and more than 400 million devices (Pcs, Tablets and Phones) that rely on Symantec to help them secure their assets from attacks, including their data centers, emails and other sensitive data Cloud Platform Engineering (CPE) - Build consolidated cloud infrastructure and platform services for next generation data powered Symantec applications - A big data platform for batch and stream analytics integrated with both private and public clouds - Open source components as building blocks - Bridge feature gaps and contribute back
  4. 4. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  5. 5. Big Data Platform Challenge • Hundreds of millions of users generating Billions of events every day from across the globe • Hundreds of Big Data Application Developers developing 1000s of applications • At 12 PB and 500+ nodes, Cloud Platform Engineering Analytics team built the largest security data lake at Symantec • Elasticity is built into the platform to optimize costs in the cloud San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  6. 6. Big Data Platform Challenge • Great! Now Developers can start building applications on our Big Data Lake • 100s of developers start building applications using different big data tools San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  7. 7. Big Data Platform Challenge • Product team developers wants quick changes, latest versions • Platform team wants stability! • Soon, frustration prevails San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  8. 8. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  9. 9. What is the Solution? • Build and use your own little cluster for development • Copy subset of data for development purposes • Build elasticity into the platform for cost optimizations • Tear down the cluster after development is complete • Repeat and Rinse San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  10. 10. What is the Solution? • But Building clusters are hard and time consuming • Too many services to install and configure • Developers are not interested in building and managing clusters San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  11. 11. What is the Solution? – Self Service • What if we make it really easy to build clusters? • Abstract all the deployment complexities and enable developers to get their own cluster in one click of a button • Use the same blueprint for both dev and prod clusters San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  12. 12. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  13. 13. Self Service Analytics (SSA) Clusters • RESTful web services to allow creation and management of custom clusters • Select from pre-defined Ambari Blueprints • Can provision infrastructure on Openstack as well as AWS • Installs HDP stack specified as part of Ambari blueprint • Dashing dashboard to monitor and manage (start/stop/kill) clusters San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  14. 14. Environment • Private cloud on Openstack (Kilo, No Heat) • Public cloud on AWS • HDP 2.3.2 & 2.4.2 • Ambari 2.1.2 & 2.2 San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  15. 15. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani SSA Architecture
  16. 16. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani SSA Services
  17. 17. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani SSA Demo
  18. 18. Ambari Custom Services • What about the services that are not supported by Ambari out of the box? • We write our own Ambari custom stack San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  19. 19. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  20. 20. Next Gen SSA • This is all great! But, lot of work to add more cloud providers. • Takes a lot of effort to understand the cloud provider’s APIs San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  21. 21. Next Gen SSA – Cloudbreak • Cloudbreak –Cloudbreak helps to simplify the provisioning of HDP clusters in cloud environments –Supports multiple clouds including AWS, Google, Azure and Openstack –Uses Apache Ambari for HDP installation and management –Has a nice UI to build and manage clusters –Supports automated cluster scaling San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  22. 22. AWS Cluster Architecture San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Private Subnet  Direct Connect  10 Gbps  Data Ingestion Pipes  Telemetry Ingestion Pipes  Datacenter hosts HDP over bare-metal and Openstack  Uses d3.* and r3.* flavors  Encrypted volumes – LUKS  Non-EBS root volume  Non-Dockerized HDP  Custom AMI  Enhanced networking Symantec Datacenter
  23. 23. Cloudbreak Demo San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  24. 24. Hybrid Cloud Using Cloudbreak – Customization & Contribution • Non-dockerized HDP installation • Support for Keystone v3 for Openstack – Cloudbreak 1.2 – released 03/2016 • Support for Custom AMIs • We have our own hardened images with Enhanced Networking, Volume Encryption, etc • Support for non-EBS backed root volumes • Deploy in existing private VPC/Subnet • Additional AWS instance flavors supported – We use r3.* and d3.* which are not supported by Cloudbreak • We build our own Cloudbreak package from the trunk San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  25. 25. Cloudbreak – Keystone V3 Screenshot San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  26. 26. Cloudbreak – Keystone V3 Project Scope Screenshot San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  27. 27. Custom AMI Support •Org security mandates using specific hardened AMIs only •Created our own hardened image with software and configurations required by Cloudbreak •Allows us to use features like: –Volume encryption, enhanced networking enabled –Non-EBS volumes –Symantec specific configurations like LDAP, repos, DNS etc –Symantec standard for hostnames •Use jdk1.8 instead of java 7 which comes with Cloudbreak AMI San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani /cloud-aws/src/main/resources/aws-images.yml
  28. 28. Non Dockerized HDP Support Why? •No experience running production clusters under docker. •Unknowns with upgrade path for HDP components. •Encrypted Disk Volumes had issues working with docker. What? •Worked with Cloudbreak team to test out non-Dockerized version of Cloudbreak •Provided feedback from our test deployment of the non-Dockerized version •Feature now available in the master branch San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  29. 29. Non-EBS backed root volume •Changes to AWS CloudFormation template used by Cloudbreak •We use ephemeral storage for root volumes for availability reason •Will contribute this back as an option to Cloudbreak San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  30. 30. Cloudbreak Contribution – In Progress •Placement groups •Multiple security groups attached to one cluster •Multiple subnet deployment inside VPC •Support for non-EBS root volumes San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  31. 31. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Monitoring & Alerting6 Going Hybrid Cloud using Cloudbreak5
  32. 32. Monitoring & Alerting Now that we have delivered an elephant, the next question from users is – How is his health? San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  33. 33. Monitoring and Alerting •Comprehensive dashboards for all environments managed by the platform team •Extensively use Ambari Alerts •QueryX: Custom framework to fill the gaps in Ambari Alerts •All alerts are sent to OpenTSDB + Grafana stack •Critical alerts – PagerDuty San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  34. 34. Monitoring and Alerting San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Ambari Metrics Collector + QueryX Cluster 1 Cluster 2 Cluster3 …. OpenTSDB Grafana Call Ambari Metrics API
  35. 35. Grafana Dashboards San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  36. 36. Grafana Dashboards San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  37. 37. Ambari Alerts San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  38. 38. Ambari Alerts San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  39. 39. Summary and Future Work • A journey towards one click cluster deployment • Cloudbreak - one tool for all cloud - Contribute back the features developed in-house - Enable Cloudbreak to support Baremetal cluster provisioning - Auto-scaling using Cloudbreak and Periscope - Single large YARN cluster for variety of compute and storage loads • Open source – use and contribute - Work with community to address gaps • SSA code already opensourced - https://github.com/symantec/ San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  40. 40. Thank You! Q & A Karthik Karuppaiya karthik_karuppaiya@symantec.com Vivek Madani vivek_madani@symantec.com San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani

×