Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Managing A Multi-Tenant Data Lake
2Copyright 2016 Comcast Corporation. All Rights Reserved
Agenda
 Timeline for Evolution
 Why Governance
 Multi-Tenancy ...
3Copyright 2016 Comcast Corporation. All Rights Reserved
Timeline – 2013
 2013 – “The Experiment”
 Started with 10 node ...
4Copyright 2016 Comcast Corporation. All Rights Reserved
Timeline – 2014 (H1)
 2014 Production “Honeymoon”
 Added 70 mor...
5Copyright 2016 Comcast Corporation. All Rights Reserved
Timeline – 2014 (H2)
 2014 Production “Tiger’s Tail”
 Total of ...
6Copyright 2016 Comcast Corporation. All Rights Reserved
Timeline – 2015
 2015 Production “Cortez”
 Adding 250 more node...
7Copyright 2016 Comcast Corporation. All Rights Reserved
Timeline – 2016
 2016 Production “Planetary”
 Adding 1300 more ...
8Copyright 2016 Comcast Corporation. All Rights Reserved
Why Governance?
 It’s about establishing acceptable behaviors fo...
9Copyright 2016 Comcast Corporation. All Rights Reserved
Multi-Tenancy Anti-Patterns
 Speculative Execution
 Optional Us...
10Copyright 2016 Comcast Corporation. All Rights Reserved
Signs of Looming Disaster
 Pending Jobs
 Queue Fidgeting
 Job...
11Copyright 2016 Comcast Corporation. All Rights Reserved
Instituting Governance
 Governance is not a technology problem
...
12Copyright 2016 Comcast Corporation. All Rights Reserved
Setting Out Governance Standards – Starting Out
 Involve the bu...
13Copyright 2016 Comcast Corporation. All Rights Reserved
Setting Out Governance Standards – Measurement
 Define universa...
14Copyright 2016 Comcast Corporation. All Rights Reserved
Setting Out Governance Standards – Enforcement
 Lock down as ma...
15Copyright 2016 Comcast Corporation. All Rights Reserved
Setting Out Governance Standards – Enforcement
 Hadoop provides...
16Copyright 2016 Comcast Corporation. All Rights Reserved
Multi-Tenancy: Understanding the Chaos - Monitoring/Metrics
Imag...
17Copyright 2016 Comcast Corporation. All Rights Reserved
Use Case – Extreme Ad Hoc (Data Science)
18Copyright 2016 Comcast Corporation. All Rights Reserved
Use Case – Extreme Ad Hoc (Data Science)
19Copyright 2016 Comcast Corporation. All Rights Reserved
Challenges? You bet!
20Copyright 2016 Comcast Corporation. All Rights Reserved
Challenges Monitoring and Managing a Multi-tenant
Hadoop Environ...
21Copyright 2016 Comcast Corporation. All Rights Reserved
Challenges Monitoring and Managing a Multi-tenant
Hadoop Environ...
22Copyright 2016 Comcast Corporation. All Rights Reserved
Challenges Monitoring and Managing a Multi-tenant
Hadoop Environ...
23Copyright 2016 Comcast Corporation. All Rights Reserved
Challenges Monitoring and Managing a Multi-tenant
Hadoop Environ...
24Copyright 2016 Comcast Corporation. All Rights Reserved
Environment
25Copyright 2016 Comcast Corporation. All Rights Reserved
Our Environment - Tools for Monitoring
Standard Hadoop Monitoring
26Copyright 2016 Comcast Corporation. All Rights Reserved
Environment - Tools for Monitoring
Command Center
Pepperdata
27Copyright 2016 Comcast Corporation. All Rights Reserved
SLA Management
Application Timing
Images: Creative Commons
28Copyright 2016 Comcast Corporation. All Rights Reserved
SLA Management
Application Timing
Resource Management
Images: Cr...
29Copyright 2016 Comcast Corporation. All Rights Reserved
SLA Management
Application Timing
Resource Management
Capacity M...
30Copyright 2016 Comcast Corporation. All Rights Reserved
Support & Staffing
Images: Creative Commons
31Copyright 2016 Comcast Corporation. All Rights Reserved
Takeaways for DevOps Model in Hadoop
Train Your Teams (!!!)
32Copyright 2016 Comcast Corporation. All Rights Reserved
Takeaways for DevOps Model in Hadoop
Train Your Teams (!!!)
Meas...
33Copyright 2016 Comcast Corporation. All Rights Reserved
Takeaways for DevOps Model in Hadoop
Train Your Teams (!!!)
Meas...
34Copyright 2016 Comcast Corporation. All Rights Reserved
Comcast Command Center
35Copyright 2016 Comcast Corporation. All Rights Reserved
The Command Center: Our Focus
Visualizations & Design
36Copyright 2016 Comcast Corporation. All Rights Reserved
Ease Of Use
Visualizations & Design
The Command Center: Our Focus
37Copyright 2016 Comcast Corporation. All Rights Reserved
Visualizations & Design
Ease Of Use
Extensibility
The Command Ce...
38Copyright 2016 Comcast Corporation. All Rights Reserved
Visualizations & Design
Ease Of Use
Extensibility
Alerting
The C...
39Copyright 2016 Comcast Corporation. All Rights Reserved
The Command Center for Monitoring and Alerting
• Missed SLAs
• G...
40Copyright 2016 Comcast Corporation. All Rights Reserved
Monitoring and Alerting at Comcast
The Command Center!
41Copyright 2016 Comcast Corporation. All Rights Reserved
Thanks!
Ray Harrison
Principle DevOps Architect
Mike Fagan
Princ...
Upcoming SlideShare
Loading in …5
×

Managing a Multi-Tenant Data Lake

3,394 views

Published on

Managing a Multi-Tenant Data Lake

Published in: Technology
  • Be the first to comment

Managing a Multi-Tenant Data Lake

  1. 1. Managing A Multi-Tenant Data Lake
  2. 2. 2Copyright 2016 Comcast Corporation. All Rights Reserved Agenda  Timeline for Evolution  Why Governance  Multi-Tenancy Anti-Patterns / Warning Signs  Instituting Governance  Managing through Chaos Monitoring/Metrics  Environment Tools  SLA Management  Support and Staffing  Demo - Command Center
  3. 3. 3Copyright 2016 Comcast Corporation. All Rights Reserved Timeline – 2013  2013 – “The Experiment”  Started with 10 node cluster  Experimentation with batch processing and enrichment of event data  Team assembled from across organization  Primarily solving single use case  30 nodes by end of trial 2 Racks
  4. 4. 4Copyright 2016 Comcast Corporation. All Rights Reserved Timeline – 2014 (H1)  2014 Production “Honeymoon”  Added 70 more nodes along with lower environments (Dev & QA)  Onboard additional ~20 data sets through batch ETL  Supporting a dozen use cases 5 Racks
  5. 5. 5Copyright 2016 Comcast Corporation. All Rights Reserved Timeline – 2014 (H2)  2014 Production “Tiger’s Tail”  Total of 200 nodes to support additional use cases (data science)  Total of ~30 more data sets through batch ETL  Supporting several dozen use cases and ad-hoc exploration  Starting to have difficulty managing resource requests 9 Racks
  6. 6. 6Copyright 2016 Comcast Corporation. All Rights Reserved Timeline – 2015  2015 Production “Cortez”  Adding 250 more nodes to production environment  Fully embraced governance  Supporting 24x7 production use cases 19 Racks
  7. 7. 7Copyright 2016 Comcast Corporation. All Rights Reserved Timeline – 2016  2016 Production “Planetary”  Adding 1300 more nodes to production environment  Standing up separate 500 node data science cluster  Spinning off critical compute to boundary satellite clusters  Reaping benefits from governance and resource planning 48 Racks
  8. 8. 8Copyright 2016 Comcast Corporation. All Rights Reserved Why Governance?  It’s about establishing acceptable behaviors for the benefit of the community  Minimize user/application impact on cluster  Users will do whatever is technically possible  Everyone has been conditioned to work “smarter not harder”  Establishing the guardrails not edicts.
  9. 9. 9Copyright 2016 Comcast Corporation. All Rights Reserved Multi-Tenancy Anti-Patterns  Speculative Execution  Optional User Training  Lack of Resource Isolation  Lack of Testing and Measurement  Ad-hoc Communication Channels  Excessive Resource Utilization/Reservation  Informal Service Level Agreements (SLAs) Public Domain: Plynn9
  10. 10. 10Copyright 2016 Comcast Corporation. All Rights Reserved Signs of Looming Disaster  Pending Jobs  Queue Fidgeting  Job Rescheduling  Non Predictive Workloads  Cluster Storage Out Of Balance Public Domain: US DOE
  11. 11. 11Copyright 2016 Comcast Corporation. All Rights Reserved Instituting Governance  Governance is not a technology problem  Governance must be solved using  People - Who  Processes – What / When / How  Policy – Why  Always employ technology to help with enforcement and measurement
  12. 12. 12Copyright 2016 Comcast Corporation. All Rights Reserved Setting Out Governance Standards – Starting Out  Involve the business users to define light-weight policies and processes  Onboarding users/applications/tools  Resource Utilization Worksheets  Deployment checklists  Service Level Agreements / Penalties  Updates of Governance Standards  You MUST socialize and educate your community on these policies and process  Strive for evolution not revolution
  13. 13. 13Copyright 2016 Comcast Corporation. All Rights Reserved Setting Out Governance Standards – Measurement  Define universally accepted performance measures  Storage  Compute  System Availability  Issues and MTTR  Average Completion Time  Average Pending Apps  Be transparent with results and make them available to entire community  Establish monthly performance reviews with key stakeholders
  14. 14. 14Copyright 2016 Comcast Corporation. All Rights Reserved Setting Out Governance Standards – Enforcement  Lock down as many resources as possible  Monitor resource utilization for compliance  Automate corrective measures  Its all about transitioning from defense to offense and eliminating surprises!
  15. 15. 15Copyright 2016 Comcast Corporation. All Rights Reserved Setting Out Governance Standards – Enforcement  Hadoop provides some base capabilities  YARN Queues for compute  HDFS Quotas/ACLs for storage  Implement custom solutions for proactive offensive capabilities  Job monitoring and migration (Penalty Box)  Dynamic Allocation / Queue Flexing  Monitor and track leading indicators (Command Center)
  16. 16. 16Copyright 2016 Comcast Corporation. All Rights Reserved Multi-Tenancy: Understanding the Chaos - Monitoring/Metrics Image Attribution: Pixabay - Creative Commons CC0
  17. 17. 17Copyright 2016 Comcast Corporation. All Rights Reserved Use Case – Extreme Ad Hoc (Data Science)
  18. 18. 18Copyright 2016 Comcast Corporation. All Rights Reserved Use Case – Extreme Ad Hoc (Data Science)
  19. 19. 19Copyright 2016 Comcast Corporation. All Rights Reserved Challenges? You bet!
  20. 20. 20Copyright 2016 Comcast Corporation. All Rights Reserved Challenges Monitoring and Managing a Multi-tenant Hadoop Environment – Diverse User Community DiverseUserCommunity Images: Creative Commons
  21. 21. 21Copyright 2016 Comcast Corporation. All Rights Reserved Challenges Monitoring and Managing a Multi-tenant Hadoop Environment - SLAs DiverseSLAs
  22. 22. 22Copyright 2016 Comcast Corporation. All Rights Reserved Challenges Monitoring and Managing a Multi-tenant Hadoop Environment - Governance Images: Creative Commons
  23. 23. 23Copyright 2016 Comcast Corporation. All Rights Reserved Challenges Monitoring and Managing a Multi-tenant Hadoop Environment – Monitoring & Forecasting Images: Creative Commons
  24. 24. 24Copyright 2016 Comcast Corporation. All Rights Reserved Environment
  25. 25. 25Copyright 2016 Comcast Corporation. All Rights Reserved Our Environment - Tools for Monitoring Standard Hadoop Monitoring
  26. 26. 26Copyright 2016 Comcast Corporation. All Rights Reserved Environment - Tools for Monitoring Command Center Pepperdata
  27. 27. 27Copyright 2016 Comcast Corporation. All Rights Reserved SLA Management Application Timing Images: Creative Commons
  28. 28. 28Copyright 2016 Comcast Corporation. All Rights Reserved SLA Management Application Timing Resource Management Images: Creative Commons
  29. 29. 29Copyright 2016 Comcast Corporation. All Rights Reserved SLA Management Application Timing Resource Management Capacity Management Images: Creative Commons
  30. 30. 30Copyright 2016 Comcast Corporation. All Rights Reserved Support & Staffing Images: Creative Commons
  31. 31. 31Copyright 2016 Comcast Corporation. All Rights Reserved Takeaways for DevOps Model in Hadoop Train Your Teams (!!!)
  32. 32. 32Copyright 2016 Comcast Corporation. All Rights Reserved Takeaways for DevOps Model in Hadoop Train Your Teams (!!!) Measure, Forecast and Model
  33. 33. 33Copyright 2016 Comcast Corporation. All Rights Reserved Takeaways for DevOps Model in Hadoop Train Your Teams (!!!) Measure, Forecast and Model Automation and Frameworks
  34. 34. 34Copyright 2016 Comcast Corporation. All Rights Reserved Comcast Command Center
  35. 35. 35Copyright 2016 Comcast Corporation. All Rights Reserved The Command Center: Our Focus Visualizations & Design
  36. 36. 36Copyright 2016 Comcast Corporation. All Rights Reserved Ease Of Use Visualizations & Design The Command Center: Our Focus
  37. 37. 37Copyright 2016 Comcast Corporation. All Rights Reserved Visualizations & Design Ease Of Use Extensibility The Command Center: Our Focus
  38. 38. 38Copyright 2016 Comcast Corporation. All Rights Reserved Visualizations & Design Ease Of Use Extensibility Alerting The Command Center: Our Focus
  39. 39. 39Copyright 2016 Comcast Corporation. All Rights Reserved The Command Center for Monitoring and Alerting • Missed SLAs • Guardrails broken • Definitions • Links • Containers • Queue capacity • Status • Measures • HDFS Usage • Queue Usage Continuous Evolution Continuous Engagement
  40. 40. 40Copyright 2016 Comcast Corporation. All Rights Reserved Monitoring and Alerting at Comcast The Command Center!
  41. 41. 41Copyright 2016 Comcast Corporation. All Rights Reserved Thanks! Ray Harrison Principle DevOps Architect Mike Fagan Principle Big Data Architect Ray_Harrison@cable.comcast.com Michael_Fagan@cable.comcast.com We Are Hiring!

×