SlideShare a Scribd company logo
1 of 46
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Meeting Performance Goals in
Multi-tenant Hadoop Clusters
Shivnath Babu and Brian Majeska
Who we are
• Manages Operations for YP’s Hadoop and
HBase clusters
• Worked with Hadoop ecosystem for 5 years
• System Admin for 13 years
• Previously at eHarmony, CalTech, EarthLink
Brian Majeska
Director Engineering
Operations
Platform Data Services
YP.com
Glendale, CA 91203
Shivnath Babu
Associate Professor,
Duke University
Co-founder/CTO,
Unravel Data Systems
Menlo Park, CA 94025
• R&D on Hadoop, Spark, NoSQL, streaming,
& MPP to simplify ongoing app/system
management
• Led work on first self-tuning Hadoop platform
• Awards from NSF, IBM, HP
• PhD, Stanford University
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Lifecycle of a Multi-tenant Hadoop Cluster
Growth, Diversity, Challenges
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Growth
• Pilot: Migrated ETL to Hadoop 5 years ago
• Growth in applications
• ETL: from 3 to 100+ unique workflows in 5 years
• Ad-hoc: 7K jobs a day - 24/7
• Growth in users: 100+ active users
• Growth in systems: HDFS, MapReduce, Hive, Oozie, HBase, Spark, Kafka
Hadoop Growth Over Time
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Production Cluster
300 servers
8 cores
8 x 1T data drives (2PB)
18G RAM
1G NIC
Multi-tenant Production Cluster
220 servers
16 cores
12 x 4T data drives (7PB)
256G RAM
10G NIC
5 Years Ago… 1PB data Today… 5PB data
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Daily Processing on Multi-tenant Cluster
• 1.3 Billion events
• 300TB HDFS Reads
• 35TB HDFS Writes
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
With Great Power Comes Great Diversity!
• Diversity in application types
• Diversity in application resource needs
• Diversity in users and their skill-sets
• Diversity in business criticality of workloads
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Lifecycle of a Multi-tenant Hadoop Cluster
Growth & Diversity  Challenges
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Growth + Diversity  Big Challenges
• More problems
• Harder to diagnose
Cascading failures
Application slowdowns
Rogue applications
Missed SLAs
Stuck jobs
Failed queries
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Growth + Diversity  Big Challenges
• More problems
• Harder to diagnose
• Harder to track who is doing what
• Harder to control
CPU/IO/Network usage
Files/Tables/partitions
created
Best practices on
application performance
and cluster usage
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Growth + Diversity  Big Challenges
• More problems
• Harder to diagnose
• Harder to track who is doing what
• Harder to control
• Harder to optimize
• Harder to plan
Server configuration
Scheduler parameters
Justifying resource demands
Forecasting capacity needs
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
For Easier Ongoing Management
of Multi-tenant Hadoop Clusters
Understand, Improve, Control
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Understand
What is Going On
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Real-life Example: Unpredictable Workflow
Performance
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Bad
Good
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Root Cause of this Resource Contention
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
But, This is Just One Type of Contention
• At Resource Manager Level
• App admission time
• Container allocation for App Master
• Container allocation for tasks
• Container allocation for Executor
• At Application Level
• Workflow Scheduler, e.g., Oozie
• Query Engine, e.g., HiveServer2
• At Master Daemon Level
• NameNode
• Hive MetaStore
Bad
Good
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Key Takeaways
Resource contention at different levels affects app performance
• Different apps (Oozie workflows, MapReduce, Spark, Tez) are affected differently
• Manual diagnosis can be hard and time-consuming
Unravel’s approach to diagnose such problems automatically
• Analyzes full-stack monitoring data
• Carefully combines system knowledge with statistical analysis
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
For Easier Ongoing Management
of Multi-tenant Hadoop Clusters
Understand, Improve, Control
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Improve
Performance & Efficiency
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Quick Primer on YARN Resource Manager
Image from: http://doc.mapr.com/display/MapR/YARN
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Two Ways to Improve Performance & Efficiency
1. At the level of individual application’s interaction with the Resource
Manager
2. At the level of the Resource Manager’s Configuration that affects all
applications
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Application’s Interaction with the
Resource Manager
1. Number of containers
• MapReduce, Spark, & Tez use different techniques to determine this number
2. Container size
• CPU
• Memory
Image from: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Application Spawns Too Many Containers
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Poorly-sized Containers
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Massive Inefficiencies Diagnosed and Eliminated with
Intelligent Container Sizing!
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Two Ways to Improve Performance & Efficiency
1. At the level of individual application’s interaction with the Resource
Manager
2. At the level of the Resource Manager’s Configuration that affects all
applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Configuring the Resource Manager (Queues/Pools)
Root
Mem Capacity: 12 GB
CPU Capacity: 24 cores
Marketing
Fair Share Mem: 4 GB
Fair Share CPU: 8 cores
R&D
Fair Share Mem: 4 GB
Fair Share CPU: 8 cores
Sales
Fair Share Mem: 4 GB
Fair Share CPU: 8 cores
Missed SLAs
Poor performance
Failed applications
Jim’s Team
Fair Share Mem: 2 GB
Fair Share CPU: 4 cores
Bob’s Team
Fair Share Mem: 2 GB
Fair Share CPU: 4 cores
Configuring the Resource Manager (Parameters)
Image from: http://www.slideshare.net/SumeetSingh1/hadoop-summit-san-jose-2015-towards-slabased-scheduling-on-yarn-clusters
Performance Goals that Need to be Met
• Deadline: ETL workflow should finish by 6.00 AM
• Latency: Average query latency should be under 3 minutes
• Utilization: Cluster utilization should be above 70%
• Predictability: SLA satisfaction rate should be above 95%
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Very complex to configure manually for
good performance and efficiency!
1. YARN does not understand performance goals
2. Too many low-level parameters to be set
3. Need a deep understanding of application
workload & performance requirements
4. Diverse types of application behaviors
5. Workloads change with time
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
A New Interface to YARN’s Resource Manager
Simple abstraction to specify key performance goals
• Based on past/current performance
• Helps pose operational what-if questions
Powerful functionality
• Learning engine for automated answers to operational what-if questions
• Recommender system to automatically find parameter settings that meet
performance goals
Nonintrusive & No changes needed to YARN
Ask operational what-if questions based on past/current performance:
1. What is the impact of decreasing capacity of ADVERTISING queue by 30%?
2. How to reduce average workflow latency in FINANCIAL queue to 30 minutes?
Resource allocation & app
performance in different queues
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Tempo: Robust and Self-Tuning Resource
Management in Multi-tenant Parallel Databases
To appear in VLDB in Sept 2016
Key Types of
Performance
Goals
Application
Workload
in the
Cluster
Models of
Fair/Capacity
Schedulers
Learning and
Optimization
Algorithms
Automated
Answers to
Operational
Questions
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Key Takeaways
YARN is powerful to meet key multi-tenant resource allocation needs
But, an easy interface to specify & satisfy performance goals is lacking
Our work aims to fill this gap in the ecosystem
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
For Easier Ongoing Management
of Multi-tenant Hadoop Clusters
Understand, Improve, Control
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Control
Multi-tenant Usage by
Enforcing Policies
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
AutoActions for Policy-based Control
Ops specifies a policy: If resources used by ad-hoc apps in FINANCIAL queue
are slowing down the CEO-Report workflow by more than 20%, then move the
ad-hoc apps to the QUARANTINE queue
Unravel continuously monitors for policy violations
If a policy violation is detected, then Unravel acts via YARN REST APIs
• Helps Ops automate operational processes & get peace of mind
• Unravel maintains complete audit trail for post-mortem investigation
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Examples of AutoActions
Beginner
• Enforcing best practices on number of tasks and container sizes
Intermediate
• Detecting Rogue Apps and moving them to a capped queue/pool
• Making workflow execution fault-tolerant under YARN kills & OOMs
Expert
• Guaranteeing SLAs by dynamic adjustment of YARN Resource Manager
parameters
• Enabling workload-aware cluster selection to lower Cloud usage costs
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
Lifecycle of a Multi-tenant Hadoop Cluster
Growth, Diversity, Challenges
Missed SLAs
Poor performance
Failed applications
Underutilized clusters
Low throughput
Unused datasets
Poor data layout
For Easier Ongoing Management
of Multi-tenant Hadoop Clusters
Understand, Improve, Control
Get Unravel Trial Edition:
bit.ly/getunravel
UNCOVER ISSUES
UNLEASH RESOURCES
UNRAVEL PERFORMANCE

More Related Content

What's hot

Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm DataWorks Summit/Hadoop Summit
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchDataWorks Summit/Hadoop Summit
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Uri Laserson
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackDataWorks Summit/Hadoop Summit
 
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUsCreating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUsDatabricks
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Databricks
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an examplehadooparchbook
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAmazon Web Services
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
 
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...Databricks
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 

What's hot (20)

Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba Search
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUsCreating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS Experience
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 

Viewers also liked

Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 
certificate 100 best graduates
certificate 100 best graduatescertificate 100 best graduates
certificate 100 best graduatesToma Gaidyte
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePete Kisich
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Cloudera, Inc.
 
Lego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming PipelinesLego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming PipelinesDataWorks Summit/Hadoop Summit
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopRoman Nikitchenko
 
Hadoop do data warehousing rules apply
Hadoop do data warehousing rules applyHadoop do data warehousing rules apply
Hadoop do data warehousing rules applyDataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeInside Analysis
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 

Viewers also liked (20)

Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Hadoop
HadoopHadoop
Hadoop
 
certificate 100 best graduates
certificate 100 best graduatescertificate 100 best graduates
certificate 100 best graduates
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS Storage
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Lego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming PipelinesLego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming Pipelines
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with Hadoop
 
Hadoop do data warehousing rules apply
Hadoop do data warehousing rules applyHadoop do data warehousing rules apply
Hadoop do data warehousing rules apply
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 

Similar to Meeting Performance Goals in multi-tenant Hadoop Clusters

February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
Blackboard DevCon 2011 - Developing B2 for Performance and Scalability
Blackboard DevCon 2011 - Developing B2 for Performance and ScalabilityBlackboard DevCon 2011 - Developing B2 for Performance and Scalability
Blackboard DevCon 2011 - Developing B2 for Performance and ScalabilityNoriaki Tatsumi
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsIke Ellis
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Vladi Vexler
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesScyllaDB
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfMAshok10
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsMahesh Nair
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsArpan Bhandari
 
Geek Sync | Is Your Database Environment Ready for DevOps?
Geek Sync | Is Your Database Environment Ready for DevOps?Geek Sync | Is Your Database Environment Ready for DevOps?
Geek Sync | Is Your Database Environment Ready for DevOps?IDERA Software
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data labChris Kernaghan
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
 
Better Visibility into Spark Execution for Faster Application Development-(S...
 Better Visibility into Spark Execution for Faster Application Development-(S... Better Visibility into Spark Execution for Faster Application Development-(S...
Better Visibility into Spark Execution for Faster Application Development-(S...Spark Summit
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh
 
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWSAmazon Web Services
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
LanceShivnathHadoopSummit2015
LanceShivnathHadoopSummit2015LanceShivnathHadoopSummit2015
LanceShivnathHadoopSummit2015Lance Co Ting Keh
 

Similar to Meeting Performance Goals in multi-tenant Hadoop Clusters (20)

February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
Blackboard DevCon 2011 - Developing B2 for Performance and Scalability
Blackboard DevCon 2011 - Developing B2 for Performance and ScalabilityBlackboard DevCon 2011 - Developing B2 for Performance and Scalability
Blackboard DevCon 2011 - Developing B2 for Performance and Scalability
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for Databases
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
Geek Sync | Is Your Database Environment Ready for DevOps?
Geek Sync | Is Your Database Environment Ready for DevOps?Geek Sync | Is Your Database Environment Ready for DevOps?
Geek Sync | Is Your Database Environment Ready for DevOps?
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
Better Visibility into Spark Execution for Faster Application Development-(S...
 Better Visibility into Spark Execution for Faster Application Development-(S... Better Visibility into Spark Execution for Faster Application Development-(S...
Better Visibility into Spark Execution for Faster Application Development-(S...
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
 
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
LanceShivnathHadoopSummit2015
LanceShivnathHadoopSummit2015LanceShivnathHadoopSummit2015
LanceShivnathHadoopSummit2015
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Meeting Performance Goals in multi-tenant Hadoop Clusters

  • 1. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Meeting Performance Goals in Multi-tenant Hadoop Clusters Shivnath Babu and Brian Majeska
  • 2. Who we are • Manages Operations for YP’s Hadoop and HBase clusters • Worked with Hadoop ecosystem for 5 years • System Admin for 13 years • Previously at eHarmony, CalTech, EarthLink Brian Majeska Director Engineering Operations Platform Data Services YP.com Glendale, CA 91203 Shivnath Babu Associate Professor, Duke University Co-founder/CTO, Unravel Data Systems Menlo Park, CA 94025 • R&D on Hadoop, Spark, NoSQL, streaming, & MPP to simplify ongoing app/system management • Led work on first self-tuning Hadoop platform • Awards from NSF, IBM, HP • PhD, Stanford University
  • 3. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Lifecycle of a Multi-tenant Hadoop Cluster Growth, Diversity, Challenges
  • 4. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Growth • Pilot: Migrated ETL to Hadoop 5 years ago • Growth in applications • ETL: from 3 to 100+ unique workflows in 5 years • Ad-hoc: 7K jobs a day - 24/7 • Growth in users: 100+ active users • Growth in systems: HDFS, MapReduce, Hive, Oozie, HBase, Spark, Kafka
  • 5. Hadoop Growth Over Time Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Production Cluster 300 servers 8 cores 8 x 1T data drives (2PB) 18G RAM 1G NIC Multi-tenant Production Cluster 220 servers 16 cores 12 x 4T data drives (7PB) 256G RAM 10G NIC 5 Years Ago… 1PB data Today… 5PB data
  • 6. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Daily Processing on Multi-tenant Cluster • 1.3 Billion events • 300TB HDFS Reads • 35TB HDFS Writes
  • 7. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout With Great Power Comes Great Diversity! • Diversity in application types • Diversity in application resource needs • Diversity in users and their skill-sets • Diversity in business criticality of workloads
  • 8. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Lifecycle of a Multi-tenant Hadoop Cluster Growth & Diversity  Challenges
  • 9. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Growth + Diversity  Big Challenges • More problems • Harder to diagnose Cascading failures Application slowdowns Rogue applications Missed SLAs Stuck jobs Failed queries
  • 10. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Growth + Diversity  Big Challenges • More problems • Harder to diagnose • Harder to track who is doing what • Harder to control CPU/IO/Network usage Files/Tables/partitions created Best practices on application performance and cluster usage
  • 11. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Growth + Diversity  Big Challenges • More problems • Harder to diagnose • Harder to track who is doing what • Harder to control • Harder to optimize • Harder to plan Server configuration Scheduler parameters Justifying resource demands Forecasting capacity needs
  • 12. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout For Easier Ongoing Management of Multi-tenant Hadoop Clusters Understand, Improve, Control
  • 13. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Understand What is Going On
  • 14. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Real-life Example: Unpredictable Workflow Performance
  • 15. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout
  • 16. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout
  • 18.
  • 19.
  • 20. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Root Cause of this Resource Contention
  • 21. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout But, This is Just One Type of Contention • At Resource Manager Level • App admission time • Container allocation for App Master • Container allocation for tasks • Container allocation for Executor • At Application Level • Workflow Scheduler, e.g., Oozie • Query Engine, e.g., HiveServer2 • At Master Daemon Level • NameNode • Hive MetaStore Bad Good
  • 22. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Key Takeaways Resource contention at different levels affects app performance • Different apps (Oozie workflows, MapReduce, Spark, Tez) are affected differently • Manual diagnosis can be hard and time-consuming Unravel’s approach to diagnose such problems automatically • Analyzes full-stack monitoring data • Carefully combines system knowledge with statistical analysis
  • 23. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout For Easier Ongoing Management of Multi-tenant Hadoop Clusters Understand, Improve, Control
  • 24. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Improve Performance & Efficiency
  • 25. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Quick Primer on YARN Resource Manager Image from: http://doc.mapr.com/display/MapR/YARN
  • 26. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Two Ways to Improve Performance & Efficiency 1. At the level of individual application’s interaction with the Resource Manager 2. At the level of the Resource Manager’s Configuration that affects all applications
  • 27. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Application’s Interaction with the Resource Manager 1. Number of containers • MapReduce, Spark, & Tez use different techniques to determine this number 2. Container size • CPU • Memory Image from: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
  • 28. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Application Spawns Too Many Containers
  • 29. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Poorly-sized Containers
  • 30. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Massive Inefficiencies Diagnosed and Eliminated with Intelligent Container Sizing!
  • 31. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Two Ways to Improve Performance & Efficiency 1. At the level of individual application’s interaction with the Resource Manager 2. At the level of the Resource Manager’s Configuration that affects all applications
  • 32. Underutilized clusters Low throughput Unused datasets Poor data layout Configuring the Resource Manager (Queues/Pools) Root Mem Capacity: 12 GB CPU Capacity: 24 cores Marketing Fair Share Mem: 4 GB Fair Share CPU: 8 cores R&D Fair Share Mem: 4 GB Fair Share CPU: 8 cores Sales Fair Share Mem: 4 GB Fair Share CPU: 8 cores Missed SLAs Poor performance Failed applications Jim’s Team Fair Share Mem: 2 GB Fair Share CPU: 4 cores Bob’s Team Fair Share Mem: 2 GB Fair Share CPU: 4 cores
  • 33. Configuring the Resource Manager (Parameters) Image from: http://www.slideshare.net/SumeetSingh1/hadoop-summit-san-jose-2015-towards-slabased-scheduling-on-yarn-clusters
  • 34. Performance Goals that Need to be Met • Deadline: ETL workflow should finish by 6.00 AM • Latency: Average query latency should be under 3 minutes • Utilization: Cluster utilization should be above 70% • Predictability: SLA satisfaction rate should be above 95%
  • 35. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Very complex to configure manually for good performance and efficiency! 1. YARN does not understand performance goals 2. Too many low-level parameters to be set 3. Need a deep understanding of application workload & performance requirements 4. Diverse types of application behaviors 5. Workloads change with time
  • 36. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout A New Interface to YARN’s Resource Manager Simple abstraction to specify key performance goals • Based on past/current performance • Helps pose operational what-if questions Powerful functionality • Learning engine for automated answers to operational what-if questions • Recommender system to automatically find parameter settings that meet performance goals Nonintrusive & No changes needed to YARN
  • 37. Ask operational what-if questions based on past/current performance: 1. What is the impact of decreasing capacity of ADVERTISING queue by 30%? 2. How to reduce average workflow latency in FINANCIAL queue to 30 minutes? Resource allocation & app performance in different queues
  • 38. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Tempo: Robust and Self-Tuning Resource Management in Multi-tenant Parallel Databases To appear in VLDB in Sept 2016
  • 39. Key Types of Performance Goals Application Workload in the Cluster Models of Fair/Capacity Schedulers Learning and Optimization Algorithms Automated Answers to Operational Questions
  • 40. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Key Takeaways YARN is powerful to meet key multi-tenant resource allocation needs But, an easy interface to specify & satisfy performance goals is lacking Our work aims to fill this gap in the ecosystem
  • 41. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout For Easier Ongoing Management of Multi-tenant Hadoop Clusters Understand, Improve, Control
  • 42. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Control Multi-tenant Usage by Enforcing Policies
  • 43. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout AutoActions for Policy-based Control Ops specifies a policy: If resources used by ad-hoc apps in FINANCIAL queue are slowing down the CEO-Report workflow by more than 20%, then move the ad-hoc apps to the QUARANTINE queue Unravel continuously monitors for policy violations If a policy violation is detected, then Unravel acts via YARN REST APIs • Helps Ops automate operational processes & get peace of mind • Unravel maintains complete audit trail for post-mortem investigation
  • 44. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Examples of AutoActions Beginner • Enforcing best practices on number of tasks and container sizes Intermediate • Detecting Rogue Apps and moving them to a capped queue/pool • Making workflow execution fault-tolerant under YARN kills & OOMs Expert • Guaranteeing SLAs by dynamic adjustment of YARN Resource Manager parameters • Enabling workload-aware cluster selection to lower Cloud usage costs
  • 45. Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout Lifecycle of a Multi-tenant Hadoop Cluster Growth, Diversity, Challenges Missed SLAs Poor performance Failed applications Underutilized clusters Low throughput Unused datasets Poor data layout For Easier Ongoing Management of Multi-tenant Hadoop Clusters Understand, Improve, Control
  • 46. Get Unravel Trial Edition: bit.ly/getunravel UNCOVER ISSUES UNLEASH RESOURCES UNRAVEL PERFORMANCE

Editor's Notes

  1. Shivnath starts
  2. - MapReduce, Hive, and now Spark and Kafka Cores, Memory, Storage Engineers, Analysts, and First time users Daily, Weekly, Monthly, Quarterly report jobs
  3. Rouge: DNS Dos Cascading: OOM before yarn SLA: Monthly and Quarterly jobs read more data and use more resources but only during certain times of the year. Stuck: heartbeats are working, but the process isn’t Failed: bad drives
  4. Resource Usage by user Hive: 3000 tables and data ownership NN small files/big files
  5. Tuning the cluster Tuning the scheduler Resource usage per user and department Budget for next year
  6. These are the problems we face. Our goal is to find these problems faster then we have in the past. So we have partnered with Unravel. Let me hand this off to Shivnath and he can explain more.