SlideShare a Scribd company logo
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Rui Jian, Hao Lin, Facebook Inc.
rjian@fb.com, hlin@fb.com
Tangram: Distributed Scheduling
Framework for Apache Spark at
Facebook
#UnifiedAnalytics #SparkAISummit
About Us
• Rui Jian
– Software Engineer at Facebook (Data Warehouse & Graph Indexing)
– Master of Computer Science (Shanghai Jiao Tong university)
• Hao Lin
– Research scientist at Facebook (Data Warehouse Batch Scheduling)
– PhD in Parallel Computing (Purdue ECE)
3#UnifiedAnalytics #SparkAISummit
Agenda
• Overview
• Tangram Architecture
• Scheduling Policies & Resource Allocation
• Future work
4#UnifiedAnalytics #SparkAISummit
What is Tangram?
The scheduling platform for
• reliably running various batch workloads
• with efficient heterogenous resource management
• at scale
5#UnifiedAnalytics #SparkAISummit
Tangram Scheduling Targets
• Single jobs: adhoc/periodic
• Batch jobs: adhoc/periodic, malleable
• Gang jobs: adhoc/periodic, rigid
• Long-running jobs: steady and regular; e.g. online training
6#UnifiedAnalytics #SparkAISummit
Why Tangram?
• Various workload characteristics
– ML
– Apache Spark
– Apache Giraph
– Single jobs
• Customized scheduling policies
• Scalability
– Fleet size: hundreds of thousands worker nodes
– Job scheduling throughput: hundreds of millions jobs per day
7#UnifiedAnalytics #SparkAISummit
Overview
• What is Tangram?
8#UnifiedAnalytics #SparkAISummit
Admin
Job Manager
DB
ML
Resource
Manager
Master
Agent AgentAgent
Single
Job
Gang Job
ML Elastic
Scheduler
1
2
3
4
5
6
SQL query
Giraph
Spark
Client Library
9#UnifiedAnalytics #SparkAISummit
• Job management
• Request/Release resources
• Resource grant
• Preemption notification
• Launch containers
• Container status change event
Tangram
client
Resource
Manager
Agent
Application
1
2
3
4
5
6
Agent
• Report schedulable resources and runtime usage
• Health check reports
• Detect labels
• Launch/Kill Containers
• Container recovery
• Resource isolation with cgroup v2
10#UnifiedAnalytics #SparkAISummit
Failure Recovery
• Agent failure
– Scan the recovery directory and recover the running containers
• RM failure
– Both agent and client hold off communication to the RM until the new
master shows up
– Client sync session info to the new master to help it build the states
– Agents add them to the new master
11#UnifiedAnalytics #SparkAISummit
Scheduling Policies
• Hierarchical queue structure
• Jobs to be queued on leaves
• Queue configs:
– min/max resources
– Policy:
• FIFO
• Dominant Resource Fairness (DRF)
• User fairness
• Global
• …
12#UnifiedAnalytics #SparkAISummit
/
ads feed
pipelines interactive
Job
DRF
DRF DRF
User FairnessFIFO
20%80%
50% 50%
user1 user2
50% 50%
FIFO FIFO
Job
Job Job
Scheduling Policies
• Jobs ordered by priority, submission time within queue
• Gang job as first class in scheduling and resource allocation
• Lookahead scheduling for better throughput and utilization
• Job starvation prevention
13#UnifiedAnalytics #SparkAISummit
Gang 200 Gang 20 Single Gang 4 Single
Resource Allocation
• Fine-grained resource specification:
– {cpuMilliCores: 3000, memoryBytes: 200GB}
• Constraints:
– “dataCenter = dc1 & type in [1,2] & kernelVersion > 4.10”
• Job Affinity:
– inSameDatacenter
14#UnifiedAnalytics #SparkAISummit
Resource Allocation
15#UnifiedAnalytics #SparkAISummit
Prefetched
Host Cache
• Bypass the
steps of
host
filtering
and
scoring
• Speedup
allocation
process
Host Filtering
• Hard &
Soft
constraints
• Resource
constraint
• Label
constraint
• Job affinity
Host Scoring
and Ordering
• Packing
efficiency
• Host
healthiness
• Data
locality
Commit
Allocation
• Book
keeping
resources
• Update
cluster &
queue
parameters
Constraint-based Scheduling
• Machine type
• Datacenter
• Region
• CPU architecture
• Host prefix
• …
16#UnifiedAnalytics #SparkAISummit
Merged host pool - type 1 & 2
Job
Job
Job
Host 1
Host 2
Host 3
Host 4
Host 5
Labeled with
{”type”:”2”}
Labeled with
{”type”:”1”}
Job Job
Job constraint:
type=2
Job constraint:
type=1
Queue
Preemption
• Guarantee resource availability SLO within and across queues
• Identify the starving jobs and overallocated jobs
• Minimize preemption cost: two-phase protocol
– Only candidates appearing in both phases will be preempted
– Resource Manager notifies client with preemption intent s.t. necessary action can
be taken, e.g. checkpointing
17#UnifiedAnalytics #SparkAISummit
Cross Datacenter Scheduling
• The growing demand of computation and storage for Hive tables
spans across data centers
• Stranded capacity with imbalanced load
• Poor data locality and waste of network bandwidth
• Slow reaction to recover from crisis and disaster
18#UnifiedAnalytics #SparkAISummit
Cross Datacenter Scheduling
• Dispatcher Proxy
– Monitors resource consumption
across data centers
– Decides the Resource Manager
for scheduling jobs
– Provides location hints to the
Resource Manager for
enforcement
• Planner
– Decides where the data will be
replaced based on utilization and
available resources
19#UnifiedAnalytics #SparkAISummit
Datacenter 1 Datacenter 2 Datacenter 3
Resource Manager
1
Resource Manager
2
Dispatcher
Job
Cross Datacenter Scheduling
• Dispatcher Proxy
– Monitors resource consumption
across data centers
– Decides the Resource Manager
for scheduling jobs
– Provides location hints to the
Resource Manager for
enforcement
• Planner
– Decides where the data will be
replaced based on utilization and
available resources
20#UnifiedAnalytics #SparkAISummit
Datacenter 1 Datacenter 2 Datacenter 3
Resource Manager
1
Resource Manager
2
Dispatcher
Job
Job constraint:
datacenter=1
Cross Datacenter Scheduling
• Dispatcher Proxy
– Monitors resource consumption
across data centers
– Decides the Resource Manager
for scheduling jobs
– Provides location hints to the
Resource Manager for
enforcement
• Planner
– Decides where the data will be
replaced based on utilization and
available resources
21#UnifiedAnalytics #SparkAISummit
Datacenter 1 Datacenter 2 Datacenter 3
Resource Manager
1
Resource Manager
2
Dispatcher
Job
Job constraint:
datacenter=1
Table DataTable Data
Future Work
• Mix workloads managed by one resource manager
• Run batch workloads with off-peak resources from online services
• Automatic resource tuning for high utilization
• We’re hiring! Contact: rjian@fb.com
22#UnifiedAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
DataWorks Summit
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Databricks
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Chris Fregly
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Databricks
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
Databricks
 
Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed Python
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 

What's hot (20)

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed Python
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
 

Similar to Tangram: Distributed Scheduling Framework for Apache Spark at Facebook

Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Databricks
 
High Performance Deep learning with Apache Spark
High Performance Deep learning with Apache SparkHigh Performance Deep learning with Apache Spark
High Performance Deep learning with Apache Spark
Rui Liu
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
Ilkay Altintas, Ph.D.
 
Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity Planning
MongoDB
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
Norberto Leite
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Wee Hyong Tok
 
Unit ii sem-v-hadoop
Unit ii  sem-v-hadoopUnit ii  sem-v-hadoop
Unit ii sem-v-hadoop
DrChitraDhawale
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
martinbpeters
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
aftab alam
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
Databricks
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflow
Databricks
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
Databricks
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
Himanshu Bedi
 

Similar to Tangram: Distributed Scheduling Framework for Apache Spark at Facebook (20)

Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
 
High Performance Deep learning with Apache Spark
High Performance Deep learning with Apache SparkHigh Performance Deep learning with Apache Spark
High Performance Deep learning with Apache Spark
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity Planning
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
 
Unit ii sem-v-hadoop
Unit ii  sem-v-hadoopUnit ii  sem-v-hadoop
Unit ii sem-v-hadoop
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflow
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

Tangram: Distributed Scheduling Framework for Apache Spark at Facebook

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 2. Rui Jian, Hao Lin, Facebook Inc. rjian@fb.com, hlin@fb.com Tangram: Distributed Scheduling Framework for Apache Spark at Facebook #UnifiedAnalytics #SparkAISummit
  • 3. About Us • Rui Jian – Software Engineer at Facebook (Data Warehouse & Graph Indexing) – Master of Computer Science (Shanghai Jiao Tong university) • Hao Lin – Research scientist at Facebook (Data Warehouse Batch Scheduling) – PhD in Parallel Computing (Purdue ECE) 3#UnifiedAnalytics #SparkAISummit
  • 4. Agenda • Overview • Tangram Architecture • Scheduling Policies & Resource Allocation • Future work 4#UnifiedAnalytics #SparkAISummit
  • 5. What is Tangram? The scheduling platform for • reliably running various batch workloads • with efficient heterogenous resource management • at scale 5#UnifiedAnalytics #SparkAISummit
  • 6. Tangram Scheduling Targets • Single jobs: adhoc/periodic • Batch jobs: adhoc/periodic, malleable • Gang jobs: adhoc/periodic, rigid • Long-running jobs: steady and regular; e.g. online training 6#UnifiedAnalytics #SparkAISummit
  • 7. Why Tangram? • Various workload characteristics – ML – Apache Spark – Apache Giraph – Single jobs • Customized scheduling policies • Scalability – Fleet size: hundreds of thousands worker nodes – Job scheduling throughput: hundreds of millions jobs per day 7#UnifiedAnalytics #SparkAISummit
  • 8. Overview • What is Tangram? 8#UnifiedAnalytics #SparkAISummit Admin Job Manager DB ML Resource Manager Master Agent AgentAgent Single Job Gang Job ML Elastic Scheduler 1 2 3 4 5 6 SQL query Giraph Spark
  • 9. Client Library 9#UnifiedAnalytics #SparkAISummit • Job management • Request/Release resources • Resource grant • Preemption notification • Launch containers • Container status change event Tangram client Resource Manager Agent Application 1 2 3 4 5 6
  • 10. Agent • Report schedulable resources and runtime usage • Health check reports • Detect labels • Launch/Kill Containers • Container recovery • Resource isolation with cgroup v2 10#UnifiedAnalytics #SparkAISummit
  • 11. Failure Recovery • Agent failure – Scan the recovery directory and recover the running containers • RM failure – Both agent and client hold off communication to the RM until the new master shows up – Client sync session info to the new master to help it build the states – Agents add them to the new master 11#UnifiedAnalytics #SparkAISummit
  • 12. Scheduling Policies • Hierarchical queue structure • Jobs to be queued on leaves • Queue configs: – min/max resources – Policy: • FIFO • Dominant Resource Fairness (DRF) • User fairness • Global • … 12#UnifiedAnalytics #SparkAISummit / ads feed pipelines interactive Job DRF DRF DRF User FairnessFIFO 20%80% 50% 50% user1 user2 50% 50% FIFO FIFO Job Job Job
  • 13. Scheduling Policies • Jobs ordered by priority, submission time within queue • Gang job as first class in scheduling and resource allocation • Lookahead scheduling for better throughput and utilization • Job starvation prevention 13#UnifiedAnalytics #SparkAISummit Gang 200 Gang 20 Single Gang 4 Single
  • 14. Resource Allocation • Fine-grained resource specification: – {cpuMilliCores: 3000, memoryBytes: 200GB} • Constraints: – “dataCenter = dc1 & type in [1,2] & kernelVersion > 4.10” • Job Affinity: – inSameDatacenter 14#UnifiedAnalytics #SparkAISummit
  • 15. Resource Allocation 15#UnifiedAnalytics #SparkAISummit Prefetched Host Cache • Bypass the steps of host filtering and scoring • Speedup allocation process Host Filtering • Hard & Soft constraints • Resource constraint • Label constraint • Job affinity Host Scoring and Ordering • Packing efficiency • Host healthiness • Data locality Commit Allocation • Book keeping resources • Update cluster & queue parameters
  • 16. Constraint-based Scheduling • Machine type • Datacenter • Region • CPU architecture • Host prefix • … 16#UnifiedAnalytics #SparkAISummit Merged host pool - type 1 & 2 Job Job Job Host 1 Host 2 Host 3 Host 4 Host 5 Labeled with {”type”:”2”} Labeled with {”type”:”1”} Job Job Job constraint: type=2 Job constraint: type=1 Queue
  • 17. Preemption • Guarantee resource availability SLO within and across queues • Identify the starving jobs and overallocated jobs • Minimize preemption cost: two-phase protocol – Only candidates appearing in both phases will be preempted – Resource Manager notifies client with preemption intent s.t. necessary action can be taken, e.g. checkpointing 17#UnifiedAnalytics #SparkAISummit
  • 18. Cross Datacenter Scheduling • The growing demand of computation and storage for Hive tables spans across data centers • Stranded capacity with imbalanced load • Poor data locality and waste of network bandwidth • Slow reaction to recover from crisis and disaster 18#UnifiedAnalytics #SparkAISummit
  • 19. Cross Datacenter Scheduling • Dispatcher Proxy – Monitors resource consumption across data centers – Decides the Resource Manager for scheduling jobs – Provides location hints to the Resource Manager for enforcement • Planner – Decides where the data will be replaced based on utilization and available resources 19#UnifiedAnalytics #SparkAISummit Datacenter 1 Datacenter 2 Datacenter 3 Resource Manager 1 Resource Manager 2 Dispatcher Job
  • 20. Cross Datacenter Scheduling • Dispatcher Proxy – Monitors resource consumption across data centers – Decides the Resource Manager for scheduling jobs – Provides location hints to the Resource Manager for enforcement • Planner – Decides where the data will be replaced based on utilization and available resources 20#UnifiedAnalytics #SparkAISummit Datacenter 1 Datacenter 2 Datacenter 3 Resource Manager 1 Resource Manager 2 Dispatcher Job Job constraint: datacenter=1
  • 21. Cross Datacenter Scheduling • Dispatcher Proxy – Monitors resource consumption across data centers – Decides the Resource Manager for scheduling jobs – Provides location hints to the Resource Manager for enforcement • Planner – Decides where the data will be replaced based on utilization and available resources 21#UnifiedAnalytics #SparkAISummit Datacenter 1 Datacenter 2 Datacenter 3 Resource Manager 1 Resource Manager 2 Dispatcher Job Job constraint: datacenter=1 Table DataTable Data
  • 22. Future Work • Mix workloads managed by one resource manager • Run batch workloads with off-peak resources from online services • Automatic resource tuning for high utilization • We’re hiring! Contact: rjian@fb.com 22#UnifiedAnalytics #SparkAISummit
  • 23. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT