SlideShare a Scribd company logo
1 of 25
Graphene – Microsoft
SCOPE on Tez
Hitesh Sharma (Principal Software Eng. Manager)
Anupam (Senior Software Engineer)
Agenda
• Overview of SCOPE and Cosmos
• SCOPE Job Manager responsibilities
• Design of Graphene
• Features required in Tez
Cosmos
Environment
• A Microsoft-internal platform
for building big-data
applications
• Available externally as Azure
Data Lake Analytics
• Enable customers to transform
data of any scale into new
business assets easily at low
cost in the cloud
Cosmos: World’s Biggest YARN Cluster!
Single DC >
40K machines
Multiple DCs
> 500,000 jobs
/ day
~ 3 billion
containers/day
High avg. CPU
utilization
Three Nines
Exabytes in
storage
100s of PB
processed/day
Exabytes of
data moved
SCOPE
• Scripting language for Cosmos
• Influenced by SQL and relational
concepts
• Works great with C# and .NET
• Very extensible
• Auto scale
• Naturally parallelizable computation
• Lower the barrier to write efficient
programs
RawData =
EXTRACT
Clicks:int,
Domain:string
FROM @“RAWWEBDATA.TSV”
USING DefaultTextExtractor();
WebData =
SELECT *,
Domain.Trim().ToUpper()
AS NormalizedDomain
FROM RawData;
OUTPUT WebData
TO “WEBDATA.TSV”
USING DefaultTextOutputter();
CosmosFront-EndService
Optimizer
Job Manager
Compiler
Runtime
Engine
SCOPE platform
Job scale
• Single job can consume > 1PB of
data
• > 15000 concurrent tasks (degree of
parallelism)
• Thousands of vertices
• DAGs can be very wide, very deep,
or both
• Millions of tasks in a job
• Billions of edges
Job Manager
• DAG execution
• Builds execution graph
• Topologically executes the DAG
• Keep track of state of the job/vertices​
• Dynamic DAG updates
• Rack level aggregation
• Broadcast tree
• Fault tolerance
• Handle failures and do revocations
• Detect and mitigate outliers
Job Manager
• Scheduling
• Keep track of cluster resources​
• Distributed scheduler
• Requests bonus or opportunistic containers
to increase utilization
• Can upgrade opportunistic containers
• Container reuse
• Opportunistic containers present some
interesting choices for reuse
• Tricky to implement
Time
Parallel
containers
Max containers
allowed
Job Manager
• Finalization
• Concatenate final outputs​
• Metadata operations​
• Tooling
• Near real-time feedback
• Finding the critical path
• Structured error reporting
Current
Challenges
• Higher cost of ownership
• No AM recovery
• Tied to Cosmos infrastructure
• Memory inefficient
• Native support for interactive
workload
Status and Roadmap
Prototype
Run
benchmarks
like TPC-X,
TeraSort etc.
Offline
flighting of
customer
jobs
First stage of
production
deployment
Late 2017
Mid 2018
Late 2018
Early 2018
Design and
Implementation of
Graphene
Guiding
Principles
Minimal changes
in SCOPE stack
Work with
community
Use Tez
extensibility
Maintain
compatibility
Consume output of
compilation to
generate DAG
Algebr
a
Launch and
communicate with
ScopeEngine
Engine
Produce status,
debugging, and
error details for
existing tooling
Tooling
Interact with
storage layer
Store
Graphene – Integration Points
Graphene – Application Master
GRAPHENE AM
GrapheneDAGAppMaster
DAG
Converter
Algebra
Legend
Tez Component
Uses Tez API
External Component
DAG
Store Client
Input InitializerDAGAppMaster
DAGImpl
Custom Edge
and Vertex Mgr
Tez
Magic
Task
Graphene – Task Execution
Task Container
SCOPE Engine
SCOPE Processor
SCOPE Input SCOPE Output
SCOPE TaskTez
Magic!
GRAPHENE AM
AM Container
Launch Container
InputFailedEvent/DataMovementEvent
InputDataInformationEvent or
DataMovementEvent Task CommandStatus & Error
Legend
Tez Component
Uses Tez API
External Component
Graphene – Tooling Integration
Task Container
SCOPE Engine
SCOPE Task
Periodic Stats and Diag
Legend
Tez Component
Uses Tez API
External Component
Statistics & DiagTez
Magic
GRAPHENE AM
AM Container
JobProfiler:
EventListener
Real Time
Stats
Historic
Stats Task Level Stats
Vertex Level Stats
Experience So Far
Reliability
As expected from
a production ready
software
No major bugs or
reliability issues
Onboarding
Modular and
tested code
Documentation :
Opportunity to
contribute
Community
Very responsive
Special thanks to
Bikas Saha, Kuhu
Shukla, Jonathan
Eagles
Scaling Tez
• Existing Cosmos workloads can have >
15k parallel tasks
• Acquiring and managing these
containers
• Managing communications with
these tasks
• Providing real time progress for
all the tasks
Scaling Tez
• Optimize AM memory
• Metadata management for large
inputs
• Memory pressure under large
event throughput
• Large DAGs with > 2000 vertices
and > 1 million tasks
• Optimizations for deep DAGs
Integrating
with YARN
Opportunistic
containers
• Mechanism to drive up utilization of
cluster
• AM has deep understanding of the
capability
• Effectively using opportunistic
containers in scheduler
• Harder scheduling choices with
container reuse
AM Recovery
• High priority customer ask
• Need to plugin Graphene to this AM
resiliency
• Deterministic and reliable recovery
with dynamic behavior
Conclusion
Microsoft SCOPE analytics running on
Apache YARN and Tez!
Our Journey has just started.
We invite you to collaborate.
References
• Apache Tez: A Unifying Framework for Modeling and Building Data
Processing Applications [SIGMOD, 2015]
• SCOPE: easy and efficient parallel processing of massive data sets
[VLDB, 2008]
• Apollo: Scalable and Coordinated Scheduling for Cloud-Scale
Computing [OSDI, 2014]
• Dryad: distributed data-parallel programs from sequential building
blocks [EuroSys, 2007]
• Lessons learned from scaling YARN to 40k machines in a multi tenancy
environment. [DataWorksSummit, 2017]

More Related Content

What's hot

The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeDataWorks Summit
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on DockerDataWorks Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...DataWorks Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...DataWorks Summit
 

What's hot (20)

The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 

Similar to Graphene - Bringing SCOPE Analytics to Apache Tez

Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
Machine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaMachine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaData Con LA
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...confluent
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 
Alluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio, Inc.
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 

Similar to Graphene - Bringing SCOPE Analytics to Apache Tez (20)

Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
try
trytry
try
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Machine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaMachine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh Poduska
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
Alluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata Services
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Ecss des
Ecss desEcss des
Ecss des
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Graphene - Bringing SCOPE Analytics to Apache Tez

  • 1. Graphene – Microsoft SCOPE on Tez Hitesh Sharma (Principal Software Eng. Manager) Anupam (Senior Software Engineer)
  • 2. Agenda • Overview of SCOPE and Cosmos • SCOPE Job Manager responsibilities • Design of Graphene • Features required in Tez
  • 3. Cosmos Environment • A Microsoft-internal platform for building big-data applications • Available externally as Azure Data Lake Analytics • Enable customers to transform data of any scale into new business assets easily at low cost in the cloud
  • 4. Cosmos: World’s Biggest YARN Cluster! Single DC > 40K machines Multiple DCs > 500,000 jobs / day ~ 3 billion containers/day High avg. CPU utilization Three Nines Exabytes in storage 100s of PB processed/day Exabytes of data moved
  • 5. SCOPE • Scripting language for Cosmos • Influenced by SQL and relational concepts • Works great with C# and .NET • Very extensible • Auto scale • Naturally parallelizable computation • Lower the barrier to write efficient programs RawData = EXTRACT Clicks:int, Domain:string FROM @“RAWWEBDATA.TSV” USING DefaultTextExtractor(); WebData = SELECT *, Domain.Trim().ToUpper() AS NormalizedDomain FROM RawData; OUTPUT WebData TO “WEBDATA.TSV” USING DefaultTextOutputter();
  • 7. Job scale • Single job can consume > 1PB of data • > 15000 concurrent tasks (degree of parallelism) • Thousands of vertices • DAGs can be very wide, very deep, or both • Millions of tasks in a job • Billions of edges
  • 8. Job Manager • DAG execution • Builds execution graph • Topologically executes the DAG • Keep track of state of the job/vertices​ • Dynamic DAG updates • Rack level aggregation • Broadcast tree • Fault tolerance • Handle failures and do revocations • Detect and mitigate outliers
  • 9. Job Manager • Scheduling • Keep track of cluster resources​ • Distributed scheduler • Requests bonus or opportunistic containers to increase utilization • Can upgrade opportunistic containers • Container reuse • Opportunistic containers present some interesting choices for reuse • Tricky to implement Time Parallel containers Max containers allowed
  • 10. Job Manager • Finalization • Concatenate final outputs​ • Metadata operations​ • Tooling • Near real-time feedback • Finding the critical path • Structured error reporting
  • 11. Current Challenges • Higher cost of ownership • No AM recovery • Tied to Cosmos infrastructure • Memory inefficient • Native support for interactive workload
  • 12. Status and Roadmap Prototype Run benchmarks like TPC-X, TeraSort etc. Offline flighting of customer jobs First stage of production deployment Late 2017 Mid 2018 Late 2018 Early 2018
  • 14. Guiding Principles Minimal changes in SCOPE stack Work with community Use Tez extensibility Maintain compatibility
  • 15. Consume output of compilation to generate DAG Algebr a Launch and communicate with ScopeEngine Engine Produce status, debugging, and error details for existing tooling Tooling Interact with storage layer Store Graphene – Integration Points
  • 16. Graphene – Application Master GRAPHENE AM GrapheneDAGAppMaster DAG Converter Algebra Legend Tez Component Uses Tez API External Component DAG Store Client Input InitializerDAGAppMaster DAGImpl Custom Edge and Vertex Mgr Tez Magic Task
  • 17. Graphene – Task Execution Task Container SCOPE Engine SCOPE Processor SCOPE Input SCOPE Output SCOPE TaskTez Magic! GRAPHENE AM AM Container Launch Container InputFailedEvent/DataMovementEvent InputDataInformationEvent or DataMovementEvent Task CommandStatus & Error Legend Tez Component Uses Tez API External Component
  • 18. Graphene – Tooling Integration Task Container SCOPE Engine SCOPE Task Periodic Stats and Diag Legend Tez Component Uses Tez API External Component Statistics & DiagTez Magic GRAPHENE AM AM Container JobProfiler: EventListener Real Time Stats Historic Stats Task Level Stats Vertex Level Stats
  • 19. Experience So Far Reliability As expected from a production ready software No major bugs or reliability issues Onboarding Modular and tested code Documentation : Opportunity to contribute Community Very responsive Special thanks to Bikas Saha, Kuhu Shukla, Jonathan Eagles
  • 20. Scaling Tez • Existing Cosmos workloads can have > 15k parallel tasks • Acquiring and managing these containers • Managing communications with these tasks • Providing real time progress for all the tasks
  • 21. Scaling Tez • Optimize AM memory • Metadata management for large inputs • Memory pressure under large event throughput • Large DAGs with > 2000 vertices and > 1 million tasks • Optimizations for deep DAGs
  • 22. Integrating with YARN Opportunistic containers • Mechanism to drive up utilization of cluster • AM has deep understanding of the capability • Effectively using opportunistic containers in scheduler • Harder scheduling choices with container reuse
  • 23. AM Recovery • High priority customer ask • Need to plugin Graphene to this AM resiliency • Deterministic and reliable recovery with dynamic behavior
  • 24. Conclusion Microsoft SCOPE analytics running on Apache YARN and Tez! Our Journey has just started. We invite you to collaborate.
  • 25. References • Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications [SIGMOD, 2015] • SCOPE: easy and efficient parallel processing of massive data sets [VLDB, 2008] • Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing [OSDI, 2014] • Dryad: distributed data-parallel programs from sequential building blocks [EuroSys, 2007] • Lessons learned from scaling YARN to 40k machines in a multi tenancy environment. [DataWorksSummit, 2017]

Editor's Notes

  1. We are here to talk about how we are looking to power SCOPE with Tez.
  2. We will do a quick overview of Cosmos and SCOPE Then we will talk about the role job manager plays in the system How we are looking to fix some of the problems that we have by leveraging Tez Anupam will dig a little deeper into design of Graphene He will be talking about the challenges in front of us and why we need your help to take Tez to the next level
  3. A Microsoft-internal platform for building big-data applications​ ​ Used across Microsoft by Bing, Azure, Windows, Office for data mining and analysis. Available externally as Azure Data Lake Analytics Lets user focus on transforming data to gain insights while we focus on operating the platform at lower COGS.
  4. SCOPE is the main scripting language for Cosmos. Targeted for large-scale data analysis. You could run a script over 1GB, TB, or a PB and we handle scaling that. SQL like language that allows C# and .NET devs to get started easily. On the right is a sample SCOPE script. In this case we are reading a TSV file and running a select statement on that, add a new column, and output that as a new file. Users can easily define their own functions and even implement their own versions of operators like extractors, processors, and outputters. Users just write the scripts thinking it is going to run on a single machine and we scale it out on the cluster. This means that the nitty-gritties of dealing from failures and retries is not something a user should worry about
  5. Users submit a SCOPE script from VS using Scope studio plugin. The script goes through Cosmos Job Service and FE where it is compiled by Scope compiler. Compiler produces an AST representation of the script along with the codegen DLLs for user code and other artifacts. Optimizer makes decisions about execution plan, parallelism, and generates an algebra. Job manager, which is us, parses the algebra and starts executing the DAG on the cluster. As part of the execution JM launches Scope engine on the tasks which provides implementations of many standard physical operators. JM gives the Scope engine input paths to read and outputs to produce. Typically outputs of one vertex become input to some other vertex and DAG execution continues. --- SCOPE compiler and optimizer are responsible for generating an efficient execution plan and the runtime.
  6. So what are the responsibilities of the Job Manager? DAG execution JM is the central and coordinating process for all processing vertices within an application. The primary function of the JM is to construct the runtime DAG from the compile time representation of a DAG and execute over it. The JM schedules a DAG vertex onto the cluster nodes when all the inputs are ready. JM can also do dynamic updates to the graph like a pod level aggregation or build a broadcast tree. Fault tolerance The Job Manager monitors progress of all executing vertices. Failing vertices are re-executed a limited number of times and if there are too many failures, the job is terminated. JM also detects slower tasks in a vertex and reexecutes them elsewhere on the cluster.
  7. Scheduling When a task is ready then JM looks for a machine in the cluster to run the task upon. The global cluster load information used by each JM is provided through the cooperation of two additional entities in the system: a Resource Monitor (RM) for each cluster and a Process Node (PN) on each server. The RM aggregates load information from PNs across the cluster continuously, providing a global view of the cluster status for each JM to make informed scheduling decisions. It also enforces token limits.. Users typically give a job some tokens to run. Each token amounts to 2 cores and 6GB. JM ensures that the resources used by the job never exceed the allocated number of tokens.
  8. When the job finishes then the JM finalizes the outputs so they become visible to the user. It also supports some custom metadata operations like catalog updates.
  9. Go through details. Explains design and implementation decisions for Graphene and how we use Tez
  10. 1m Once we decided to implement SCOPE AM using Tez. We decided upon certain ground rules or guiding principles we would use to accomplish this goal. Hitesh already gave us an idea about the scale of Cosmos and SCOPE workloads, and how critical they are for Microsoft’s business. The compiler, optimizer, execution engine and tooling will be minimally changed, in order to allow for a staged transition. Tez has very powerful set of APIs to allow any system to plugin. We will be using these extensibility points as much as possible. Finally, for features that we feel the need to add to Tez, we will be working with the community and making them work generally for all Tez users as much as possible. With these ground rules set we started working on porting Scope to run on top of Tez.
  11. 3m The need to seamlessly upgrade from current job manager to graphene implies that graphene should be a drop-in replacement for current job manager. As Hitesh showed, doing this at Cosmos scale while being the backbone of Microsoft’s analytics need implies least perturbation. This meant that the SCOPE AM on Tez had to mimic existing job manager kind of behavior. Graphene has 4 unique integration point in Cosmos SCOPE stack not native to Tez. This introduction of our guiding principles and integration points will be helpful to understand our implementation and the rationale behind our design choices.
  12. 5m
  13. 6m
  14. 7m
  15. 8m
  16. 10m Bring learnings from Job Manager back to Tez.
  17. 12m
  18. 13m
  19. 15m
  20. 16m
  21. 16m