SlideShare a Scribd company logo
How Microsoft built
and scaled Cosmos
Cosmos
— Cosmos is a large Scale Data processing system
— In use by thousands of internal users at Microsoft
— Distributed filesystem contains exabytes of data
— High-level SQL-like language to run jobs
processing up to petabytes at a time
Outline
— What made Cosmos successful
— Language
— Data sharing
— Technical Challenges
— Scalability challenges and architecture
— Supporting lower latency workload
— Conclusion
Language: Scope
— SQL-Like language
— Support structured data and unstructured data
— Easy to use and learn
Q = SSTREAM “queries.ss”;
U = SSTREAM “users.ss”;
J= SELECT *, Math.Round(Q.latency) AS l
FROM Q,U WHERE Q.uid==U.uid;
OUTPUT J TO “output.txt”
“SCOPE: Parallel Databases Meet MapReduce” Jingren Zhou, Nicolas Bruno, Ming-chuan Wu, Paul Larson, Ronnie Chaiken, Darren Shakib,
The VLDB Journal, 2012
Scope
— C# extensibility
— Supports user defined objects
input = EXTRACT user, session, blob
FROM "log_%n.txt?n=1...10"
USING DefaultTextExtractor;
SELECT user, session,
new RequestInfo(blob) AS request
FROM input
WHERE
request.Browser.IsChrome()
“SCOPE: Parallel Databases Meet MapReduce” Jingren Zhou, Nicolas Bruno, Ming-chuan Wu, Paul Larson, Ronnie Chaiken, Darren Shakib,
The VLDB Journal, 2012
Scope Distributed Execution
— Queries are parsed into a logical operator tree
— The optimizer transforms the query into a physical
operator graph, which is then compiled into
binaries
— The physical operator graph and binaries are
handed to a scheduler for execution
“Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping
Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
Scope Optimizer
Data Sharing
— Users share data by reference
— Teams put their data in Cosmos because that is
where the data they want to join against is
— Skype, Windows, Xbox, Bing, Ads, Office, and
more
http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf
https://azure.microsoft.com/en-us/blog/behind-the-scenes-of-azure-data-lake-bringing-microsoft-s-big-data-experience-to-
hadoop/
“Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping
Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
Network Effect
• Teams put their data in Cosmos because that is
where the data they want to join against is
http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf
JETS operates a high-scale, modern data pipeline for Office
Telemetry data from clients and services are combined into both
custom (app domain specific) and common System Health data
sets in Cosmos.
organization reports to surface [..] release risks and telemetry information
telemetry information using map/reduce COSMOS
WSD organization is responsible for delivering security and non-
security fixes to Windows OSes to billions of customers, every
month on patch Tuesday
Are you interested in building the BI platform for Bing Ads?
Experience with working on C#, C++, or Java, Cosmos, is highly
desirable.
Are you excited about delivering the next generation personal
assistant, Cortana, to millions of people using Windows worldwide?
Experience with“Big Data” technologies like Cosmos
Data Sharing
— Users share data by reference
— Teams put their data in Cosmos because that is
where the data they want to join against is
— Skype, Windows, Xbox, Bing, Ads, Office, and
more
— This drives huge scalability requirements
— Cluster size exceed 50,000 servers
http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf
https://azure.microsoft.com/en-us/blog/behind-the-scenes-of-azure-data-lake-bringing-microsoft-s-big-data-experience-to-
hadoop/
“Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping
Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
Outline
— What made Cosmos successful
— Language
— Data sharing
— Technical Challenges
— Scalability challenges and architecture
— Supporting lower latency workload
— Conclusion
Plan Optimizations
— At large scale, query plan manipulations are
required to improve efficiency of sort, aggregation
and broadcast
Aggregation
Broadcast Joins
Parallel Sort
Scaling the Execution:
Apollo (OSDI’14)
— A large number of users share execution resources
for data locality
— How to minimize latency while maximizing cluster
utilization?
— Challenges:
— Scale
— Heterogeneous workload
— Maximizing utilization
“Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping
Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
Heterogeneous Workload
Dynamic Workload
How to effectively use resources while maintaining
performance guarantees with a dynamic workload?
Architecture
— For scalability, the architecture adopts a fully
decentralized control plane
— Each job has its own scheduler instance
— Each scheduler is making independent decisions
informed by global information
Architecture
•
Scheduler:
There is one scheduler per job for
scalability
The scheduler makes local
decision and directly dispatch
tasks to process nodes
Architecture
Process Nodes:
Execute tasks on behalf of job
managers
Provides local resource isolation
Send status update aggregated
by a resource monitor
Architecture
Resource Monitor:
Aggregates status information
from process node
Provides the cluster load
information to schedulers to
inform future scheduling
Architecture
The queue at the PN allows the scheduler
to reason about future resource availability
Representing Load
— How to concisely represent load?
— Represents the expected wait time to
acquire resources
— Integrated into a scheduler cost model
Optimizing for various
factors
Optimizing for various
factors
— To make optimal scheduling decisions, multiple factors have to be
considered at the same time
— Input location
— Network topology
— Wait time
— Initialization time
— Machine health, probability of failure
Scheduler Performance
Ideal scheduler (Capacity Constraint)
Ideal Scheduler (Infinite Capacity)
Baseline
Apollo
The Cosmos scheduler performs within 5% of the
ideal trace driven scheduler
Utilization
Cosmos maintains a median utilization above 80%
on weekdays while supporting latency-sensitive
workloads
More in the paper
— Scheduler cost model
— Opportunistic scheduling
— Stable matching
“Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing”
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian,
Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
Outline
— What made Cosmos successful
— Language
— Data sharing
— Technical Challenges
— Scalability challenges and architecture
— Supporting lower latency workload
— Conclusion
Supporting lower latency
workloads
— As the customer base increased, the workload
diversified
— Users request the ability to get interactive
latencies, on the same data
— While Apollo can scale to jobs processing
petabytes of data, it has undesirable overhead for
smaller jobs
Supporting lower latency
workloads
— How to provide interactive latencies at cloud scale?
— How to provide fault tolerance in an interactive
context?
JetScope (VLDB ’15)
— Provide interactive capabilities on Cosmos &
Scope
— Paradigm shift in the execution model:
— Stream intermediate results
— Gang scheduling
Intermediate Results
Streaming
— JetScope avoids materializing intermediates to disk
— Tasks writes to a local service, StreamNet, which
manages communications
— Challenges:
— Deadlock on ordered merge when using finite
communication buffers
— Too many connections
Gang Scheduling
— To achieve minimal latency, JetScope starts all
tasks at the same time (gang scheduling)
— Execution overlap in tasks allows an increase in
parallelism
— Challenge: Scheduler deadlock
— Two schedulers incrementally acquire resources
— Resources run out, neither jobs can execute
— Solution: Admission control
—Chance of failure increases with number of
servers touched
—A job could fail repeatedly and never complete
—We need a fault tolerance mechanism that
doesn’t impact performance
—Details are in the paper
39
Fault Tolerance
How does JetScope scale?
Latency(seconds)
0
13
25
38
50
Q1 Q4 Q6 Q12 Q15
10TB with 200 servers 1TB with 20 servers
Similar latency
after 10x scale
increase
40
Conclusion
—Cosmos is a large scale distributed data processing
system
—Store exabytes of data on many clusters, that can
contain over 50,000 servers
—Provides both batch processing and interactive
processing
—Has a fully decentralized control plane for scalability
—Operates a high utilization to maintain low query
cost
41

More Related Content

What's hot

Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
Rakuten Group, Inc.
 
Introducing MemSQL 4
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4
SingleStore
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
SingleStore
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
confluent
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
SingleStore
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
SingleStore
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018
confluent
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
HostedbyConfluent
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
confluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
confluent
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
SingleStore
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
confluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
HostedbyConfluent
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
HostedbyConfluent
 
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
confluent
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
confluent
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 

What's hot (20)

Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Introducing MemSQL 4
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 

Similar to How Microsoft Built and Scaled Cosmos

IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016
Vaidheswaran CS
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Mesosphere Inc.
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
lantianlcdx
 
Mihai_Nuta
Mihai_NutaMihai_Nuta
Mihai_Nuta
Mihai Nuta
 
FredMcLainResumeB
FredMcLainResumeBFredMcLainResumeB
FredMcLainResumeB
Fred McLain
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
Joarder Kamal
 
Containerizing couchbase with microservice architecture on mesosphere.pptx
Containerizing couchbase with microservice architecture on mesosphere.pptxContainerizing couchbase with microservice architecture on mesosphere.pptx
Containerizing couchbase with microservice architecture on mesosphere.pptx
Ravi Yadav
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
Gokhan Boranalp
 
newSkills_09
newSkills_09newSkills_09
newSkills_09
Yue Chao Qin
 
Scientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & FutureScientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & Future
stratuslab
 
Santosh kumarpandi
Santosh kumarpandiSantosh kumarpandi
Santosh kumarpandi
Santosh Kumar Pandi
 
QoE-Aware Traffic Steering using OpenFlow
QoE-Aware Traffic Steering using OpenFlowQoE-Aware Traffic Steering using OpenFlow
QoE-Aware Traffic Steering using OpenFlow
US-Ignite
 
Karthik Balasubramanian (Resume)
Karthik Balasubramanian (Resume)Karthik Balasubramanian (Resume)
Karthik Balasubramanian (Resume)
karthik_bala
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPC
Object Automation
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
Vipin Singhal
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
TamilKnowledgebase
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
geminass1
 
Cloud Computing: Concepts, Technologies and Business Implications
Cloud Computing: Concepts, Technologies and Business ImplicationsCloud Computing: Concepts, Technologies and Business Implications
Cloud Computing: Concepts, Technologies and Business Implications
ssuser1cafbd1
 
Naresh Babu
Naresh BabuNaresh Babu
Naresh Babu
Naresh Babu Keshava
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Animesh Chaturvedi
 

Similar to How Microsoft Built and Scaled Cosmos (20)

IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
Mihai_Nuta
Mihai_NutaMihai_Nuta
Mihai_Nuta
 
FredMcLainResumeB
FredMcLainResumeBFredMcLainResumeB
FredMcLainResumeB
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
 
Containerizing couchbase with microservice architecture on mesosphere.pptx
Containerizing couchbase with microservice architecture on mesosphere.pptxContainerizing couchbase with microservice architecture on mesosphere.pptx
Containerizing couchbase with microservice architecture on mesosphere.pptx
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
newSkills_09
newSkills_09newSkills_09
newSkills_09
 
Scientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & FutureScientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & Future
 
Santosh kumarpandi
Santosh kumarpandiSantosh kumarpandi
Santosh kumarpandi
 
QoE-Aware Traffic Steering using OpenFlow
QoE-Aware Traffic Steering using OpenFlowQoE-Aware Traffic Steering using OpenFlow
QoE-Aware Traffic Steering using OpenFlow
 
Karthik Balasubramanian (Resume)
Karthik Balasubramanian (Resume)Karthik Balasubramanian (Resume)
Karthik Balasubramanian (Resume)
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPC
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
Cloud Computing: Concepts, Technologies and Business Implications
Cloud Computing: Concepts, Technologies and Business ImplicationsCloud Computing: Concepts, Technologies and Business Implications
Cloud Computing: Concepts, Technologies and Business Implications
 
Naresh Babu
Naresh BabuNaresh Babu
Naresh Babu
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 

More from SingleStore

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
SingleStore
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
SingleStore
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
SingleStore
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
SingleStore
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
SingleStore
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
SingleStore
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
SingleStore
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
SingleStore
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
SingleStore
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
SingleStore
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
SingleStore
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
SingleStore
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
SingleStore
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
SingleStore
 

More from SingleStore (20)

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 

Recently uploaded

Research proposal seminar ,Research Methodology
Research proposal seminar ,Research MethodologyResearch proposal seminar ,Research Methodology
Research proposal seminar ,Research Methodology
doctorzlife786
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
TARIKU ENDALE
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
MinThetLwin1
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
huseindihon
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
Joel Ngushwai
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
M44.pdf dairy management farm report of an
M44.pdf dairy management farm report of anM44.pdf dairy management farm report of an
M44.pdf dairy management farm report of an
ManjuBv2
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
saadkhan1485265
 

Recently uploaded (20)

Research proposal seminar ,Research Methodology
Research proposal seminar ,Research MethodologyResearch proposal seminar ,Research Methodology
Research proposal seminar ,Research Methodology
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
M44.pdf dairy management farm report of an
M44.pdf dairy management farm report of anM44.pdf dairy management farm report of an
M44.pdf dairy management farm report of an
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
 

How Microsoft Built and Scaled Cosmos

  • 1. How Microsoft built and scaled Cosmos
  • 2. Cosmos — Cosmos is a large Scale Data processing system — In use by thousands of internal users at Microsoft — Distributed filesystem contains exabytes of data — High-level SQL-like language to run jobs processing up to petabytes at a time
  • 3. Outline — What made Cosmos successful — Language — Data sharing — Technical Challenges — Scalability challenges and architecture — Supporting lower latency workload — Conclusion
  • 4. Language: Scope — SQL-Like language — Support structured data and unstructured data — Easy to use and learn Q = SSTREAM “queries.ss”; U = SSTREAM “users.ss”; J= SELECT *, Math.Round(Q.latency) AS l FROM Q,U WHERE Q.uid==U.uid; OUTPUT J TO “output.txt” “SCOPE: Parallel Databases Meet MapReduce” Jingren Zhou, Nicolas Bruno, Ming-chuan Wu, Paul Larson, Ronnie Chaiken, Darren Shakib, The VLDB Journal, 2012
  • 5. Scope — C# extensibility — Supports user defined objects input = EXTRACT user, session, blob FROM "log_%n.txt?n=1...10" USING DefaultTextExtractor; SELECT user, session, new RequestInfo(blob) AS request FROM input WHERE request.Browser.IsChrome() “SCOPE: Parallel Databases Meet MapReduce” Jingren Zhou, Nicolas Bruno, Ming-chuan Wu, Paul Larson, Ronnie Chaiken, Darren Shakib, The VLDB Journal, 2012
  • 6. Scope Distributed Execution — Queries are parsed into a logical operator tree — The optimizer transforms the query into a physical operator graph, which is then compiled into binaries — The physical operator graph and binaries are handed to a scheduler for execution “Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
  • 8. Data Sharing — Users share data by reference — Teams put their data in Cosmos because that is where the data they want to join against is — Skype, Windows, Xbox, Bing, Ads, Office, and more http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf https://azure.microsoft.com/en-us/blog/behind-the-scenes-of-azure-data-lake-bringing-microsoft-s-big-data-experience-to- hadoop/ “Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
  • 9. Network Effect • Teams put their data in Cosmos because that is where the data they want to join against is http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf JETS operates a high-scale, modern data pipeline for Office Telemetry data from clients and services are combined into both custom (app domain specific) and common System Health data sets in Cosmos.
  • 10. organization reports to surface [..] release risks and telemetry information telemetry information using map/reduce COSMOS WSD organization is responsible for delivering security and non- security fixes to Windows OSes to billions of customers, every month on patch Tuesday
  • 11. Are you interested in building the BI platform for Bing Ads? Experience with working on C#, C++, or Java, Cosmos, is highly desirable.
  • 12. Are you excited about delivering the next generation personal assistant, Cortana, to millions of people using Windows worldwide? Experience with“Big Data” technologies like Cosmos
  • 13. Data Sharing — Users share data by reference — Teams put their data in Cosmos because that is where the data they want to join against is — Skype, Windows, Xbox, Bing, Ads, Office, and more — This drives huge scalability requirements — Cluster size exceed 50,000 servers http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf https://azure.microsoft.com/en-us/blog/behind-the-scenes-of-azure-data-lake-bringing-microsoft-s-big-data-experience-to- hadoop/ “Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
  • 14. Outline — What made Cosmos successful — Language — Data sharing — Technical Challenges — Scalability challenges and architecture — Supporting lower latency workload — Conclusion
  • 15. Plan Optimizations — At large scale, query plan manipulations are required to improve efficiency of sort, aggregation and broadcast
  • 19. Scaling the Execution: Apollo (OSDI’14) — A large number of users share execution resources for data locality — How to minimize latency while maximizing cluster utilization? — Challenges: — Scale — Heterogeneous workload — Maximizing utilization “Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
  • 21. Dynamic Workload How to effectively use resources while maintaining performance guarantees with a dynamic workload?
  • 22. Architecture — For scalability, the architecture adopts a fully decentralized control plane — Each job has its own scheduler instance — Each scheduler is making independent decisions informed by global information
  • 23. Architecture • Scheduler: There is one scheduler per job for scalability The scheduler makes local decision and directly dispatch tasks to process nodes
  • 24. Architecture Process Nodes: Execute tasks on behalf of job managers Provides local resource isolation Send status update aggregated by a resource monitor
  • 25. Architecture Resource Monitor: Aggregates status information from process node Provides the cluster load information to schedulers to inform future scheduling
  • 26. Architecture The queue at the PN allows the scheduler to reason about future resource availability
  • 27. Representing Load — How to concisely represent load? — Represents the expected wait time to acquire resources — Integrated into a scheduler cost model
  • 29. Optimizing for various factors — To make optimal scheduling decisions, multiple factors have to be considered at the same time — Input location — Network topology — Wait time — Initialization time — Machine health, probability of failure
  • 30. Scheduler Performance Ideal scheduler (Capacity Constraint) Ideal Scheduler (Infinite Capacity) Baseline Apollo The Cosmos scheduler performs within 5% of the ideal trace driven scheduler
  • 31. Utilization Cosmos maintains a median utilization above 80% on weekdays while supporting latency-sensitive workloads
  • 32. More in the paper — Scheduler cost model — Opportunistic scheduling — Stable matching “Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing” Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou, in Proc. of the 2014 OSDI Conference (OSDI'14)
  • 33. Outline — What made Cosmos successful — Language — Data sharing — Technical Challenges — Scalability challenges and architecture — Supporting lower latency workload — Conclusion
  • 34. Supporting lower latency workloads — As the customer base increased, the workload diversified — Users request the ability to get interactive latencies, on the same data — While Apollo can scale to jobs processing petabytes of data, it has undesirable overhead for smaller jobs
  • 35. Supporting lower latency workloads — How to provide interactive latencies at cloud scale? — How to provide fault tolerance in an interactive context?
  • 36. JetScope (VLDB ’15) — Provide interactive capabilities on Cosmos & Scope — Paradigm shift in the execution model: — Stream intermediate results — Gang scheduling
  • 37. Intermediate Results Streaming — JetScope avoids materializing intermediates to disk — Tasks writes to a local service, StreamNet, which manages communications — Challenges: — Deadlock on ordered merge when using finite communication buffers — Too many connections
  • 38. Gang Scheduling — To achieve minimal latency, JetScope starts all tasks at the same time (gang scheduling) — Execution overlap in tasks allows an increase in parallelism — Challenge: Scheduler deadlock — Two schedulers incrementally acquire resources — Resources run out, neither jobs can execute — Solution: Admission control
  • 39. —Chance of failure increases with number of servers touched —A job could fail repeatedly and never complete —We need a fault tolerance mechanism that doesn’t impact performance —Details are in the paper 39 Fault Tolerance
  • 40. How does JetScope scale? Latency(seconds) 0 13 25 38 50 Q1 Q4 Q6 Q12 Q15 10TB with 200 servers 1TB with 20 servers Similar latency after 10x scale increase 40
  • 41. Conclusion —Cosmos is a large scale distributed data processing system —Store exabytes of data on many clusters, that can contain over 50,000 servers —Provides both batch processing and interactive processing —Has a fully decentralized control plane for scalability —Operates a high utilization to maintain low query cost 41