SlideShare a Scribd company logo
Multi-Tenant Data Cloud
with YARN & Helix
LinkedIn - Data infra : Helix, Espresso
@kishore_b_g
Yahoo - Ads infra : S4
Kishore Gopalakrishna
What is YARN
Next Generation Compute Platform
MapReduce
HDFS
Hadoop 1.0
MapReduce
HDFS
Hadoop 2.0
Others
(Batch, Interactive, Online,
Streaming)
YARN
(cluster resource management)
A1
A1
A2
A3
B1 C1
C5
B2
B3 C2
B4
B5
C3
C4
Enables
HDFS/Common Area
YARN
YARN Architecture
Client
Resource
Manager
Node Manager Node Manager
submit job
node statusnode status
container
request
App Package
Application
Master
Container
So, let’s build something
Example System
Generate Data
Serve
M/R
Redis
Server 3
HDFS 3
- Generate data in Hadoop
- Use it for serving
Application
Master
Example System
Request
Containers Assign work
Handle Failure
Handle
workload
Changes
Requirements
Big Data :-)
Partitioned, replicated
Fault tolerant, Scalable
Efficient resource utilization
Generate Data
Serve
M/R
Server 3
HDFS 3
Allocation + Assignment
HDFS
Server 1 Server 2Server 3
Partition Assignment - affinity, even distribution
Replica Placement - on different physical machines
Container Allocation - data affinity, rack aware placement
M/Rp1 p2 p3 p4 p5 p6
p1 p2
p5 p4
Server 3
p3 p4
p1 p6
Server 3
p5 p6
p3 p2
Multiple servers to serve
the partitioned data
M/R job generates partitioned data
Failure Handling
Server 1 Server 2Server 1
Acquire new container close to data if possible
Assign failed partitions to new container
On Failure - Even load distribution, while waiting for new container
Server 23 Server 3 Server 4
p5 p4 p1 p6 p3 p2
p1 p2 p3 p4 p5 p6
p3 p2
p5 p6
Workload Changes
Server 1 Server 2Server 3
Workload change - Acquire/Release containers
Container change - Re-distribute work
Monitor - CPU, Memory, Latency, Tps
p1 p2
p5 p4
Server 3
p3 p4
p1 p6
Server 3
p5 p6
p3 p2
Server 3
p4 p6
p2
Service Discovery
Server 1 Server 2Server 3
Dynamically updated on changes
Discover everything, what is running where
p1 p2
p1 p1
Server 3
p3 p4
p1 p1
Server 3
p5 p6
p1 p1
Client Client
Service Discovery
Building YARN Application
Writing AM is Hard and Error Prone
Handling Faults, Workload Changes is non-trivial and often overlooked
Request
container
How many
containers
Where
Assign work
Place
partitions &
replicas
Affinity
Workload
changes
acquire/
release
containers
Minimize
movement
Faults
Handling
Detect non
trivial failures
new v/s
reuse
containers
Other
Service
Discovery
Monitoring
Is there something that can make
this easy?
Apache Helix
What is Helix?
Built at LinkedIn, 2+ years in production
Generic cluster management framework
Contributed to Apache, now a TLP: helix.apache.org
Decoupling cluster management from core functionality
Helix at LinkedIn
Oracle
Oracle
OracleDB
Change Capture
Change
Consumers
Index Search Index
User Writes
Data Replicator
In Production
ETL
HDFS
Analytics
Helix at LinkedIn
In Production
Over 1000 instances covering over 30000
partitions
Over 1000 instances for change
capture consumers
As many as 500 instances in a
single Helix cluster
(all numbers are per-datacenter)
Others Using Helix
Helix concepts
Resource
(Database, Index, Topic, Task)
Partitions
Replicas
p1 p2 p3 p4 p5 p6
r1
r2
r3
Container
Process
Container
Process
Container
Process
Assignment ?
Serve
bootstrap
State Model and Constraints
Helix Concepts
State
Constraints
Transition
Constraints
Partition
Resource
Node
Cluster
Serve: 3
bootstrap: 0
Max T1 transitions in
parallel
-
Max T2 transitions in
parallel
No more than
10 replicas
Max T3 transitions in
parallel
-
Max T4 transitions in
parallel
StateCount=
Replication factor:3
Stop
ParticipantParticipantParticipant
Helix Architecture
P1
stop
bootstrap
server
P2 P5
P3
P4
P8
P6
P7
Controller
Client Client Target Provider
Provisioner
Rebalancer
assign work via callback
spectator spectator
Service Discovery
metrics
metrics
Helix Controller
High-Level Overview
Resource
Config
Constraints
Objectives
Controller
TargetProvider
Provisioner
Rebalancer
Number of Containers
Task-> Container
Mapping
YARN RM
Helix Controller
Target Provider
Determine how many containers are required along with the spec
Fixed CPU Memory Bin Packing
monitoring system provides usage information
Default implementations, Bin Packing can be used to customize further
TargetProvider
Resources p1,p2 .. pn
Existing containers c1,c2 .. cn
Health of tasks,
containers
cpu, memory, health
Allocation
constraints
Affinity,
rack locality
SLA
Fixed: 10 containers
CPU headroom:30%
Memory Usage: 70%
time: 5h
Number of
container
release list
acquire list
Container spec
cpu: x
memory: y
location: L
Helix Controller
Provisioner
Given the container spec, interact with YARN RM to
acquire/release, NM to start/stop containers
YARN
Interacts with YARN RM and subscribes to notifications
Helix Controller
Rebalancer
Based on the current nodes in the cluster and constraints, find an
assignment of task to node
Auto Semi-Auto Static
Rebalancer
Tasks t1,t2 .. tn
Existing containers c1,c2 .. cn
Allocation
constraints &
objectives
Affinity,
rack locality,
Even distribution of
tasks,
Minimize movement
while expanding
Assignment
C1: t1,t2
C2: t3,t4
User defined
Based on the FSM, compute & fire the transitions to Participants
Example System: Helix-Based Solution
Solution
Configure App
Configure Target Provider
Configure Provisioner
Configure Rebalancer
Generate Data
Serve
M/R
Server 3
HDFS 3
Configure AppConfigure App
App Name Partitioned Data Server
App Master
Package
/path/to/
GenericHelixAppMaster.tar
App package
/path/to/
RedisServerLauncher.tar
App Config
DataDirectory: hdfs:/path/to/
data
Configure target providerConfigure target provider
TargetProvider RedisTargetProvider
Goal Target TPS: 1 million
Min container 1
Max containers 25
Configure ProvisionerConfigure Provisioner
YARN RM host:port
Configure RebalancerConfigure Rebalancer
Partitions 6
Replica 2
Max partitions per container 4
Rebalancer.Mode AUTO
Placement Data Affinity
FailureHandling Even distribution
Scaling Minimize Movement
app_config_spec.yaml
Example System: Helix-Based Solution
yarn_app_launcher.sh	
  app_config_spec.yaml
Launch Application
Node ManagerNode Manager
Application Master
Helix + YARN
Helix Controller
Node Manager
YARN
Resource
Manager
Target Provider
Provisioner
Rebalancer
assign
work
Client
submit job
Launch
AM
request
cntrs
launch
containers
Server 1 Server 2participant 3
p1 p2
p5 p4
participant 3
p3 p4
p1 p6
participant 3
p5 p6
p3 p2
Auto Scaling
Non linear scaling from 0 to 1M TPS and back
Failure Handling: Random Faults
Recovering from faults at 1M Tps (5%, 10%, 20% failures/min)
Summary
HDFS
YARN
(cluster resource management)
HELIX
(container + task management)
Others
(Batch, Interactive, Online, Streaming)
Fault tolerance, Expansion handled transparently
Generic Application Master
Efficient resource utilization by task model
Questions?
Website
Twitter
Mail
Team
helix.apache.org, #apachehelix
@apachehelix, @kishore_b_g
user@helix.apache.org
Kanak Biscuitwala, Zhen Zhang
?We love helping & being helped

More Related Content

What's hot

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Davorin Vukelic
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
Xin Wang
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
Nati Shalom
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
Petr Zapletal
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Spark Summit
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
T Jake Luciani
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
Lester Martin
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
毅 吕
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark Summit
 
Storm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationStorm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationFerran Galí Reniu
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Data Con LA
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
Ran Silberman
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
Otto Mok
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
DataWorks Summit
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
Farzad Nozarian
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
DataWorks Summit/Hadoop Summit
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 

What's hot (20)

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Storm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationStorm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computation
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 

Viewers also liked

A Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management ProgramA Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management Program
DataWorks Summit
 
Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)
Shannon Gilliland
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleDataWorks Summit
 
Тематическое планирование 7 класс
Тематическое планирование 7 классТематическое планирование 7 класс
Тематическое планирование 7 класс
koneqq
 
Internet un gran sector en el que emprender
Internet un gran sector en el que emprenderInternet un gran sector en el que emprender
Internet un gran sector en el que emprenderAntevenio S.A
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop SecurityDataWorks Summit
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
Etimology
EtimologyEtimology
Etimology
Andrea Izzo
 
Self esteem-2
Self esteem-2Self esteem-2
Self esteem-2
Shannon Gilliland
 
The use of_l1.a.reynolds
The use of_l1.a.reynoldsThe use of_l1.a.reynolds
The use of_l1.a.reynoldshibbatulnoor
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
DataWorks Summit
 
UK 2014
UK 2014UK 2014
Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016
FertilityEurope
 
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Maria Velarde-Peru
 
Etymology - Communication
Etymology - CommunicationEtymology - Communication
Etymology - CommunicationLinxacross Ltd
 
Самообразование
СамообразованиеСамообразование
Самообразование
koneqq
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 

Viewers also liked (20)

A Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management ProgramA Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management Program
 
CDC fy-2015-ofr-annual-report
CDC fy-2015-ofr-annual-reportCDC fy-2015-ofr-annual-report
CDC fy-2015-ofr-annual-report
 
Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
Тематическое планирование 7 класс
Тематическое планирование 7 классТематическое планирование 7 класс
Тематическое планирование 7 класс
 
DaedalusFBBlog
DaedalusFBBlogDaedalusFBBlog
DaedalusFBBlog
 
Internet un gran sector en el que emprender
Internet un gran sector en el que emprenderInternet un gran sector en el que emprender
Internet un gran sector en el que emprender
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop Security
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
Etimology
EtimologyEtimology
Etimology
 
Self esteem-2
Self esteem-2Self esteem-2
Self esteem-2
 
The use of_l1.a.reynolds
The use of_l1.a.reynoldsThe use of_l1.a.reynolds
The use of_l1.a.reynolds
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 
UK 2014
UK 2014UK 2014
UK 2014
 
Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016
 
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
 
Etymology - Communication
Etymology - CommunicationEtymology - Communication
Etymology - Communication
 
Самообразование
СамообразованиеСамообразование
Самообразование
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 
UX Team Of One
UX Team Of OneUX Team Of One
UX Team Of One
 

Similar to One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN

Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
appaji intelhunt
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data Overview
Prabhu Thukkaram
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
nnakasone
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
Ivan Donev
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
shrey mehrotra
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
omalreda
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Mahendran Ponnusamy
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
Amazon Web Services
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
Nan Zhu
 
YARN Services
YARN ServicesYARN Services
YARN Services
Steve Loughran
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
Steve Loughran
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDataWorks Summit
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Carol McDonald
 

Similar to One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN (20)

Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data Overview
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 

One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN

  • 1. Multi-Tenant Data Cloud with YARN & Helix LinkedIn - Data infra : Helix, Espresso @kishore_b_g Yahoo - Ads infra : S4 Kishore Gopalakrishna
  • 2. What is YARN Next Generation Compute Platform MapReduce HDFS Hadoop 1.0 MapReduce HDFS Hadoop 2.0 Others (Batch, Interactive, Online, Streaming) YARN (cluster resource management) A1 A1 A2 A3 B1 C1 C5 B2 B3 C2 B4 B5 C3 C4 Enables
  • 3. HDFS/Common Area YARN YARN Architecture Client Resource Manager Node Manager Node Manager submit job node statusnode status container request App Package Application Master Container
  • 4. So, let’s build something
  • 5. Example System Generate Data Serve M/R Redis Server 3 HDFS 3 - Generate data in Hadoop - Use it for serving
  • 6. Application Master Example System Request Containers Assign work Handle Failure Handle workload Changes Requirements Big Data :-) Partitioned, replicated Fault tolerant, Scalable Efficient resource utilization Generate Data Serve M/R Server 3 HDFS 3
  • 7. Allocation + Assignment HDFS Server 1 Server 2Server 3 Partition Assignment - affinity, even distribution Replica Placement - on different physical machines Container Allocation - data affinity, rack aware placement M/Rp1 p2 p3 p4 p5 p6 p1 p2 p5 p4 Server 3 p3 p4 p1 p6 Server 3 p5 p6 p3 p2 Multiple servers to serve the partitioned data M/R job generates partitioned data
  • 8. Failure Handling Server 1 Server 2Server 1 Acquire new container close to data if possible Assign failed partitions to new container On Failure - Even load distribution, while waiting for new container Server 23 Server 3 Server 4 p5 p4 p1 p6 p3 p2 p1 p2 p3 p4 p5 p6 p3 p2 p5 p6
  • 9. Workload Changes Server 1 Server 2Server 3 Workload change - Acquire/Release containers Container change - Re-distribute work Monitor - CPU, Memory, Latency, Tps p1 p2 p5 p4 Server 3 p3 p4 p1 p6 Server 3 p5 p6 p3 p2 Server 3 p4 p6 p2
  • 10. Service Discovery Server 1 Server 2Server 3 Dynamically updated on changes Discover everything, what is running where p1 p2 p1 p1 Server 3 p3 p4 p1 p1 Server 3 p5 p6 p1 p1 Client Client Service Discovery
  • 11. Building YARN Application Writing AM is Hard and Error Prone Handling Faults, Workload Changes is non-trivial and often overlooked Request container How many containers Where Assign work Place partitions & replicas Affinity Workload changes acquire/ release containers Minimize movement Faults Handling Detect non trivial failures new v/s reuse containers Other Service Discovery Monitoring Is there something that can make this easy?
  • 13. What is Helix? Built at LinkedIn, 2+ years in production Generic cluster management framework Contributed to Apache, now a TLP: helix.apache.org Decoupling cluster management from core functionality
  • 14. Helix at LinkedIn Oracle Oracle OracleDB Change Capture Change Consumers Index Search Index User Writes Data Replicator In Production ETL HDFS Analytics
  • 15. Helix at LinkedIn In Production Over 1000 instances covering over 30000 partitions Over 1000 instances for change capture consumers As many as 500 instances in a single Helix cluster (all numbers are per-datacenter)
  • 17. Helix concepts Resource (Database, Index, Topic, Task) Partitions Replicas p1 p2 p3 p4 p5 p6 r1 r2 r3 Container Process Container Process Container Process Assignment ?
  • 18. Serve bootstrap State Model and Constraints Helix Concepts State Constraints Transition Constraints Partition Resource Node Cluster Serve: 3 bootstrap: 0 Max T1 transitions in parallel - Max T2 transitions in parallel No more than 10 replicas Max T3 transitions in parallel - Max T4 transitions in parallel StateCount= Replication factor:3 Stop
  • 19. ParticipantParticipantParticipant Helix Architecture P1 stop bootstrap server P2 P5 P3 P4 P8 P6 P7 Controller Client Client Target Provider Provisioner Rebalancer assign work via callback spectator spectator Service Discovery metrics metrics
  • 21. Helix Controller Target Provider Determine how many containers are required along with the spec Fixed CPU Memory Bin Packing monitoring system provides usage information Default implementations, Bin Packing can be used to customize further TargetProvider Resources p1,p2 .. pn Existing containers c1,c2 .. cn Health of tasks, containers cpu, memory, health Allocation constraints Affinity, rack locality SLA Fixed: 10 containers CPU headroom:30% Memory Usage: 70% time: 5h Number of container release list acquire list Container spec cpu: x memory: y location: L
  • 22. Helix Controller Provisioner Given the container spec, interact with YARN RM to acquire/release, NM to start/stop containers YARN Interacts with YARN RM and subscribes to notifications
  • 23. Helix Controller Rebalancer Based on the current nodes in the cluster and constraints, find an assignment of task to node Auto Semi-Auto Static Rebalancer Tasks t1,t2 .. tn Existing containers c1,c2 .. cn Allocation constraints & objectives Affinity, rack locality, Even distribution of tasks, Minimize movement while expanding Assignment C1: t1,t2 C2: t3,t4 User defined Based on the FSM, compute & fire the transitions to Participants
  • 24. Example System: Helix-Based Solution Solution Configure App Configure Target Provider Configure Provisioner Configure Rebalancer Generate Data Serve M/R Server 3 HDFS 3
  • 25. Configure AppConfigure App App Name Partitioned Data Server App Master Package /path/to/ GenericHelixAppMaster.tar App package /path/to/ RedisServerLauncher.tar App Config DataDirectory: hdfs:/path/to/ data Configure target providerConfigure target provider TargetProvider RedisTargetProvider Goal Target TPS: 1 million Min container 1 Max containers 25 Configure ProvisionerConfigure Provisioner YARN RM host:port Configure RebalancerConfigure Rebalancer Partitions 6 Replica 2 Max partitions per container 4 Rebalancer.Mode AUTO Placement Data Affinity FailureHandling Even distribution Scaling Minimize Movement app_config_spec.yaml Example System: Helix-Based Solution
  • 27. Node ManagerNode Manager Application Master Helix + YARN Helix Controller Node Manager YARN Resource Manager Target Provider Provisioner Rebalancer assign work Client submit job Launch AM request cntrs launch containers Server 1 Server 2participant 3 p1 p2 p5 p4 participant 3 p3 p4 p1 p6 participant 3 p5 p6 p3 p2
  • 28. Auto Scaling Non linear scaling from 0 to 1M TPS and back
  • 29. Failure Handling: Random Faults Recovering from faults at 1M Tps (5%, 10%, 20% failures/min)
  • 30. Summary HDFS YARN (cluster resource management) HELIX (container + task management) Others (Batch, Interactive, Online, Streaming) Fault tolerance, Expansion handled transparently Generic Application Master Efficient resource utilization by task model