SlideShare a Scribd company logo
1 of 34
Download to read offline
NOTRE
SINGULARITÉ ?
NOUS SOMMES
PLURIELS !
NOTRE
SINGULARITÉ ?
NOUS SOMMES
PLURIELS !
Mohamed Mehdi BEN AISSA
Big Data Practice Manager – Finaxys
Big Data ITO - CACIB
linkedin.com/in/mehdi-ben-aissa/ @Ben_Aissa_mehdi
HOW TO DESIGN A DISASTER RECOVERY PLAN FOR HDP
CLUSTERS?
2017-09
PLAN I. INTRODUCTION
II. BIG DATA DRP ARCHITECTURES
III. STRETCH CLUSTER ARCHITECTURE
IV. HDP ARCHITECTURE
V. HDFS : STRETCH CLUSTER CONFIGURATION
VI. YARN : STRETCH CLUSTER CONFIGURATION
VII. CONCLUSION
4
INTRODUCTION
INTRODUCTION
6
• SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability,
responsibilities) :
• RTO (Recovery Time Objective) : The targeted duration of time and a service level within
which a business process must be restored after a disaster
• RPO (Recovery Point Objective) : The maximum targeted period in which data might be
lost
• Goals :
24/7 RPO €
RTO=0 RPO=0 Cost=0 Consistency Performance
BIG DATA DRP ARCHITECTURES
BIG DATA DRP ARCHITECTURES : MULTI-CLUSTER ARCHITECTURE VS STRETCH CLUSTER
8
Cluster 1 Cluster 2
Data Center 1 Data Center 2
Data Center 3
Data
Replication
Replication
(1) Multi-cluster Architecture (2) Stretch Cluster
STRETCH CLUSTER : ARCHITECTURE
9
Control NodesControl Nodes
Gateway Node
Witness Nodes
Master Nodes
Worker Nodes
Gateway Node
DC1 DC2
DC3
HORTONWORKS DATA PLATFORM ARCHITECTURE
HORTONWORKS DATA PLATFORM : ARCHITECTURE (1)
11
HORTONWORKS DATA PLATFORM : ARCHITECTURE (2)
12
Data Collect & Storage
Data Processing Serving Layer
Apache Kafka
NoSQL Database Indexed Data Search
HORTONWORKS DATA PLATFORM : ARCHITECTURE (3)
13
HDFS Kafka
YARN HBase Solr Flume
Spark Tez MapReduce
Hive Oozie Sqoop
HDFS : STRETCH CLUSTER CONFIGURATION
STRETCH CLUSTER : HDFS ARCHITECTURE
15
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
STRETCH CLUSTER : HDFS ARCHITECTURE - DEFAULT CONFIGURATION
16
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
Cluster Config:
dfs.replication = 4
B
B
B
B
STRETCH CLUSTER : HDFS ARCHITECTURE – RACK AWARENESS (1)
17
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
Cluster Config:
dfs.replication = 4
Rack.awareness
B B B
B
STRETCH CLUSTER : HDFS ARCHITECTURE – RACK AWARENESS (2)
18
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
B B B
B B
Cluster Config:
dfs.replication = 4
Rack.awareness
STRETCH CLUSTER : HDFS ARCHITECTURE – HVE (1)
19
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
Cluster Config:
dfs.replication = 4
Rack.awareness
HVE (Hadoop Virtualization Extensions)
B B B B
STRETCH CLUSTER : HDFS ARCHITECTURE – HVE (2)
20
DC3
Zookeeper 1
Journalnode 1
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode
Rack3 Rack4
Cluster Config:
dfs.replication = 4
Rack.awareness
HVE (Hadoop Virtualization Extensions)
B B
DC1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Namenode
Rack1 Rack2
B B Timeout: 10mn
STRETCH CLUSTER : HDFS ARCHITECTURE – TIMEOUT
21
DC3
Zookeeper 1
Journalnode 1
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode
Rack3 Rack4
Cluster Config:
dfs.replication = 4
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Heartbeat.recheck-interval: 5mn-> 5s
B B
DC1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Namenode
Rack1 Rack2
B B Timeout: 10s
STRETCH CLUSTER : HDFS ARCHITECTURE – SPLIT BRAIN (1)
22
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
B B B B
Cluster Config:
dfs.replication = 4
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Heartbeat.recheck-interval: 5mn-> 5s
STRETCH CLUSTER : HDFS ARCHITECTURE – SPLIT BRAIN (2)
23
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
B B
Cluster Config:
dfs.replication = 4
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Heartbeat.recheck-interval: 5mn-> 5s
X B B
STRETCH CLUSTER : HDFS ARCHITECTURE – SPLIT BRAIN (3)
24
DC1
DC3
Zookeeper 1
Journalnode 1
Zookeeper 2
Journalnode 2
Zookeeper 3
Journalnode 3
DC2
Zookeeper 4
Journalnode 4
Zookeeper 5
Journalnode 5
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Namenode Namenode
Rack1 Rack2 Rack3 Rack4
B B
X B B
Cluster Config:
dfs.replication = 4
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Heartbeat.recheck-interval: 5mn-> 5s
YARN : STRETCH CLUSTER CONFIGURATION
STRETCH CLUSTER : YARN ARCHITECTURE (1)
26
DC1
DC3
Zookeeper 1
Zookeeper 2 Zookeeper 3
DC2
Zookeeper 4 Zookeeper 5
Node
Manager 1
Node
Manager 2
Node
Manager 3
Node
Manager 4
Node
Manager 5
Node
Manager 6
Node
Manager 7
Node
Manager 8
Resource
Manager
Resource
Manager
Rack1 Rack2 Rack3 Rack4
Cluster Config:
Rack.awareness
HVE (Hadoop Virtualization Extensions)
A AA A
STRETCH CLUSTER : YARN ARCHITECTURE (2)
27
DC1
DC3
Zookeeper 1
Zookeeper 2 Zookeeper 3
DC2
Zookeeper 4 Zookeeper 5
Node
Manager 1
Node
Manager 2
Node
Manager 7
Node
Manager 8
Resource
Manager
Resource
Manager
Rack1 Rack2 Rack3 Rack4
Cluster Config:
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Node Labels
Node.label: dc1 Node.label: dc2
Node
Manager 3
Node
Manager 4
Node
Manager 5
Node
Manager 6
A A
A A
A A
A A
STRETCH CLUSTER : YARN ARCHITECTURE - DC FAILURE
28
DC1
DC3
Zookeeper 1
Zookeeper 2 Zookeeper 3
DC2
Zookeeper 4 Zookeeper 5
Node
Manager 1
Node
Manager 2
Node
Manager 7
Node
Manager 8
Resource
Manager
Resource
Manager
Rack1 Rack2 Rack3 Rack4
Cluster Config:
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Node Labels
Node
Manager 3
Node
Manager 4
Node
Manager 5
Node
Manager 6
A A
A A
A A
A A
A A
A A
STRETCH CLUSTER : YARN ARCHITECTURE - SPLIT BRAIN (1)
29
DC1
DC3
Zookeeper 1
Zookeeper 2 Zookeeper 3
DC2
Zookeeper 4 Zookeeper 5
Node
Manager 1
Node
Manager 2
Node
Manager 7
Node
Manager 8
Resource
Manager
Resource
Manager
Rack1 Rack2 Rack3 Rack4
Cluster Config:
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Node Labels
Node
Manager 3
Node
Manager 4
Node
Manager 5
Node
Manager 6
A A
A A
A A
A A
STRETCH CLUSTER : YARN ARCHITECTURE – SPLIT BRAIN (2)
30
DC1
DC3
Zookeeper 1
Zookeeper 2 Zookeeper 3
DC2
Zookeeper 4 Zookeeper 5
Node
Manager 1
Node
Manager 2
Node
Manager 7
Node
Manager 8
Resource
Manager
Resource
Manager
Rack1 Rack2 Rack3 Rack4
Cluster Config:
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Node Labels
Node
Manager 3
Node
Manager 4
Node
Manager 5
Node
Manager 6
XA A
A A
A A
A A
A A
A A
STRETCH CLUSTER : YARN ARCHITECTURE – SPLIT BRAIN (2)
31
DC1
DC3
Zookeeper 1
Zookeeper 2 Zookeeper 3
DC2
Zookeeper 4 Zookeeper 5
Node
Manager 1
Node
Manager 2
Node
Manager 7
Node
Manager 8
Resource
Manager
Resource
Manager
Rack1 Rack2 Rack3 Rack4
Cluster Config:
Rack.awareness
HVE (Hadoop Virtualization Extensions)
Node Labels
Node
Manager 3
Node
Manager 4
Node
Manager 5
Node
Manager 6
XA A
A A
A A
A A
CONCLUSION
CONCLUSION
33
• There is no one ideal architecture that can respond to all needs:
o RPO = 0
o RTO = 0
o Performance
o Consistency
• You can combine many architectures in the same Cluster : Hybrid Architectures
• Monitoring Tools are required to keep track of your replication process and have a global
visibility about your cluster status
• Resiliency and Performance Tests are required to validate your DRP Architecture
MERCI
34

More Related Content

What's hot

Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015DevOpsDays Tel Aviv
 
Cloudera Impala 1.0
Cloudera Impala 1.0Cloudera Impala 1.0
Cloudera Impala 1.0Minwoo Kim
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!Databricks
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 

What's hot (20)

Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
 
Cloudera Impala 1.0
Cloudera Impala 1.0Cloudera Impala 1.0
Cloudera Impala 1.0
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 

Similar to DRP (Stretch Cluster) for HDP - Future of Data : Paris

Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Geo Replicated Databases For Disaster Recovery Using CRDT
Geo Replicated Databases For Disaster Recovery Using CRDTGeo Replicated Databases For Disaster Recovery Using CRDT
Geo Replicated Databases For Disaster Recovery Using CRDTRedis Labs
 
How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020OVHcloud
 
Tendências e Evoluções em Armazemamento de Dados
Tendências e Evoluções em Armazemamento de Dados Tendências e Evoluções em Armazemamento de Dados
Tendências e Evoluções em Armazemamento de Dados Jefferson Alcantara
 
DRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing ArchitecturesDRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing ArchitecturesMohamed Mehdi Ben Aissa
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaHenning Jacobs
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDoKC
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
OUGN winning performnace challenges in oracle Multitenant
OUGN   winning performnace challenges in oracle MultitenantOUGN   winning performnace challenges in oracle Multitenant
OUGN winning performnace challenges in oracle MultitenantPini Dibask
 
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareNikhil Kumar
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesMesosphere Inc.
 
High Availability for Oracle SE2
High Availability for Oracle SE2High Availability for Oracle SE2
High Availability for Oracle SE2Markus Flechtner
 
Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARNFerran Galí Reniu
 
Winning performance challenges in oracle multitenant
Winning performance challenges in oracle multitenantWinning performance challenges in oracle multitenant
Winning performance challenges in oracle multitenantPini Dibask
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 

Similar to DRP (Stretch Cluster) for HDP - Future of Data : Paris (20)

Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Geo Replicated Databases For Disaster Recovery Using CRDT
Geo Replicated Databases For Disaster Recovery Using CRDTGeo Replicated Databases For Disaster Recovery Using CRDT
Geo Replicated Databases For Disaster Recovery Using CRDT
 
How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020
 
Tendências e Evoluções em Armazemamento de Dados
Tendências e Evoluções em Armazemamento de Dados Tendências e Evoluções em Armazemamento de Dados
Tendências e Evoluções em Armazemamento de Dados
 
DRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing ArchitecturesDRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing Architectures
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe Barcelona
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
OUGN winning performnace challenges in oracle Multitenant
OUGN   winning performnace challenges in oracle MultitenantOUGN   winning performnace challenges in oracle Multitenant
OUGN winning performnace challenges in oracle Multitenant
 
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 Clusterware
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
 
High Availability for Oracle SE2
High Availability for Oracle SE2High Availability for Oracle SE2
High Availability for Oracle SE2
 
Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARN
 
Winning performance challenges in oracle multitenant
Winning performance challenges in oracle multitenantWinning performance challenges in oracle multitenant
Winning performance challenges in oracle multitenant
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 

DRP (Stretch Cluster) for HDP - Future of Data : Paris

  • 2. NOTRE SINGULARITÉ ? NOUS SOMMES PLURIELS ! Mohamed Mehdi BEN AISSA Big Data Practice Manager – Finaxys Big Data ITO - CACIB linkedin.com/in/mehdi-ben-aissa/ @Ben_Aissa_mehdi
  • 3. HOW TO DESIGN A DISASTER RECOVERY PLAN FOR HDP CLUSTERS? 2017-09
  • 4. PLAN I. INTRODUCTION II. BIG DATA DRP ARCHITECTURES III. STRETCH CLUSTER ARCHITECTURE IV. HDP ARCHITECTURE V. HDFS : STRETCH CLUSTER CONFIGURATION VI. YARN : STRETCH CLUSTER CONFIGURATION VII. CONCLUSION 4
  • 6. INTRODUCTION 6 • SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability, responsibilities) : • RTO (Recovery Time Objective) : The targeted duration of time and a service level within which a business process must be restored after a disaster • RPO (Recovery Point Objective) : The maximum targeted period in which data might be lost • Goals : 24/7 RPO € RTO=0 RPO=0 Cost=0 Consistency Performance
  • 7. BIG DATA DRP ARCHITECTURES
  • 8. BIG DATA DRP ARCHITECTURES : MULTI-CLUSTER ARCHITECTURE VS STRETCH CLUSTER 8 Cluster 1 Cluster 2 Data Center 1 Data Center 2 Data Center 3 Data Replication Replication (1) Multi-cluster Architecture (2) Stretch Cluster
  • 9. STRETCH CLUSTER : ARCHITECTURE 9 Control NodesControl Nodes Gateway Node Witness Nodes Master Nodes Worker Nodes Gateway Node DC1 DC2 DC3
  • 11. HORTONWORKS DATA PLATFORM : ARCHITECTURE (1) 11
  • 12. HORTONWORKS DATA PLATFORM : ARCHITECTURE (2) 12 Data Collect & Storage Data Processing Serving Layer Apache Kafka NoSQL Database Indexed Data Search
  • 13. HORTONWORKS DATA PLATFORM : ARCHITECTURE (3) 13 HDFS Kafka YARN HBase Solr Flume Spark Tez MapReduce Hive Oozie Sqoop
  • 14. HDFS : STRETCH CLUSTER CONFIGURATION
  • 15. STRETCH CLUSTER : HDFS ARCHITECTURE 15 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4
  • 16. STRETCH CLUSTER : HDFS ARCHITECTURE - DEFAULT CONFIGURATION 16 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4 Cluster Config: dfs.replication = 4 B B B B
  • 17. STRETCH CLUSTER : HDFS ARCHITECTURE – RACK AWARENESS (1) 17 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4 Cluster Config: dfs.replication = 4 Rack.awareness B B B B
  • 18. STRETCH CLUSTER : HDFS ARCHITECTURE – RACK AWARENESS (2) 18 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4 B B B B B Cluster Config: dfs.replication = 4 Rack.awareness
  • 19. STRETCH CLUSTER : HDFS ARCHITECTURE – HVE (1) 19 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4 Cluster Config: dfs.replication = 4 Rack.awareness HVE (Hadoop Virtualization Extensions) B B B B
  • 20. STRETCH CLUSTER : HDFS ARCHITECTURE – HVE (2) 20 DC3 Zookeeper 1 Journalnode 1 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Rack3 Rack4 Cluster Config: dfs.replication = 4 Rack.awareness HVE (Hadoop Virtualization Extensions) B B DC1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Namenode Rack1 Rack2 B B Timeout: 10mn
  • 21. STRETCH CLUSTER : HDFS ARCHITECTURE – TIMEOUT 21 DC3 Zookeeper 1 Journalnode 1 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Rack3 Rack4 Cluster Config: dfs.replication = 4 Rack.awareness HVE (Hadoop Virtualization Extensions) Heartbeat.recheck-interval: 5mn-> 5s B B DC1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Namenode Rack1 Rack2 B B Timeout: 10s
  • 22. STRETCH CLUSTER : HDFS ARCHITECTURE – SPLIT BRAIN (1) 22 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4 B B B B Cluster Config: dfs.replication = 4 Rack.awareness HVE (Hadoop Virtualization Extensions) Heartbeat.recheck-interval: 5mn-> 5s
  • 23. STRETCH CLUSTER : HDFS ARCHITECTURE – SPLIT BRAIN (2) 23 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4 B B Cluster Config: dfs.replication = 4 Rack.awareness HVE (Hadoop Virtualization Extensions) Heartbeat.recheck-interval: 5mn-> 5s X B B
  • 24. STRETCH CLUSTER : HDFS ARCHITECTURE – SPLIT BRAIN (3) 24 DC1 DC3 Zookeeper 1 Journalnode 1 Zookeeper 2 Journalnode 2 Zookeeper 3 Journalnode 3 DC2 Zookeeper 4 Journalnode 4 Zookeeper 5 Journalnode 5 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Namenode Namenode Rack1 Rack2 Rack3 Rack4 B B X B B Cluster Config: dfs.replication = 4 Rack.awareness HVE (Hadoop Virtualization Extensions) Heartbeat.recheck-interval: 5mn-> 5s
  • 25. YARN : STRETCH CLUSTER CONFIGURATION
  • 26. STRETCH CLUSTER : YARN ARCHITECTURE (1) 26 DC1 DC3 Zookeeper 1 Zookeeper 2 Zookeeper 3 DC2 Zookeeper 4 Zookeeper 5 Node Manager 1 Node Manager 2 Node Manager 3 Node Manager 4 Node Manager 5 Node Manager 6 Node Manager 7 Node Manager 8 Resource Manager Resource Manager Rack1 Rack2 Rack3 Rack4 Cluster Config: Rack.awareness HVE (Hadoop Virtualization Extensions) A AA A
  • 27. STRETCH CLUSTER : YARN ARCHITECTURE (2) 27 DC1 DC3 Zookeeper 1 Zookeeper 2 Zookeeper 3 DC2 Zookeeper 4 Zookeeper 5 Node Manager 1 Node Manager 2 Node Manager 7 Node Manager 8 Resource Manager Resource Manager Rack1 Rack2 Rack3 Rack4 Cluster Config: Rack.awareness HVE (Hadoop Virtualization Extensions) Node Labels Node.label: dc1 Node.label: dc2 Node Manager 3 Node Manager 4 Node Manager 5 Node Manager 6 A A A A A A A A
  • 28. STRETCH CLUSTER : YARN ARCHITECTURE - DC FAILURE 28 DC1 DC3 Zookeeper 1 Zookeeper 2 Zookeeper 3 DC2 Zookeeper 4 Zookeeper 5 Node Manager 1 Node Manager 2 Node Manager 7 Node Manager 8 Resource Manager Resource Manager Rack1 Rack2 Rack3 Rack4 Cluster Config: Rack.awareness HVE (Hadoop Virtualization Extensions) Node Labels Node Manager 3 Node Manager 4 Node Manager 5 Node Manager 6 A A A A A A A A A A A A
  • 29. STRETCH CLUSTER : YARN ARCHITECTURE - SPLIT BRAIN (1) 29 DC1 DC3 Zookeeper 1 Zookeeper 2 Zookeeper 3 DC2 Zookeeper 4 Zookeeper 5 Node Manager 1 Node Manager 2 Node Manager 7 Node Manager 8 Resource Manager Resource Manager Rack1 Rack2 Rack3 Rack4 Cluster Config: Rack.awareness HVE (Hadoop Virtualization Extensions) Node Labels Node Manager 3 Node Manager 4 Node Manager 5 Node Manager 6 A A A A A A A A
  • 30. STRETCH CLUSTER : YARN ARCHITECTURE – SPLIT BRAIN (2) 30 DC1 DC3 Zookeeper 1 Zookeeper 2 Zookeeper 3 DC2 Zookeeper 4 Zookeeper 5 Node Manager 1 Node Manager 2 Node Manager 7 Node Manager 8 Resource Manager Resource Manager Rack1 Rack2 Rack3 Rack4 Cluster Config: Rack.awareness HVE (Hadoop Virtualization Extensions) Node Labels Node Manager 3 Node Manager 4 Node Manager 5 Node Manager 6 XA A A A A A A A A A A A
  • 31. STRETCH CLUSTER : YARN ARCHITECTURE – SPLIT BRAIN (2) 31 DC1 DC3 Zookeeper 1 Zookeeper 2 Zookeeper 3 DC2 Zookeeper 4 Zookeeper 5 Node Manager 1 Node Manager 2 Node Manager 7 Node Manager 8 Resource Manager Resource Manager Rack1 Rack2 Rack3 Rack4 Cluster Config: Rack.awareness HVE (Hadoop Virtualization Extensions) Node Labels Node Manager 3 Node Manager 4 Node Manager 5 Node Manager 6 XA A A A A A A A
  • 33. CONCLUSION 33 • There is no one ideal architecture that can respond to all needs: o RPO = 0 o RTO = 0 o Performance o Consistency • You can combine many architectures in the same Cluster : Hybrid Architectures • Monitoring Tools are required to keep track of your replication process and have a global visibility about your cluster status • Resiliency and Performance Tests are required to validate your DRP Architecture