SlideShare a Scribd company logo
Non-Stop Hadoop: Adding R-A-S to your 
Hadoop clusters using a Globally Consistent 
HDFS Namespace 
Presented by Chris Almond @ Phoenix Data Conference 
October 2014
REALIZING THE POSSIBILITIES OF BIG DATA 2 
WWW.WANDISCO.COM 
For Today 
Who am I and what is this about? 
At Work: 
chris.almond@wandisco.com 
On line: 
www.linkedin.com/in/chrisalmond/ 
www.twitter.com/calmo 
Session Description: 
Hadoop has quickly evolved into the system of 
choice for storing and processing Big Data, 
and is now widely used to support mission-critical 
applications that operate within a ‘data 
lake’ style infrastructures. A critical requirement of 
such applications is the need for continuous 
operation even in the event of various system 
failures. This requirement has driven adoption of 
multi-data center Hadoop architectures, a.k.a 
geo-distributed or global Hadoop. In this session 
we will provide a brief introduction to 
WANdisco, then dig into how our Non-Stop 
Hadoop solution addresses real world use 
cases, and also a show live demonstration of 
Non-Stop namenode operation across two 
WAN connected hadoop clusters.
REALIZING THE POSSIBILITIES OF BIG DATA 3 
WWW.WANDISCO.COM 
WANdisco Background 
• WANdisco: Wide Area Network Distributed Computing 
– Enterprise ready, high availability software solutions that enable globally distributed 
organizations to meet today’s data challenges of secure storage, scalability and availability 
• Leader in tools for software engineers – Subversion 
– Apache Software Foundation sponsor 
• Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) 
• US patented active-active replication technology granted, November 2012 
• Global locations 
– San Ramon (CA) 
– Chengdu (China) 
– Tokyo (Japan) 
– Boston (MA) 
– Sheffield (UK) 
– Belfast (UK)
REALIZING THE POSSIBILITIES OF BIG DATA 4 
WWW.WANDISCO.COM 
Customers
REALIZING THE POSSIBILITIES OF BIG DATA 5 
WWW.WANDISCO.COM 
Non-Stop Hadoop 
Non-Intrusive Plugin 
Provides Continuous Availability 
In the LAN / Across the WAN 
Active/Active
REALIZING THE POSSIBILITIES OF BIG DATA 6 
WWW.WANDISCO.COM 
Key Problem For Multi Cluster Hadoop 
LAN / WAN 
+ 
=
Enterprise Ready Hadoop 
Characteristics of Mission Critical Applications 
REALIZING THE POSSIBILITIES OF BIG DATA 7 
WWW.WANDISCO.COM 
• Require Continuous Availability 
– SLA’s, Regulatory Compliance 
• Require HDFS to be Deployed Globally 
– Share Data Between Data Centers 
– Data is Consistent and Not Eventual 
• Ease Administrative Burden 
– Reduce Operational Complexity 
– Simplify Disaster Recovery 
– Lower RTO/RPO 
• Allow Maximum Utilization of 
Resource 
– Within the Data Center 
– Across Data Centers
REALIZING THE POSSIBILITIES OF BIG DATA 8 
WWW.WANDISCO.COM 
The difficulty realizing the data lake…
REALIZING THE POSSIBILITIES OF BIG DATA 9 
WWW.WANDISCO.COM 
… is that data spans the entire world
Breaking Away from Active/Passive 
What’s in a NameNode 
REALIZING THE POSSIBILITIES OF BIG DATA 10 
WWW.WANDISCO.COM 
Single Standby 
• Inefficient utilization of resource 
– Journal Nodes 
– ZooKeeper Nodes 
– Standby Node 
• Performance Bottleneck 
• Still tied to the beeper 
• Limited to LAN scope 
Active / Active 
• All resources utilized 
– Only NameNode configuration 
– Scale as the cluster grows 
– All NameNodes active 
• Load balancing 
• Set resiliency (# of active NN) 
• Global Consistency
Breaking Away from Active/Passive 
What’s in a Data Center 
REALIZING THE POSSIBILITIES OF BIG DATA 11 
WWW.WANDISCO.COM 
Standby Datacenter 
• Idle Resource 
– Single Data Center Ingest 
– Disaster Recovery Only 
• One way synchronization 
– DistCp 
• Error Prone 
– Clusters can diverge over time 
• Difficult to scale > 2 Data Centers 
– Complexity of sharing data 
increases 
Active / Active 
• DR Resource Available 
– Ingest at all Data Centers 
– Run Jobs in both Data Centers 
• Replication is Multi-Directional 
– active/active 
• Absolute Consistency 
– Single HDFS spans locations 
• ‘N’ Data Center support 
– Global HDFS allows appropriate 
data to be shared
REALIZING THE POSSIBILITIES OF BIG DATA 12 
WWW.WANDISCO.COM 
One Cluster Aproach 
• Example 
Applications 
– HBASE 
– RT Query 
– Map Reduce 
• Poor Resource 
Management 
– Data Locality Issues 
– Network Use 
– Complex 
Multiple Clusters
REALIZING THE POSSIBILITIES OF BIG DATA 13 
WWW.WANDISCO.COM 
Creating Multiple Clusters 
• Example 
Applications 
– HBASE 
– RT Query 
– Map Reduce 
• Need to share data 
between clusters 
– DistCp / Stale Data 
– Inefficient use of 
storage and or 
network 
– Some clusters may 
not be available 
Multiple Clusters
REALIZING THE POSSIBILITIES OF BIG DATA 14 
WWW.WANDISCO.COM 
Cluster Zones 
Zoning for Optimal Efficiency 
1 
HDFS 
100% 
Consistency
Absolute 
Consistency 
Maximum 
Resource 
Use 
Lower 
Recovery 
Time/Point 
REALIZING THE POSSIBILITIES OF BIG DATA 15 
WWW.WANDISCO.COM 
Multi Datacenter Hadoop 
Disaster Recovery 
WAN 
REPLICATION 
Replicate 
Only 
What 
You 
Want 
BeEer 
UHlizaHon 
of 
Power/Cooling 
Lower 
TCO 
LAN 
Speed 
Performance
Technical Overview 
Hadoop Powered by WANdisco
Multi Data Center Hadoop Today 
What's wrong with the status quo 
REALIZING THE POSSIBILITIES OF BIG DATA 17 
WWW.WANDISCO.COM 
Periodic Synchronization 
DistCp 
Parallel Data Ingest 
Load Balancer, Streaming
Multi Data Center Hadoop Today 
Hacks currently in use 
REALIZING THE POSSIBILITIES OF BIG DATA 18 
WWW.WANDISCO.COM 
Periodic Synchronization 
DistCp 
• Runs as Map reduce 
• DR Data Center is read only 
• Over time, Hadoop clusters 
become inconsistent 
• Manual and labor intensive 
process to reconcile differences 
• Inefficient us of the network
Multi Data Center Hadoop Today 
Hacks currently in use 
REALIZING THE POSSIBILITIES OF BIG DATA 19 
WWW.WANDISCO.COM 
Parallel Data Ingest 
Load Balancer, Flume 
• Hiccups in either of the Hadoop 
cluster causes the two file 
systems to diverge 
• Potential to run out of buffer when 
WAN is down 
• Requires constant attention and 
sys-admin hours to keep running 
• Data created on the cluster is not 
replicated 
• Use of streaming technologies 
(like flume) for data redirection are 
only for streaming
REALIZING THE POSSIBILITIES OF BIG DATA 20 
WWW.WANDISCO.COM 
DConE 
Distributed Coordination Engine 
• WANdisco’s patented WAN capable paxos implementation 
– Mathematically proven 
– Provides distributed co-ordination of File system metadata 
• Active/Active (All locations) 
• Create, Modify, Delete 
• Shared nothing (No Leader) 
• No restrictions on distance between datacenters 
– US Patent granted for time independent implementation of Paxos 
• Not based on SAN block device synchronization such as EMC SRDF 
– SAN block replication has distance limits resulting from the inability of file systems 
such as NTFS and ext4 to tolerate long RTTs to block storage 
– Possible distribution of corrupted blocks
PAXOS 
Paxos is a family of protocols for solving consensus in a network of 
unreliable processors. 
Consensus is the process of agreeing on one result among a group of 
participants. 
This problem becomes difficult when the participants or their 
communication medium may experience failures. 
REALIZING THE POSSIBILITIES OF BIG DATA 21 
WWW.WANDISCO.COM 
DConE 
Distributed Coordination Engine 
• WANdisco’s patented WAN capable paxos implementation 
– Mathematically proven 
– Provides distributed co-ordination of File system metadata 
• Active/Active (All locations) 
• Create, Modify, Delete 
• Shared nothing (No Leader) 
• No restrictions on distance between datacenters 
– US Patent granted for time independent implementation of Paxos 
• Not based on SAN block device synchronization such as EMC SRDF 
– SAN block replication has distance limits resulting from the inability of file systems 
such as NTFS and ext4 to tolerate long RTTs to block storage 
– Possible distribution of corrupted blocks
PAXOS 
Leslie 
Lamport: 
Any 
node 
that 
proposes 
aDer 
a 
decision 
has 
been 
reached 
must 
communicate 
with 
a 
node 
in 
the 
majority. 
The 
protocol 
guarantees 
that 
it 
will 
learn 
the 
previously 
agreed 
upon 
value 
from 
that 
majority. 
hEp://research.microsoW.com/en-­‐us/um/people/lamport/pubs/pubs.html 
REALIZING THE POSSIBILITIES OF BIG DATA 22 
WWW.WANDISCO.COM 
DConE 
Distributed Coordination Engine 
• WANdisco’s patented WAN capable paxos implementation 
– Mathematically proven 
– Provides distributed co-ordination of File system metadata 
• Active/Active (All locations) 
• Create, Modify, Delete 
• Shared nothing (No Leader) 
• No restrictions on distance between datacenters 
– US Patent granted for time independent implementation of Paxos 
• Not based on SAN block device synchronization such as EMC SRDF 
– SAN block replication has distance limits resulting from the inability of file systems 
such as NTFS and ext4 to tolerate long RTTs to block storage 
– Possible distribution of corrupted blocks 
hEp://research.microsoW.com/en-­‐us/um/people/lamport/pubs/lamport-­‐paxos.pdf 
hEp://css.csail.mit.edu/6.824/2014/ 
papers/paxos-­‐simple.pdf
PAXOS 
REALIZING THE POSSIBILITIES OF BIG DATA 23 
WWW.WANDISCO.COM 
DConE 
Distributed Coordination Engine 
• WANdisco’s patented WAN capable paxos implementation 
– Mathematically proven 
– Provides distributed co-ordination of File system metadata 
• Active/Active (All locations) 
• Create, Modify, Delete 
• Shared nothing (No Leader) 
• No restrictions on distance between datacenters 
– US Patent granted for time independent implementation of Paxos 
• Not based on SAN block device synchronization such as EMC SRDF 
– SAN block replication has distance limits resulting from the inability of file systems 
such as NTFS and ext4 to tolerate long RTTs to block storage 
– Possible distribution of corrupted blocks 
“Contrary to conventional wisdom, we 
were able to use Paxos to build a highly 
available system that provides 
reasonable latencies for interactive 
applications while synchronously 
replicating writes across geographically 
distributed datacenters.“ 
http://www.cidrdb.org/cidr2011/Papers/ 
CIDR11_Paper32.pdf …
How DConE Works 
WANdisco Active/Active Replication 
REALIZING THE POSSIBILITIES OF BIG DATA 24 
WWW.WANDISCO.COM 
• Majority Quorum 
– A fixed number of participants 
– The Majority must agree for change 
• Failure 
– Failed nodes are unavailable 
– Normal operation continue on nodes 
with quorum 
• Recovery / Self Healing 
– Nodes that rejoin stay in safe mode 
until they are caught up 
• Disaster Recovery 
– A complete loss can be brought back 
from another replica 
TX 
id: 
168 
TX 
id: 
169 
TX 
id: 
TX 
id: 
171 
TX 
id: 
172 
TX 
id: 
173 
TX 
id: 
168 
TX 
id: 
169 
TX 
id: 
TX 
id: 
171 
TX 
id: 
172 
TX 
id: 
173 
TX 
id: 
168 
TX 
id: 
169 
TX 
id: 
TX 
id: 
171 
TX 
id: 
172 
TX 
id: 
173 
Proposal 
170 
Agree 
170 
Agree 
170 
Proposal 
171 
Agree 
172 
Agree 
173 
Agree 
Proposal 
172 
Proposal 
173 
B 
A 
C 
Agree 
170 
Agree 
Agree 
Agree 
173
REALIZING THE POSSIBILITIES OF BIG DATA 25 
WWW.WANDISCO.COM 
Architecture of a Non-Stop Hadoop
Use Case: Disaster Recovery 
Use Cases 
REALIZING THE POSSIBILITIES OF BIG DATA 26 
WWW.WANDISCO.COM 
• Data is as current as possible (no 
periodic synchs) 
• Doesn’t require monitoring and 
consistency checking 
• Virtually zero downtime to recover 
from regional data center failure 
• Regulatory compliance
REALIZING THE POSSIBILITIES OF BIG DATA 27 
• Ingest and analyze anywhere 
• Analyze Everywhere 
– Fraud Detection 
– Equity Trading Information 
– New Business 
– Etc… 
• Backup Datacenter(s) can be used 
WWW.WANDISCO.COM 
for work 
– No idle resource 
Use Case: Multi Data-Center 
Ingest and multi-tenant workloads
Use Case: Zones 
REALIZING THE POSSIBILITIES OF BIG DATA 28 
WWW.WANDISCO.COM 
• Maximize Resource Utilization 
– No idle standby 
• Isolate Dev and Test Clusters 
– Share data not resource 
• Carve off hardware for a specific 
group 
– Prevents a bad map/reduce job from 
bringing down the cluster 
• Guarantee Consistency and 
availability of data 
– Data is instantly available
Use Case: Heterogeneous Hardware (Zones) 
In memory analytics 
REALIZING THE POSSIBILITIES OF BIG DATA 29 
WWW.WANDISCO.COM 
• Mixed Hardware Profiles 
– Memory, Disk, CPU 
– Isolate memory-hungry 
processing (Storm/Spark) from 
regular jobs 
• Share data, not processing 
– Isolate lower priority (dev/ 
test) work
REALIZING THE POSSIBILITIES OF BIG DATA 30 
WWW.WANDISCO.COM 
Data 
Ocean 
Feeder 
Site 
AccounHng 
Mart 
Banking 
Mart 
• Data Marts 
– Restrict access to relevant 
data 
– Create Quick Clusters 
• Feeder Sites (Data 
Tributaries) 
– Ingest Only 
Data Reservoir 
Use Cases
REALIZING THE POSSIBILITIES OF BIG DATA 31 
WWW.WANDISCO.COM 
• Basel III 
– Consistency of Data 
• Data Privacy Directive 
– Data Sovereignty 
• data doesn’t leave country of origin 
Compliance 
RegulaHon 
Guidelines 
Regulatory Compliance
5 Reasons your Hadoop Deployment Needs Wandisco 
REALIZING THE POSSIBILITIES OF BIG DATA 32 
WWW.WANDISCO.COM
REALIZING THE POSSIBILITIES OF BIG DATA 33 
WWW.WANDISCO.COM 
Non-Stop Hadoop Demonstration

More Related Content

What's hot

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
DataWorks Summit/Hadoop Summit
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011
GlusterFS
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableDisaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Stefan Kupstaitis-Dunkler
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
DataWorks Summit
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
Ravi Veeramachaneni
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
Romeo Kienzler
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
DataWorks Summit/Hadoop Summit
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
Kelly Technologies
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
musrath mohammad
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
hdhappy001
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
Atanu Chatterjee
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 

What's hot (20)

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableDisaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Hadoop
Hadoop Hadoop
Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 

Viewers also liked

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
Hortonworks
 
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
Amazon Web Services
 
Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesAlan McSweeney
 
android technology presentation
android technology presentationandroid technology presentation
android technology presentation
Nishul Tomar
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery PresentationTimSchaefer
 
An Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningAn Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningNEBizRecovery
 
The A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoveryThe A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster Recovery
Sirius
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IThhuihhui
 

Viewers also liked (10)

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
 
Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery Notes
 
android technology presentation
android technology presentationandroid technology presentation
android technology presentation
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery Presentation
 
An Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningAn Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery Planning
 
The A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoveryThe A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster Recovery
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IT
 

Similar to WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
shrey mehrotra
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
Kamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Red_Hat_Storage
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Jean-Pierre König
 
Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions
Mellanox Technologies
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
AmirReza Mohammadi
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Precisely
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 

Similar to WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 (20)

How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 

Recently uploaded

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 

Recently uploaded (20)

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 

WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

  • 1. Non-Stop Hadoop: Adding R-A-S to your Hadoop clusters using a Globally Consistent HDFS Namespace Presented by Chris Almond @ Phoenix Data Conference October 2014
  • 2. REALIZING THE POSSIBILITIES OF BIG DATA 2 WWW.WANDISCO.COM For Today Who am I and what is this about? At Work: chris.almond@wandisco.com On line: www.linkedin.com/in/chrisalmond/ www.twitter.com/calmo Session Description: Hadoop has quickly evolved into the system of choice for storing and processing Big Data, and is now widely used to support mission-critical applications that operate within a ‘data lake’ style infrastructures. A critical requirement of such applications is the need for continuous operation even in the event of various system failures. This requirement has driven adoption of multi-data center Hadoop architectures, a.k.a geo-distributed or global Hadoop. In this session we will provide a brief introduction to WANdisco, then dig into how our Non-Stop Hadoop solution addresses real world use cases, and also a show live demonstration of Non-Stop namenode operation across two WAN connected hadoop clusters.
  • 3. REALIZING THE POSSIBILITIES OF BIG DATA 3 WWW.WANDISCO.COM WANdisco Background • WANdisco: Wide Area Network Distributed Computing – Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability • Leader in tools for software engineers – Subversion – Apache Software Foundation sponsor • Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) • US patented active-active replication technology granted, November 2012 • Global locations – San Ramon (CA) – Chengdu (China) – Tokyo (Japan) – Boston (MA) – Sheffield (UK) – Belfast (UK)
  • 4. REALIZING THE POSSIBILITIES OF BIG DATA 4 WWW.WANDISCO.COM Customers
  • 5. REALIZING THE POSSIBILITIES OF BIG DATA 5 WWW.WANDISCO.COM Non-Stop Hadoop Non-Intrusive Plugin Provides Continuous Availability In the LAN / Across the WAN Active/Active
  • 6. REALIZING THE POSSIBILITIES OF BIG DATA 6 WWW.WANDISCO.COM Key Problem For Multi Cluster Hadoop LAN / WAN + =
  • 7. Enterprise Ready Hadoop Characteristics of Mission Critical Applications REALIZING THE POSSIBILITIES OF BIG DATA 7 WWW.WANDISCO.COM • Require Continuous Availability – SLA’s, Regulatory Compliance • Require HDFS to be Deployed Globally – Share Data Between Data Centers – Data is Consistent and Not Eventual • Ease Administrative Burden – Reduce Operational Complexity – Simplify Disaster Recovery – Lower RTO/RPO • Allow Maximum Utilization of Resource – Within the Data Center – Across Data Centers
  • 8. REALIZING THE POSSIBILITIES OF BIG DATA 8 WWW.WANDISCO.COM The difficulty realizing the data lake…
  • 9. REALIZING THE POSSIBILITIES OF BIG DATA 9 WWW.WANDISCO.COM … is that data spans the entire world
  • 10. Breaking Away from Active/Passive What’s in a NameNode REALIZING THE POSSIBILITIES OF BIG DATA 10 WWW.WANDISCO.COM Single Standby • Inefficient utilization of resource – Journal Nodes – ZooKeeper Nodes – Standby Node • Performance Bottleneck • Still tied to the beeper • Limited to LAN scope Active / Active • All resources utilized – Only NameNode configuration – Scale as the cluster grows – All NameNodes active • Load balancing • Set resiliency (# of active NN) • Global Consistency
  • 11. Breaking Away from Active/Passive What’s in a Data Center REALIZING THE POSSIBILITIES OF BIG DATA 11 WWW.WANDISCO.COM Standby Datacenter • Idle Resource – Single Data Center Ingest – Disaster Recovery Only • One way synchronization – DistCp • Error Prone – Clusters can diverge over time • Difficult to scale > 2 Data Centers – Complexity of sharing data increases Active / Active • DR Resource Available – Ingest at all Data Centers – Run Jobs in both Data Centers • Replication is Multi-Directional – active/active • Absolute Consistency – Single HDFS spans locations • ‘N’ Data Center support – Global HDFS allows appropriate data to be shared
  • 12. REALIZING THE POSSIBILITIES OF BIG DATA 12 WWW.WANDISCO.COM One Cluster Aproach • Example Applications – HBASE – RT Query – Map Reduce • Poor Resource Management – Data Locality Issues – Network Use – Complex Multiple Clusters
  • 13. REALIZING THE POSSIBILITIES OF BIG DATA 13 WWW.WANDISCO.COM Creating Multiple Clusters • Example Applications – HBASE – RT Query – Map Reduce • Need to share data between clusters – DistCp / Stale Data – Inefficient use of storage and or network – Some clusters may not be available Multiple Clusters
  • 14. REALIZING THE POSSIBILITIES OF BIG DATA 14 WWW.WANDISCO.COM Cluster Zones Zoning for Optimal Efficiency 1 HDFS 100% Consistency
  • 15. Absolute Consistency Maximum Resource Use Lower Recovery Time/Point REALIZING THE POSSIBILITIES OF BIG DATA 15 WWW.WANDISCO.COM Multi Datacenter Hadoop Disaster Recovery WAN REPLICATION Replicate Only What You Want BeEer UHlizaHon of Power/Cooling Lower TCO LAN Speed Performance
  • 16. Technical Overview Hadoop Powered by WANdisco
  • 17. Multi Data Center Hadoop Today What's wrong with the status quo REALIZING THE POSSIBILITIES OF BIG DATA 17 WWW.WANDISCO.COM Periodic Synchronization DistCp Parallel Data Ingest Load Balancer, Streaming
  • 18. Multi Data Center Hadoop Today Hacks currently in use REALIZING THE POSSIBILITIES OF BIG DATA 18 WWW.WANDISCO.COM Periodic Synchronization DistCp • Runs as Map reduce • DR Data Center is read only • Over time, Hadoop clusters become inconsistent • Manual and labor intensive process to reconcile differences • Inefficient us of the network
  • 19. Multi Data Center Hadoop Today Hacks currently in use REALIZING THE POSSIBILITIES OF BIG DATA 19 WWW.WANDISCO.COM Parallel Data Ingest Load Balancer, Flume • Hiccups in either of the Hadoop cluster causes the two file systems to diverge • Potential to run out of buffer when WAN is down • Requires constant attention and sys-admin hours to keep running • Data created on the cluster is not replicated • Use of streaming technologies (like flume) for data redirection are only for streaming
  • 20. REALIZING THE POSSIBILITIES OF BIG DATA 20 WWW.WANDISCO.COM DConE Distributed Coordination Engine • WANdisco’s patented WAN capable paxos implementation – Mathematically proven – Provides distributed co-ordination of File system metadata • Active/Active (All locations) • Create, Modify, Delete • Shared nothing (No Leader) • No restrictions on distance between datacenters – US Patent granted for time independent implementation of Paxos • Not based on SAN block device synchronization such as EMC SRDF – SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage – Possible distribution of corrupted blocks
  • 21. PAXOS Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures. REALIZING THE POSSIBILITIES OF BIG DATA 21 WWW.WANDISCO.COM DConE Distributed Coordination Engine • WANdisco’s patented WAN capable paxos implementation – Mathematically proven – Provides distributed co-ordination of File system metadata • Active/Active (All locations) • Create, Modify, Delete • Shared nothing (No Leader) • No restrictions on distance between datacenters – US Patent granted for time independent implementation of Paxos • Not based on SAN block device synchronization such as EMC SRDF – SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage – Possible distribution of corrupted blocks
  • 22. PAXOS Leslie Lamport: Any node that proposes aDer a decision has been reached must communicate with a node in the majority. The protocol guarantees that it will learn the previously agreed upon value from that majority. hEp://research.microsoW.com/en-­‐us/um/people/lamport/pubs/pubs.html REALIZING THE POSSIBILITIES OF BIG DATA 22 WWW.WANDISCO.COM DConE Distributed Coordination Engine • WANdisco’s patented WAN capable paxos implementation – Mathematically proven – Provides distributed co-ordination of File system metadata • Active/Active (All locations) • Create, Modify, Delete • Shared nothing (No Leader) • No restrictions on distance between datacenters – US Patent granted for time independent implementation of Paxos • Not based on SAN block device synchronization such as EMC SRDF – SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage – Possible distribution of corrupted blocks hEp://research.microsoW.com/en-­‐us/um/people/lamport/pubs/lamport-­‐paxos.pdf hEp://css.csail.mit.edu/6.824/2014/ papers/paxos-­‐simple.pdf
  • 23. PAXOS REALIZING THE POSSIBILITIES OF BIG DATA 23 WWW.WANDISCO.COM DConE Distributed Coordination Engine • WANdisco’s patented WAN capable paxos implementation – Mathematically proven – Provides distributed co-ordination of File system metadata • Active/Active (All locations) • Create, Modify, Delete • Shared nothing (No Leader) • No restrictions on distance between datacenters – US Patent granted for time independent implementation of Paxos • Not based on SAN block device synchronization such as EMC SRDF – SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage – Possible distribution of corrupted blocks “Contrary to conventional wisdom, we were able to use Paxos to build a highly available system that provides reasonable latencies for interactive applications while synchronously replicating writes across geographically distributed datacenters.“ http://www.cidrdb.org/cidr2011/Papers/ CIDR11_Paper32.pdf …
  • 24. How DConE Works WANdisco Active/Active Replication REALIZING THE POSSIBILITIES OF BIG DATA 24 WWW.WANDISCO.COM • Majority Quorum – A fixed number of participants – The Majority must agree for change • Failure – Failed nodes are unavailable – Normal operation continue on nodes with quorum • Recovery / Self Healing – Nodes that rejoin stay in safe mode until they are caught up • Disaster Recovery – A complete loss can be brought back from another replica TX id: 168 TX id: 169 TX id: TX id: 171 TX id: 172 TX id: 173 TX id: 168 TX id: 169 TX id: TX id: 171 TX id: 172 TX id: 173 TX id: 168 TX id: 169 TX id: TX id: 171 TX id: 172 TX id: 173 Proposal 170 Agree 170 Agree 170 Proposal 171 Agree 172 Agree 173 Agree Proposal 172 Proposal 173 B A C Agree 170 Agree Agree Agree 173
  • 25. REALIZING THE POSSIBILITIES OF BIG DATA 25 WWW.WANDISCO.COM Architecture of a Non-Stop Hadoop
  • 26. Use Case: Disaster Recovery Use Cases REALIZING THE POSSIBILITIES OF BIG DATA 26 WWW.WANDISCO.COM • Data is as current as possible (no periodic synchs) • Doesn’t require monitoring and consistency checking • Virtually zero downtime to recover from regional data center failure • Regulatory compliance
  • 27. REALIZING THE POSSIBILITIES OF BIG DATA 27 • Ingest and analyze anywhere • Analyze Everywhere – Fraud Detection – Equity Trading Information – New Business – Etc… • Backup Datacenter(s) can be used WWW.WANDISCO.COM for work – No idle resource Use Case: Multi Data-Center Ingest and multi-tenant workloads
  • 28. Use Case: Zones REALIZING THE POSSIBILITIES OF BIG DATA 28 WWW.WANDISCO.COM • Maximize Resource Utilization – No idle standby • Isolate Dev and Test Clusters – Share data not resource • Carve off hardware for a specific group – Prevents a bad map/reduce job from bringing down the cluster • Guarantee Consistency and availability of data – Data is instantly available
  • 29. Use Case: Heterogeneous Hardware (Zones) In memory analytics REALIZING THE POSSIBILITIES OF BIG DATA 29 WWW.WANDISCO.COM • Mixed Hardware Profiles – Memory, Disk, CPU – Isolate memory-hungry processing (Storm/Spark) from regular jobs • Share data, not processing – Isolate lower priority (dev/ test) work
  • 30. REALIZING THE POSSIBILITIES OF BIG DATA 30 WWW.WANDISCO.COM Data Ocean Feeder Site AccounHng Mart Banking Mart • Data Marts – Restrict access to relevant data – Create Quick Clusters • Feeder Sites (Data Tributaries) – Ingest Only Data Reservoir Use Cases
  • 31. REALIZING THE POSSIBILITIES OF BIG DATA 31 WWW.WANDISCO.COM • Basel III – Consistency of Data • Data Privacy Directive – Data Sovereignty • data doesn’t leave country of origin Compliance RegulaHon Guidelines Regulatory Compliance
  • 32. 5 Reasons your Hadoop Deployment Needs Wandisco REALIZING THE POSSIBILITIES OF BIG DATA 32 WWW.WANDISCO.COM
  • 33. REALIZING THE POSSIBILITIES OF BIG DATA 33 WWW.WANDISCO.COM Non-Stop Hadoop Demonstration