High level design for replicating HBase bulk loaded data. Detailed design document can be found at https://issues.apache.org/jira/browse/HBASE-13153.
It was presented at Apache HBase Meetup 2015, Bangalore.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
With YARN running Docker containers, it is possible to run applications that are not HDFS aware inside these containers. It is hard to customize these applications since most of them assume a Posix file system with rewrite capabilities. In this talk, we will dive into how we created a block storage, how it is being tested internally and the storage containers which makes it all possible.
The storage container framework was developed as part of Ozone (HDFS-7240). This is talk will also explore the current state of Ozone along with CBlocks. This talk will explore architecture of storage containers, how replication is handled, scaling to millions of volumes and I/O performance optimizations.
Hadoop was born much earlier than the Cloud Native era. But the question is still the same: what can it offer in the time of Kubernetes, containerization and hybrid clouds?
Apache Hadoop Ozone is a new subproject of Hadoop. It has a generic low-level binary layer, the Hadoop Distributed Data Storage (HDDS) and a S3 compatible Object Store implementation on top of it.
But the HDDS data storage layer is not just for the object store. It could be used for multiple purposes: to enhance the scalability the HDFS or provide block level access to the managed storage space. With this approach the same Hadoop Ozone cluster could provide hadoop file system based storage, object store space and block level storage.
Storage is still a hot topic with Kubernetes and in Cloud Native environments. Container Storage Interface specification is a vendor neutral standard to provide storage plugin for multiple container orchestration system.
Quadra provides block level access on top of the Hadoop Distributed Data Storage layer and it’s first class citizen of the containerized word. It implements the Container Storage Interface and can work as a Kubernetes dynamic volume provisioner.
In this talk we will demonstrate how the Hadoop Ozone storage could be used from containers. We will explain the basic storage type of Kubernetes clusters and show how Hadoop Ozone and Quadra could help to solve the storage problem in an industry standard way.
High level design for replicating HBase bulk loaded data. Detailed design document can be found at https://issues.apache.org/jira/browse/HBASE-13153.
It was presented at Apache HBase Meetup 2015, Bangalore.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
With YARN running Docker containers, it is possible to run applications that are not HDFS aware inside these containers. It is hard to customize these applications since most of them assume a Posix file system with rewrite capabilities. In this talk, we will dive into how we created a block storage, how it is being tested internally and the storage containers which makes it all possible.
The storage container framework was developed as part of Ozone (HDFS-7240). This is talk will also explore the current state of Ozone along with CBlocks. This talk will explore architecture of storage containers, how replication is handled, scaling to millions of volumes and I/O performance optimizations.
Hadoop was born much earlier than the Cloud Native era. But the question is still the same: what can it offer in the time of Kubernetes, containerization and hybrid clouds?
Apache Hadoop Ozone is a new subproject of Hadoop. It has a generic low-level binary layer, the Hadoop Distributed Data Storage (HDDS) and a S3 compatible Object Store implementation on top of it.
But the HDDS data storage layer is not just for the object store. It could be used for multiple purposes: to enhance the scalability the HDFS or provide block level access to the managed storage space. With this approach the same Hadoop Ozone cluster could provide hadoop file system based storage, object store space and block level storage.
Storage is still a hot topic with Kubernetes and in Cloud Native environments. Container Storage Interface specification is a vendor neutral standard to provide storage plugin for multiple container orchestration system.
Quadra provides block level access on top of the Hadoop Distributed Data Storage layer and it’s first class citizen of the containerized word. It implements the Container Storage Interface and can work as a Kubernetes dynamic volume provisioner.
In this talk we will demonstrate how the Hadoop Ozone storage could be used from containers. We will explain the basic storage type of Kubernetes clusters and show how Hadoop Ozone and Quadra could help to solve the storage problem in an industry standard way.
Overview of HBase cluster replication feature, covering implementation details as well as monitoring tools and tips for troubleshooting and support of Replication deployments.
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Azure HDInsight and Amazon EMR. In these settings - but also in more traditional, on premise deployments - applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems to achieve goals for durability, performance, and coordination.
Building on existing heterogeneous storage support, we add a storage tier to HDFS to work with external stores, allowing remote namespaces to be "mounted" in HDFS. This capability not only supports transparent caching of remote data as HDFS blocks, it also supports (a)synchronous writes to remote clusters for business continuity planning (BCP) and supports hybrid cloud architectures.
This idea was presented at last year’s Summit in San Jose. Lots of progress has been made since then and active development is ongoing at the Apache Software Foundation on branch HDFS-9806, driven by Microsoft and Western Digital. We will discuss the refined design & implementation and present how end-users and admins will be able to use this powerful functionality.
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. This is a session at the OpenStack summit given by Kamesh Pemmaraju at Dell and John Paul at University of Alabama. This will detail how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. After assessing a number of traditional storage scenarios, the University partnered with Dell and Inktank to architect a centralized cloud storage platform that was capable of scaling seamlessly and rapidly, was cost-effective, and that could leverage a single hardware infrastructure for the OpenStack compute and storage environment.
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Updated version of my talk about Hadoop 3.0 with the newest community updates.
Talk given at the codecentric Meetup Berlin on 31.08.2017 and on Data2Day Meetup on 28.09.2017 in Heidelberg.
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
In this deck from 2018 Swiss HPC Conference, Christian Kniep from Docker Inc. presents: State of Containers and the Convergence of HPC and BigData.
"This talk will recap the history of and what constitutes Linux Containers, before laying out how the technology is employed by various engines and what problems these engines have to solve. Afterward Christian will elaborate on why the advent of standards for images and runtimes moved the discussion from building and distributing containers to orchestrating containerized applications at scale. In conclusion attendees will get an update on how containers foster the convergence of Big Data and HPC workloads and the state of native HPC containers."
Learn more: http://docker.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Overview of HBase cluster replication feature, covering implementation details as well as monitoring tools and tips for troubleshooting and support of Replication deployments.
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Azure HDInsight and Amazon EMR. In these settings - but also in more traditional, on premise deployments - applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems to achieve goals for durability, performance, and coordination.
Building on existing heterogeneous storage support, we add a storage tier to HDFS to work with external stores, allowing remote namespaces to be "mounted" in HDFS. This capability not only supports transparent caching of remote data as HDFS blocks, it also supports (a)synchronous writes to remote clusters for business continuity planning (BCP) and supports hybrid cloud architectures.
This idea was presented at last year’s Summit in San Jose. Lots of progress has been made since then and active development is ongoing at the Apache Software Foundation on branch HDFS-9806, driven by Microsoft and Western Digital. We will discuss the refined design & implementation and present how end-users and admins will be able to use this powerful functionality.
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. This is a session at the OpenStack summit given by Kamesh Pemmaraju at Dell and John Paul at University of Alabama. This will detail how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. After assessing a number of traditional storage scenarios, the University partnered with Dell and Inktank to architect a centralized cloud storage platform that was capable of scaling seamlessly and rapidly, was cost-effective, and that could leverage a single hardware infrastructure for the OpenStack compute and storage environment.
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Updated version of my talk about Hadoop 3.0 with the newest community updates.
Talk given at the codecentric Meetup Berlin on 31.08.2017 and on Data2Day Meetup on 28.09.2017 in Heidelberg.
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
In this deck from 2018 Swiss HPC Conference, Christian Kniep from Docker Inc. presents: State of Containers and the Convergence of HPC and BigData.
"This talk will recap the history of and what constitutes Linux Containers, before laying out how the technology is employed by various engines and what problems these engines have to solve. Afterward Christian will elaborate on why the advent of standards for images and runtimes moved the discussion from building and distributing containers to orchestrating containerized applications at scale. In conclusion attendees will get an update on how containers foster the convergence of Big Data and HPC workloads and the state of native HPC containers."
Learn more: http://docker.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Changes Expected in Hadoop 3 | Getting to Know Hadoop 3 Alpha | Upcoming Hado...Edureka!
This Edureka tutorial on Hadoop 3 ( Hadoop Blog series: https://goo.gl/LFesy8 ) will help you to focus on the changes that are expected in Hadoop 3, as it's still in alpha phase. Apache community has incorporated many changes in Apache Hadoop 3 and is still working on some of them. So, we will be taking a broader look at the expected changes in Hadoop 3:
1. Support For Erasure Encoding In HDFS
2. YARN Timeline Service V.2
3. Shell Script Rewrite
4. Shaded Client Jars
5. Support For Opportunistic Containers
6. Mapreduce Task-level Native Optimization
7. Support For More Than 2 Passive Namenodes
8. Default Ports Of Multiple Services Have Been Changed
9. Intra-DataNode Balancer
Scaleable PHP Applications in KubernetesRobert Lemke
Kubernetes is also called the "distributed Linux of the cloud" – which implies that it provides fundamental infrastructure, which can solve a lot of challenges. Let’s see how PHP applications fit into this picture. In this presentation, we are going to explore when Kubernetes is a good fit for operating your PHP application and how it can be done in practice. We’ll look at the whole lifecycle: how to build your application, create or choose the right Docker images, deploy and scale, and how to deal with performance and monitoring. At the end you will have a good understanding about all the different stages and building blocks for running a PHP application with Kubernetes in production.
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
Apache Tez is a library to build data processing engines in Hadoop/YARN. It takes care of many common building blocks like scheduling, fault tolerance, speculation, security etc. so that the engine can focus on its core features. E.g. Apache Hive can focus on SQL optimization. There has been rapid adoption in projects like Hive, Pig, Flink, Cascading, Scalding and commercial products like Datameer and Syncsort. We will provide a brief overview of Tez and then look at new features for job monitoring in the Tez UI and performance debugging tools for Tez applications. Finally we will explore upcoming features like hybrid scheduling that open up new areas of performance and functionality.
The Hadoop Cluster Administration course at Edureka starts with the fundamental concepts of Apache Hadoop and Hadoop Cluster. It covers topics to deploy, manage, monitor, and secure a Hadoop Cluster. You will learn to configure backup options, diagnose and recover node failures in a Hadoop Cluster. The course will also cover HBase Administration. There will be many challenging, practical and focused hands-on exercises for the learners. Software professionals new to Hadoop can quickly learn the cluster administration through technical sessions and hands-on labs. By the end of this six week Hadoop Cluster Administration training, you will be prepared to understand and solve real world problems that you may come across while working on Hadoop Cluster.
Linux-Stammtisch Juli 2019, Munich: Talk by Mario-Leander Reimer (@LeanderReimer, Principal Software Architect at QAware)
=== Please download slides if blurred! ===
Abstract: Only a few years ago the move towards microservice architecture was the first big disruption in software engineering: instead of running monoliths, systems were now build, composed and run as autonomous services. But this came at the price of added development and infrastructure complexity. Serverless and FaaS seem to be the next disruption, they are the logical evolution trying to address some of the inherent technology complexity we are currently faced when building cloud native apps.
FaaS frameworks are currently popping up like mushrooms: Knative, Kubeless, OpenFn, Fission, OpenFaas or Open Whisk are just a few to name. But which one of these is safe to pick and use in your next project? Let's find out. This session will start off by briefly explaining the essence of Serverless application architecture. We will then define a criteria catalog for FaaS frameworks and continue by comparing and showcasing the most promising ones.
Docker for developers on mac and windowsDocker, Inc.
The whole Docker ecosystem exists today because of every single developer who found ways of using Docker to improve how they build software; whether streamlining production deployments, speeding up continuous integration systems or standing up an application on your laptop to hack on. In this talk we want to take a step back and look at where Docker sits today from the software developers point of view - and then jump ahead and talk about where it might go in the future. In this talk, we’ll discuss:
* Making Docker an everyday part of the developing software on the desktop, with Docker for Windows and Docker for Mac
* Docker Compose, and the future of describing applications as code
* How Docker provides the best tools for developing applications destined to run on any Kubernetes cluster
This session should be of interest to anyone who writes software; from people who want to hack on a few personal projects, to polyglot open source programmers and to professional developers working in tightly controlled environments. Everyone deserves a better developer experience.
Apache Hadoop 3 is coming! As the next major milestone for hadoop and big data, it attracts everyone's attention as showcase several bleeding-edge technologies and significant features across all components of Apache Hadoop: Erasure Coding in HDFS, Docker container support, Apache Slider integration and Native service support, Application Timeline Service version 2, Hadoop library updates and client-side class path isolation, etc. In this talk, first we will update the status of Hadoop 3.0 releasing work in apache community and the feasible path through alpha, beta towards GA. Then we will go deep diving on each new feature, include: development progress and maturity status in Hadoop 3. Last but not the least, as a new major release, Hadoop 3.0 will contain some incompatible API or CLI changes which could be challengeable for downstream projects and existing Hadoop users for upgrade - we will go through these major changes and explore its impact to other projects and users.
Speaker: Sanjay Radia, Founder and Chief Architect, Hortonworks
Docker storage designing a platform for persistent dataDocker, Inc.
Docker containers have popularised the concept of read-only/immutable infrastructure and lead to changes in system and application architecture across the IT industry. However nearly every application generates some data that will need to persist long after the life-span of the container that generated it. This talk will look at the best practices around persistent storage with containers, from providing design advice around the construction of your application/container to the functionality provided from storage vendors through the Docker Volume driver plugins.
This session from DockerCon 18 covers the types of applications that have a requirement of persistent or shared storage, it discusses the various implementation methods with Docker and finally looks at automating the process with both Swarm and Kubernetes.
The original deck in PPTX is available here https://www.dropbox.com/s/pzqi0wbaxdqeca7/DCSF18_Docker%20Storage.pptx?dl=0
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
1. HUAWEI TECHNOLOGIES CO., LTD.
www.huawei.com
Namenode High Availability
Uma Maheswara Rao G
maheswara@huawei.com
umamahesh@apache.org
2. HUAWEI TECHNOLOGIES CO., LTD. Page 2
Who am I
R&D Tech Lead in Huawei
Apache Hadoop Committer at ASF
Active Contributor to Apache BookKeeper at ASF
Main development focus on HDFS
NameNode HA
3. HUAWEI TECHNOLOGIES CO., LTD. Page 3
Huawei Hadoop R&D – In a Glance
• Secondary Index in HBase
• HDFS NN HA (Hadoop-2)
• Bookkeeper as shared storage for NN HA (Hadoop-2)
• HDFS NN HA (Hadoop-1)
• MapReduce ResourceManager HA (Hadoop-2 / YARN)
• MapReduce JobTracker HA (Hadoop-1)
• Hive HA
Hadoop Development
• Raised over 650 defects since Jan’11
• Fixed over 500 defects since Jan’11, and contributed back
to community.
Stabilization
NameNode HA
4. HUAWEI TECHNOLOGIES CO., LTD. Page 4
In 2011, we implemented HA in Hadoop 0.20.1, based on
Backup Namenode(BNN) and ZooKeeper at Huawei
Intelligent clients - find active NN from configured NNs
Streaming edits to the BNN
Sending block reports to both active NN and BNN
BNN does periodic checkpoints
ZK based leader election
Achieved hot standby and Automatic Switch
However, BNN based solution did not address double failure
scenario
Namenode High Availability
NameNode HA
5. HUAWEI TECHNOLOGIES CO., LTD. Page 5
What is Double Failure here?
Active NN Backup NN
Active NN Start - Active NN
Checkpoint – t1
Checkpoint – t2
NN-1 NN-2
NN-1 NN-2
X - Operation logs in NN1 and could not stream to
BNN (NN2)
Not aware of X - Operation logs
from NN1. DataLoss!!!
NN1 is active and NN2 is Backup node
NN1 failed to stream edits to the Backup node and removed the stream
NN1 received X- operation edit logs, where NN2 not aware of them
Now NN1 crashed, NN2 becomes active
NN2 continues to work but not aware of the edits after checkpoint:t1
NameNode HA
6. HUAWEI TECHNOLOGIES CO., LTD. Page 6
In 2011 - 2012, Community started discussion on Shared storage
approach
HDFS-1623 : Discussed double failure issue
Store the edit logs at a common place and share to both the
Namenodes
Huawei collaborated with community in HDFS-1623
development
Community discussions on double failure
Constraints with the HDFS-1623:
The shared storage is an SPOF. A highly available shared storage is
required.
NameNode HA
7. HUAWEI TECHNOLOGIES CO., LTD. Page 7Page 7
NameNode HA Architecture
Both NameNodes start as
standby and try loading the edits
Namenode
Standby
Failover
Controller
Namenode
Active
Failover
Controller
Shared storage
DN 1 DN 2 DN n
ZK 1 ZK 2 ZK n
Cmd
Monitor
NN
health
Namenode
Standby
Write
Block reports
Read Read
Monitor
NN
health
New process FailOver
Controller (ZKFC) responsible
for monitoring and failover
On ZK based leader election
ZKFC issue command to its local
NN to become Active or
Standby
Only Active NN will write to
shared storage, Standby NN will
read from it.
All DataNodes will send block
reports to both Namenodes to
achieve hot Standby.
NameNode HA
8. HUAWEI TECHNOLOGIES CO., LTD. Page 8
Shared storage options
NFS: May not be a better fit for many deployments as NFS is an
external device, costly, less control on timeouts etc.
QuorumJournalManager(QJM): Newly implemented in HDFS. It did
not come out from any of the Apache releases yet.
BoookKeeperJournalManager(BKJM): Integrated BookKeeper based
shared storage to HDFS. Available in Hadoop-2 onwards.
NameNode HA
9. HUAWEI TECHNOLOGIES CO., LTD. Page 9
BookKeeper is mainly inspired by Namenode. Can work as shared storage with HA
Namenode
Can Support Federated Namenodes
Dynamic scaling up/down of storage nodes(BK)
– Useful when HBase, Federated NN also uses it.
Multiple disk support - Reduces the disk seeks
Round Robin reads from quorum - distributes the load
HDFS-234 - BookKeeper based Journal manager
- Available in Hadoop-2 releases and trunk
Quorum based commits in trunk
– Parallel writes to write-quorum Bookies and wait for ack-quorum (see
BOOKKEEPER-208)
NameNode HA
Motivations to use BookKeeper as shared storage
Contd..
10. HUAWEI TECHNOLOGIES CO., LTD. Page 10
Motivations to use BookKeeper as shared storage
BookKeeper is in production use at Yahoo! for guaranteed delivery of log
messages to Hedwig Servers
Code was there for more than a year in Apache and has active
contributors and committers list
NameNode HA
11. HUAWEI TECHNOLOGIES CO., LTD. Page 11
BookKeeper at glance
Bookie: Storage node
Ledger: log file
Ensemble: group of bookies
storing a ledger
Writes to quorums of Bookies
Parallel writes to quorums
Wait for ack quorum (in BK 4.2)
Reads from the same quorum
NameNode HA
Throughput: max latency of ack quorum bookies response
12. HUAWEI TECHNOLOGIES CO., LTD. Page 12Page 12Page 12
NN HA Architecture with BK as shared storage
NameNode HA
Failover
Controller
Failover
Controller
Cmd
Monitor
NN
health
Write
Block reports
Read
Monitor
NN
health
BK 1 BK 2 BK n
Namenode
Active
B
K
J
M
Namenode
Standby
B
K
J
M
BookKeeper Journal Manager
(BKJM) is NameNode plugin
implementation, involves BK
client to read/write to/from BK
cluster
ZK 1 ZK 2 ZK n
DN 1 DN 2 DN n
13. HUAWEI TECHNOLOGIES CO., LTD. Page 13Page 13
NameNode HA
Depends on ZooKeeper for leader
election
Monitors the health of Namenodes
Invoke the RPC calls to NN to switch
the state when leader election
happens
Perform leader election when health
monitor find NN is bad
Fencing: ZKFC will kill the remote
NN before invoking switch
commands to ensure single Active
NN at any time.
ZooKeeper Failover Controller (ZKFC)
Namenode
Monitor
health
Transiti
on to
Active/
Standby
Namenode
Monitor
health
Transiti
on to
Active/
Standby
Fencing
Failover
Controlle
r
Failover
Controller
ZK
1
ZK
2
ZK
n
14. HUAWEI TECHNOLOGIES CO., LTD. Page 14
What if remote node is not accessible to kill?
• Requires to write a fence method with shared storage.
Note: Shared storage should support fencing for this purpose
How BK based shared storage handles fencing?
• BookKeeper allows single writer for a ledger
“Eliminates the need of strict ZKFC level fencing”
• BKJM solves the issue of concurrent ledger creations by two Active
Namenodes - see HDFS-3452
Special fencing scenarios:
NameNode HA
16. HUAWEI TECHNOLOGIES CO., LTD. Page 16
NameNode HA
Other Enhancements/Work In Progress
BOOKKEEPER-237 - Automatic recovery on Bookie failures.
Importance – To avoid the data-loss when multiple Bookie crashes
Entry read latency improvement with (round-robin + speculative) reads with less
timeout - Avoid latencies in switch
Already developed support for shared storage upgrades
Further improvements and work progress is in HDFS-3399
Take a look at BookKeeper community work
https://issues.apache.org/jira/browse/BOOKKEEPER
17. HUAWEI TECHNOLOGIES CO., LTD.
BookKeeper Auto Recovery (BOOKKEEPER-237)
Autorecovery is a new module in BookKeeper
Re-replicates the ledgers automatically when there are Bookie failures
Watch on Available Bookies
Detects the failed Bookie
ledgers
Add that ledgers ids in ZK
Auditor:
NameNode HA
18. HUAWEI TECHNOLOGIES CO., LTD.
Monitor for the ledger ids
published by Auditor
Scan and find the missed
fragments from the ledger
Initiates the re-replication
Clean up the re-replicated ledger
ids from ZK
ReplicationWorker:
NameNode HA
BookKeeper Auto Recovery (BOOKKEEPER-237)
19. HUAWEI TECHNOLOGIES CO., LTD.
Testing
Tested using HA cluster with HBase, MR, Hive, Huawei OM
500+ automated integration HA tests running in daily CI
Verified the fault tolerant tests with BK recovery feature – see the
list in HDFS-3399
Periodic kill node scenarios to verify the switch with BK – Did not
see any data-loss
In our test cluster, we did not see significant degradations and
expected linear degradations when we increase quorum
NameNode HA