HBase is a distributed, scalable, big data store modeled after Google's Bigtable. The document outlines the key aspects of HBase, including that it uses HDFS for storage, Zookeeper for coordination, and can optionally use MapReduce for batch processing. It describes HBase's architecture with a master server distributing regions across multiple region servers, which store and serve data from memory and disks.
With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
Apache HBase is an open source distributed data-store capable of managing billions of rows of semi-structured data across large clusters of commodity hardware. HBase provides real-time random read-write access as well as integration with Hadoop MapReduce, Hive, and Pig for batch analysis. In this talk, Todd will provide an introduction to the capabilities and characteristics of HBase, comparing and contrasting it with traditional database systems. He will also introduce its architecture and data model, and present some example use cases.
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.
With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
Apache HBase is an open source distributed data-store capable of managing billions of rows of semi-structured data across large clusters of commodity hardware. HBase provides real-time random read-write access as well as integration with Hadoop MapReduce, Hive, and Pig for batch analysis. In this talk, Todd will provide an introduction to the capabilities and characteristics of HBase, comparing and contrasting it with traditional database systems. He will also introduce its architecture and data model, and present some example use cases.
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
"While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns. "
Introduction to HBase. HBase is a NoSQL databases which experienced a tremendous increase in popularity during the last years. Large companies like Facebook, LinkedIn, Foursquare are using HBase. In this presentation we will address questions like: what is HBase?, and compared to relational databases?, what is the architecture?, how does HBase work?, what about the schema design?, what about the IT ressources?. Questions that should help you consider whether this solution might be suitable in your case.
Jesse Anderson (Smoking Hand)
This early-morning session offers an overview of what HBase is, how it works, its API, and considerations for using HBase as part of a Big Data solution. It will be helpful for people who are new to HBase, and also serve as a refresher for those who may need one.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they
http://berlinbuzzwords.de/sessions/advanced-hbase-schema-design
Speaker: Jesse Anderson (Cloudera)
As optional pre-conference prep for attendees who are new to HBase, this talk will offer a brief Cliff's Notes-level talk covering architecture, API, and schema design. The architecture section will cover the daemons and their functions, the API section will cover HBase's GET, PUT, and SCAN classes; and the schema design section will cover how HBase differs from an RDBMS and the amount of effort to place on schema and row-key design.
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
Apache HDFS, the file system on which HBase is most commonly deployed, was originally designed for high-latency high-throughput batch analytic systems like MapReduce. Over the past two to three years, the rising popularity of HBase has driven many enhancements in HDFS to improve its suitability for real-time systems, including durability support for write-ahead logs, high availability, and improved low-latency performance. This talk will give a brief history of some of the enhancements from Hadoop 0.20.2 through 0.23.0, discuss some of the most exciting work currently under way, and explore some of the future enhancements we expect to develop in the coming years. We will include both high-level overviews of the new features as well as practical tips and benchmark results from real deployments.
In this session you will learn:
What is Big Data?
What is Hadoop?
Overview of Hadoop Ecosystem
Hadoop Distributed File System or HDFS
Hadoop Cluster Modes
Yarn
MapReduce
Hive
Pig
Zookeeper
Flume
Sqoop
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
This talk examines HBase client options available to application developers working with HBase. The focus is framed on, but not limited to, building webapps.
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
"While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns. "
Introduction to HBase. HBase is a NoSQL databases which experienced a tremendous increase in popularity during the last years. Large companies like Facebook, LinkedIn, Foursquare are using HBase. In this presentation we will address questions like: what is HBase?, and compared to relational databases?, what is the architecture?, how does HBase work?, what about the schema design?, what about the IT ressources?. Questions that should help you consider whether this solution might be suitable in your case.
Jesse Anderson (Smoking Hand)
This early-morning session offers an overview of what HBase is, how it works, its API, and considerations for using HBase as part of a Big Data solution. It will be helpful for people who are new to HBase, and also serve as a refresher for those who may need one.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they
http://berlinbuzzwords.de/sessions/advanced-hbase-schema-design
Speaker: Jesse Anderson (Cloudera)
As optional pre-conference prep for attendees who are new to HBase, this talk will offer a brief Cliff's Notes-level talk covering architecture, API, and schema design. The architecture section will cover the daemons and their functions, the API section will cover HBase's GET, PUT, and SCAN classes; and the schema design section will cover how HBase differs from an RDBMS and the amount of effort to place on schema and row-key design.
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
Apache HDFS, the file system on which HBase is most commonly deployed, was originally designed for high-latency high-throughput batch analytic systems like MapReduce. Over the past two to three years, the rising popularity of HBase has driven many enhancements in HDFS to improve its suitability for real-time systems, including durability support for write-ahead logs, high availability, and improved low-latency performance. This talk will give a brief history of some of the enhancements from Hadoop 0.20.2 through 0.23.0, discuss some of the most exciting work currently under way, and explore some of the future enhancements we expect to develop in the coming years. We will include both high-level overviews of the new features as well as practical tips and benchmark results from real deployments.
In this session you will learn:
What is Big Data?
What is Hadoop?
Overview of Hadoop Ecosystem
Hadoop Distributed File System or HDFS
Hadoop Cluster Modes
Yarn
MapReduce
Hive
Pig
Zookeeper
Flume
Sqoop
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
This talk examines HBase client options available to application developers working with HBase. The focus is framed on, but not limited to, building webapps.
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
HBase adoption continues to explode amid rapid customer success and unbridled innovation. HBase with its limitless scalability, high reliability and deep integration with Hadoop ecosystem tools, offers enterprise developers a rich platform on which to build their next generation applications. In this workshop we will explore HBase SQL capabilities, deep Hadoop ecosystem integrations and deployment & management best practices.
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
HBase Technical Introduction. This deck includes a description of memory design, write path, read path, some operational tidbits, SQL on HBase (Phoenix and Hive), as well as HOYA (HBase on YARN).
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)tatsuya6502
This is the Japanese translation of the presentation at Tokyo HBase Meetup (July 1, 2011)
Author:
Jonathan Gray
Software Engineer / HBase Commiter at Facebook
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
Apache Hive provides SQL-like access to your stored data in Apache Hadoop. Apache HBase stores tabular data in Hadoop and supports update operations. The combination of these two capabilities is often desired, however, the current integration show limitations such as performance issues. In this talk, Enis Soztutar will present an overview of Hive and HBase and discuss new updates/improvements from the community on the integration of these two projects. Various techniques used to reduce data exchange and improve efficiency will also be provided.
If you've used a modern, interactive map such as Google or Bing Maps, you've consumed "map tiles". Map tiles are small images rendering a piece of the mosaic that is the whole map. Using conventional means, rendering tiles for the whole globe at multiple resolutions is a huge data processing effort. Even highly optimized, it spans a couple TBs and a few days of computation. Enter Hadoop. In this talk, I'll show you how to generate your own custom tiles using Hadoop. There will be pretty pictures.
We start by looking at distributed database features that impact latency. Then we take a deeper look at the HBase read and write paths with a focus on request latency. We examine the sources of latency and how to minimize them.
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
ABSTRACT : Based on the cost saving, this Hadoop distributed cluster based on raspberry is designed for the storage and processing of massive data. This paper expounds the two core technologies in the Hadoop software framework - HDFS distributed file system architecture and MapReduce distributed processing mechanism. The construction method of the cluster is described in detail, and the Hadoop distributed cluster platform is successfully constructed based on the two raspberry factions. The technical knowledge about Hadoop is well understood in theory and practice.
These slides cover the very basics of Hadoop architecture, in particular HDFS. This was my presentation in the first Delhi Hadoop User Group (DHUG) meetup held at Gurgaon on 10th September 2011. Loved the positive feedback. I'll also upload a more elaborate version covering Hadoop mapreduce architecture as well soon. Most of the stuff covered in these slides can be found in Tom White's book as well (See the last slide)
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
A guide to using Apache Hadoop as your open source big data platform of choice, including the vendors that make various Hadoop flavors, related open source tools, Hadoop capabilities and suitable applications.
Hadoop Interview Questions and Answers by rohit kapakapa rohit
Hadoop Interview Questions and Answers - More than 130 real time questions and answers covering hadoop hdfs,mapreduce and administrative concepts by rohit kapa
Data Storage and Management project ReportTushar Dalvi
This paper aims at evaluating the performance of random reads and random writes the information of HBase and Cassandra and compare the results that we got through various ubuntu operation
Well-defined introduction about working with Big Data and introduction to the Hadoop Ecosystem.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
1. Apache HBase
For Architects
Nick Dimiduk, Hortonworks
Strata/Hadoop World Barcelona, 2014-11-21
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 1
2. Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License. Page 2
3. Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Agenda
• Background
– (how did we get here?)
• TL;DR
– (don’t waste my time!)
• High-level Architecture
– (where are we?)
• Anatomy of a RegionServer
– (how does this thing work?)
• By Example
– (how do I use it?)
• Resources
– (where do we go from here?)
Page 3
4. Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Background
Page 4
5. So what is HBase anyway?
• BigTable paper from Google, 2006, Dean et al.
– “Bigtable is a sparse, distributed, persistent multi-dimensional sorted map.”
– http://research.google.com/archive/bigtable.html
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
• Key Features:
– Distributed storage across cluster of machines
– Random, online read and write data access
– Schemaless data model (“NoSQL”)
– Self-managed data partitions
Page 5
6. Apache Hadoop Dependencies
• Apache Hadoop Distributed Filesystem (HDFS)
– Distributed, fault-tolerant, throughput-optimized data storage
– The Google File System, 2003, Ghemawat et al.
– http://research.google.com/archive/gfs.html
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
• Apache Zookeeper (ZK)
– Distributed, available, reliable coordination system
– The Chubby Lock Service …, 2006, Burrows
– http://research.google.com/archive/chubby.html
• Apache Hadoop MapReduce (MR)
– Distributed, fault-tolerant, batch-oriented data processing
– MapReduce: …, 2004, Dean and Ghemawat
– http://research.google.com/archive/mapreduce.html
Page 6
7. Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
TL;DR
Page 7
8. So what is HBase anyway?
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 8
C1 tree C0 tree
Disk Memory
Figure 2.1. Schematic picture of an LSM-tree of two components
Figure 2.1 reproduced from O’Neil, Patrick, et al. "The log-structured
merge-tree (LSM-tree)." Acta Informatica 33.4 (1996): 351-385.
9. So what is HBase anyway?
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 9
DataNode RegionServer C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
10. primarily random reads and writes. In other deployments, is also a part of the workloads, TaskTrackers, Servers can run together.
primarily random primarily reads random and writes. reads and In other writes. is also a part is also of the a part workloads, of the workloads, TaskTrackers, Servers can Servers run together.
can run together.
So what is HBase anyway?
DataNode RegionServer DataNode RegionServer Figure 3.7 HBase RegionServer and HDFS DataNode processes C1 DataNode RegionServer DataNode RegionServer DataNode C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 C1 Figure 3.7 HBase Figure RegionServer 3.7 HBase and RegionServer HDFS DataNode and DataNode RegionServer DataNode RegionServer C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 10
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
DataNode RegionServer DataNode C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
DataNode RegionServer C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
11. RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, RegionServers, and each RegionServer and each RegionServer typically hosts typically multiple hosts Given that the Given underlying that the underlying data is stored data in is HDFS, stored which in HDFS, is available a single namespace, a single namespace, all RegionServers all RegionServers have access have to the access same to system and system can therefore and can host therefore any region host any (figure region 3.8). (figure By physically 3.8). Nodes and Nodes RegionServers, and RegionServers, you can use you the can data use locality the data property; locality can theoretically can theoretically read and write read to and write local to DataNode local as DataNode the You may wonder You may where wonder the TaskTrackers where the TaskTrackers are in this are scheme HBase deployments, HBase deployments, the MapReduce the MapReduce framework framework isn’t deployed isn’t primarily random primarily reads random and writes. reads and In other writes. deployments, In other deployments, where is also a part is also of the a part workloads, of the workloads, TaskTrackers, TaskTrackers, DataNodes, Servers can Servers run together.
can run together.
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Nodes and Nodes RegionServers, and Nodes RegionServers, you and can RegionServers, use you the can data use locality you the can data property; use locality the data that property; is, locality can theoretically can theoretically read and can write theoretically read to and write local read to DataNode and write local to as DataNode the primary local as DataNode the You may wonder You may where wonder You the may TaskTrackers where wonder the TaskTrackers where are in the this TaskTrackers are scheme in this of are scheme things. HBase deployments, HBase deployments, the HBase MapReduce deployments, the MapReduce framework the MapReduce framework isn’t deployed framework isn’t at deployed all if the isn’t primarily random primarily reads random primarily and writes. reads random and In other writes. reads deployments, and In other writes. deployments, In where other the deployments, MapReduce where is also a part is also of the a part workloads, is also of the a part workloads, TaskTrackers, of the workloads, TaskTrackers, DataNodes, TaskTrackers, DataNodes, and HBase Servers can Servers run together.
can Servers run together.
can run together.
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
primarily random primarily reads random primarily and writes. reads random and In other writes. reads deployments, and In other writes. deployments, In where other the deployments, MapReduce where is also a part is also of the a part workloads, is also of the a part workloads, TaskTrackers, of the workloads, TaskTrackers, DataNodes, TaskTrackers, DataNodes, and HBase Servers can Servers run together.
can Servers run together.
can run together.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called DataNode RegionServer DataNode RegionServer Figure 3.7 HBase RegionServer and HDFS DataNode processes C1 Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to a single namespace, all RegionServers have access to the same persisted files system and can therefore host any region (figure 3.8). By physically collocating Nodes and RegionServers, you can use the data locality property; that is, can theoretically read and write to local DataNode as the primary You may wonder where the TaskTrackers are in this scheme of things. HBase deployments, the MapReduce framework isn’t deployed at all if the primarily random reads and writes. In other deployments, where the MapReduce is also a part of the workloads, TaskTrackers, DataNodes, and HBase Servers can run together.
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
So what is HBase anyway?
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Servers can run together.
Servers can run together.
store/access data on HDFS. The master process does the distribution of regions among
RegionServers, and each RegionServer typically hosts multiple regions.
store/access data on HDFS. The master process does the distribution of regions among
RegionServers, and each RegionServer typically hosts multiple regions.
store/access data on HDFS. The master process does the distribution of regions among
RegionServers, and each RegionServer typically hosts multiple regions.
store/access data on HDFS. The master process does the distribution of regions RegionServers, and each RegionServer typically hosts multiple regions.
store/access store/data access on HDFS. data The on master HDFS. process The master does process the distribution does RegionServers, RegionServers, and each RegionServer and each RegionServer typically hosts typically multiple hosts Given that the Given underlying that the underlying data is stored data in is HDFS, stored which in HDFS, is available a single namespace, a single namespace, all RegionServers all RegionServers have access have to the access same to system and system can therefore and can host therefore any region host any (figure region 3.8). (figure By physically 3.8). Nodes and Nodes RegionServers, and RegionServers, you can use you the can data use locality the data property; locality can theoretically can theoretically read and write read to and write local to DataNode local as DataNode the You may wonder You may where wonder the TaskTrackers where the TaskTrackers are in this are scheme HBase deployments, HBase deployments, the MapReduce the MapReduce framework framework isn’t deployed isn’t primarily random primarily reads random and writes. reads and In other writes. deployments, In other deployments, where is also a part is also of the a part workloads, of the workloads, TaskTrackers, TaskTrackers, DataNodes, Servers can Servers run together.
can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to a single namespace, all RegionServers have access to the same persisted files system and can therefore host any region (figure 3.8). By physically collocating Nodes and RegionServers, you can use the data locality property; that is, can theoretically read and write to local DataNode as the primary You may wonder where the TaskTrackers are in this scheme of things. HBase deployments, the MapReduce framework isn’t deployed at all if the primarily random reads and writes. In other deployments, where the MapReduce is also a part of the workloads, TaskTrackers, DataNodes, and HBase Servers can run together.
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
DataNode RegionServer DataNode RegionServer DataNode RegionServer
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Servers can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
C1 tree C0 tree
primarily Disk Memory
C1 tree C0 tree
random reads and writes. In other deployments, where the MapReduce pro-cessing
Disk Memory Cache
C1 tree C0 tree
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also Disk Memory
C1 tree C0 tree
a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
is also a part of Disk C1 tree the Memory
C0 tree
workloads, TaskTrackers, DataNodes, and HBase Region-
DataNode RegionServer DataNode RegionServer C1 tree C0 tree
C1 tree C0 tree
Disk Memory
Disk Memory
C1 tree C0 tree
C1 tree C0 tree
Disk Memory Cache
Disk Memory Cache
C1 tree C0 tree
C1 tree C0 tree
Disk Memory
Disk Memory
C1 tree C0 tree
C1 tree C0 tree
Figure 3.7 HBase RegionServer and HDFS DataNode processes Licensed to Nick Dimiduk <ndimiduk@DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
DataNode RegionServer DataNode RegionServer DataNode RegionServer
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
Servers can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
C1 tree C0 tree
primarily Disk Memory
C1 tree C0 tree
random reads and writes. In other deployments, where the MapReduce pro-cessing
Disk Memory Cache
Servers can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
C1 tree C0 tree
C1 tree C0 tree
primarily Disk Memory
C1 tree C0 tree
random Disk Memory
C1 tree C0 tree
reads and writes. In other deployments, where the MapReduce pro-cessing
Disk Memory Cache
Disk Memory Cache
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer Disk Memory
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically Licensed to Nick Dimiduk <ndimiduk@gmail.com>
DataNode RegionServer DataNode RegionServer Disk Memory
Disk Memory
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes Licensed to Nick Dimiduk <ndimiduk@C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure C1 tree C0 tree
3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also Disk Memory
C1 tree C0 tree
a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
Figure C1 tree C0 tree
3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on Disk Memory
C1 tree C0 tree
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer Disk Memory
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically C1 tree C0 tree
Disk Memory
C1 tree C0 tree
DataNode RegionServer DataNode RegionServer Disk Memory
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Servers can run together.
Servers can run together.
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Servers can run together.
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure C1 tree C0 tree
3.7 HBase C1 tree C0 tree
RegionServer tree C0 tree
and HDFS tree DataNode C0 tree
processes are typically collocated on Disk Memory
Disk Memory
Disk Memory
Disk Memory
C1 tree C0 tree
C1 tree C0 tree
C1 tree C0 tree
C1 tree C0 tree
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on Licensed to Nick Dimiduk <ndimiduk@gmail.com>
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
DataNode RegionServer DataNode RegionServer C1 tree C0 tree
C1 tree C0 tree
C1 tree C0 tree
Disk Memory
Disk Memory
Disk Memory
C1 tree C0 tree
C1 tree C0 tree
C1 tree C0 tree
Disk Memory Cache
Disk Memory Cache
Disk Memory Cache
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically C1 tree C0 tree
C1 Disk Memory
C1 tree C0 tree
Disk Memory
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
Disk Memory
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 11
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Figure 3.7 HBase C1 tree C0 tree
RegionServer and HDFS DataNode processes are typically collocated on the same host.
Disk Memory
C1 tree C0 tree
Disk Memory
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
12. RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, and each RegionServer typically hosts multiple regions.
RegionServers, RegionServers, and each RegionServer and each RegionServer typically hosts typically multiple hosts Given that the Given underlying that the underlying data is stored data in is HDFS, stored which in HDFS, is available a single namespace, a single namespace, all RegionServers all RegionServers have access have to the access same to system and system can therefore and can host therefore any region host any (figure region 3.8). (figure By physically 3.8). Nodes and Nodes RegionServers, and RegionServers, you can use you the can data use locality the data property; locality can theoretically can theoretically read and write read to and write local to DataNode local as DataNode the You may wonder You may where wonder the TaskTrackers where the TaskTrackers are in this are scheme HBase deployments, HBase deployments, the MapReduce the MapReduce framework framework isn’t deployed isn’t primarily random primarily reads random and writes. reads and In other writes. deployments, In other deployments, where is also a part is also of the a part workloads, of the workloads, TaskTrackers, TaskTrackers, DataNodes, Servers can Servers run together.
can run together.
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Nodes and Nodes RegionServers, and Nodes RegionServers, you and can RegionServers, use you the can data use locality you the can data property; use locality the data that property; is, locality can theoretically can theoretically read and can write theoretically read to and write local read to DataNode and write local to as DataNode the primary local as DataNode the You may wonder You may where wonder You the may TaskTrackers where wonder the TaskTrackers where are in the this TaskTrackers are scheme in this of are scheme things. HBase deployments, HBase deployments, the HBase MapReduce deployments, the MapReduce framework the MapReduce framework isn’t deployed framework isn’t at deployed all if the isn’t primarily random primarily reads random primarily and writes. reads random and In other writes. reads deployments, and In other writes. deployments, In where other the deployments, MapReduce where is also a part is also of the a part workloads, is also of the a part workloads, TaskTrackers, of the workloads, TaskTrackers, DataNodes, TaskTrackers, DataNodes, and HBase Servers can Servers run together.
can Servers run together.
can run together.
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
primarily random primarily reads random primarily and writes. reads random and In other writes. reads deployments, and In other writes. deployments, In where other the deployments, MapReduce where is also a part is also of the a part workloads, is also of the a part workloads, TaskTrackers, of the workloads, TaskTrackers, DataNodes, TaskTrackers, DataNodes, and HBase Servers can Servers run together.
can Servers run together.
can run together.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called regions.
Figure 3.6 A table consists of multiple smaller chunks called DataNode RegionServer DataNode RegionServer Figure 3.7 HBase RegionServer and HDFS DataNode processes C1 Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to a single namespace, all RegionServers have access to the same persisted files system and can therefore host any region (figure 3.8). By physically collocating Nodes and RegionServers, you can use the data locality property; that is, can theoretically read and write to local DataNode as the primary You may wonder where the TaskTrackers are in this scheme of things. HBase deployments, the MapReduce framework isn’t deployed at all if the primarily random reads and writes. In other deployments, where the MapReduce is also a part of the workloads, TaskTrackers, DataNodes, and HBase Servers can run together.
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
So what is HBase anyway?
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Servers can run together.
Servers can run together.
store/access data on HDFS. The master process does the distribution of regions among
RegionServers, and each RegionServer typically hosts multiple regions.
store/access data on HDFS. The master process does the distribution of regions among
RegionServers, and each RegionServer typically hosts multiple regions.
store/access data on HDFS. The master process does the distribution of regions among
RegionServers, and each RegionServer typically hosts multiple regions.
store/access data on HDFS. The master process does the distribution of regions RegionServers, and each RegionServer typically hosts multiple regions.
store/access store/data access on HDFS. data The on master HDFS. process The master does process the distribution does RegionServers, RegionServers, and each RegionServer and each RegionServer typically hosts typically multiple hosts Given that the Given underlying that the underlying data is stored data in is HDFS, stored which in HDFS, is available a single namespace, a single namespace, all RegionServers all RegionServers have access have to the access same to system and system can therefore and can host therefore any region host any (figure region 3.8). (figure By physically 3.8). Nodes and Nodes RegionServers, and RegionServers, you can use you the can data use locality the data property; locality can theoretically can theoretically read and write read to and write local to DataNode local as DataNode the You may wonder You may where wonder the TaskTrackers where the TaskTrackers are in this are scheme HBase deployments, HBase deployments, the MapReduce the MapReduce framework framework isn’t deployed isn’t primarily random primarily reads random and writes. reads and In other writes. deployments, In other deployments, where is also a part is also of the a part workloads, of the workloads, TaskTrackers, TaskTrackers, DataNodes, Servers can Servers run together.
can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to all clients as
Given that the underlying data is stored in HDFS, which is available to a single namespace, all RegionServers have access to the same persisted files system and can therefore host any region (figure 3.8). By physically collocating Nodes and RegionServers, you can use the data locality property; that is, can theoretically read and write to local DataNode as the primary You may wonder where the TaskTrackers are in this scheme of things. HBase deployments, the MapReduce framework isn’t deployed at all if the primarily random reads and writes. In other deployments, where the MapReduce is also a part of the workloads, TaskTrackers, DataNodes, and HBase Servers can run together.
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
DataNode RegionServer DataNode RegionServer DataNode RegionServer
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Servers can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
C1 tree C0 tree
primarily Disk Memory
C1 tree C0 tree
random reads and writes. In other deployments, where the MapReduce pro-cessing
Disk Memory Cache
C1 tree C0 tree
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
can theoretically read and write to local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also Disk Memory
C1 tree C0 tree
a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
is also a part of Disk C1 tree the Memory
C0 tree
workloads, TaskTrackers, DataNodes, and HBase Region-
DataNode RegionServer DataNode RegionServer C1 tree C0 tree
C1 tree C0 tree
Disk Memory
Disk Memory
C1 tree C0 tree
C1 tree C0 tree
Disk Memory Cache
Disk Memory Cache
C1 tree C0 tree
C1 tree C0 tree
Disk Memory
Disk Memory
C1 tree C0 tree
C1 tree C0 tree
Figure 3.7 HBase RegionServer and HDFS DataNode processes Licensed to Nick Dimiduk <ndimiduk@DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
DataNode RegionServer DataNode RegionServer DataNode RegionServer
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
Servers can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
C1 tree C0 tree
primarily Disk Memory
C1 tree C0 tree
random reads and writes. In other deployments, where the MapReduce pro-cessing
Disk Memory Cache
Servers can run together.
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
C1 tree C0 tree
C1 tree C0 tree
primarily Disk Memory
C1 tree C0 tree
random Disk Memory
C1 tree C0 tree
reads and writes. In other deployments, where the MapReduce pro-cessing
Disk Memory Cache
Disk Memory Cache
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer Disk Memory
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically Licensed to Nick Dimiduk <ndimiduk@gmail.com>
DataNode RegionServer DataNode RegionServer Disk Memory
Disk Memory
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes Licensed to Nick Dimiduk <ndimiduk@C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure C1 tree C0 tree
3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also Disk Memory
C1 tree C0 tree
a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
Figure C1 tree C0 tree
3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on Disk Memory
C1 tree C0 tree
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer Disk Memory
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically C1 tree C0 tree
Disk Memory
C1 tree C0 tree
DataNode RegionServer DataNode RegionServer Disk Memory
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Servers can run together.
Servers can run together.
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Servers can run together.
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure C1 tree C0 tree
3.7 HBase C1 RegionServer and HDFS DataNode processes are typically collocated on Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on Licensed to Nick Dimiduk <ndimiduk@gmail.com>
"2011-07-04" Disk Memory
C1 tree C0 tree
1368396302 "fourth of July"
tree C0 tree
Disk Memory
C1 tree C0 tree
tree C0 tree
Disk Memory
C1 tree C0 tree
tree C0 tree
Disk Memory
C1 tree C0 tree
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
DataNode RegionServer DataNode RegionServer C1 tree C0 tree
C1 tree C0 tree
C1 tree C0 tree
Disk Memory
Disk Memory
Disk Memory
C1 tree C0 tree
C1 tree C0 tree
C1 tree C0 tree
Disk Memory Cache
Disk Memory Cache
Disk Memory Cache
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically C1 tree C0 tree
C1 Disk Memory
C1 tree C0 tree
Disk Memory
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on DataNode RegionServer DataNode RegionServer DataNode RegionServer
Disk Memory
Disk Memory
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 12
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Figure 3.7 HBase C1 tree C0 tree
RegionServer and HDFS DataNode processes are typically collocated on the same host.
Disk Memory
C1 tree C0 tree
Disk Memory
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
C1 Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
a
cf1
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
cf2
1.0001 1368387684 "almost the loneliest number"
14. Logical Data Model
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
"2011-07-04" 1368396302 "fourth of July"
1.0001 1368387684 "almost the loneliest number"
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 14
a
cf1
cf2
b cf2 "thumb" 1368387247 [3.6 kb png data]
Table A
rowkey
column
family
column
qualifier
timestamp value
Rows
Column Families
15. Logical Architecture
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 15
Table A
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Region 1
Region 2
Region 3
Region 4
Region Server 7
Table A, Region 1
Table A, Region 2
Table G, Region 1070
Table L, Region 25
Region Server 86
Table A, Region 3
Table C, Region 30
Table F, Region 160
Table F, Region 776
Region Server 367
Table A, Region 4
Table C, Region 17
Table E, Region 52
Table P, Region 1116
16. Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
and RegionServers, you can use the data locality property; that can theoretically read and write to the local DataNode as the primary You may wonder where the TaskTrackers are in this scheme of HBase deployments, the MapReduce framework isn’t deployed at all primarily random reads and writes. In other deployments, where the is also a part of the workloads, TaskTrackers, DataNodes, and Servers can run together.
Nodes Nodes DataNode RegionServer DataNode DataNode RegionServer RegionServer DataNode DataNode RegionServer Figure 3.7 HBase RegionServer Figure 3.7 and HDFS HBase DataNode RegionServer processes and HDFS are typically DataNode collocated processes and RegionServers, you can use the data locality property; can theoretically read and write to the local DataNode You may wonder where the TaskTrackers are in this HBase deployments, the MapReduce framework isn’t deployed primarily random reads and writes. In other deployments, is also a part of the workloads, TaskTrackers, DataNodes, Servers can run together.
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same Nodes and RegionServers, you can use the data can theoretically read and write to the local You may wonder where the TaskTrackers HBase deployments, the MapReduce framework primarily random reads and writes. In other deployments, is also a part of the workloads, TaskTrackers, Servers can run together.
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers
Physical Architecture
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. HBase deployments, the MapReduce framework isn’t deployed at all if the workload primarily random reads and writes. In other deployments, where the MapReduce is also a part of the workloads, TaskTrackers, DataNodes, and HBase Servers can run together.
can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-cessing
is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
Zoo
Keeper
Zoo
Keeper
DataNode RegionServer DataNode RegionServer DataNode RegionServer
DataNode RegionServer DataNode RegionServer Region
Region
Region
Server
Server
Server
Data
Data
Data
Node
Node
Node
Figure Licensed Attribution-3.7 under a ShareAlike HBase Creative Commons
3.0 Unported RegionServer License.
and HDFS DataNode processes Page 16
are typically Region
Server
Data
Node
...
can theoretically read and write You may wonder where the TaskTrackers HBase deployments, the MapReduce primarily random reads and writes. is also a part of the workloads, Servers can run together.
DataNode RegionServer Master
Master
Figure 3.7 HBase RegionServer and HDFS Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Name
Node
DataNode RegionServer DataNode RegionServer HBase
Client
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically Licensed to Nick Dimiduk <ndimiduk@gmail.HDFS
HBase
17. User API
• {rowkey => {family => {qualifier => {version => value}}}}
– Think: nested TreeMap (Java), OrderedDictionary (C#), OrderedDict (Python)
• Basic data operations: GET, PUT, DELETE
• SCAN over range of key-values
– benefit of the sorted rowkey business
– this is how you implement any kind of "complex query” *
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
• GET, SCAN support Filters
– Push application logic to RegionServers
• INCREMENT, APPEND, CheckAnd{Put,Delete}
– Server-side, atomic data operations, can be contentious!
Page 17
* This is also a foundational component in what we refer to as
“schema design” in this “schemaless” database.
18. Anatomy of a RegionServer
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 18
19. So what is HBase anyway?
C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory Cache
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 19
DataNode RegionServer C1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk Memory
26. Web-scale Database
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 26
App
server
App
server
App
server
App
server
App
server
Counters
Sessions
User profiles
Social Media
Application
Data
Latency
Write Read
Throughput
27. Search Search
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
“BigIndex”
Page 27
"BigIndex"
Document
store
Search Search
Search
App
server
App
server
App
server
App
server
App
server
Latency
Write Read
Throughput
28. Materialized View
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 28
App
server
App
server
App
server
App
server
App
server
Latency
Write Read
Throughput
29. Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
ETL Assist
Write Read
Page 29
Latency
Throughput
30. Lambda Architecture
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 30
App
server
App
server
App
server
App
server
App
server
31. Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Resources
Page 31
32. Join the Community!
• hbase.apache.org
– hbase.apache.org/book.html
– blogs.apache.org/hbase/
– hbase.apache.org/mail-lists.html
• IRC: irc.freenode.net #hbase
• JIRA: issues.apache.org/jira/browse/HBASE
• Source: git clone git://git.apache.org/hbase.git
• In person
– HBaseCon, hbasecon.com
– Hadoop Summit, hadoopsummit.org
– Strata / Hadoop World, strataconf.com
– Local meetup near you!
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 32
33. Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
HBase 0.98/1.0
• Hardening
– Stability, Reliability, Availability, Performance
• Horizontal Scalability
– 1000’s of machines
• Availability
– Speed of Recovery (MTTR), Region Replicas
• Improved Multi-tenancy
– RPC priorities/QoS, namespace management
• Cell-level security
• Semantic Versioning
• Client API cleanup
Page 33
34. Thanks!
Licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Page 34
M A N N I N G
Nick Dimiduk
Amandeep Khurana
FOREWORD BY
Michael Stack
hbaseinaction.com
Nick Dimiduk
github.com/ndimiduk
@xefyr
n10k.com
strataeucftw
slideshare.net/xefyr/hbase-for-architects