Advance HBase and Zookeeper - Module 8

Example: Mail Inbox
<userId> : <colfam> : <messageId> : <timestamp> : <email-message>
12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..."
12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."
12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."
12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..."
OR
12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..."
12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..."
12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..."
12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."
Same Storage Requirements

Secondary Indexes
Although HBase has no native support for secondary indexes, there are
use cases that need them. The requirements are usually that can look
up a cell with not just the primary coordinates—the row key, column
family name, and qualifier—but also an alternative coordinate. In
addition, it can scan a range of rows from the main table, but ordered
by the secondary index.
• Client-managed
• Indexed-Transactional HBase
• Indexed HBase

Coprocessors
• Think of this as a small MapReduce framework that distributes
work across the entire cluster.
• A coprocessor enables to run arbitrary code directly on each
region server.
• It executes the code on a per-region basis, giving trigger-like
functionality

Zookeeper
• An open source server that reliably coordinates distributed
processes.
• Apache ZooKeeper provides operational services for a Hadoop
cluster.
• ZooKeeper provides a distributed configuration service, a
synchronization service and a naming registry for distributed
systems.
• Distributed applications use ZooKeeper to store and mediate
updates to important configuration information.

Zookeeper Service : Data Model
• Znode
– In-memory data node in the Zookeeper data
– Have a hierarchical namespace
– UNIX like notation for path
• Types of Znode
– Persistent
– Ephemeral
• Flags of Znode
– Sequential numbers

The ZooKeeper service can run in two modes.
• In standalone mode, there is a single ZooKeeper server, which is
useful for testing due to its simplicity (it can even be embedded in
unit tests), but provides no guarantees of high-availability or
resilience.
• In production, ZooKeeper runs in replicated mode, on a cluster of
machines called an ensemble. ZooKeeper achieves high-availability
through replication, and can provide a service as long as a majority of
the machines in the ensemble are up.
Zookeeper Service: Implementation

Zookeeper Service: Consistency

Zookeeper Service: Sessions
• A ZooKeeper client is configured with the list of servers in the ensemble.
On startup, it tries to connect to one of the servers in the list.
• Once a connection has been made with a ZooKeeper server, the server
creates a new session for the client.
• Sessions are kept alive by the client sending ping requests (also known as
heartbeats) whenever the session is idle for longer than a certain period.

Advance HBase and Zookeeper - Module 8

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Advance HBase and Zookeeper - Module 8

Similar to Advance HBase and Zookeeper - Module 8 (20)

More from Rohit Agrawal

More from Rohit Agrawal (9)

Recently uploaded

Recently uploaded (20)

Advance HBase and Zookeeper - Module 8