Apache HBase - Just the Basics

NoSQL Datastore built on top of the
HDFS filesystem
HBase is a column family oriented
database
Based on the Google BigTable
paper
Uses HDFS for storage
Data can be retrieved quickly or
batch processed with MapReduce
What Is Apache HBase?

Need Big Data TB/PB
High throughput
Variable columns
Need random reads and writes
HBase Use Cases

HBase has a Java API
It is the only first class citizen
There are other programmatic
interfaces
A REST interface allows HTTP access
A Thrift gateway allow non-Java
programmatic access
There are non-native SQL interfaces
Apache Phoenix, Impala, Presto, Hive
Accessing HBase

setConf(HBaseConfiguration.create(getConf()));
Connection connection = null;
Table table = null;
try {
// Define the table and column family for the data
TableName TABLE_NAME = TableName.valueOf("hbasetable");
byte[] CF = Bytes.toBytes("colfamily");
// Connect to the table
connection = ConnectionFactory.createConnection(getConf());
table = connection.getTable(TABLE_NAME);
// Create a put and add columns to it
Put p = new Put(Bytes.toBytes("rowkey"));
p.addColumn(CF, Bytes.toBytes("columnqual"), Bytes.toBytes(42.0d));
// Add the new column to the row
table.put(p);
} finally {
// close everything down
if (table != null)
table.close();
if (connection != null)
connection.close();
}
Puts

// Define the table and column family for the data
TableName TABLE_NAME = TableName.valueOf("hbasetable");
byte[] CF = Bytes.toBytes("colfamily");
// Connect to the table
connection = ConnectionFactory.createConnection(getConf());
table = connection.getTable(TABLE_NAME);
// Create a get with the row key you want
Get g = new Get(Bytes.toBytes("rowkey"));
// Get the row and bytes the for the cell
Result result = table.get(g);
byte[] value = result.getValue(CF, Bytes.toBytes("columnqual"));
// Yes, your client will need to know the type of data in the cell
double doubleValue = Bytes.toDouble(value);
Gets

Architecting for a RDBMS is about
relationships or normalizing data
Architecting for HBase is about
access patterns or denormalizing
data
Questions to ask:
How is data being accessed?
What is the fastest way to read/write
data?
What is the optimal way to organize
data?
Differences With RDBMS

Treating
HBase like a
relational
database will
lead to abject
failure
Abject Failure

Actual engineering goes into row
key design
You only have one index or primary
key
Getting this primary key right takes
time and effort
You don't just use an auto-
incrementing number
Multiple pieces of data are often in
the row key
Row Keys

Tables schemas require design and
thought
The access pattern should be
known ahead of time
General best practices:
Fewer, bigger (denormalized) tables
Spend more time designing up front
Use bulk loading for incremental or
time series data
Schema Design

Jesse Anderson (Smoking Hand)
jesse@smokinghand.com
@jessetanderson
Conclusion

Apache HBase - Just the Basics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Apache HBase - Just the Basics

Similar to Apache HBase - Just the Basics (20)

More from HBaseCon

More from HBaseCon (20)

Recently uploaded

Recently uploaded (20)

Apache HBase - Just the Basics