HBase Lightning Talk

A P A C H E
HBASE
Scott
Leberknight

"Bigtable is a distributed storage
system for managing structured data
that is designed to scale to a very
large size: petabytes of data across
thousands of commodity
servers. Many projects at Google
store data in Bigtable including web
indexing, Google Earth, and Google
Finance."

- Bigtable: A Distributed Storage System
for Structured Data
http://labs.google.com/papers/bigtable.html

"A Bigtable is a sparse, distributed, persistent
multidimensional sorted map"

for Structured Data

distributed

sparse

column-oriented

versioned

The map is indexed by a row key,
column key, and a timestamp; each
value in the map is an uninterpreted array
of bytes.
for Structured Data

(row key, column key, timestamp) => value

Key Concepts:
row key => 20120407152657

column family => "personal:"

column key => "personal:givenName",
"personal:surname"

timestamp => 1239124584398

Row Key Timestamp Column Family "info:" ColumN Family
"content:"
20120407145045 t7 "info:summary" "An intro to..."
t6 "info:author" "John Doe"
t5 "Google's Bigtable is..."
t4 "Google Bigtable is..."
t3 "info:category" "Persistence"
t2 "info:author" "John"
t1 "info:title" "Intro to Bigtable"
20120320162535 t4 "info:category" "Persistence"
t3 "CouchDB is..."
t2 "info:author" "Bob Smith"
t1 "info:title" "Doc-oriented..."

Get row 20120407145045...
Row Key Timestamp Column Family "info:" Column Family
"content:"
20120407145045 t7 "info:summary" "An intro to..."
t6 "info:author" "John Doe"
t5 "Google's Bigtable is..."
t4 "Google Bigtable is..."
t3 "info:category" "Persistence"
t2 "info:author" "John"
t1 "info:title" "Intro to Bigtable"
20120320162535 t4 "info:category" "Persistence"
t3 "CouchDB is..."
t2 "info:author" "Bob Smith"
t1 "info:title" "Doc-oriented..."

Use HBase when you need random, realtime read/
write access to your Big Data. This project's goal is the
hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
hardware. HBase is an open-source, distributed,
versioned, column-oriented store modeled after
Google's Bigtable.

- http://hbase.apache.org/

HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented
storage using CouchDB'
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN CELL
content: timestamp=1239135042862, value=CouchDB is a doc...
info:author timestamp=1239135042755, value=Bob Smith
info:category timestamp=1239135042982, value=Persistence
info:title timestamp=1239135042623, value=Document-oriented...

HBase Shell

hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }
timestamp=1239135325074, value=John Doe
timestamp=1239135324741, value=John
hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }
ROW COLUMN+CELL
20120320162535 column=content:, timestamp=1239135042862, value=CouchDB is...
20120320162535 column=info:author, timestamp=1239135042755, value=Bob Smith
20120320162535 column=info:category, timestamp=1239135042982, value=Persistence
20120320162535 column=info:title, timestamp=1239135042623, value=Document...

// Create a new table
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);

String tableName = "people";
HTableDescriptor desc = new HTableDescriptor(tableName);
desc.addFamily(new HColumnDescriptor("personal"));
desc.addFamily(new HColumnDescriptor("contactinfo"));
desc.addFamily(new HColumnDescriptor("creditcard"));
admin.createTable(desc);

System.out.printf("%s is available? %bn",
tableName, admin.isTableAvailable(tableName));

import static org.apache.hadoop.hbase.util.Bytes.toBytes;

// Add some data into 'people' table
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("givenName"),
toBytes("John"));
put.add(toBytes("personal"), toBytes("mi"), toBytes("M"));
put.add(toBytes("personal"), toBytes("surname"),
toBytes("Connor"));
put.add(toBytes("contactinfo"), toBytes("email"),
toBytes("john.connor@gmail.com"));
table.put(put);
table.flushCommits();
table.close();

Finding data:

get (by row key)

scan (by row key ranges, ﬁltering)

// Get a row. Ask for only the data you need.
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299"));
get.setMaxVersions(2);
get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"), toBytes("email"));
Result result = table.get(get);

// Update existing values, and add a new one
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("surname"),
toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"),
toBytes("john.m.smith@gmail.com"));
put.add(toBytes("contactinfo"), toBytes("address"),
toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();

// Scan rows...
Scan scan = new Scan(toBytes("smith-"));
scan.addColumn(toBytes("personal"), toBytes("givenName"));
scan.addColumn(toBytes("contactinfo", toBytes("email"));
scan.addColumn(toBytes("contactinfo", toBytes("address"));
scan.setFilter(new PageFilter(numRowsPerPage));
ResultScanner sacnner = table.getScanner(scan);
for (Result result : scanner) {
// process result...
}

DAta Modeling

Row key design

MATCH TO DATA ACCESS PATTERNS

WIDE VS. NARROW ROWS

REferences

shop.oreilly.com/product/0636920014348.do

http://shop.oreilly.com/product/0636920021773.do
(3rd edition pub date is May 29, 2012)
hbase.apache.org

(my info)

scott.leberknight at nearinfinity.com
www.nearinfinity.com/blogs/
twitter: sleberknight

HBase Lightning Talk

More Related Content

What's hot

Similar to HBase Lightning Talk

More from Scott Leberknight

Recently uploaded

HBase Lightning Talk