0
A P A C H EHBASE             Scott          Leberknight
BACKGROUND
GoogleBigtable
"Bigtable is a distributed storagesystem for managing structured datathat is designed to scale to a verylarge size: petaby...
"A Bigtable is a sparse, distributed, persistent                    multidimensional sorted map"               - Bigtable:...
wtf?
distributed    sparsecolumn-oriented   versioned
The map is indexed by a row key,column key, and a timestamp; eachvalue in the map is an uninterpreted arrayof bytes.      ...
Key Concepts:row key => 20120407152657column family => "personal:"column key => "personal:givenName",              "person...
Row Key       Timestamp         Column Family "info:"                ColumN Family                                        ...
Get row 20120407145045...   Row Key       Timestamp         Column Family "info:"                Column Family            ...
Use HBase when you need random, realtime read/write access to your Big Data. This projects goal is thehosting of very larg...
HBase Shellhbase(main):001:0> create blog, info, content0 row(s) in 4.3640 secondshbase(main):002:0> put blog, 20120320162...
HBase Shellhbase(main):015:0> get blog, 20120407145045, {COLUMN=>info:author, VERSIONS=>3 }timestamp=1239135325074, value=...
Got byte[]?
// Create a new tableConfiguration conf = HBaseConfiguration.create();HBaseAdmin admin = new HBaseAdmin(conf);String table...
import static org.apache.hadoop.hbase.util.Bytes.toBytes;// Add some data into people tableConfiguration conf = HBaseConfi...
Finding data:    get (by row key)    scan (by row key ranges, filtering)
// Get a row. Ask for only the data you need.Configuration conf = HBaseConfiguration.create();HTable table = new HTable(co...
// Update existing values, and add a new oneConfiguration conf = HBaseConfiguration.create();HTable table = new HTable(con...
// Scan rows...Configuration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "people");Scan scan = new ...
DAta Modeling   Row key design   MATCH TO DATA ACCESS PATTERNS   WIDE VS. NARROW ROWS
REferences                   shop.oreilly.com/product/0636920014348.do                                     http://shop.ore...
(my info)scott.leberknight at nearinfinity.comwww.nearinfinity.com/blogs/twitter: sleberknight
Upcoming SlideShare
Loading in...5
×

HBase Lightning Talk

4,046

Published on

Slides for a lightning talk on HBase that I gave at Near Infinity (www.nearinfinity.com) spring 2012 conference.


The associated sample code is on GitHub at https://github.com/sleberknight/basic-hbase-examples

Published in: Technology, News & Politics
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,046
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
149
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Transcript of "HBase Lightning Talk"

  1. 1. A P A C H EHBASE Scott Leberknight
  2. 2. BACKGROUND
  3. 3. GoogleBigtable
  4. 4. "Bigtable is a distributed storagesystem for managing structured datathat is designed to scale to a verylarge size: petabytes of data acrossthousands of commodityservers. Many projects at Googlestore data in Bigtable including webindexing, Google Earth, and GoogleFinance." - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  5. 5. "A Bigtable is a sparse, distributed, persistent multidimensional sorted map" - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  6. 6. wtf?
  7. 7. distributed sparsecolumn-oriented versioned
  8. 8. The map is indexed by a row key,column key, and a timestamp; eachvalue in the map is an uninterpreted arrayof bytes. - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html (row key, column key, timestamp) => value
  9. 9. Key Concepts:row key => 20120407152657column family => "personal:"column key => "personal:givenName", "personal:surname"timestamp => 1239124584398
  10. 10. Row Key Timestamp Column Family "info:" ColumN Family "content:"20120407145045 t7 "info:summary" "An intro to..." t6 "info:author" "John Doe" t5 "Googles Bigtable is..." t4 "Google Bigtable is..." t3 "info:category" "Persistence" t2 "info:author" "John" t1 "info:title" "Intro to Bigtable"20120320162535 t4 "info:category" "Persistence" t3 "CouchDB is..." t2 "info:author" "Bob Smith" t1 "info:title" "Doc-oriented..."
  11. 11. Get row 20120407145045... Row Key Timestamp Column Family "info:" Column Family "content:"20120407145045 t7 "info:summary" "An intro to..." t6 "info:author" "John Doe" t5 "Googles Bigtable is..." t4 "Google Bigtable is..." t3 "info:category" "Persistence" t2 "info:author" "John" t1 "info:title" "Intro to Bigtable"20120320162535 t4 "info:category" "Persistence" t3 "CouchDB is..." t2 "info:author" "Bob Smith" t1 "info:title" "Doc-oriented..."
  12. 12. Use HBase when you need random, realtime read/write access to your Big Data. This projects goal is thehosting of very large tables -- billions of rows Xmillions of columns -- atop clusters of commodityhardware. HBase is an open-source, distributed,versioned, column-oriented store modeled afterGoogles Bigtable. - http://hbase.apache.org/
  13. 13. HBase Shellhbase(main):001:0> create blog, info, content0 row(s) in 4.3640 secondshbase(main):002:0> put blog, 20120320162535, info:title, Document-orientedstorage using CouchDB0 row(s) in 0.0330 secondshbase(main):003:0> put blog, 20120320162535, info:author, Bob Smith0 row(s) in 0.0030 secondshbase(main):004:0> put blog, 20120320162535, content:, CouchDB is adocument-oriented...0 row(s) in 0.0030 secondshbase(main):005:0> put blog, 20120320162535, info:category, Persistence0 row(s) in 0.0030 secondshbase(main):006:0> get blog, 20120320162535COLUMN CELL content: timestamp=1239135042862, value=CouchDB is a doc... info:author timestamp=1239135042755, value=Bob Smith info:category timestamp=1239135042982, value=Persistence info:title timestamp=1239135042623, value=Document-oriented...4 row(s) in 0.0140 seconds
  14. 14. HBase Shellhbase(main):015:0> get blog, 20120407145045, {COLUMN=>info:author, VERSIONS=>3 }timestamp=1239135325074, value=John Doetimestamp=1239135324741, value=John2 row(s) in 0.0060 secondshbase(main):016:0> scan blog, { STARTROW => 20120300, STOPROW => 20120400 }ROW COLUMN+CELL 20120320162535 column=content:, timestamp=1239135042862, value=CouchDB is... 20120320162535 column=info:author, timestamp=1239135042755, value=Bob Smith 20120320162535 column=info:category, timestamp=1239135042982, value=Persistence 20120320162535 column=info:title, timestamp=1239135042623, value=Document...4 row(s) in 0.0230 seconds
  15. 15. Got byte[]?
  16. 16. // Create a new tableConfiguration conf = HBaseConfiguration.create();HBaseAdmin admin = new HBaseAdmin(conf);String tableName = "people";HTableDescriptor desc = new HTableDescriptor(tableName);desc.addFamily(new HColumnDescriptor("personal"));desc.addFamily(new HColumnDescriptor("contactinfo"));desc.addFamily(new HColumnDescriptor("creditcard"));admin.createTable(desc);System.out.printf("%s is available? %bn", tableName, admin.isTableAvailable(tableName));
  17. 17. import static org.apache.hadoop.hbase.util.Bytes.toBytes;// Add some data into people tableConfiguration conf = HBaseConfiguration.create();Put put = new Put(toBytes("connor-john-m-43299"));put.add(toBytes("personal"), toBytes("givenName"), toBytes("John"));put.add(toBytes("personal"), toBytes("mi"), toBytes("M"));put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor"));put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.connor@gmail.com"));table.put(put);table.flushCommits();table.close();
  18. 18. Finding data: get (by row key) scan (by row key ranges, filtering)
  19. 19. // Get a row. Ask for only the data you need.Configuration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "people");Get get = new Get(toBytes("connor-john-m-43299"));get.setMaxVersions(2);get.addFamily(toBytes("personal"));get.addColumn(toBytes("contactinfo"), toBytes("email"));Result result = table.get(get);
  20. 20. // Update existing values, and add a new oneConfiguration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "people");Put put = new Put(toBytes("connor-john-m-43299"));put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith"));put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.m.smith@gmail.com"));put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA"));table.put(put);table.flushCommits();table.close();
  21. 21. // Scan rows...Configuration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "people");Scan scan = new Scan(toBytes("smith-"));scan.addColumn(toBytes("personal"), toBytes("givenName"));scan.addColumn(toBytes("contactinfo", toBytes("email"));scan.addColumn(toBytes("contactinfo", toBytes("address"));scan.setFilter(new PageFilter(numRowsPerPage));ResultScanner sacnner = table.getScanner(scan);for (Result result : scanner) { // process result...}
  22. 22. DAta Modeling Row key design MATCH TO DATA ACCESS PATTERNS WIDE VS. NARROW ROWS
  23. 23. REferences shop.oreilly.com/product/0636920014348.do http://shop.oreilly.com/product/0636920021773.do (3rd edition pub date is May 29, 2012)hbase.apache.org
  24. 24. (my info)scott.leberknight at nearinfinity.comwww.nearinfinity.com/blogs/twitter: sleberknight
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×