• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to HBase - Phoenix HUG 5/14
 

Introduction to HBase - Phoenix HUG 5/14

on

  • 104 views

Agenda:

Agenda:
- HBase Overview
- HBase APIs
- MapR Tables
- Example
- Securing tables

Statistics

Views

Total Views
104
Views on SlideShare
104
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to HBase - Phoenix HUG 5/14 Introduction to HBase - Phoenix HUG 5/14 Presentation Transcript

    • © 2014 MapR Technologies 1© 2014 MapR Technologies to Apache HBase jwalsh@mapr.com
    • © 2014 MapR Technologies 2 Agenda • HBase Overview • HBase APIs • MapR Tables • Example • Securing tables
    • © 2014 MapR Technologies 3 What’s HBase?? • A NoSQL database – Synonym for ‘non-traditional’ database • A distributed columnar data store – Storage layout implies performance characteristics • The “Hadoop” database • A semi-structured database – No rigid requirements to define columns or even data types in advance – It’s all bytes to HBase • A persistent sorted Map of Maps – Programmers view
    • © 2014 MapR Technologies 4 Relational database model vs. NoSQL • RDBMS: MySQL, Oracle, MS SQL Server, DB2, Postgres… • Non-relational database models used with Big Data – Key-value: Riak, Redis – Column-oriented : MapR Tables, HBase, Cassandra – Document-oriented : MongoDB, CouchDB – Graph: Neo4J, OrientDB
    • © 2014 MapR Technologies 5 Relational Model • RDBMS (Relational Database Management System) – Standard persistence model – Data is normalized, split into tables when stored • typed and structured before stored – Joined back together when read • Structured Query Language • Pros – Many business rules map well to a tabular structure and relationships • Layout of the data is known in advance – Transactions handle concurrency , consistency – Provides an efficient and robust structure for storing data
    • © 2014 MapR Technologies 6 Column Oriented • Row is indexed by a key – Data stored sorted by key • Data is stored by columns grouped into column families – Each family is a file of column values laid out in sorted order by row key – Contrast this to a traditional row oriented database where rows are stored together with fixed space allocated for each row CF1 colA colB colC val val val CF2 colA colB colC val val val Customer Address data Customer order dataCustomer id RowKey axxx gxxx
    • © 2014 MapR Technologies 7 HBase is… • Distributed column-oriented database built on top of HDFS/MapR-FS. • Open-source implementation of Google’s Big Table – Semi-structured data – Commodity Hardware – Horizontal Scalability – Part of Hadoop system, and integrated with MapReduce • Is the Hadoop application to use when you require real-time read/write random access to very large datasets. • Provides fault-tolerant way of storing large quantities of sparse data
    • © 2014 MapR Technologies 8 ZooKeeperZooKeeper Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node What is HBase? (Cluster View) • ZooKeeper (ZK) • HMaster (HM) • Region Servers (RS) For MapR, there is less delineation between Control and Data Nodes. A HMaster C DHMaster Master servers Slave servers Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node NameNode A B ZooKeeper
    • © 2014 MapR Technologies 9 What is a Region? • The basic partitioning/sharding unit of HBase. • Each region is assigned a range of keys it is responsible for. • Region servers serve data for reads and writes ZooKeeper ZooKeeper ZooKeeper HMaster Region Container Key col B col C val val val Region Key col B col C val val val Region Container Key col B col C val val val Region Key col B col C val val val Client
    • © 2014 MapR Technologies 10 HBase Data Model- Row Keys • Row Keys: identify the rows in an HBase table. RowK ey CF1 CF2 … colA colB colC colA colB colC colD R1 axxx val val val val … gxxx val val val val R2 hxxx val val val val val val val … jxxx val R3 kxxx val val val val … rxxx val val val val val val … sxxx val val
    • © 2014 MapR Technologies 11 Rows are Stored in Sorted Order • Sorting of row key is based upon binary values – Sort is lexicographic at byte level – Comparison is “left to right” • Example: – Sort order for String 1, 2, 3, …, 99, 100:  1, 10, 100, 11, 12,…, 2, 20, 21, …, 9, 91, 92, …, 98, 99 – Sort order for String 001, 002, 003, …, 099, 100:  001, 002, 003, …, 099, 100 – What if the RowKeys were numbers converted to fixed sized binary?
    • © 2014 MapR Technologies 12 Tables are split into Regions = contiguous keys • Tables are partitioned into key ranges (regions) • Region= contiguous keys, served by nodes (RegionServers) • Regions are spread across cluster: S1, S2… Source: Diagram from Lars George’s HBase: The Definitive Guide. Key RangeRegion1 Key Range axxx gxxx Region 2 Key Range Lxxx zxxx Region CF1 colA colB colC val val val CF2 colA colB colC val val val Region Row key axxx gxxx Region Server for Region 2, 3
    • © 2014 MapR Technologies 13 HBase Data Model- Cells • Value for each cell is specified by complete coordinates: – RowKey  Column Family  Column  Version: Value – Key:CF:Col:Version:Value RowKey CF:Qualifier version value smithj Data:street 1273456780 0 Main street Column Key
    • © 2014 MapR Technologies 14 Sparsely-Populated Data • Missing values: Cells remain empty and consume no storage RowK ey CF1 CF2 … colA colB colC colA colB colC colD R1 axxx val val val val … gxxx val val val val R2 hxxx val val val val val val val … jxxx val R3 kxxx val val val val … rxxx val val val val val val … sxxx val val
    • © 2014 MapR Technologies 15 HBase Data Model Summary • Efficient/Flexible – Storage allocated for columns only as needed on a given row • Great for sparse data • Great for data of widely varying size – Adding columns can be done at any time without impact – Compression and versioning are usually built-in and take advantage of column family storage (like data together) • Highly Scalable – Data is sharded amongst regions based upon key • Regions are distributed in cluster – Grouping by key = related data stored together • Finding data – Key implies region and server, column family implies file – Efficiently get to any data by key
    • © 2014 MapR Technologies 16 Agenda • HBase Overview • HBase APIs • MapR Tables • Example • Securing tables
    • © 2014 MapR Technologies 17 Basic Table Operations • Create Table, define Column Families before data is imported – But not the rows keys or number/names of columns • Basic data access operations (CRUD): put Inserts data into rows (both add and update) get Accesses data from one row scan Accesses data from a range of rows delete Delete a row or a range of rows or columns
    • © 2014 MapR Technologies 18 CRUD Operations Follow A Pattern (mostly) • Most common pattern – Instantiate object for an operation: Put put = new Put(key) – Add or Set attributes to specify what you need: put.add(…) – Execute the operation against the table: myTable.put(put) // Insert value1 into rowKey in columnFamily:columnName1 Put put = new Put(rowKey); put.add(columnFamily, columnName1, value1); myTable.put(put); // Retrieve values from rowA in columnFamily:columnName1 Get get = new Get(rowKey); get.addColumn(columnFamily, columnName1); Result result = myTable.get(get);
    • © 2014 MapR Technologies 19 Put Example byte [] invTable = Bytes.toBytes("/path/Inventory"); byte [] stockCF = Bytes.toBytes(“stock"); byte [] quantityCol = Bytes.toBytes (“quantity”); long amt = 24l; HTableInterface table = new HTable(hbaseConfig, invTable); Put put = new Put(Bytes.toBytes (“pens”)); put.add(stockCF, quantityCol, Bytes.toBytes(amt)); table.put(put); CF “stock” quantity pens 24 Inventory
    • © 2014 MapR Technologies 20 Put Operation – Add method • Once a Put instance is created you call an add method on it • Typically you add a value for a specific column in a column family – ("column name" and "qualifier" mean the same thing) • Optionally you can set a timestamp for a cell Put add(byte[] family, byte[] qualifier, long ts, byte[] value) Put add(byte[] family, byte[] qualifier, byte[] value)
    • © 2014 MapR Technologies 21 Put Operation –Single Put Example byte [] tableName = Bytes.toBytes("/path/Shopping"); byte [] itemsCF = Bytes.toBytes(“items"); byte [] penCol = Bytes.toBytes (“pens”); byte [] noteCol = Bytes.toBytes (“notes”); byte [] eraserCol = Bytes.toBytes (“erasers”); HTableInterface table = new HTable(hbaseConfig, tableName); Put put = new Put(“mike”); put.add(itemsCF, penCol, Bytes.toBytes(5l)); put.add(itemsCF, noteCol, Bytes.toBytes(5l)); put.add(itemsCF, eraserCol, Bytes.toBytes(2l)); table.put(put); Adding multiple column values to a row
    • © 2014 MapR Technologies 22 Bytes class – org.apache.hadoop.hbase.util.Bytes – Provides methods to convert Java types to and from byte[] arrays – Support for • String, boolean, short, int, long, double, and float – Example: http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/util/Bytes.html byte [] bytesTablePath = Bytes.toBytes("/path/Shopping"); String myTable = Bytes.toString(bytesTablePath); byte [] amountBytes = Bytes.toBytes(1000l); long amount = Bytes.toLong(amount);
    • © 2014 MapR Technologies 23 Get Operation – Single Get Example byte [] tableName = Bytes.toBytes("/path/Shopping"); byte [] itemsCF = Bytes.toBytes(“stock"); byte [] penCol = Bytes.toBytes (“pens”); HTableInterface table = new HTable(hbaseConfig, tableName); Get get = new Get(“Mike”); get.addColumn(itemsCF, penCol); Result result = myTable.get(get); byte[] val = result.getValue(itemsCF, penCol); System.out.println("Value: " + Bytes.toLong(val));
    • © 2014 MapR Technologies 24 Get Operation – Add And Set methods • Using just a get object will return everything for a row. • To narrow down results call add – addFamily: get all columns for a specific family – addColumn: get a specific column • To further narrow down results, specify more details via one or more set calls then call add – setTimeRange: retrieve columns within a specific range of version timestamps – setTimestamp: retrieve columns with a specific timestamp – setMaxVersions: set the number of versions of each column to be returned – setFilter: add a filter get.addColumn(columnFamilyName, columnName1);
    • © 2014 MapR Technologies 25 Result – Retrieve A Value From A Result public static final byte[] ITEMS_CF= Bytes.toBytes("items"); public static final byte[] PENS_COL = Bytes.toBytes(“pens"); Get g = new Get(Bytes.toBytes(“Adam”)); g.addColumn(ITEMS_CF , PENS_COL); Result result = table.get(g); byte[] b = result.getValue(ITEMS_CF, PENS_COL); long valueInColumn = Bytes.toLong(b); http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/Result.html Items:pens Items:notepads Items:erasers Adam 18 7 10
    • © 2014 MapR Technologies 26 Other APIs • Not covering append, delete, and scan • Not covering administrative APIs 26
    • © 2014 MapR Technologies 27 Agenda • HBase Overview • HBase APIs • MapR Tables • Example • Securing tables
    • © 2014 MapR Technologies 28 Tables and Files in a Unified Storage Layer MapR Filesystem is an integrated system – Tables and Files in a unified filesystem, based on MapR’s enterprise-grade storage layer. HBase JVM HDFS JVM ext3 FS Disks Apache HBase on Hadoop HBase JVM Apache HBase on MapR Filesystem MapR-FS Disks HDFS API M7 Tables Integrated into Filesystem MapR-FS Disks HBase API HDFS API
    • © 2014 MapR Technologies 29 Portability • MapR tables use the HBase data model and API • Apache HBase applications work as-is on MapR tables – No need to recompile – No vendor lock-in MapR-FS Disks HBase API HDFS API
    • © 2014 MapR Technologies 30 MapR M7 Table Storage • Table regions live inside a MapR container – Served by MapR fileserver service running on nodes – HBase RegionServer and HBase Master services are not required Region Container Key col B col C val val val Client Nodes Region Key col B col C val val val Region Container Key col B col C val val val Region Key col B col C val val val
    • © 2014 MapR Technologies 31 MapR Tables vs. HBase • Compaction delays • Manual administration • Poor reliability • Lengthy disaster recovery • No compaction delays • Easy administration • Strong consistency • Rapid recovery • 2x Cassandra performance • 3x Hbase performance Other NoSQL Service Disruptions 24x7 Uptime
    • © 2014 MapR Technologies 32 MapR M7 vs. HBase – Mixed Load (50-50)
    • © 2014 MapR Technologies 33 Agenda • HBase Overview • HBase APIs • MapR Tables • Example • Securing tables
    • © 2014 MapR Technologies 34 Example: Employee Database • Column Family: Base – lastName – firstName – address – SSN • Column Family: salary – ‘dynamic’ columns – year:salary • Row key – lastName:firstName? Not unique – Unique id? Can’t search easily – lastName:firstName:id? Can’t search by id 34
    • © 2014 MapR Technologies 35 Source: “employee class” public class Employee { String key; String lastName, firstName, address; String ssn; Map<Integer, Integer> salary; … } 35
    • © 2014 MapR Technologies 36 Source: ‘schema’ byte[] BASE_CF = Bytes.toBytes("base"); byte[] SALARY_CF = Bytes.toBytes("salary"); byte[] FIRST_COL = Bytes.toBytes("firstName"); byte[] LAST_COL = Bytes.toBytes("lastName"); byte[] ADDRESS_COL = Bytes.toBytes("address"); byte[] SSN_COL = Bytes.toBytes("ssn"); String tableName = userdirectory + "/" + shortName; byte[] TABLE_NAME = Bytes.toBytes(tableName); 36
    • © 2014 MapR Technologies 37 Source: “get table” HTablePool pool = new HTablePool(); table = pool.getTable(TABLE_NAME); return table; 37
    • © 2014 MapR Technologies 38 Source: “get row” • Whole row Get g = new Get(Bytes.toBytes(key)); Result result = getTable().get(g); • Just base column family Get g = new Get(Bytes.toBytes(key)); g.addFamily(BASE_CF); Result result = getTable().get(g); 38
    • © 2014 MapR Technologies 39 Source: “parse row” Employee e = new Employee(); e.setKey(Bytes.toString(r.getRow())); e.setLastName(getString(r, BASE_CF, LAST_COL)); e.setFirstName(getString(r,BASE_CF, FIRST_COL)); e.setAddress(getString(r,BASE_CF, ADDRESS_COL)); e.setSsn(getString(r,BASE_CF, SSN_COL)); String getString(Result r, byte[] cf, byte[] col) { byte[] b = r.getValue(cf, col); if (b != null) return Bytes.toString(b); else return ""; } 39
    • © 2014 MapR Technologies 40 Source: “parse row” //get salary information Map<byte[], byte[]> m = r.getFamilyMap(SALARY_CF); Iterator<Map.Entry<byte[], byte[]>> i = m.entrySet().iterator(); while (i.hasNext()) { Map.Entry<byte[], byte[]> entry = i.next(); Integer year = Integer.parseInt(Bytes.toString(entry.getKey())); Integer amt = Integer.parseInt(Bytes.toString( entry.getValue())); e.getSalary().put(year, amt); } 40
    • © 2014 MapR Technologies 41 Example • Create a table using MCS • Create a table and column families using maprcli 41 $ maprcli table create -path /user/keys/employees $ maprcli table cf create -path /user/keys/employees -cfname base $ maprcli table cf create -path /user/keys/employees -cfname salary
    • © 2014 MapR Technologies 42 Example • Populate with sample data using hbase shell 42 hbase> put '/user/keys/employees', 'k1', 'base:lastName', 'William' > put '/user/keys/employees', 'k1', 'base:firstName', 'John' > put '/user/keys/employees', 'k1', 'base:address', '123 street, springfield, VA' > put '/user/keys/empoyees', 'k1', 'base:ssn', '999-99-9999' > put '/user/keys/employees', 'k1', 'salary:2010', '90000’ > put '/user/keys/employees', 'k1', 'salary:2011', '91000’ > put '/user/keys/employees', 'k1', 'salary:2012', '92000’ > put '/user/keys/employees', 'k1', 'salary:2013', '93000’ ….….
    • © 2014 MapR Technologies 43 Example • Fetch record using java program 43 $ ./run employees get k1 Use command get against table /user/keys/employees Employee record: Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]
    • © 2014 MapR Technologies 45 What Didn’t I Consider? 45
    • © 2014 MapR Technologies 46 What Didn’t I Consider? • Row Key • Secondary ways of searching – Other tables as indexes? • Long term data evolution – Avro? – Protobufs? • Security – SSN is sensitive – Salary looks kind of sensitive 46
    • © 2014 MapR Technologies 47 Agenda • HBase Overview • HBase APIs • MapR Tables • Example • Securing tables
    • © 2014 MapR Technologies 48 MapR Tables Security • Access Control Expressions (ACEs) – Boolean logic to control access at table, column family, & column level
    • © 2014 MapR Technologies 49 ACE Highlights • Creator of table has all rights by default – Others have none • Can grant admin rights without granting read/write rights • Defaults for column families set at table level • Access to data depends on column family and column access controls • Boolean logic 49
    • © 2014 MapR Technologies 50 MapR Tables Security • Leverages MapR security when enabled – Wire level authentication – Wire level encryption – Trivial to configure • Most reasonable settings by default • No Kerberos required! – Portable • No MapR specific APIs 50
    • © 2014 MapR Technologies 51 Example • Enable cluster security • Yes, that’s it! – Now all Web UI and CLI access requires authentication – Traffic is now authenticated using encrypted credentials – Most traffic is encrypted and bulk data transfer traffic can be encrypted 51 # configure.sh –C hostname –Z hostname -secure –genkeys
    • © 2014 MapR Technologies 52 Example • Fetch record using java program when not authenticated 52 $ ./run employees get k1 Use command get against table /user/keys/employees 14/03/14 18:42:39 ERROR fs.MapRFileSystem: Exception while trying to get currentUser java.io.IOException: failure to login: Unable to obtain MapR credentials
    • © 2014 MapR Technologies 53 Example • Fetch record using java program 53 $ maprlogin password [Password for user 'keys' at cluster 'my.cluster.com': ] MapR credentials of user 'keys' for cluster 'my.cluster.com' are written to '/tmp/maprticket_1000' $ ./run employees get k1 Use command get against table /user/keys/employees Employee record: Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]
    • © 2014 MapR Technologies 54 Example • Fetch record using java program as someone not authorized to table 54 $ maprlogin password [Password for user 'fred' at cluster 'my.cluster.com': ] MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to '/tmp/maprticket_2001' $ ./run /user/keys/employees get k1 Use command get against table /user/keys/employees 2014-03-14 18:49:20,2787 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:7318 Thread: 139674989631232 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13) Exception in thread "main" java.io.IOException: Error: Permission denied(13)
    • © 2014 MapR Technologies 55 Example • Set ACEs to allow read to base information but not salary • Fetch whole record using java program 55 $ ./run /user/keys/employees get k1 Use command get against table /user/keys/employees 2014-03-14 18:53:15,0806 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:7318 Thread: 139715048077056 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13) Exception in thread "main" java.io.IOException: Error: Permission denied(13)
    • © 2014 MapR Technologies 56 Example • Set ACEs to allow read to base information but not salary • Fetch just base record using java program 56 $ ./run employees getbase k1 Use command get against table /user/keys/employees Employee record: Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={}]
    • © 2014 MapR Technologies 57 Agenda • HBase Overview • HBase APIs • MapR Tables • Example • Securing tables
    • © 2014 MapR Technologies 58 References • http://www.mapr.com/blog/getting-started-mapr-security-0 • http://www.mapr.com/ • http://hadoop.apache.org/ • http://hbase.apache.org/ • http://tech.flurry.com/2012/06/12/137492485/ • http://en.wikipedia.org/wiki/Lexicographical_order • Hbase in Action, Nick Dimiduck, Amandeep Khurana • HBase: The Definitive Guide, Lars George • Note: this presentation includes materials from the MapR HBase training classes
    • © 2014 MapR Technologies 59 www.hbasebook.com
    • © 2014 MapR Technologies 60 https://github.com/larsgeorge/hbase-book
    • © 2014 MapR Technologies 61 Agenda • HBase Overview • HBase APIs • MapR Tables • Example • Securing tables
    • © 2014 MapR Technologies 62 Q&A @mapr maprtech jwalsh@mapr.com Engage with us! MapR maprtech mapr-technologies
    • © 2014 MapR Technologies 63© 2014 MapR Technologies HBase Architecture