Dancing with the elephant h base1_final

1,780 views

Published on

Published in: Technology
  • Be the first to comment

Dancing with the elephant h base1_final

  1. 1. Dancing With The Elephant
  2. 2. We will discuss • Introduction to Hadoop • HBase: Definition, Storage Model, Usecases • Basic Data Access from shell • Hands-on with HBase API
  3. 3. What is Hadoop • Framework for distributed processing of large datasets(BigData) • HDFS+MapReduce • HDFS: (Data)  Distributed Filesystem responsible for storing data across cluster  Provides replication on cheap commodity hardware  Namenode and DataNode processes • MapReduce: (Processing)  May be a future session
  4. 4. HBase: What • a sparse, distributed, persistent, multidimensional, sorted map ( defined by Google’s paper on BigTable) • Distributed NoSQL Database designed on top of HDFS
  5. 5. RDBMS Woes (with massive data) • Scaling is Hard and Expensive • Turn off relational features/secondary indexes.. to scale • Hard to do quick reads at larger tables sizes(500 GB) • Single point of failures • Schema changes
  6. 6. HBase: Why • Scalable: Just add nodes as your data grows • Distributed: Leveraging Hadoop’s HDFS advantages • Built on top of Hadoop : Being part of the ecosystem, can be integrated to multiple tools • High performance for read/write  Short-Circuit reads  Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms • Schema less • Production-Ready where data is in order of petabytes
  7. 7. HBase: Storage Model 1
  8. 8. HTable • Tables are split into regions • Region: Data with continuous range of RowKeys from [Start to End) sorted Order • Regions split as Table grows (Region size can be configured) • Table Schema defines Column Families • (Table, RowKey, ColumnFamily, ColumnName, Timestamp)  Value
  9. 9. HTable(Data Structure) • SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  10. 10. HBase: Data Read/Write • Get: Random read • Scan: Sequential read • Put: Write/Update
  11. 11. HBase: Data Access Clients • Demo of HBase shell • Java API
  12. 12. HBase: API • Connection • DDL • DML • Filters • Hands-On
  13. 13. HBase: API • Configuration: holds details where to find the cluster and tunable setting . • Hconnection : represent connection to the cluster. • HBaseAdmin: handles DDL operations(create, list,drop,alter). • Htable (HTableInterface) :is a handle on a single Hbase table. Send “command” to the table (Put , Get , Scan , Delete , Increment)
  14. 14. HBase: API:DDL Group name: ddl (Data Defination Language) Commands: alter, create, describe, disable, drop, enable, exists, is_di sabled, is_enabled, list
  15. 15. HBase: API:DDL HBaseConfiguration conf = new HBaseConfiguration(); conf.set("hbase.master","localhost:60010"); HBaseAdmin hbase = new HBaseAdmin(conf); HTableDescriptor desc = new HTableDescriptor(" testtable "); HColumnDescriptor meta = new HColumnDescriptor(" colfam1 ".getBytes()); HColumnDescriptor prefix = new HColumnDescriptor(" colfam2 ".getBytes()); desc.addFamily(meta); desc.addFamily(prefix); hbase.createTable(desc);
  16. 16. HBase: API:DML Group name: dml (Data Manipulation Language) Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
  17. 17. HBase: API:DML PUT HTable table = new HTable(conf, "testtable"); Put put = new Put(Bytes.toBytes("row1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val2")); table.put(put);
  18. 18. HBase: API:DML GET Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "testtable"); Get get = new Get(Bytes.toBytes("row1")); get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q ual1")); Result result = table.get(get); byte[] val = result.getValue(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); System.out.println("Value: " + Bytes.toString(val));
  19. 19. HBase: API:DML SCAN Scan scan1 = new Scan(); ResultScanner scanner1 = table.getScanner(scan1); for (Result res : scanner1) { System.out.println(res); } scanner1.close();
  20. 20. Other Projects around HBase • SQL Layer: Phoenix, Hive, Impala • Object Persistence: Lily, Kundera
  21. 21. FollowUp • Part2:  Building KeyValue Data store in HBase  Challenges we faced in SMART • {Rahul, vinay}@briotribes.com
  22. 22. Shoutout To
  23. 23. HBase: Usecase (Facebook) • Facebook Messaging:  Titan  1.5 M ops per second at peak  6B+ messages per day  16 columns per operation across diff. families • Facebook insights:  Puma  provides developers and Page owners with metrics about their content  > 1 M counter increments per second

×