Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Based concepts to implement Create, Read, Update, Delete operations on HBase over Java API.

Follow us at LinkedIn:

Published in: Software
  • Login to see the comments


  1. 1. HBase CRUD Use Java API for Create, Read, Update, Delete operations
  2. 2. Agenda • Intro • Create • Insert • Update • Delete • Read – Table Scan • Read – Get Field • Conclusions
  3. 3. Intro A rowkey primarily represents each row uniquely in the HBase table, whereas other keys such as column family, timestamp, and so on are used to locate a piece of data in an HBase table. The HBase API provides the following methods to support the CRUD operations: • Put • Get • Delete • Scan • Increment You could find source code for this presentation on github:
  4. 4. Create Table creates in ‘Enabled’ state. Check table creation in Hue (Cloudera CDH 5.1.0) and hbase shell
  5. 5. Insert Use HConnection.getTable() against HTablePool as last is deprecated in 0.94, 0.95/0.96, and removed in 0.98 .
  6. 6. Insert All manipulations with table implements through HTableInterface. HTable represents particular table in Hbase. The HTable class is not thread-safe as concurrent modifications are not safe. Hence, a single instance of HTable for each thread should be used in any application. For multiple HTable instances with the same configuration reference, the same underlying HConnection instance can be used. RowKey is main point to consider when configuring table structure. Use compound RowKey with SHA1, MD5 hashing algorithms (with additional reverse timestamp part) as Hbase store data sorted.
  7. 7. Update Data in Hbase is versioned, by default there’re last 3 values stored into column. Use HColumnDescriptor.setMaxVersions(n) method to overwrite this value.
  8. 8. Delete Value for “user_name” qual changed to previous version.
  9. 9. Read – Table Scan Table Scan... PaulRK Paul
  10. 10. Read – Get Field Get particular Field... rowKey = MikeRK, user_name: Mike rowKey = MikeRK, user_mail:
  11. 11. Conclusions • HTable is expensive Creating HTable instances also comes at a cost. Creating an HTable instance is a slow process as the creation of each HTable instance involves the scanning of the .META table to check whether the table actually exists, which makes the operation very costly. Hence, it is not recommended that you use a new HTable instance for each request where the number of concurrent requests are very high • Scan cashing A scan can be configured to retrieve a batch of rows in every RPC call it makes to HBase. This configuration can be done at a per-scanner level by using the setCaching(int) API on the scan object. This configuration can also be set in the hbasesite.xml configuration file using the hbase.client.scanner.caching property • Increment Increment Column Value (ICV). It’s exposed as both the Increment command object like the others but also as a method on the HTableInterface. This command allows you to change an integral value stored in an HBase cell without reading it back first. The data manipulation happens in HBase, not in your client application, which makes it fast. It also avoids a possible race condition where some other client is interacting with the same cell. • Filter A filter is a predicate that executes in HBase instead of on the client. When you specify a Filter in your Scan, HBase uses it to determine whether a record should be returned. This can avoid a lot of unnecessary data transfer. It also keeps the filtering on the server instead of placing that burden on the client. The filter applied is anything implementing the org.apache.hadoop.hbase.filter.Filter interface. HBase provides a number of filters, but it’s easy to implement your own.
  12. 12. Thank you ushin.evgenij