Scott Miao 2012/7/5                      1
Agenda   Course Credit   Hands-on       Install tm-puppet   Client API: The basics   Hands-on       Write your own C...
General Notes (1/2)   Any mutate data operations are atomic    on a per-row basis   Create HTable instances only once fo...
General Notes (2/2)   Configuration     Load hbase-default & hbase-site.xml in      CLASSPATH     Set properties in had...
Client API: The basics Put Get Delete Batch operations Row Locks Scan Source code: https://github.com/larsgeorge/hba...
Put method - Single Put   ch03/client.PutExample   Notice the timestamp     ts will set by HBase if user not provide   ...
Put method - KeyValue   The low level data bytes in Client APIs     <row-key>/<family>:<qualifier>/<version>/<type>/<val...
Put method –Client-side write buffer (1/3) collects put operations so that they are  sent in one RPC call to the server(s...
Put method –Client-side write buffer (2/3)   Round-trip time     Is the time it takes for a client to send a      reques...
Put method –Client-side write buffer (3/3)                                 10
Put method – List of Puts ch03/client.PutListExample ch03/client. PutListErrorExample2                                  ...
Put method –Atomic compare-and-set   A check before put   ch03/client.CheckAndPutExample   Can not cross the row       ...
Get method – Single Gets client.GetExample Result Class     Contains all the matching cells                            ...
Get method – List of Gets client.GetListExample client. GetListErrorExample client.GetRowOrBeforeExample     Find the ...
Delete method – Single Deletes   client.DeleteExample                                 15
Delete method – List of Deletes client.DeleteListExample client.DeleteListErrorExample                                  16
Delete method –Atomic compare-and-delete   client.CheckAndDeleteExample                                   17
Batch Operations   client.BatchExample     No client-side buffer, just like Put operations                              ...
Row Locks (1/3)   Two types of lock     Server side lock      ○ Servers will create a lock implicitly on your        beh...
Row Locks (2/3)   Avoid using row locks whenever    possible   Do Gets require a Lock ?     No, while a mutation is in ...
Row Locks (3/3)   When to release RowLock ?     Current lock has been released     The lease on the lock has expired   ...
Scans (1/3)   A technique akin to cursors in database    systems     which make use of the underlying     sequential, so...
Scans (2/3)   Scan(byte[] startRow, byte[] stopRow)     [startRow, stopRow) Scan addFamily(byte [] family) Scan addCol...
Scans (3/3)   ch03/client.ScanExample   Scans do not ship all the matching rows in    one RPC to the client     one cal...
Scans – Caching (1/2) Deal with small data rows with huge data  set size Table level     void HTable.setScannerCaching(...
Scans – Caching (2/2)   Need to find a sweet spot between     A low number of RPCs     The memory used on client and se...
Scans – Batching   Deal with very large rows     Those do not fit into the memory of the client      process     batchi...
Scans – Caching & Batching(1/3) ch03/client.ScanCacheBatchExample 10 rows * 20 columns per row = 200 columns            ...
Scans – Caching & Batching    (2/3)   RPCs = (Rows * Cols per Row) / Min(Cols per    Row, Batch Size) / Scanner Caching  ...
Scans – Caching & Batching(3/3) 1 Table, 9 Rows, with some columns Caching set to 6, batch set to 3                     ...
Hands-On –Write your own CRUD codes(1/3)   In hbase shell     Create table      ○ A Table „MY_SECOND_TABLE‟      ○ With ...
Hands-On –    Write your own CRUD codes    (2/3)   Environment     Let project_home = ${git_home}/hbase-      training/0...
Hands-On –Write your own CRUD codes(3/3)   Requirements     After you completed your codes     Run command in ${project...
Client API: AdvancedFeatures Filters Counters Coprocessors HTable Pool Connection Handling                        34
Filters   Get     Direct access to data   Scan     Use start/end key   Filters     More limiting selectors to the qu...
Filters – How Filters work                             36
Filters –Hierarchy   Various Filter    impl.s for your    needs   You also can write    your own impl.                  ...
Filters – Comparison FiltersThey take the comparison operator andcomparator instance     Class Name                       ...
Filters –CompareFilterOperators                39
Filters –CompareFilter Comparators                            40
Filters – CompareFilterexample   ch04/filters.RowFilterExample                                    41
Filters – Dedicated Filters   Mainly used in the Scan, they basically   filter out entire rows     Class Name             ...
Filters – Dedicated Filters      Class Name                                  DescriptionInclusiveStopFilter      Change th...
Filters – Decorating Filters   Class Name                              DescriptionSkipFilter         wraps a given filter ...
Filters - FilterList   In practice, you may want to have more    than one filter being applied to reduce    the data retu...
Filters - FilterList   ch04/filters.FilterListExample     First scan filters is like     Second scan filters is like   ...
Filters – Custom Filters   If there is no any Filters    can help your needs     You could make one by      yourself !!...
Filters – Custom Filters ch04/filters.CustomFilter ch04/filters.CustomFilterExample   Custom Filters Deployment (costly...
Filters - Summary                    49
Filters - Summary                    50
Counters   Many applications that collect statistics     such as clicks or views in online advertising     were used to...
Counters - shell   Create a table     create counters, daily, weekly, monthly„   Initial a counter     incr counters, ...
Counters - shell   You can also fine-tune your counter     incr counters, 20110101, daily:hits, 0     incr counters, 20...
Counters - API   Single Counters     ch04/client.IncrementSingleExample   Multiple Counters     ch04/client.IncrementM...
Coprocessors   With the coprocessor feature in HBase,    you can even move part of the    computation to where the data l...
Coprocessors   Two types     Observer      ○ Trigger-like     Endpoint      ○ Stored procedure-like   Usecases     Ag...
Coprocessors –Coprocessor Class   Priorities defined in Coprocessor.Priority    enumeration                              ...
Coprocessors –Coprocessor Class   State defined in Coprocessor.State    enumeration                                      ...
Coprossesor – MainClasses                     59
Coprossesor – Flow                     60
Coprocessor – Loading from    Configuration   Add following description in hbase-site.xml   Region, master, wal are diff...
Coprocessor – Loading from    table descriptor   Use HTableDescriptor.setValue(String key, String value)   Key spec.    ...
Coprocessor - Observer   callback functions (hooks) are executed    when certain events occur   Known as Triggers in DBM...
Coprocessor – Observer mainclaases                              64
Coprocessor – RegionObserverand Region Life Cycle                               65
Coprocessor – RegionObserverClasses• Handling region life cycle events• Handling client API events• ch04/coprocessor.Regio...
Coprocessor – MasterObserverClasses• ch04/coprocessor.MasterObserverExample                                           67
Coprocessor - Endpoint   User code can be deployed to the    servers hosting the data to, for example,    perform server-...
Coprocessor – Endpoint mainClasses •   ch04/coprocessor.RowCountProtocol •   ch04/coprocessor.RowCountEndpoint •   ch04/co...
Coprocessor –Single Region V.S. Range of regions                                      70
HTablePool Creating an HTable instance takes a few  seconds to complete It is not be capable in highly contended  enviro...
HTablePool – Sample code                           72
Connection Handling   Use the shared Connection as you can                                           73
Connection Handling –Main Classes                        74
Connection Handling –Features Share ZooKeeper connections     initial lookup of where user table regions are      locate...
中場休息~        76
Upcoming SlideShare
Loading in...5
×

002 hbase clientapi

1,906

Published on

the HBase training series part II

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,906
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

002 hbase clientapi

  1. 1. Scott Miao 2012/7/5 1
  2. 2. Agenda  Course Credit  Hands-on  Install tm-puppet  Client API: The basics  Hands-on  Write your own CRUD codes~  Client API: Advanced Feature  All refers toHbase: The Definitive Guide - http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100/ref=sr_1_1?ie=UTF8&qid=1339060175&sr=8-1 2
  3. 3. General Notes (1/2) Any mutate data operations are atomic on a per-row basis Create HTable instances only once for each thread  HTable is not thread-safe  Can use HTablePool Familiar with the API docs 3
  4. 4. General Notes (2/2) Configuration  Load hbase-default & hbase-site.xml in CLASSPATH  Set properties in hadoop CLI  Set properties in Java code  Set in Java code > hadoop CLI > hbase- site.xml > hbase-default.xml 4
  5. 5. Client API: The basics Put Get Delete Batch operations Row Locks Scan Source code: https://github.com/larsgeorge/hbase-book 5
  6. 6. Put method - Single Put ch03/client.PutExample Notice the timestamp  ts will set by HBase if user not provide  ts determines version in HBase (default is 3)  ts may confuse the HBase versioning if Client‟s timezone is not identical 6
  7. 7. Put method - KeyValue The low level data bytes in Client APIs  <row-key>/<family>:<qualifier>/<version>/<type>/<value- length>  Put.add(KeyValue kv);  Map<byte[], List<KeyValue>> Put.getFamilyMap(); 7
  8. 8. Put method –Client-side write buffer (1/3) collects put operations so that they are sent in one RPC call to the server(s) ch03/client.PutWriteBufferExample sample code  long getWriteBufferSize()  void setWriteBufferSize(long writeBufferSize)  Default is 2 MB bytes  Configuration property ○ hbase.client.write.buffer 8
  9. 9. Put method –Client-side write buffer (2/3) Round-trip time  Is the time it takes for a client to send a request and the server to send a response over the network  Not include the Data-Transfer Time ○ Data size is a factor  On average, 1ms on a LAN ○ 1000 round-trips per second Usecase  Small data size but many requests to send 9
  10. 10. Put method –Client-side write buffer (3/3) 10
  11. 11. Put method – List of Puts ch03/client.PutListExample ch03/client. PutListErrorExample2 11
  12. 12. Put method –Atomic compare-and-set A check before put ch03/client.CheckAndPutExample Can not cross the row 12
  13. 13. Get method – Single Gets client.GetExample Result Class  Contains all the matching cells 13
  14. 14. Get method – List of Gets client.GetListExample client. GetListErrorExample client.GetRowOrBeforeExample  Find the specified rowKey  Previous row if not found  Null if no any found 14
  15. 15. Delete method – Single Deletes client.DeleteExample 15
  16. 16. Delete method – List of Deletes client.DeleteListExample client.DeleteListErrorExample 16
  17. 17. Delete method –Atomic compare-and-delete client.CheckAndDeleteExample 17
  18. 18. Batch Operations client.BatchExample  No client-side buffer, just like Put operations 18
  19. 19. Row Locks (1/3) Two types of lock  Server side lock ○ Servers will create a lock implicitly on your behalf, just for the duration of the call  Client side lock ○ Clients can also acquire explicit locks and use them across multiple operations on the same row ○ RowLock Class client.RowLockExample 19
  20. 20. Row Locks (2/3) Avoid using row locks whenever possible Do Gets require a Lock ?  No, while a mutation is in progress, all reading clients will be seeing the previous state of all columns 20
  21. 21. Row Locks (3/3) When to release RowLock ?  Current lock has been released  The lease on the lock has expired ○ Configuration Property on the server side  hbase.regionserver.lease.period  Default is 1 min. ○ org.apache.hadoop.hbase.regionserver.LeaseExce ption: 21
  22. 22. Scans (1/3) A technique akin to cursors in database systems  which make use of the underlying sequential, sorted storage layout HBase is providing Narrowing the scan‟s scope is playing into the strengths of HBase  Since data is stored in column families, you will not read the unrelated families storage files at all 22
  23. 23. Scans (2/3) Scan(byte[] startRow, byte[] stopRow)  [startRow, stopRow) Scan addFamily(byte [] family) Scan addColumn(byte[] family, byte[] qualifier) Scan setTimeRange(long minStamp, long maxStamp) throws IOException Scan setTimeStamp(long timestamp) Scan setMaxVersions() Scan setMaxVersions(int maxVersions) Scan setFilter(Filter filter) 23
  24. 24. Scans (3/3) ch03/client.ScanExample Scans do not ship all the matching rows in one RPC to the client  one call would use up too many resources, and take a long time ResultScanner wraps the Result instance for each row into an iterator functionality An iterator functionality  Just like JDBC ResultSet 24
  25. 25. Scans – Caching (1/2) Deal with small data rows with huge data set size Table level  void HTable.setScannerCaching(int scannerCaching) Scan level  void Scan.setCaching(int caching) In configuration file (hbase-site.xml)  hbase.client.scanner.caching  Will take effect depends on you put it on the client or server side 25
  26. 26. Scans – Caching (2/2) Need to find a sweet spot between  A low number of RPCs  The memory used on client and server side Using the same lease-based mechanisms with RowLock org.apache.hadoop.hbase.client.Scanne rTimeoutException: 65094ms passed since the last invocation, timeout is currently set to 60000 ch03/client.ScanTimeoutExample 26
  27. 27. Scans – Batching Deal with very large rows  Those do not fit into the memory of the client process  batching works on the column level void Scan.setBatch(int batch) For example, your row has 17 columns and you set the batch to 5…  You‟ll get four Result instances, with 5, 5, 5, and 2 27
  28. 28. Scans – Caching & Batching(1/3) ch03/client.ScanCacheBatchExample 10 rows * 20 columns per row = 200 columns 28
  29. 29. Scans – Caching & Batching (2/3) RPCs = (Rows * Cols per Row) / Min(Cols per Row, Batch Size) / Scanner Caching = (10 * 20) / Min(20, 20) / 5 = 200 / 20 / 5 =2 2 + 1 or 2 requests for open/close Scanner = 3 or 4 29
  30. 30. Scans – Caching & Batching(3/3) 1 Table, 9 Rows, with some columns Caching set to 6, batch set to 3 30
  31. 31. Hands-On –Write your own CRUD codes(1/3) In hbase shell  Create table ○ A Table „MY_SECOND_TABLE‟ ○ With two column families „FAM_1‟, „FAM_2‟ In java code  Put ○ Two values  Scan Table  Delete ○ One value  Get the last one value 31
  32. 32. Hands-On – Write your own CRUD codes (2/3) Environment  Let project_home = ${git_home}/hbase- training/002/hands-on/${your_name}  mkdir ${project_home}  cp –rf ${git_home}/hbase-training/002/projects/training- 002 ${project_home} Write java codes in  ${project_home}/src/main/java/client/CrudTest.java 32
  33. 33. Hands-On –Write your own CRUD codes(3/3) Requirements  After you completed your codes  Run command in ${project_home} ○ Build the jar file  mvn clean package ○ Run the jar file you built  sh bin/run.sh > output.txt ○ output.txt  Ran successfully and output the Hbase data  I will verify this file in git Commit and push your git 33
  34. 34. Client API: AdvancedFeatures Filters Counters Coprocessors HTable Pool Connection Handling 34
  35. 35. Filters Get  Direct access to data Scan  Use start/end key Filters  More limiting selectors to the query  Applied on the server side  Including ○ Column families, column qualifiers, timestamps or ranges, version number 35
  36. 36. Filters – How Filters work 36
  37. 37. Filters –Hierarchy Various Filter impl.s for your needs You also can write your own impl. 37
  38. 38. Filters – Comparison FiltersThey take the comparison operator andcomparator instance Class Name DescriptionRowFilter It is used to filter based on the row keyFamilyFilter It is used to filter based on the column familyQualifierFilter It is used to filter based on the column qualifierValueFilter It is used to filter based on column valueDependentColumnFil It uses the timestamp of the reference column andter includes all other columns that have the same timestamp 38
  39. 39. Filters –CompareFilterOperators 39
  40. 40. Filters –CompareFilter Comparators 40
  41. 41. Filters – CompareFilterexample ch04/filters.RowFilterExample 41
  42. 42. Filters – Dedicated Filters Mainly used in the Scan, they basically filter out entire rows Class Name DescriptionSingleColumnValueFil It is used to filter cells based on valueterSingleColumnValueEx Opposite with SingleColumnValueFiltercludeFilterPrefixFilter All rows that match this prefix are returned to the clientPageFilter It controls how many rows per page should be returnedKeyOnlyFilter It access just the keys of each KeyValue, while omitting the actual dataFirstKeyOnlyFilter It access the key of first column in each row, and bypass the rest 42
  43. 43. Filters – Dedicated Filters Class Name DescriptionInclusiveStopFilter Change the Scan [startRow, stopRow) to [startRow, stopRow]TimestampFilter It returns only cells whose timestamp (version) is in the specified list of timestamps (versions)ColumnCountGetFilter It returns first N columns on row only, for HBase test purposeColumnPaginationFilter Similar to the PageFilter, this one can be used to page through columns in a rowColumnPrefixFilter Analog to the PrefixFilter, which worked by filtering on row key prefixes, this filter does the same for columnsRandomRowFilter It including random rows into the result 43
  44. 44. Filters – Decorating Filters Class Name DescriptionSkipFilter wraps a given filter and extends it to exclude an entire row, when the wrapped filter hints for a KeyValue to be skippedWhileMatchFilter It aborts the entire scan once a piece of information is filtered 44
  45. 45. Filters - FilterList In practice, you may want to have more than one filter being applied to reduce the data returned to your client application Operators 45
  46. 46. Filters - FilterList ch04/filters.FilterListExample  First scan filters is like  Second scan filters is like 46
  47. 47. Filters – Custom Filters If there is no any Filters can help your needs  You could make one by yourself !! Make a Filter Impl. extended from  Filter  FilterBase 47
  48. 48. Filters – Custom Filters ch04/filters.CustomFilter ch04/filters.CustomFilterExample Custom Filters Deployment (costly) 1. Build jar file 2. Put jar file on the every region server 3. Append jar file path into $CLASSPATH in hbase-env.sh 4. Restart all HBase daemons 48
  49. 49. Filters - Summary 49
  50. 50. Filters - Summary 50
  51. 51. Counters Many applications that collect statistics  such as clicks or views in online advertising  were used to collect the data in logfiles that would subsequently be analyzed The Counter is all you need !! 51
  52. 52. Counters - shell Create a table  create counters, daily, weekly, monthly„ Initial a counter  incr counters, 20110101, daily:hits, 1  Let‟s do it twice Get your counter  get_counter counters, 20110101, daily:hits 52
  53. 53. Counters - shell You can also fine-tune your counter  incr counters, 20110101, daily:hits, 0  incr counters, 20110101, daily:hits, -1 Do not use put as incr, despite counter is also a value  Data type issue, long V.S. String Use get_counter not get  It is more human readable~ 53
  54. 54. Counters - API Single Counters  ch04/client.IncrementSingleExample Multiple Counters  ch04/client.IncrementMultipleExample 54
  55. 55. Coprocessors With the coprocessor feature in HBase, you can even move part of the computation to where the data lives As a small MapReduce framework, which can distribute the work across the entire cluster 55
  56. 56. Coprocessors Two types  Observer ○ Trigger-like  Endpoint ○ Stored procedure-like Usecases  Aggregate functions, sum(), avg()  Integrity Checks, put some data and other data must exist  Authentication, authorization and auditing ○ Based on Coprocessors from 0.92 HBase 56
  57. 57. Coprocessors –Coprocessor Class Priorities defined in Coprocessor.Priority enumeration 57
  58. 58. Coprocessors –Coprocessor Class State defined in Coprocessor.State enumeration 58
  59. 59. Coprossesor – MainClasses 59
  60. 60. Coprossesor – Flow 60
  61. 61. Coprocessor – Loading from Configuration Add following description in hbase-site.xml Region, master, wal are different Observers The order of Class fully-qualified names in value, will determine the execution order And follow the Custom-Filter deployment way For every table and region 61
  62. 62. Coprocessor – Loading from table descriptor Use HTableDescriptor.setValue(String key, String value) Key spec.  COPROCESSOR[$<number>]  Ex. ○ COPROCESSOR$1 Value spec.  <jarFilePath>|<classFullyQualifiedName>|<priority>  Ex. ○ “hdfs://localhost:8020/users/leon/test.jar|coprocessor.Test|SYSTEM”  jarFilePath could be any protocol supported by Hadoop FileSystem Class Ch04/coprocessor.LoadWithTableDescriptorExample Only for regions of specified table 62
  63. 63. Coprocessor - Observer  callback functions (hooks) are executed when certain events occur  Known as Triggers in DBMS Observer Type DescriptionRegionObserver Observse events bound to the regions of a tableMasterObserver Observe evens bound to administrative or DDL-type operations (cluster-wide event)WALObserver Observe events bound to WAL log (Write-ahead log) processing 63
  64. 64. Coprocessor – Observer mainclaases 64
  65. 65. Coprocessor – RegionObserverand Region Life Cycle 65
  66. 66. Coprocessor – RegionObserverClasses• Handling region life cycle events• Handling client API events• ch04/coprocessor.RegionObserverExample 66
  67. 67. Coprocessor – MasterObserverClasses• ch04/coprocessor.MasterObserverExample 67
  68. 68. Coprocessor - Endpoint User code can be deployed to the servers hosting the data to, for example, perform server-local computations Known as Stored procedures in DBMS Can be combined with observer implementations to directly interact with the server-side state 68
  69. 69. Coprocessor – Endpoint mainClasses • ch04/coprocessor.RowCountProtocol • ch04/coprocessor.RowCountEndpoint • ch04/coprocessor.EndpointExample • ch04/coprocessor.EndpointProxyExample 69
  70. 70. Coprocessor –Single Region V.S. Range of regions 70
  71. 71. HTablePool Creating an HTable instance takes a few seconds to complete It is not be capable in highly contended environment with thousands of requests per second Keep one HTable instance for multiple uses, but it is not thread-safe 71
  72. 72. HTablePool – Sample code 72
  73. 73. Connection Handling Use the shared Connection as you can 73
  74. 74. Connection Handling –Main Classes 74
  75. 75. Connection Handling –Features Share ZooKeeper connections  initial lookup of where user table regions are located Cache common resources  Location is cached on the client side after first round-trips with ZooKeeper and other servers  When a lookup fails ○ Ex. A region was split ○ A built-in retry mechanism to refresh the stale cache information Do not forget to release your shared Connection  HTable.close()  HTablePool.closeTablePool(…) 75
  76. 76. 中場休息~ 76
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×