Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

003 admin featuresandclients


Published on

Published in: Technology
  • Be the first to comment

003 admin featuresandclients

  1. 1. Scott Miao 2012/7/12HBase Admin API & Available Clients1
  2. 2. Agenda Course Credit HBaseAdmin APIs HTableDescriptor HColumnDescriptor HBaseAdmin Available Clients Interactive Clients Batch Clients Shell Web-based UI2
  3. 3. Course Credit Show up, 30 scores Ask question, each question earns 5 scores Hands-on, 40 scores 70 scores will pass this course Each course credit will be calculated once for each coursefinished The course credit will be sent to you and your supervisor bymail3
  4. 4. Hadoop RPC framework Writable interface void write(DataOutput out) throws IOException; Serialize the Object data and send to remote void readFields(DataInput in) throws IOException; New an instance and deserialize the remote-data for subsequentoperations Parameterless Constructor Hadoop will instantiate a empty Object Call the readFields method to deserialize the remote data4
  5. 5. HTableDescriptor Constructor HTableDescriptor(); HTableDescriptor(String name); HTableDescriptor(byte[] name); HTableDescriptor(HTableDescriptor desc); ch05/admin.CreateTableExample Can be used to fine-tune the table’s performance5
  6. 6. HTableDescriptor – Logical V.S. physical views6
  7. 7. HTableDescriptor - PropertiesProperty DescriptionName SpecifyTable Namebyte[] getName();String getNameAsString();void setName(byte[] name);Column Families Specify column familyvoid addFamily(HColumnDescriptor family);boolean hasFamily(byte[] c);HColumnDescriptor[] getColumnFamilies();HColumnDescriptor getFamily(byte[]column);HColumnDescriptor removeFamily(byte[] column);Maximum File Size Specify maximum size a region within the table can grow tolong getMaxFileSize();void setMaxFileSize(long maxFileSize);It really about the maximum size of each store, the better name would bemaxStoreSize; By default, it’s size is 256 MB, a larger value may be requiredwhen you have a lot of data.7
  8. 8. HTableDescriptor - PropertiesProperty DescriptionRead-only By default, all tables are writable, If the flag is set to true, you can only readfrom the table and not modify it at all.boolean isReadOnly();void setReadOnly(boolean readOnly);Memstore flush size An in-memory store to buffer values before writing them to disk as a newstorage file. default 64 MB.long getMemStoreFlushSize();void setMemStoreFlushSize(long memstoreFlushSize);Deferred log flush Save write-ahead-log entries to disk, by default, set to false.synchronized boolean isDeferredLogFlush();void setDeferredLogFlush(boolean isDeferredLogFlush);Miscellaneous options Stored with the table definition and can be retrieved if necessary.byte[] getValue(byte[] key)String getValue(String key)Map<ImmutableBytesWritable,ImmutableBytesWritable> getValues()void setValue(byte[] key,byte[] value)void setValue(String key,String value)void remove(byte[] key)8
  9. 9. HColumnDescriptor A more appropriate name would be HColumnFamilyDescriptor The family name must be printable You cannot simply rename them later Constructor HColumnDescriptor(); HColumnDescriptor(String familyName), HColumnDescriptor(byte[] familyName); HColumnDescriptor(HColumnDescriptor desc); HColumnDescriptor(byte[] familyName,int maxVersions,String compression, boolean inMemory,boolean blockCacheEnabled,int timeToLive, String bloomFilter); HColumnDescriptor(byte [] familyName,int maxVersions,String compression, boolean inMemory,boolean blockCacheEnabled,int blocksize, int timeToLive,String bloomFilter,int scope);9
  10. 10. HColumnDescriptor –Column families V.S. store files10
  11. 11. Property DescriptionName Specify column family name.A column family cannot be renamed, create a new familywith the desired name and copy the data over, using theAPIbyte[] getName();String getNameAsString();MaximumversionsPredicate deletion. How many versions of each value you want to keep. Default value is 3int getMaxVersions();void setMaxVersions(int maxVersions);Compression HBase has pluggable compression algorithm support. Default value is NONE.HColumnDescriptor – Properties11
  12. 12. HColumnDescriptor – PropertiesProperty DescriptionBlock size All stored files are divided into smaller blocks that are loaded during a get or scanoperation, default value is 64KB.synchronized int getBlocksize();void setBlocksize(int s);HDFS is using a block size of—by default—64 MBBlock cache HBase reads entire blocks of data for efficient I/O usage and retains these blocksin an in-memory cache so that subsequent reads do not need any disk operation.Thedefault is true.boolean isBlockCacheEnabled();void setBlockCacheEnabled(boolean blockCacheEnabled);if your use case only ever has sequential reads on a particular column family, it isadvisable that you disable it.Time-to-live (TTL) Predicate deletion.A threshold based on the timestamp of a value and the internalhousekeeping is checking automatically if a value exceeds getTimeToLive();void setTimeToLive(int timeToLive);By default, keeping the values forever (set to Integer.MAX_VALUE)12
  13. 13. HColumnDescriptor – PropertiesProperty DescriptionIn-memory lock cache and how HBase is using it to keep entire blocks of data in memory forefficient sequential access to values.The in-memory flag defaults to false.boolean isInMemory();void setInMemory(boolean inMemory);is good for small column families with few values, such as the passwords of a usertable, so that logins can be processed very fast.Bloom filter Allowing you to improve lookup times given you have a specific access pattern.Since they add overhead in terms of storage and memory, they are turned off bydefault.Replication scope It enables you to have multiple clusters that ship local updates across the network sothat they are applied to the remote copies. By default is 0.13
  14. 14. HBaseAdmin Just like a DDL in RDBMSs Create tables with specific column families Check for table existence Alter table and column family definitions Drop tables And more…14
  15. 15. HBaseAdmin – Basic Operations boolean isMasterRunning() HConnection getConnection() Configuration getConfiguration() close()15
  16. 16. HBaseAdmin – Table Operations Table-related admin.API They are asynchronous in nature createTable() V.S. createTableAsync(), etc CreateTable ch05/admin.CreateTableExample ch05/admin.CreateTableWithRegionsExample A numRegions that is at least 3: otherwise, the call will return with anexception This is to ensure that you end up with at least a minimum set of regions16
  17. 17. HBaseAdmin – Table Operations DoesTable exist ch05/admin.ListTablesExample You should be using existing table names Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown DeleteTable ch05/admin.TableOperationsExample Disabling a table can potentially take a very long time, up to severalminutes Depending on how much data is residual in the server’s memory andnot yet persisted to disk Undeploying a region requires all the data to be written to disk first isTableAvailable() V.S. isTableEnabled()/isTableDisabled()17
  18. 18. HBaseAdmin – Table Operations ModifyTable ch05/admin.ModifyTableExample HTableDescriptor.equals() Compares the current with the specified instance Returns true if they match in all properties Also including the contained column families and their respective settings18
  19. 19. HBaseAdmin – Schema Operations Besides using the modifyTable() call, there are dedicatedmethods provided by the HBaseAdmin Make sure the table to be modified is disabled first All of these calls are asynchronous void addColumn(String tableName,HColumnDescriptor column) void addColumn(byte[] tableName,HColumnDescriptor column) void deleteColumn(String tableName,String columnName) void deleteColumn(byte[] tableName,byte[] columnName) void modifyColumn(String tableName,HColumnDescriptor descriptor) void modifyColumn(byte[] tableName,HColumnDescriptor descriptor)19
  20. 20. HBaseAdmin – Cluster OperationsMethods in HBaseAdmin Class Description• static voidcheckHBaseAvailable(Configurationconf)• ClusterStatus getClusterStatus()• Client application can com-municate with the remoteHBase cluster, either silently succeeds, or throws said error• Retrieve an instance of the ClusterStatus class,containing detailed information about the cluster status• void closeRegion(String regionname,String hostAndPort)• void closeRegion(byte[] regionname,String hostAndPort)Close regions that have previously been deployed to regionservers. Does bypass any master notification, the region isdirectly closed by the region server, unseen by the masternode.• void flush(StringtableNameOrRegionName)• void flush(byte[]tableNameOrRegionName)Call the MemStore instances of the region or table, to flushthe cached modification data into disk. Or the data would bewritten by hitting the memstore flush size.For advanced users, so please check theseAPI in the document and handle with care20
  21. 21. HBaseAdmin – Cluster OperationsMethods in HBaseAdminClassDescription• void compact(StringtableNameOrRegionName)• void compact(byte[]tableNameOrRegionName)Minor-compaction, compactions can potentially take a longtime to complete. It is executed in the background by theserver hosting the named region, or by all servers hostingany region of the given table• void majorCompact(StringtableNameOrRegionName)• void majorCompact(byte[]tableNameOrRegionName)Major-compaction• void split(StringtableNameOrRegionName)• void split(byte[]tableNameOrRegionName)• …These calls allows you to split a specific region, or table21
  22. 22. HBaseAdmin – Cluster OperationsMethods in HBaseAdminClassDescription• void assign(byte[] regionName,boolean force)• void unassign(byte[]regionName,boolean force)A client requires a region to be deployed or undeployed fromthe region servers, it can invoke these calls.• void move(byte[]encodedRegionName,byte[]destServerName)Move a region from its current region server to a new one.The destServerName parameter can be set to null to pick a newserver at random.• boolean balanceSwitch(booleanb)• boolean balancer()• Allows you to switch the region balancer on or off.• A call to balancer() will start the process of moving regions• from the servers, with more deployed to those with lessdeployed regions.• void shutdown()• void stopMaster()• void stopRegionServer(StringhostnamePort)• Shut down the entire cluster• Stop the master server• Stop a particular region server only• Once invoked, the affected servers will be stopped, that is,there is no delay nor a way to revert the process22
  23. 23. HBaseAdmin –Cluster Status Information You can get more details info. about your HBase cluster fromHBaseAdmin.getClusterStatus() Related Classes ClusterStatus ServerName => HServerInfo HServerLoad RegionLoad ch05/admin.ClusterStatusExample23
  24. 24. Available Clients HBase comes with a variety of clients that can be used fromvarious programming languages Interactive Clients Native JavaAPI REST Thrift Avro Batch Clients MapReduce Hive Pig Shell Web-based UI24
  25. 25. Available Clients Interactive Clients Native JavaAPI REST Thrift Avro Batch Clients MapReduce Hive Pig Shell Web-based UIWe’ve already done25
  26. 26. Batch Clients – MapReduce framework HDFS:A distributed filesystem MapReduce:A distributedAlgorithm26
  27. 27. Batch Clients - MapReduce framework27
  28. 28. Batch Clients - MapReduce InputFormat and TableInputFormat28
  29. 29. Batch Clients - MapReduce Mapper and TableMapper29
  30. 30. Batch Clients - MapReduce Reducer and TableReducer30
  31. 31. Batch Clients - MapReduce OutputFormat and TableOutputFomrat31
  32. 32. Batch Clients - MapReduce Sample ch07/mapreduce.Driver How to run//in root account In hbase shell create‘testtable_mr’,‘data’//in hbase-user account cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07 Hadoop fs –copyFromLocal hadoop fs -copyFromLocal test-data.txt /tmp hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i/tmp/test-data.txt -c data:json How to use hadoop jar target/hbase-book-ch07-1.0.jar //will show usage32
  33. 33.  Apache Pig project A platform to analyze large amounts of data It has its own high-level query language, called Pig Latin uses an imperative programming style to formulate the stepsinvolved in transforming the input data to the final output Opposite of Hive’s declarative approach to emulate SQL (HiveQL) Combined with the power of Hadoop and the MapReduceframeworkBatch Clients - Pig33
  34. 34. Batch Clients – Pig Latin Sample--Load data from a file and write to HBaseraw = LOAD tutorial/data/excite-small.log USING PigStorage(t) AS (user, time, query);T = FOREACH raw GENERATE CONCAT(CONCAT(user, u0000), time), query;STORET INTO excite USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(colfam1:query);--Load records which just been written from HBaseR = LOAD excite USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(colfam1:query, -loadKey)AS (key: chararray, query: chararray);34
  35. 35. Shell We already used on course #1 hbase shell The majority of commands have a direct match with amethod provided by either the client or administrative API Grouped into five different categories, representing theirsemantic relationships35
  36. 36. Shell - General36
  37. 37. Shell – Data definition37
  38. 38. Shell – Data manipulation38
  39. 39. Shell – Tools39
  40. 40. Shell – Replication40
  41. 41. Web-based UI Master UI (http://${your_host}:8110/master.jsp) Main page UserTable page Zookeeper page Region Server UI Shared pages Local logs Thread Dump Log level41
  42. 42. 呼~終於完了…Orz42
  43. 43. 43