11. 11
Yahoo! Cloud Serving Benchmark
• 3 HBase nodes on Solaris zones
Throughput
Average
Response
Time
Max
Response
Time
Write 1808 writes/s 1.6 ms
0.02% > 1s
(due to region
splitting)
Read 9846 reads/s 0.3 ms 45ms
13. 13
Setting up Hadoop
• Supported Platforms
• Linux – best
• Solaris – ok. Just works
• Windows – not recommend
• Required Software
• JDK 1.6.x
• SSH
• Packages
• Cloudera
14. 14
Match Hadoop & HBase Version
Hadoop version HBase version Compatible?
0.20.3 release 0.90.x NO
0.20-append 0.90.x YES
0.20.5 release 0.90.x YES
0.21.0 release 0.90.x NO
0.22.x (in
development)
0.90.x NO
15. 15
Running Modes of Hadoop
• Standalone Operation
By default, run in a non-distributed mode, as a single Java process, be useful for
debugging.
• Pseudo-Distributed Operation
Run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs
in a separate Java process.
• Fully-Distributed Operation
Run in a cluster, the real production environment.
18. 18
Map Reduce Job
MapReduce is a programming model for data processing on Hadoop. It works by breaking the
processing into two phases: the map phase and the reduce phase. Each phase has key-value pairs
as input and output, the types of which may be chosen by the programmer.
• Mapper
A Mapper usually process data in single lines. Ignore the useless lines and collect useful information
from data into <Key, Value> pairs.
• Reducer
Receive the <Key, <Value1, Value2, …>> pairs from Mappers. Tabulate statistics data and write the
results into <Key, Value> pairs.
20. 20
Serialization in Hadoop
int IntWritable
long LongWritable
boolean BooleanWritable
byte ByteWritable
float FloatWritable
double DoubleWritable
String Text
null NullWritable
21. 21
Example: WordCount
Before we jump into the details, lets walk through an example MapReduce application to get a flavour for how they
work. WordCount is a simple application that counts the number of occurences of each word in a given input set.
public static class Map extends MapReduceBase implements Mapper < LongWritable, Text, Text, IntWritable > {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws
IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws
IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
input Key - Value
data format
output Key - Value
data format
must be extened and
implemented
put the word as Key, occurence as
Value into collector
input Key - Value data format match
the output format of Mapper
22. 22
MapReduce Job Configuration
Before running a MapReduce job, the following fields should be set:
• Mapper Class
The mapper class written by yourself to be run.
• Reducer Class
The reducer class written by yourself to be run.
• Input Format & Output Format
Define the format of all input and outputs. A large number of formats are supported in
Hadoop Library.
• OutputKeyClass & OutputValueClass
The data type class of the outputs that Mappers send to Reducers.
23. 23
Example: WordCount
Code to run the job
public class WordCount {
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
set output key & value class
set Mapper & Reducer class
set InputFormat & OutputFormat class
set input & output path
25. 25
Example in perf-log
Here shows an example of using MapReduce to analyze the log files in perf-log project.
In log files, There are two kinds of record type and each record is a single line.
Event Level
Request Level
26. 26
Example Using MapReduce
Here we use a MapReduce job to calculate the most used event everyday. All the event
records are collected in Map and the most used events are counted in Reduce.
event PLT_LOGIN
request record…
request record…
event PM_HOME
request record…
event PM_OPENFORM
request record…
request record…
request record…
event CDP_LOGOUT
request record…
request record…
request record…
.
.
.
(11/12, PLT_LOGIN)
(11/12, PM_HOME)
(11/12, PLT_LOGIN)
(11/12, PM_LOGOUT)
.
.
.
(11/13, CDP_LOGIN)
(11/13, CDP_LOGIN)
.
.
.
(11/12, [PLT_LOGIN,
PM_HOME,
PLT_LOGIN,
PLT_LOGOUT…])
(11/13, [CDP_LOGIN,
CDP_LOGIN…])
.
.
.
(11/12, PLT_LOGIN)
(11/13, PM_HOME)
(11/14, CDP_HOME)
(11/15, PLT_HOME)
.
.
.
Map Shuffle(auto) Reduce
28. 28
Table Structure
Tables in HBase have the following features:
1. They are large, sparsely populated tables.
2. Each row has a row key.
3. Table rows are sorted by row key, the table’s primary key.
By default, the sort is byte-ordered.
4. Row columns are grouped into column families. A table’s
column families must be specified up front as part of the
table schema definition and can not be changed.
5. New column family members can be added on demand.
29. 29
Table Structure
Here is the table structure of “perflog” in the pref-log
project:
column family column qualifer
event req
event_name event_id … req1 req1_id … req2 req2_id …
row1 xxx xxx … xxx xxx xxx xxx xxx …
row2 xxx xxx … xxx xxx xxx xxx xxx …
row key
column value
30. 30
Column Design
When designing column families and qualifiers, pay
attention to the following two points:
1. Keep the number of column families in your schema low.
HBase currently does not do well with anything above two or three column families.
2. Name the column families and qualifiers as short as
possible.
Operating on a table in HBase will cause thousands and thousands of compares on
column names. So short names will improve the performance.
31. 31
HBase Command Shell
HBase provides a command shell to operate the
system. Here are some example commands :
• Status
• Create
• List
• Put
• Scan
• Disable & Drop
33. 33
API to Operate Tables in HBase
There are four main methods to operate a table in
Hbase:
• Get
• Put
• Scan
• Delete
**Put and Scan are widely used in perf-log project.
34. 34
Using Put & Scan in HBase
When using put in HBase, notice:
• AutoFlush
• WAL on Puts
When using scan in HBase, notice:
• Scan Attribute Selection
• Scan Caching
35. 35
Using Scan with Filter
HBase filters are a powerful feature that can greatly enhance your effectiveness
working with data stored in tables. Four filters are used in perf-log project:
• SingleColumnValueFilter
You can use this filter when you have exactly one column that decides if an entire row
should be returned or not.
• RowFilter
This filter gives you the ability to filter data based on row keys.
• PageFilter
You paginate through rows by employing this filter.
• FilterList
Enable you to use several filters at the same time.
36. 36
Using Scan with Filter
• PageFilter
There is a fundamental issue with filtering on physically separate servers. Filters run on
different region servers in parallel and can not retain or communicate their current state
across those boundaries and each filter is required to scan at least up to pageCount
rows before ending the scan. Thus you may get more rows than really you want.
Filter filter = new PageFilter(5); // 5 is the pageCount
int totalRows = 0;
byte[] lastRow = null;
while (true) {
Scan scan = new Scan();
scan.setFilter(filter);
if (lastRow != null) {
scan.setStartRow(startRow);
}
ResultScanner scanner = table.getScanner(scan);
int localRows = 0;
Result result;
while ((result = scanner.next()) != null) {
totalRows++;
lastRow = result.getRow();
}
scanner.close();
if (localRows == 0) break;
}
37. 37
Using Scan with Filter
• FilterList
When using multiple filters with FilterList, pay attention that putting filters into FilterList
in different orders will generate different results.
pageFilter = new PageFilter(5);
singleColumnValueFilter = new SingleColumnValueFilter(“event”, “name”, CompareOp.EQUAL, “PLT_LOGIN”);
Take out the first 5 records and then
return the ones that their event name
values “PLT_LOGIN”.
filterList = new FilterList();
filterList.addFilter(pageFilter);
filterList.assFilter(singleColumnValueFilter);
Take out all the records that their
event name values “PLT_LOGIN” and
then return the first 5 of them.
filterList = new FilterList();
filterList.assFilter(singleColumnValueFilter);
filterList.addFilter(pageFilter);
38. 38
Map Reduce with HBase
Here is an example:
static class MyMapper<K, V> extends MapReduceBase implements Mapper<LongWritable, Text, K, V> {
private HTable table;
@override
public void configure(JobConf jc) {
supper.configure(jc);
try {
this.table = new HTable(HBaseConfiguration.create(), “table_name”);
} catch (IOException e) {
throw new RuntimeException(“Failed HTable construction”, e);
}
}
@override
public void close() throws IOException {
super.close();
table.close();
}
public void map(LongWritable key, Text value, OutputCollector<K, V> output, Reporter reporter) throws IOException {
Put p = new Put();
… // Set your own put.
table.put(p);
}
}
39. 39
Bulk Load
HBase includes several methods of loading data into tables. The most
straightforward method is to either use a MapReduce job, or use the normal
client APIs; however, these are not always the most efficient methods.
The bulk load feature uses a MapReduce job to output table data in HBase's
internal data format, and then directly loads the data files into a running
cluster. Using bulk load will use less CPU and network resources than simply
using the HBase API.
Data
Files
Map
Reduce
Job
HFiles HBase
40. 40
Bulk Load
Notic that we use HFileOutputFormat as the output fomat of the map
reduce job used to generate HFile. But the HFileOutputFormat
provided by HBase Library DO NOT support writing multiple column
families into HFile.
But a Multi-family supported version for HFileOutputFormat can be
found HERE:
https://review.cloudera.org/r/1272/diff/1/?file=17977#file17977line93
41. 41
Thank You, and Questions
See More About Hadoop & HBase:
http://confluence.successfactors.com/display/ENG/Programming+experience+on+Ha
doop+&+HBase
Editor's Notes
http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html
Data blocks are automatically replicated cross Data Nodes. Fault-tolerant. Default number of replicates is 3.
Share-nothing architecture. Add data nodes to increase disk capacity and I/O throughput.
Due to replicates and internal structure, the actual capacity will be less than 1/3 of raw data.
Name Node manages file system’s metadata.
SPOF. Need HA and backup.
Workload increases along with files/blocks number and operations. Potential bottleneck.
Job Tracker manages Map/Reduce job execution. Often runs along with Name Node.
Job is split into tasks. Task Tracker manages task execution. Runs on Data Nodes.
Natural distributed parallel computing architecture.
Web console to monitor job/task.
“hadoop” command to run jobs, manage nodes and file system. Specially, “hadoop fs” provide many unix-like commands to access the HDFS.
HMaster manages region servers. It normally runs with Hadoop NameNode together.
Data are sorted by row key and split into regions, which are managed by region server. Region servers often run on data nodes.
Each region includes one MemStore and several store files.
Data writes are recorded into “Write-Ahead-Log” (HLog, but by default it is flushed to disk every 1 second), and written into MemStore.
When Memstore becomes full, it is flushed to HDFS as a store file.
Full operations: get, put, scan, delete.
GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.
Required software for Linux and Windows include:
Java 1.6.x, preferably from Sun, must be installed.
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Additional requirements for Windows include:
Cygwin - Required for shell support in addition to the required software above.
1. Hadoop Version
The most used Hadoop version
0.20.203.X
the current stable version, DO NOT contain the entire new API of MapReduce, DO NOT have sync attribute on HDFS, currently used in perf-log project.
0.20.205.X
the current beta version, DO NOT contain the entire new API of MapReduce, has sync attribute on HDFS.
0.21.X
the newest version, provide the entire new API of MapReduce, unstable, unsupported, does not include security, can not run with HBase.
2. Running HBase on Hadoop
The newest version of HBase is 0.90.x. This version of HBase will only run on Hadoop 0.20.x. It will not run on hadoop 0.21.x (nor 0.22.x). HBase will lose data unless it is running on an HDFS that has a durable sync. Hadoop 0.20.2 and Hadoop 0.20.203.0 DO NOT have this attribute. You choose one of the following solutions:
HBase bundles an instance of the hadoop jar under its lib directory. The bundled Hadoop was made from the Apache branch-0.20-append branch at the time of the HBase‘s release and has the sync attribuate. Replace the hadoop jar you are running on your cluster with the hadoop jar found in the HBase lib directory.
You could use the Cloudera or MapR distributions. Cloudera' CDH3 is Apache Hadoop 0.20.x plus patches including all of the 0.20-append additions needed to add a durable sync. CHD3 contains both Hadoop and HBase in its production.
Just use Hadoop 0.20.205.0. Since this release includes a merge of append/hsynch/hflush capabilities from 0.20-append branch, it can support HBase in secure mode. But it is a beta version.
In Hadoop MapReduce, interprocess communication between nodes in the system is implemented using remote procedure calls (RPCs). The RPC protocol uses serialization to render the message into a binary stream to be sent to the remote node, which then deserializes the binary stream into the original message.
Hadoop uses its own serialization format, Writables, which is certainly compact and fast (but not so easy to extend, or use from languages other than Java).
Hadoop Library provides many basic data types to be used in MapReduce, and you can also implement you own data structure according to the interfaces of Writables.
Status
Show the status of all nodes in HBase.
Create
Create a table.
List
List all the existing tables.
Put
Put either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists).
Scan
Scan allows iteration over multiple rows for specified attributes of a certain table.
Disable & Drop
First do disable, then do drop when deleting a table.
Get
Get returns attributes for a specified row.
Put
Put either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists).
Scan
Scan allows iteration over multiple rows for specified attributes. It can be used with filters and provides powerful query functions on HBase.
Delete
Delete removes a row from a table.
When using put in HBase, notice:
AutoFlush
AutoFlush meets the request of real time, and you can immediately see the row after it is added into the table. But when performing a lot of Puts, make sure that setAutoFlush is set to false on your HTable instance. Otherwise, the Puts will be sent one at a time to the RegionServer. If autoFlush = false, these messages are not sent until the write-buffer is filled, so it can reduce the number of client RPC calls. To explicitly flush the messages, call flushCommits. Calling close on the HTable instance will invoke flushCommits.
WAL on Puts
WAL means Write Ahead Log. Turning this off means that the RegionServer will not write the Put to the Write Ahead Log, only into the memstore and it will improve the performance. HOWEVER turn it off is not recommended because if there is a RegionServer failure there will be data loss.
When using scan in HBase, notice:
Scan Attribute Selection
Whenever a Scan is used to process large numbers of rows, be aware of which attributes are selected. Call the scan.addFamily to appoint the specific column values you want rather than get the entire row, because attribute over-selection is a non-trivial performance penalty over large datasets.
Scan Caching
When peforming a large number of Scans, make sure that the input Scan instance has setCaching set to something greater than the default (which is 1). Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed. There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn't always better.
HBase can be both the input and output of a Map Reduce Job. In the perf-log project, we use HBase as the output of the MR job and it is best to obey the following rules:
Get one HTable instance
There is a cost instantiating an HTable, so if you do this for each insert, you may have a negative impact on performance. Hence our setup of HTable in the configure() step.
Skip the Reducer if possible
When writing a lot of data to an HBase table from a MR job and specifically where Puts are being emitted from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase.