Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
SDEC2011 Introducing Hadoop
1. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC.
Copyright for all other & referenced work is retained by their respective owners.
Introducing Hadoop
Mastering Hadoop Map-reduce for Data Analysis
Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
2. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
What is Hadoop
3. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
HDFS Architecture
4. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Namenode/Datanode, JobTracker/TaskTracker
5. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
MapReduce
6. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
ZK Namespace
7. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Essential HBase Schema
8. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Multi-dimensional View
9. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
A Map/Hash View
•{
• "row_key_1" : { "name" : {
• "first_name" : "Jolly", "last_name" : "Goodfellow"
• } } },
• "location" : { "zip": "94301" },
10. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Architectural View (HBase)
11. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
The Persistence Mechanism
12. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
The underlying file format
13. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Installing & Setting up Hadoop
• Required software: Java 1.6.x, ssh + sshd
• Download
• Install
• Configure
• single-node
• pseudo-distributed
• cluster
14. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Download
• Source: http://hadoop.apache.org/
• Version:
• 0.20.203.x -- current stable
• 0.20.x -- previous stable
• Includes
• Hadoop Common -- common utilities, HDFS, MapReduce
15. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Install
• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz
• Move & Create Symbolic Link
• ln -s hadoop-0.20.203.0 hadoop
• On Windows
• http://developer.yahoo.com/hadoop/tutorial/module3.html
16. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Configure -- single-node
• Edit: conf/hadoop-env.sh
• Set JAVA_HOME
• Default configuration is single-node
• Start bin/hadoop (for command options)
• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
single_node_setup.html
17. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Configure -- pseduo-distributed
• Edit: conf/core-site.xml (configure HDFS daemon)
• Edit: conf/hdfs-site.xml (configure HDFS replication factor)
• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)
• Enable ssh to localhost (without passphrase)
• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
single_node_setup.html
18. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Start Hadoop
• Format HDFS: bin/hadoop namenode -format
• Start all daemons: bin/start-all.sh
• Verify logs
• Browse the web interface:
• Namenode: http://localhost:50070/
• JobTracker: http://localhost:50030/
19. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Take Hadoop for a test-drive
• Run examples (hadoop-examples-0.20.203.0.jar)
• Grep using regular expressions
• Copy files to HDFS: bin/hadoop fs -put bin input
• Grep for files which have text beginning with ‘start’
• Verify output on HDFS: bin/hadoop fs -cat output/*
• Copy output to local filesystem & verify: bin/hadoop fs -get output output
&& cat output/*
20. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Configure -- cluster
• References:
• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html
(official documentation)
• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a
Hadoop Cluster. Source: YDN)
• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
21. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
All other & referenced work is copyrighted to their respective owners
Questions?
• blog: shanky.org | twitter: @tshanky
• st@treasuryofideas.com