SDEC2011 Introducing Hadoop
Upcoming SlideShare
Loading in...5
×
 

SDEC2011 Introducing Hadoop

on

  • 1,143 views

 

Statistics

Views

Total Views
1,143
Slideshare-icon Views on SlideShare
1,143
Embed Views
0

Actions

Likes
0
Downloads
41
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SDEC2011 Introducing Hadoop SDEC2011 Introducing Hadoop Presentation Transcript

    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC. Copyright for all other & referenced work is retained by their respective owners.Introducing HadoopMastering Hadoop Map-reduce for Data AnalysisShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersWhat is Hadoop
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersHDFS Architecture
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersNamenode/Datanode, JobTracker/TaskTracker
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMapReduce
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersZK Namespace
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersEssential HBase Schema
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMulti-dimensional View
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersA Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersArchitectural View (HBase)
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe Persistence Mechanism
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe underlying file format
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstalling & Setting up Hadoop• Required software: Java 1.6.x, ssh + sshd• Download• Install• Configure • single-node • pseudo-distributed • cluster
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersDownload• Source: http://hadoop.apache.org/• Version: • 0.20.203.x -- current stable • 0.20.x -- previous stable• Includes • Hadoop Common -- common utilities, HDFS, MapReduce
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstall• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz• Move & Create Symbolic Link • ln -s hadoop-0.20.203.0 hadoop• On Windows • http://developer.yahoo.com/hadoop/tutorial/module3.html
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- single-node• Edit: conf/hadoop-env.sh • Set JAVA_HOME• Default configuration is single-node• Start bin/hadoop (for command options)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- pseduo-distributed• Edit: conf/core-site.xml (configure HDFS daemon)• Edit: conf/hdfs-site.xml (configure HDFS replication factor)• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)• Enable ssh to localhost (without passphrase)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersStart Hadoop• Format HDFS: bin/hadoop namenode -format• Start all daemons: bin/start-all.sh• Verify logs• Browse the web interface: • Namenode: http://localhost:50070/ • JobTracker: http://localhost:50030/
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersTake Hadoop for a test-drive• Run examples (hadoop-examples-0.20.203.0.jar)• Grep using regular expressions • Copy files to HDFS: bin/hadoop fs -put bin input • Grep for files which have text beginning with ‘start’ • Verify output on HDFS: bin/hadoop fs -cat output/* • Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- cluster• References:• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation)• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN)• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
    • Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersQuestions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com