Sdec2011 shashank-introducing hadoop
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Sdec2011 shashank-introducing hadoop

on

  • 707 views

 

Statistics

Views

Total Views
707
Views on SlideShare
707
Embed Views
0

Actions

Likes
0
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Sdec2011 shashank-introducing hadoop Presentation Transcript

  • 1. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC. Copyright for all other & referenced work is retained by their respective owners.Introducing HadoopMastering Hadoop Map-reduce for Data AnalysisShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
  • 2. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersWhat is Hadoop
  • 3. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersHDFS Architecture
  • 4. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersNamenode/Datanode, JobTracker/TaskTracker
  • 5. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMapReduce
  • 6. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersZK Namespace
  • 7. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersEssential HBase Schema
  • 8. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMulti-dimensional View
  • 9. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersA Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
  • 10. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersArchitectural View (HBase)
  • 11. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe Persistence Mechanism
  • 12. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe underlying file format
  • 13. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstalling & Setting up Hadoop• Required software: Java 1.6.x, ssh + sshd• Download• Install• Configure • single-node • pseudo-distributed • cluster
  • 14. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersDownload• Source: http://hadoop.apache.org/• Version: • 0.20.203.x -- current stable • 0.20.x -- previous stable• Includes • Hadoop Common -- common utilities, HDFS, MapReduce
  • 15. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstall• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz• Move & Create Symbolic Link • ln -s hadoop-0.20.203.0 hadoop• On Windows • http://developer.yahoo.com/hadoop/tutorial/module3.html
  • 16. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- single-node• Edit: conf/hadoop-env.sh • Set JAVA_HOME• Default configuration is single-node• Start bin/hadoop (for command options)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 17. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- pseduo-distributed• Edit: conf/core-site.xml (configure HDFS daemon)• Edit: conf/hdfs-site.xml (configure HDFS replication factor)• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)• Enable ssh to localhost (without passphrase)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 18. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersStart Hadoop• Format HDFS: bin/hadoop namenode -format• Start all daemons: bin/start-all.sh• Verify logs• Browse the web interface: • Namenode: http://localhost:50070/ • JobTracker: http://localhost:50030/
  • 19. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersTake Hadoop for a test-drive• Run examples (hadoop-examples-0.20.203.0.jar)• Grep using regular expressions • Copy files to HDFS: bin/hadoop fs -put bin input • Grep for files which have text beginning with ‘start’ • Verify output on HDFS: bin/hadoop fs -cat output/* • Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
  • 20. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- cluster• References:• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation)• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN)• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
  • 21. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersQuestions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com