0
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC.                        ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                        A...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                        A...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                        A...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                      All...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                      All...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                        A...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                       Al...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC                         ...
Upcoming SlideShare
Loading in...5
×

Sdec2011 shashank-introducing hadoop

574

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
574
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Sdec2011 shashank-introducing hadoop"

  1. 1. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC. Copyright for all other & referenced work is retained by their respective owners.Introducing HadoopMastering Hadoop Map-reduce for Data AnalysisShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
  2. 2. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersWhat is Hadoop
  3. 3. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersHDFS Architecture
  4. 4. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersNamenode/Datanode, JobTracker/TaskTracker
  5. 5. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMapReduce
  6. 6. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersZK Namespace
  7. 7. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersEssential HBase Schema
  8. 8. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMulti-dimensional View
  9. 9. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersA Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
  10. 10. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersArchitectural View (HBase)
  11. 11. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe Persistence Mechanism
  12. 12. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe underlying file format
  13. 13. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstalling & Setting up Hadoop• Required software: Java 1.6.x, ssh + sshd• Download• Install• Configure • single-node • pseudo-distributed • cluster
  14. 14. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersDownload• Source: http://hadoop.apache.org/• Version: • 0.20.203.x -- current stable • 0.20.x -- previous stable• Includes • Hadoop Common -- common utilities, HDFS, MapReduce
  15. 15. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstall• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz• Move & Create Symbolic Link • ln -s hadoop-0.20.203.0 hadoop• On Windows • http://developer.yahoo.com/hadoop/tutorial/module3.html
  16. 16. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- single-node• Edit: conf/hadoop-env.sh • Set JAVA_HOME• Default configuration is single-node• Start bin/hadoop (for command options)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  17. 17. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- pseduo-distributed• Edit: conf/core-site.xml (configure HDFS daemon)• Edit: conf/hdfs-site.xml (configure HDFS replication factor)• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)• Enable ssh to localhost (without passphrase)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  18. 18. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersStart Hadoop• Format HDFS: bin/hadoop namenode -format• Start all daemons: bin/start-all.sh• Verify logs• Browse the web interface: • Namenode: http://localhost:50070/ • JobTracker: http://localhost:50030/
  19. 19. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersTake Hadoop for a test-drive• Run examples (hadoop-examples-0.20.203.0.jar)• Grep using regular expressions • Copy files to HDFS: bin/hadoop fs -put bin input • Grep for files which have text beginning with ‘start’ • Verify output on HDFS: bin/hadoop fs -cat output/* • Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
  20. 20. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- cluster• References:• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation)• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN)• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
  21. 21. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersQuestions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×