Your SlideShare is downloading. ×
0
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sdec2011 Introducing Hadoop

1,338

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,338
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
232
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC. Copyright for all other & referenced work is retained by their respective owners.Introducing HadoopMastering Hadoop Map-reduce for Data AnalysisShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
  • 2. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersWhat is Hadoop
  • 3. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersHDFS Architecture
  • 4. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersNamenode/Datanode, JobTracker/TaskTracker
  • 5. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMapReduce
  • 6. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersZK Namespace
  • 7. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersEssential HBase Schema
  • 8. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersMulti-dimensional View
  • 9. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersA Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
  • 10. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersArchitectural View (HBase)
  • 11. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe Persistence Mechanism
  • 12. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersThe underlying file format
  • 13. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstalling & Setting up Hadoop• Required software: Java 1.6.x, ssh + sshd• Download• Install• Configure • single-node • pseudo-distributed • cluster
  • 14. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersDownload• Source: http://hadoop.apache.org/• Version: • 0.20.203.x -- current stable • 0.20.x -- previous stable• Includes • Hadoop Common -- common utilities, HDFS, MapReduce
  • 15. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersInstall• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz• Move & Create Symbolic Link • ln -s hadoop-0.20.203.0 hadoop• On Windows • http://developer.yahoo.com/hadoop/tutorial/module3.html
  • 16. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- single-node• Edit: conf/hadoop-env.sh • Set JAVA_HOME• Default configuration is single-node• Start bin/hadoop (for command options)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 17. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- pseduo-distributed• Edit: conf/core-site.xml (configure HDFS daemon)• Edit: conf/hdfs-site.xml (configure HDFS replication factor)• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)• Enable ssh to localhost (without passphrase)• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 18. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersStart Hadoop• Format HDFS: bin/hadoop namenode -format• Start all daemons: bin/start-all.sh• Verify logs• Browse the web interface: • Namenode: http://localhost:50070/ • JobTracker: http://localhost:50030/
  • 19. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersTake Hadoop for a test-drive• Run examples (hadoop-examples-0.20.203.0.jar)• Grep using regular expressions • Copy files to HDFS: bin/hadoop fs -put bin input • Grep for files which have text beginning with ‘start’ • Verify output on HDFS: bin/hadoop fs -cat output/* • Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
  • 20. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersConfigure -- cluster• References:• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation)• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN)• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
  • 21. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective ownersQuestions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com

×