Playing with Hadoop 2013-10-31

548 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
548
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Playing with Hadoop 2013-10-31

  1. 1. Visma Consulting 2013-10-30 Playing with Hadoop Søren Lund (slu) slu@369.dk
  2. 2. Needed to run Hadoop  You need the following to run Hadoop   Java JDK   Linux server Hadoop tarball I'm using the following   JDK 1.6.24 64 bit   Ubuntu 12.04 LTS 64 bit Hadoop 1.0.4 Could not get JDK7 + Hadoop 2.2 to work
  3. 3. Installing Hadoop
  4. 4. Install Java
  5. 5. Setup Java home and path
  6. 6. Add hadoop user
  7. 7. Install Hadoop and add to path
  8. 8. Create SSH key for hadoop user
  9. 9. Accept SSH key
  10. 10. Disable IPv6
  11. 11. Reboot and check installation
  12. 12. Running an example job
  13. 13. Calculate Pi
  14. 14. Estimated value of Pi
  15. 15. Three modes of operation  Pi was calculated in Local standalone mode    it is the default mode (i.e. no configuration needed) all components of Hadoop run in a single JVM Pseudo-distributed mode   components communicate using sockets   a separate JVM is spawned for each component it is a minicluster on a single host Fully distributed mode  components are spread across multiple machines
  16. 16. Configuring for pseudo distributed mode
  17. 17. Create base directory for HDFS
  18. 18. Set JAVA_HOME
  19. 19. Edit core-site.xml
  20. 20. Edit hdfs-site.xml
  21. 21. Edit mapred-site.xml
  22. 22. Log out and log on as hadoop
  23. 23. Format HDFS
  24. 24. Start HDFS
  25. 25. Start Map Reduce
  26. 26. Create home directory & test data
  27. 27. Running Word Count
  28. 28. First let's try the example jar
  29. 29. Inspect the result
  30. 30. Compile and run our own jar https://gist.github.com/soren/7213273
  31. 31. Inspect result
  32. 32. Run improved version https://gist.github.com/soren/7213453
  33. 33. Inspect (improved) result
  34. 34. The Web User Interface  HDFS   MapReduce   http://localhost:8070/ File Browser   http://localhost:8030/ http://localhost:8075/browseDirectory.jsp?namenodeInfoPort Note: this is with port forwarding in VirtualBox  50030 → 8030, 50070 → 8070, 50075 → 8075
  35. 35. Now you can go play with Hadoop...

×