Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop

2,614 views

Published on

It's a simple concept & installation of Hadoop. Made during the CSA course in NTHU, June 2011.

Published in: Technology, Education

Hadoop

  1. 1. 陳柏翰 CS13 http://about.me/sihalonComputer System Administration 2011
  2. 2. 只有天上在更無山與齊舉頭紅日近回首白雲低宋 寇準(華山)
  3. 3. Outlines 現有雲端服務 Hadoop 背後概念 Hadoop 單節點安裝 簡單範例
  4. 4. 什麼是雲端? Gmail YouTube Google Docs…
  5. 5. 簡單來說即凡能透過 網際網路 能享受到的 應用服務
  6. 6. 現有的雲端運算服務• Windows• Google• Amazon• Yahoo 他們的背後?• Plurk• ……
  7. 7. HadoopHadoop is a software platform that lets one easily write and runapplications that process vast amounts of data
  8. 8. What is Hadoop ? 一種開放源碼雲端平台(框架) 巨量資料計算解決方案 穩定可擴充
  9. 9. Yahoo : Hadoop Apache 項目,Yahoo 資助、開發與運用  2006年 開始參與 Hadoop。  2008年 2千臺伺服器。 執行超過1萬個Hadoop虛擬機器。 5 Petabytes的網頁內容 分析1兆個網路連結
  10. 10. Feature• 巨量 – 擁有儲存與處理大量資料的能力• 經濟 – 可以用在由一般PC所架設的叢集環境內• 效率 – 平行分散檔案的處理以得到快速的回應• 可靠 – 當某節點發生錯誤,系統能即時自動的取 得備份資料及佈署運算資源
  11. 11. 架構 HDFS - Hadoop 專案中的檔案系統 MapReduce - 平行處理P級別以上的資料集 Hbase - 巨量資料庫系統
  12. 12. Divide and Conquer 演算法(Algorithms):  Divide and Conquer  分而治之 在程式設計的軟體架構內,適合使用在大 規模數據的運算中
  13. 13. Divide and Conquer範例一:方格法求面積 範例二:鋪滿 L 形磁磚
  14. 14. Divide and ConquerI am a tiger, you are also a tiger a,2 also,1 I,1 a,2 am,1 am,1 a, 1 also,1 are,1map a,1 am,1 a,1 reduce I,1 also,1 are,1 tiger,2 tiger,1 am,1 you,1 are,1 you,1map are,1 I,1 tiger,1 I, 1 tiger,1 tiger,2 also,1 you,1 reduce you,1map a, 1 tiger,1
  15. 15. 各種身份
  16. 16. Building Hadoop Namenode JobTrackerData Task Data Task Data Task Java Java Java Linuux Linuux Linuux Node1 Node2 Node3
  17. 17. 一起飛上雲端吧 - Demo Time
  18. 18. Supported Platforms GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes. Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.
  19. 19. Environment Ubuntu Linux 10.04 LTS Hadoop 0.20.2 - released on February 2010
  20. 20. Required Software JavaTM 1.6.x, preferably from Sun, must be installed. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
  21. 21. Sun Java 61. Add repository to your apt repositories:2. Update the source list $ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" $ sudo apt-get update
  22. 22. Sun Java 63. Install sun-java6-jdk4. Select Sun’s Java as the default on yourmachine. $ sudo apt-get install sun-java6-jdk $ sudo update-java-alternatives -s java-6-sun
  23. 23. Sun Java 65. Check whether it’s success ! $ java -version
  24. 24. Configuring SSH( You can find ssh software in Software Center by searhing “ssh”)
  25. 25. Configuring SSH1. generate an SSH key for current user.2. enable SSH access to your local machinewith this newly created key. $ ssh-keygen -t rsa -P “” $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ( cat test1.txt >> test2.txt 轉向附加)
  26. 26. Configuring SSH3. Test by connecting to your local machine ( You should install ssh first ) $ ssh localhost
  27. 27. Disabling IPv6 $ sudo joe /etc/sysctl.conf #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 $ reboot
  28. 28. Disabling IPv6check whether IPv6 is enabled on your machine ( 0 means enabled, 1 means disabled ) $ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
  29. 29. Hadoop InstallationDownload Hadoop from the Apache Mirrorshttp://www.apache.org/dyn/closer.cgi/hadoop/core $ cd /home/csa $ wget http://apache.ntu.edu.tw/hadoop/core/ha doop-0.20.2/hadoop-0.20.2.tar.gz
  30. 30. Hadoop Installation $ sudo tar xzf hadoop-0.20.2.tar.gz $ sudo mv hadoop-0.20.2 hadoop
  31. 31. Hadoop Package Topology bin / 各執行檔:如 start-all.sh 、stop-all.sh 、 hadoop conf / 預設的設定檔目錄:設定環境變數、工作節點 slaves。 docs / Hadoop API 與說明文件。 contrib / 額外有用的功能套件,如:eclipse的擴充外掛。 lib / 開發 hadoop 專案或編譯 hadoop 程式所需要的所 有函式庫,如:jetty、kfs。 src / Hadoop 的原始碼。 build / 開發Hadoop 編譯後的資料夾。 logs / 預設的日誌檔所在目錄。(可更改路徑)
  32. 32. Update to who want to use Hadoop $ sudo joe /home/csa/.bashrc # Set Hadoop-related environment variables export HADOOP_HOME=/home/csa/hadoop # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin
  33. 33. ConfigurationChange the Sun JDK/JRE 6 directory $ joe /hadoop/conf/hadoop-env.sh # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24
  34. 34. Configuration In file conf/core-site.xml In file conf/core-site.xml In file conf/mapred-site.xml
  35. 35. <!-- In: conf/core-site.xml --><property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary irectories.</description></property><property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. </description></property>
  36. 36. <!-- In: conf/mapred-site.xml --><property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description> For MapReduce job tracker </description></property>
  37. 37. <!-- In: conf/hdfs-site.xml --><property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number ofreplications can be specified when the file is created. The default is usedif replication is not specified in create time. </description></property>
  38. 38. Formatting the name node! $ /home/csa/bin/hadoop namenode -format
  39. 39. Starting your single-node cluster $ /home/csa/hadoop/bin/start-all.sh $ jps
  40. 40. JpsJobTrackerTaskTrackerNameNodeDataNode
  41. 41. Congratulation! You just setup a single-node cluster
  42. 42. Hadoop Web Interfaces http://localhost:50030/– web UI for MapReduce job tracker(s) http://localhost:50060/– web UI for task tracker(s) http://localhost:50070/– web UI for HDFS name node(s)
  43. 43. 常用指令 操作 hadoop 檔案系統指令 $ bin/hadoop fs -Instruction …
  44. 44. MapReduce Demo WordCount
  45. 45. Divide and ConquerI am a tiger, you are also a tiger a,2 also,1 I,1 a,2 am,1 am,1 a, 1 also,1 are,1map a,1 am,1 a,1 reduce I,1 also,1 are,1 tiger,2 tiger,1 am,1 you,1 are,1 you,1map are,1 I,1 tiger,1 I, 1 tiger,1 tiger,2 also,1 you,1 reduce you,1map a, 1 tiger,1
  46. 46. Why wordcount ? Google Facebook
  47. 47. 參考資料來源 Thanks for … NCHC Cloud Computing Research Group ( Link here ! )
  48. 48. Thanks for your listening

×