Hadoop
Upcoming SlideShare
Loading in...5
×
 

Hadoop

on

  • 1,941 views

It's a simple concept & installation of Hadoop. Made during the CSA course in NTHU, June 2011.

It's a simple concept & installation of Hadoop. Made during the CSA course in NTHU, June 2011.

Statistics

Views

Total Views
1,941
Views on SlideShare
1,941
Embed Views
0

Actions

Likes
1
Downloads
51
Comments
1

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • How can i search files in Hadoop?
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop Hadoop Presentation Transcript

  • 陳柏翰 CS13 http://about.me/sihalonComputer System Administration 2011
  • 只有天上在更無山與齊舉頭紅日近回首白雲低宋 寇準(華山)
  • Outlines 現有雲端服務 Hadoop 背後概念 Hadoop 單節點安裝 簡單範例
  • 什麼是雲端? Gmail YouTube Google Docs…
  • 簡單來說即凡能透過 網際網路 能享受到的 應用服務
  • 現有的雲端運算服務• Windows• Google• Amazon• Yahoo 他們的背後?• Plurk• ……
  • HadoopHadoop is a software platform that lets one easily write and runapplications that process vast amounts of data
  • What is Hadoop ? 一種開放源碼雲端平台(框架) 巨量資料計算解決方案 穩定可擴充
  • Yahoo : Hadoop Apache 項目,Yahoo 資助、開發與運用  2006年 開始參與 Hadoop。  2008年 2千臺伺服器。 執行超過1萬個Hadoop虛擬機器。 5 Petabytes的網頁內容 分析1兆個網路連結
  • Feature• 巨量 – 擁有儲存與處理大量資料的能力• 經濟 – 可以用在由一般PC所架設的叢集環境內• 效率 – 平行分散檔案的處理以得到快速的回應• 可靠 – 當某節點發生錯誤,系統能即時自動的取 得備份資料及佈署運算資源
  • 架構 HDFS - Hadoop 專案中的檔案系統 MapReduce - 平行處理P級別以上的資料集 Hbase - 巨量資料庫系統
  • Divide and Conquer 演算法(Algorithms):  Divide and Conquer  分而治之 在程式設計的軟體架構內,適合使用在大 規模數據的運算中
  • Divide and Conquer範例一:方格法求面積 範例二:鋪滿 L 形磁磚
  • Divide and ConquerI am a tiger, you are also a tiger a,2 also,1 I,1 a,2 am,1 am,1 a, 1 also,1 are,1map a,1 am,1 a,1 reduce I,1 also,1 are,1 tiger,2 tiger,1 am,1 you,1 are,1 you,1map are,1 I,1 tiger,1 I, 1 tiger,1 tiger,2 also,1 you,1 reduce you,1map a, 1 tiger,1
  • 各種身份
  • Building Hadoop Namenode JobTrackerData Task Data Task Data Task Java Java Java Linuux Linuux Linuux Node1 Node2 Node3
  • 一起飛上雲端吧 - Demo Time
  • Supported Platforms GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes. Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.
  • Environment Ubuntu Linux 10.04 LTS Hadoop 0.20.2 - released on February 2010
  • Required Software JavaTM 1.6.x, preferably from Sun, must be installed. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
  • Sun Java 61. Add repository to your apt repositories:2. Update the source list $ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" $ sudo apt-get update
  • Sun Java 63. Install sun-java6-jdk4. Select Sun’s Java as the default on yourmachine. $ sudo apt-get install sun-java6-jdk $ sudo update-java-alternatives -s java-6-sun
  • Sun Java 65. Check whether it’s success ! $ java -version
  • Configuring SSH( You can find ssh software in Software Center by searhing “ssh”)
  • Configuring SSH1. generate an SSH key for current user.2. enable SSH access to your local machinewith this newly created key. $ ssh-keygen -t rsa -P “” $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ( cat test1.txt >> test2.txt 轉向附加)
  • Configuring SSH3. Test by connecting to your local machine ( You should install ssh first ) $ ssh localhost
  • Disabling IPv6 $ sudo joe /etc/sysctl.conf #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 $ reboot
  • Disabling IPv6check whether IPv6 is enabled on your machine ( 0 means enabled, 1 means disabled ) $ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
  • Hadoop InstallationDownload Hadoop from the Apache Mirrorshttp://www.apache.org/dyn/closer.cgi/hadoop/core $ cd /home/csa $ wget http://apache.ntu.edu.tw/hadoop/core/ha doop-0.20.2/hadoop-0.20.2.tar.gz
  • Hadoop Installation $ sudo tar xzf hadoop-0.20.2.tar.gz $ sudo mv hadoop-0.20.2 hadoop
  • Hadoop Package Topology bin / 各執行檔:如 start-all.sh 、stop-all.sh 、 hadoop conf / 預設的設定檔目錄:設定環境變數、工作節點 slaves。 docs / Hadoop API 與說明文件。 contrib / 額外有用的功能套件,如:eclipse的擴充外掛。 lib / 開發 hadoop 專案或編譯 hadoop 程式所需要的所 有函式庫,如:jetty、kfs。 src / Hadoop 的原始碼。 build / 開發Hadoop 編譯後的資料夾。 logs / 預設的日誌檔所在目錄。(可更改路徑)
  • Update to who want to use Hadoop $ sudo joe /home/csa/.bashrc # Set Hadoop-related environment variables export HADOOP_HOME=/home/csa/hadoop # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin
  • ConfigurationChange the Sun JDK/JRE 6 directory $ joe /hadoop/conf/hadoop-env.sh # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24
  • Configuration In file conf/core-site.xml In file conf/core-site.xml In file conf/mapred-site.xml
  • <!-- In: conf/core-site.xml --><property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary irectories.</description></property><property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. </description></property>
  • <!-- In: conf/mapred-site.xml --><property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description> For MapReduce job tracker </description></property>
  • <!-- In: conf/hdfs-site.xml --><property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number ofreplications can be specified when the file is created. The default is usedif replication is not specified in create time. </description></property>
  • Formatting the name node! $ /home/csa/bin/hadoop namenode -format
  • Starting your single-node cluster $ /home/csa/hadoop/bin/start-all.sh $ jps
  • JpsJobTrackerTaskTrackerNameNodeDataNode
  • Congratulation! You just setup a single-node cluster
  • Hadoop Web Interfaces http://localhost:50030/– web UI for MapReduce job tracker(s) http://localhost:50060/– web UI for task tracker(s) http://localhost:50070/– web UI for HDFS name node(s)
  • 常用指令 操作 hadoop 檔案系統指令 $ bin/hadoop fs -Instruction …
  • MapReduce Demo WordCount
  • Divide and ConquerI am a tiger, you are also a tiger a,2 also,1 I,1 a,2 am,1 am,1 a, 1 also,1 are,1map a,1 am,1 a,1 reduce I,1 also,1 are,1 tiger,2 tiger,1 am,1 you,1 are,1 you,1map are,1 I,1 tiger,1 I, 1 tiger,1 tiger,2 also,1 you,1 reduce you,1map a, 1 tiger,1
  • Why wordcount ? Google Facebook
  • 參考資料來源 Thanks for … NCHC Cloud Computing Research Group ( Link here ! )
  • Thanks for your listening