SlideShare a Scribd company logo
陳柏翰
                CS13 http://about.me/sihalon
Computer System Administration 2011
只有天上在
更無山與齊
舉頭紅日近
回首白雲低

宋 寇準(華山)
Outlines
 現有雲端服務
 Hadoop 背後概念
 Hadoop 單節點安裝
 簡單範例
什麼是雲端?
 Gmail
 YouTube
 Google   Docs
…
簡單來說

即

凡能透過 網際網路

    能享受到的   應用服務
現有的雲端運算服務
• Windows
• Google
• Amazon
• Yahoo     他們的背後?
• Plurk
• ……
Hadoop
Hadoop is a software platform that lets one easily write and run
applications that process vast amounts of data
What is Hadoop ?

   一種開放源碼雲端平台(框架)
   巨量資料計算解決方案
   穩定可擴充
Yahoo : Hadoop
   Apache 項目,Yahoo 資助、開發與運用
     2006年 開始參與 Hadoop。
     2008年 2千臺伺服器。
          執行超過1萬個Hadoop虛擬機器。
          5 Petabytes的網頁內容
          分析1兆個網路連結
Feature
•   巨量
    – 擁有儲存與處理大量資料的能力

•   經濟
    – 可以用在由一般PC所架設的叢集環境內

•   效率
    – 平行分散檔案的處理以得到快速的回應

•   可靠
    – 當某節點發生錯誤,系統能即時自動的取
    得備份資料及佈署運算資源
架構
 HDFS
 - Hadoop 專案中的檔案系統

 MapReduce
 - 平行處理P級別以上的資料集

 Hbase
 - 巨量資料庫系統
Divide and Conquer
   演算法(Algorithms):
     Divide and Conquer
     分而治之


   在程式設計的軟體架構內,適合使用在大
    規模數據的運算中
Divide and Conquer

範例一:方格法求面積   範例二:鋪滿 L 形磁磚
Divide and Conquer
I am a tiger, you are also a tiger                a,2
                                                  also,1
       I,1                              a,2       am,1
       am,1          a, 1               also,1    are,1
map    a,1                              am,1
                     a,1       reduce             I,1
                     also,1             are,1
                                                  tiger,2
       tiger,1       am,1
       you,1         are,1                        you,1
map
       are,1         I,1
                     tiger,1            I, 1
                     tiger,1            tiger,2
       also,1        you,1     reduce   you,1
map    a, 1
       tiger,1
各種身份
Building Hadoop
  Namenode


  JobTracker



Data            Task   Data           Task   Data          Task



       Java                   Java                  Java


       Linuux                 Linuux                Linuux


       Node1                  Node2                 Node3
一起飛上雲端吧

     - Demo Time
Supported Platforms
 GNU/Linux is supported as a
  development and production platform.
  Hadoop has been demonstrated on
  GNU/Linux clusters with 2000 nodes.
 Win32 is supported as a development
  platform. Distributed operation has not
  been well tested on Win32, so it is not
  supported as a production platform.
Environment
 Ubuntu Linux 10.04 LTS
 Hadoop 0.20.2
 - released on February 2010
Required Software
   JavaTM 1.6.x, preferably
    from Sun, must be installed.

   ssh must be installed and
    sshd must be running to
    use the Hadoop scripts
    that manage remote
    Hadoop daemons.
Sun Java 6
1. Add repository to your apt repositories:
2. Update the source list

   $ sudo add-apt-repository "deb
    http://archive.canonical.com/ lucid partner"
   $ sudo apt-get update
Sun Java 6
3. Install sun-java6-jdk
4. Select Sun’s Java as the default on your
machine.

   $ sudo apt-get install sun-java6-jdk
   $ sudo update-java-alternatives -s java-6-sun
Sun Java 6
5. Check whether it’s success !

   $ java -version
Configuring SSH
( You can find ssh software in Software Center by searhing “ssh”)
Configuring SSH
1. generate an SSH key for current user.
2. enable SSH access to your local machine
with this newly created key.

   $ ssh-keygen -t rsa -P “”
   $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
                        ( cat test1.txt >> test2.txt 轉向附加)
Configuring SSH
3. Test by connecting to your local machine
  ( You should install ssh first )

   $ ssh localhost
Disabling IPv6
 $ sudo joe /etc/sysctl.conf
 #disable ipv6
  net.ipv6.conf.all.disable_ipv6 = 1
  net.ipv6.conf.default.disable_ipv6 = 1
  net.ipv6.conf.lo.disable_ipv6 = 1

   $ reboot
Disabling IPv6
check whether IPv6 is enabled on your machine
       ( 0 means enabled, 1 means disabled )

   $ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
Hadoop Installation
Download Hadoop from the Apache Mirrors
http://www.apache.org/dyn/closer.cgi/hadoop/core


 $ cd /home/csa
 $ wget
  http://apache.ntu.edu.tw/hadoop/core/ha
  doop-0.20.2/hadoop-0.20.2.tar.gz
Hadoop Installation
 $ sudo tar xzf hadoop-0.20.2.tar.gz
 $ sudo mv hadoop-0.20.2 hadoop
Hadoop Package Topology
   bin / 各執行檔:如 start-all.sh 、stop-all.sh 、 hadoop
   conf / 預設的設定檔目錄:設定環境變數、工作節點
    slaves。
   docs / Hadoop API 與說明文件。
   contrib / 額外有用的功能套件,如:eclipse的擴充外掛。
   lib / 開發 hadoop 專案或編譯 hadoop 程式所需要的所
    有函式庫,如:jetty、kfs。
   src / Hadoop 的原始碼。
   build / 開發Hadoop 編譯後的資料夾。
   logs / 預設的日誌檔所在目錄。(可更改路徑)
Update to who want to use Hadoop
   $ sudo joe /home/csa/.bashrc



   # Set Hadoop-related environment variables
    export HADOOP_HOME=/home/csa/hadoop
   # Add Hadoop bin/ directory to PATH export
    PATH=$PATH:$HADOOP_HOME/bin
Configuration
Change the Sun JDK/JRE 6 directory

   $ joe /hadoop/conf/hadoop-env.sh

   # The java implementation to use. Required.
   export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24
Configuration
   In file conf/core-site.xml

   In file conf/core-site.xml

   In file conf/mapred-site.xml
<!-- In: conf/core-site.xml -->
<property>
          <name>hadoop.tmp.dir</name>
          <value>/app/hadoop/tmp</value>
          <description>A base for other temporary irectories.</description>
</property>
<property>
          <name>fs.default.name</name>
          <value>hdfs://localhost:9000</value>
          <description>The name of the default file system. </description>
</property>
<!-- In: conf/mapred-site.xml -->
<property>
          <name>mapred.job.tracker</name>
          <value>localhost:54311</value>
          <description> For MapReduce job tracker </description>
</property>
<!-- In: conf/hdfs-site.xml -->
<property>
          <name>dfs.replication</name>
          <value>1</value>
          <description>Default block replication. The actual number of
replications can be specified when the file is created. The default is used
if replication is not specified in create time. </description>
</property>
Formatting the name node!
   $ /home/csa/bin/hadoop namenode -format
Starting your single-node cluster

 $ /home/csa/hadoop/bin/start-all.sh
 $ jps
Jps
JobTracker
TaskTracker
NameNode
DataNode
Congratulation!
 You   just setup a single-node cluster
Hadoop Web Interfaces
 http://localhost:50030/
– web UI for MapReduce job tracker(s)
 http://localhost:50060/
– web UI for task tracker(s)
 http://localhost:50070/
– web UI for HDFS name node(s)
常用指令
 操作 hadoop 檔案系統指令
 $ bin/hadoop fs -Instruction …
MapReduce Demo
   WordCount
Divide and Conquer
I am a tiger, you are also a tiger                a,2
                                                  also,1
       I,1                              a,2       am,1
       am,1          a, 1               also,1    are,1
map    a,1                              am,1
                     a,1       reduce             I,1
                     also,1             are,1
                                                  tiger,2
       tiger,1       am,1
       you,1         are,1                        you,1
map
       are,1         I,1
                     tiger,1            I, 1
                     tiger,1            tiger,2
       also,1        you,1     reduce   you,1
map    a, 1
       tiger,1
Why wordcount ?
 Google
 Facebook
參考資料來源
           Thanks for …
   NCHC Cloud Computing Research
    Group ( Link here ! )
Thanks   for your listening

More Related Content

What's hot

データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Takumi Asai
 
Introduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in JapanIntroduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in Japan
inside-BigData.com
 
InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing
inside-BigData.com
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configuration
Gerrit van Vuuren
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Mitsuharu Hamba
 

What's hot (19)

MesosCon 2018
MesosCon 2018MesosCon 2018
MesosCon 2018
 
Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARN
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
 
Introduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in JapanIntroduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in Japan
 
InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configuration
 
Hadoop
HadoopHadoop
Hadoop
 
Ruby on hadoop
Ruby on hadoopRuby on hadoop
Ruby on hadoop
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle Professionals
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Bigdata roundtable-storm
Bigdata roundtable-stormBigdata roundtable-storm
Bigdata roundtable-storm
 
GTC Japan 2014
GTC Japan 2014GTC Japan 2014
GTC Japan 2014
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
 

Viewers also liked

Spring 3.x - Spring MVC
Spring 3.x - Spring MVCSpring 3.x - Spring MVC
Spring 3.x - Spring MVC
Guy Nir
 
Java Spring MVC Framework with AngularJS by Google and HTML5
Java Spring MVC Framework with AngularJS by Google and HTML5Java Spring MVC Framework with AngularJS by Google and HTML5
Java Spring MVC Framework with AngularJS by Google and HTML5
Tuna Tore
 

Viewers also liked (6)

Spring 3.x - Spring MVC
Spring 3.x - Spring MVCSpring 3.x - Spring MVC
Spring 3.x - Spring MVC
 
Java Spring MVC Framework with AngularJS by Google and HTML5
Java Spring MVC Framework with AngularJS by Google and HTML5Java Spring MVC Framework with AngularJS by Google and HTML5
Java Spring MVC Framework with AngularJS by Google and HTML5
 
大數據的獲利模式
大數據的獲利模式大數據的獲利模式
大數據的獲利模式
 
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
Play Framework + Docker + CircleCI + AWS + EC2 Container ServicePlay Framework + Docker + CircleCI + AWS + EC2 Container Service
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
 
Choosing the Right Framework for Running Docker Containers in Prod
Choosing the Right Framework for Running Docker Containers in ProdChoosing the Right Framework for Running Docker Containers in Prod
Choosing the Right Framework for Running Docker Containers in Prod
 
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
 

Similar to Hadoop

[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
npinto
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
Nam Nham
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
rhatr
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
fann wu
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
Viswanath Gangavaram
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 

Similar to Hadoop (20)

Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 

Hadoop

  • 1. 陳柏翰 CS13 http://about.me/sihalon Computer System Administration 2011
  • 3. Outlines  現有雲端服務  Hadoop 背後概念  Hadoop 單節點安裝  簡單範例
  • 5. 簡單來說 即 凡能透過 網際網路 能享受到的 應用服務
  • 6. 現有的雲端運算服務 • Windows • Google • Amazon • Yahoo 他們的背後? • Plurk • ……
  • 7. Hadoop Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data
  • 8. What is Hadoop ?  一種開放源碼雲端平台(框架)  巨量資料計算解決方案  穩定可擴充
  • 9. Yahoo : Hadoop  Apache 項目,Yahoo 資助、開發與運用  2006年 開始參與 Hadoop。  2008年 2千臺伺服器。 執行超過1萬個Hadoop虛擬機器。 5 Petabytes的網頁內容 分析1兆個網路連結
  • 10. Feature • 巨量 – 擁有儲存與處理大量資料的能力 • 經濟 – 可以用在由一般PC所架設的叢集環境內 • 效率 – 平行分散檔案的處理以得到快速的回應 • 可靠 – 當某節點發生錯誤,系統能即時自動的取 得備份資料及佈署運算資源
  • 11. 架構  HDFS - Hadoop 專案中的檔案系統  MapReduce - 平行處理P級別以上的資料集  Hbase - 巨量資料庫系統
  • 12. Divide and Conquer  演算法(Algorithms):  Divide and Conquer  分而治之  在程式設計的軟體架構內,適合使用在大 規模數據的運算中
  • 13. Divide and Conquer 範例一:方格法求面積 範例二:鋪滿 L 形磁磚
  • 14. Divide and Conquer I am a tiger, you are also a tiger a,2 also,1 I,1 a,2 am,1 am,1 a, 1 also,1 are,1 map a,1 am,1 a,1 reduce I,1 also,1 are,1 tiger,2 tiger,1 am,1 you,1 are,1 you,1 map are,1 I,1 tiger,1 I, 1 tiger,1 tiger,2 also,1 you,1 reduce you,1 map a, 1 tiger,1
  • 16. Building Hadoop Namenode JobTracker Data Task Data Task Data Task Java Java Java Linuux Linuux Linuux Node1 Node2 Node3
  • 17. 一起飛上雲端吧 - Demo Time
  • 18. Supported Platforms  GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.  Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.
  • 19. Environment  Ubuntu Linux 10.04 LTS  Hadoop 0.20.2 - released on February 2010
  • 20. Required Software  JavaTM 1.6.x, preferably from Sun, must be installed.  ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
  • 21. Sun Java 6 1. Add repository to your apt repositories: 2. Update the source list  $ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"  $ sudo apt-get update
  • 22. Sun Java 6 3. Install sun-java6-jdk 4. Select Sun’s Java as the default on your machine.  $ sudo apt-get install sun-java6-jdk  $ sudo update-java-alternatives -s java-6-sun
  • 23. Sun Java 6 5. Check whether it’s success !  $ java -version
  • 24. Configuring SSH ( You can find ssh software in Software Center by searhing “ssh”)
  • 25. Configuring SSH 1. generate an SSH key for current user. 2. enable SSH access to your local machine with this newly created key.  $ ssh-keygen -t rsa -P “”  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ( cat test1.txt >> test2.txt 轉向附加)
  • 26.
  • 27. Configuring SSH 3. Test by connecting to your local machine ( You should install ssh first )  $ ssh localhost
  • 28.
  • 29. Disabling IPv6  $ sudo joe /etc/sysctl.conf  #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1  $ reboot
  • 30. Disabling IPv6 check whether IPv6 is enabled on your machine ( 0 means enabled, 1 means disabled )  $ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
  • 31.
  • 32.
  • 33. Hadoop Installation Download Hadoop from the Apache Mirrors http://www.apache.org/dyn/closer.cgi/hadoop/core  $ cd /home/csa  $ wget http://apache.ntu.edu.tw/hadoop/core/ha doop-0.20.2/hadoop-0.20.2.tar.gz
  • 34. Hadoop Installation  $ sudo tar xzf hadoop-0.20.2.tar.gz  $ sudo mv hadoop-0.20.2 hadoop
  • 35. Hadoop Package Topology  bin / 各執行檔:如 start-all.sh 、stop-all.sh 、 hadoop  conf / 預設的設定檔目錄:設定環境變數、工作節點 slaves。  docs / Hadoop API 與說明文件。  contrib / 額外有用的功能套件,如:eclipse的擴充外掛。  lib / 開發 hadoop 專案或編譯 hadoop 程式所需要的所 有函式庫,如:jetty、kfs。  src / Hadoop 的原始碼。  build / 開發Hadoop 編譯後的資料夾。  logs / 預設的日誌檔所在目錄。(可更改路徑)
  • 36. Update to who want to use Hadoop  $ sudo joe /home/csa/.bashrc  # Set Hadoop-related environment variables export HADOOP_HOME=/home/csa/hadoop  # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin
  • 37. Configuration Change the Sun JDK/JRE 6 directory  $ joe /hadoop/conf/hadoop-env.sh  # The java implementation to use. Required.  export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24
  • 38. Configuration  In file conf/core-site.xml  In file conf/core-site.xml  In file conf/mapred-site.xml
  • 39. <!-- In: conf/core-site.xml --> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary irectories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. </description> </property>
  • 40. <!-- In: conf/mapred-site.xml --> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description> For MapReduce job tracker </description> </property>
  • 41. <!-- In: conf/hdfs-site.xml --> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
  • 42. Formatting the name node!  $ /home/csa/bin/hadoop namenode -format
  • 43. Starting your single-node cluster  $ /home/csa/hadoop/bin/start-all.sh  $ jps
  • 45. Congratulation!  You just setup a single-node cluster
  • 46. Hadoop Web Interfaces  http://localhost:50030/ – web UI for MapReduce job tracker(s)  http://localhost:50060/ – web UI for task tracker(s)  http://localhost:50070/ – web UI for HDFS name node(s)
  • 47. 常用指令  操作 hadoop 檔案系統指令  $ bin/hadoop fs -Instruction …
  • 48. MapReduce Demo  WordCount
  • 49. Divide and Conquer I am a tiger, you are also a tiger a,2 also,1 I,1 a,2 am,1 am,1 a, 1 also,1 are,1 map a,1 am,1 a,1 reduce I,1 also,1 are,1 tiger,2 tiger,1 am,1 you,1 are,1 you,1 map are,1 I,1 tiger,1 I, 1 tiger,1 tiger,2 also,1 you,1 reduce you,1 map a, 1 tiger,1
  • 50. Why wordcount ?  Google  Facebook
  • 51. 參考資料來源 Thanks for …  NCHC Cloud Computing Research Group ( Link here ! )
  • 52. Thanks for your listening