update:2016/06/28
Example & homework: https://github.com/Phate334/MapReduceExample
簡報已更新並重新整理
This slide show you how to deploy Hadoop cluster with Cloudera and set up your own development environment to easy to test your MapReduce app.
這份簡報分成三個部分:
1.架設hadoop叢集,今年縮短了這部分的介紹,只放上一些參考的連結。
2.準備一個可以在本機端測試Mapreduce的開發環境,而不需要每次都將程式送到叢集中。
3.三個簡單的例子,用來介紹Mapreduce框架。
該投影片介紹了利用CDH(Cloudera’s Cloudera Distribution Including Apache Hadoop)來架設Hadoop叢集時所需要準備的環境,其中包括硬體規格、系統環境與軟體版本。
未來蒐集的相關資源會放在Diggo如下的兩個library中:
https://www.diigo.com/user/phate334/cloudera
https://www.diigo.com/user/phate334/hadoop
How to plan a hadoop cluster for testing and production environmentAnna Yen
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
該投影片介紹了利用CDH(Cloudera’s Cloudera Distribution Including Apache Hadoop)來架設Hadoop叢集時所需要準備的環境,其中包括硬體規格、系統環境與軟體版本。
未來蒐集的相關資源會放在Diggo如下的兩個library中:
https://www.diigo.com/user/phate334/cloudera
https://www.diigo.com/user/phate334/hadoop
How to plan a hadoop cluster for testing and production environmentAnna Yen
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)Joseph Kuo
The concept of cloud computing has been introduced for several years. Many of us may be able to roughly imagine what it is, some of us may know how to describe it, but only a few do know how to implement it. Does NoSQL, MapReduce or Big Data equal to cloud computing? Can a service be said that it is cloud-based if it is using any of those tools? Many companies and groups have declared that their online services are cloud-based or they are using cloud computing, but are those all true? Except for the questions above, where should we start if we would like to establish a cloud-based service which is distributed, flexible, reliable, available, scalable and stable? This session intends to lead you through the gate of mysteries and head to the beautiful realm of cloud computing by using powerful tools, like Hazelcast. Welcome to journey with us to the core of cloud computing application!
https://cyberjos.blog/java/seminar/jcconf-2014-establish-the-core-of-cloud-computing-application-by-using-hazelcast/
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)Joseph Kuo
The concept of cloud computing has been introduced for several years. Many of us may be able to roughly imagine what it is, some of us may know how to describe it, but only a few do know how to implement it. Does NoSQL, MapReduce or Big Data equal to cloud computing? Can a service be said that it is cloud-based if it is using any of those tools? Many companies and groups have declared that their online services are cloud-based or they are using cloud computing, but are those all true? Except for the questions above, where should we start if we would like to establish a cloud-based service which is distributed, flexible, reliable, available, scalable and stable? This session intends to lead you through the gate of mysteries and head to the beautiful realm of cloud computing by using powerful tools, like Hazelcast. Welcome to journey with us to the core of cloud computing application!
https://cyberjos.blog/java/seminar/jcconf-2014-establish-the-core-of-cloud-computing-application-by-using-hazelcast/
6. Some Information
• Cloudera vs Hortonworks vs MapR: Comparing
Hadoop Distributions
• Products that include Apache Hadoop or
derivative works and Commercial Support
• 大数据和Hadoop生态圈,Hadoop发行版和基
于Hadoop的企业级应用
• Hadoop十年解读与发展预测
6
62. Map 的 input 格式
• job.setInputFormatClass(TextInputFormat.clas
s)
– An InputFormat for plain text files. Files are
broken into lines. Either linefeed or carriage-
return are used to signal end of line. Keys are the
position in the file, and values are the line of text.
來源: Input Splits in Hadoop’s MapReduce
揭秘InputFormat:掌控Map Reduce任务执行的利
器
62
1. 如果最後一行被切斷,則向
下一個Block抓完整行資料。
2. 如果讀取的不是第一個block,
則跳過第一個Split。
63. Example 1
Map
Ma
p
R OI
map(KEYIN key, VALUEIN value,
org.apache.hadoop.mapreduce.Mapper.Context context)
Called once for each key/value pair in the input split.
63
67. Example 1
Reduce Reduc
e OI M
•如前一小節所述,所有同樣Key的資料對已經被收在一起,放在
一個java.lang.Iterable物件中。
•接下來用一個for-each把所有的值相加。
•我們的目的是輸出加總的值,所以KEYOUT給他一個NullWritable。
67