• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
基于Eclipse和hadoop平台应用开发入门手册
 

基于Eclipse和hadoop平台应用开发入门手册

on

  • 18,422 views

Eclipse Hadoop Plugin MapReduce

Eclipse Hadoop Plugin MapReduce

Statistics

Views

Total Views
18,422
Views on SlideShare
17,965
Embed Views
457

Actions

Likes
57
Downloads
0
Comments
6

11 Embeds 457

http://www.searchtb.com 317
http://hadtop.com 121
http://www.techgig.com 7
http://static.slidesharecdn.com 3
https://www.linkedin.com 2
http://webcache.googleusercontent.com 2
http://reader.youdao.com 1
http://localhost 1
http://cache.baidu.com 1
file:// 1
https://twitter.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

16 of 6 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • 东西不错,分享精神可就一般啦。
    Are you sure you want to
    Your message goes here
    Processing…
  • good
    Are you sure you want to
    Your message goes here
    Processing…
  • a good mannual for hadoop development configuration.
    Are you sure you want to
    Your message goes here
    Processing…
  • how to get it
    Are you sure you want to
    Your message goes here
    Processing…
  • like it
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    基于Eclipse和hadoop平台应用开发入门手册 基于Eclipse和hadoop平台应用开发入门手册 Presentation Transcript

    • 基于Eclipse和Hadoop平台
      应用开发入门手册
      西铭(李振华)
      2010-4-12
      1
    • 目录
      2
    • 1.你准备好了吗?
      • 操作系统
      • Linux 64bit 运行Hadoop
      • Windows 用于运行Eclipse
      • Java SE Development Kit(JDK)
      • 下载:http://java.sun.com/javase/downloads/widget/jdk6.jsp
      • 需要下载Windows 32bit和Linux64bit 2个版本
      • Cygwin(Linux开发环境忽略)
      • 下载: http://cygwin.com/
      • Ant
      • 下载:http://ant.apache.org/bindownload.cgi
      • 需要下载Linux系统上
      • Eclipse Galileo版
      • 下载:http://www.eclipse.org/downloads/
      • 需要下载Eclipse IDE for Java Developers
      • For Windows 32bit和Linux64bit 2个版本
      • Hadoop
      • 0.19.2官方版下载:http://www.apache.org/dyn/closer.cgi/hadoop/core/
      3
    • 2.Windows环境搭建-WorkDir
      注:本案例以Hadoop0.19.2进行安装演示,与官方版本的配置区别会提示说明。
      工作目录(workdir)=D:SearchCenterJwork
      将下载的windows版本的JDK1.6、Eclipse、Hadoop0.19.2放到工作目录下
      4
    • 2.Windows环境搭建-安装JDK
      1.直接安装JDK安装包
      2.设定系统环境变量
      JAVA_HOME=C:Program FilesJavajdk1.6.0_19(你安装的目录)
      Path = 在原有path内容后面加“; C:Program FilesJavajdk1.6.0_19in”
      5
    • 2.Windows环境搭建-安装Cygwin
      1.直接双击Cygwin,运行setup.exe
      2. 选择通过internet 安装,next;设置安装目录,next;选择直连访问internet,next;选择一个下载站点,next;
      3.现在安装的内容时,通过search框搜索ssh,找到Open SSH,点击安装;之后同样方法查找ssl,点击安装;一直到最后安装完成
      6
    • 2.Windows环境搭建-安装Cygwin
      4. 设置环境变量,将D:Program Filescygwinin路径追加到path环境变量中
      7
    • 3.Linux环境搭建-WorkDir
      注:工作目录可以自己根据情况设定,这里演示使用
      开发服务器:dev5
      操作系统:Linux 64位 4U7版本
      工作目录(WorkDir)=/home/zhenhua
      将下载的JDK、Ant、Eclipse、Hadoop for Linux版本放到工作目录下
      1.wget http://apache.etoak.com//hadoop/core/hadoop-0.19.2/hadoop-0.19.2.tar.gz
      2.使用gzip –d 和 tar -xf分别解开压缩包
      8
    • 3.Linux环境搭建-安装JDK
      注:一般系统都自带JDK,如果没有可以自行安装,具体安装细节(略)
      9
    • 3.Linux环境搭建-安装Ant
      将解压缩后的apache-ant-1.8软件包放到工作目录下,重命名为ant-1.8
      10
    • 3.Linux环境搭建-Compile Hadoop
      编译官方版本Hadoop-0.19.2,提示BUILD SUCCESSFUL 完成。
      11
    • 3.Linux环境搭建-Compile HadoopPlugin
      编译Hadoop-0.19.2 Plugin for eclipse,需要修改Java源代码
      修改/home/zhenhua/hadoop-0.19.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/launch/ HadoopApplicationLaunchShortcut.java
      12
    • 3.Linux环境搭建-Compile HadoopPlugin
      编译官方版本Hadoop-0.19.2 Plugin for eclipse,需要修改build.xml
      修改/home/zhenhua/hadoop-0.19.2/src/contrib/eclipse-plugin/build.xml
      将${hadoop.root}/build/hadoop-${version}-core.jar 改为
      ${hadoop.root}/hadoop-${version}-core.jar
      13
    • 3.Linux环境搭建-Compile HadoopPlugin
      编译Hadoop-0.19.2 Plugin for eclipse
      3.编译命令中的-Declipse.home=/home/zhenhua/eclipse 是下载并解压缩后eclipse目录,-Dversion=0.19.2 定义编译好的插件版本
      编译好的官方版插件在: /home/zhenhua/hadoop-0.19.2/build/contrib/eclipse-plugin/ hadoop-0.19.2-eclipse-plugin.jar
      14
    • 3.Linux环境搭建-ConfigHadoop
      4.配置hadoop运行环境
      修改配置文件/home/zhenhua/hadoop-0.19.2/conf/hadoop-env.sh
      设置JAVA_HOME参数
      修改配置文件/home/zhenhua/hadoop-0.19.2/conf/ hadoop-site.xml
      15
    • 3.Linux环境搭建-Start Hadoop
      5.启动 hadoop运行环境
      /home/zhenhua/hadoop-0.19.2/bin/start-all.sh
      此案例配置的hadoop运行模式为伪分布式,在一台服务器上启动namenode,secondarynamenode,datanode,jobtracker,tasktracker 5个Java服务,同样也是方便我们开发时单元测试
      16
    • 4.Windows开发环境上安装Eclipse Plugin
      1.将在Linux下编译好的Eclipse Plugin下载到你的windows开发环境,并将其放到eclipse的插件目录下D:SearchCenterJWorkeclipse-java-galileo-SR2-win32eclipseplugins
      17
    • 4.Windows开发环境上安装Eclipse Plugin
      2.启动Eclipse,点击Window=》Show View=》Other ,打开Show View窗口,在MapReduce Tools节点下选择Map/Reduce Locations,点击确定
      18
    • 4.Windows开发环境上安装Eclipse Plugin
      3.在面板右下角,会出现Map/Reduce Locations窗口,鼠标右键点击弹出菜单中,选择New Hadoop location,设置配置好的远程Hadoop运行环境
      19
    • 5.第一个Map/Reduce Project
      1.打开Eclipse,点击菜单File =》New =》Project,打开新增项目模板窗口,选择Map/Reduce Project,点击下一步,设置项目名称后,点击完成
      20
    • 5.第一个Map/Reduce Project
      2.以同样方式创建一个Hadoop19项目,为了后面debug hadoop应用程序使用。
      21
    • 5.第一个Map/Reduce Project
      3.导入Hadoop源代码到Hadoop19项目中
      22
    • 5.第一个Map/Reduce Project
      3.编写第一个Map/Reduce程序WordCount.java
      23
    • 5.第一个Map/Reduce Project
      3.编写第一个Map/Reduce程序WordCount.java
      import java.io.IOException;
      import java.util.*;
      import org.apache.hadoop.fs.Path;
      import org.apache.hadoop.conf.*;
      import org.apache.hadoop.io.*;
      import org.apache.hadoop.mapred.*;
      import org.apache.hadoop.util.*;
      public class WordCount {
      public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();
      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      String line = value.toString();
      StringTokenizertokenizer = new StringTokenizer(line);
      while (tokenizer.hasMoreTokens()) {
      word.set(tokenizer.nextToken());
      output.collect(word, one);
      }
      }
      }
      public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
      sum += values.next().get();
      }
      output.collect(key, new IntWritable(sum));
      }
      }
      public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");
      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);
      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);
      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);
      conf.setWorkingDirectory(new Path("hdfs://dev5.corp.alimama.com:9100/ajoin/quickstart"));
      FileInputFormat.setInputPaths(conf, new Path("offer"));
      FileOutputFormat.setOutputPath(conf, new Path("out"));
      JobClient.runJob(conf);
      }
      }
      24
    • 5.第一个Map/Reduce Project
      4.运行 Map/Reduce程序WordCount.java
      25
    • 6. Debug你的Map/Reduce程序
      1.在Java程序中设置断点,鼠标右键点击选择Debug as=》Java Application
      26
    • 6. Debug你的Map/Reduce程序
      2.进入Debug面板,可以单步调试程序执行的每一个细节
      27
    • 6. Debug你的Map/Reduce程序
      3.当调试进入hadoop源码时,找不到源码,这时就用到了你之前创建的Hadoop19的项目,点击Edit Source Lookup Path,再点击Add,选择JavaProject,再选择Hadoop19项目源码,点击确定,这时就可以看到Hadoop内部源代码的执行啦,至此大功告成!
      28