Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

基于Eclipse和hadoop平台应用开发入门手册

18,245 views

Published on

Eclipse Hadoop Plugin MapReduce

Published in: Technology
  • 东西不错,分享精神可就一般啦。
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • good
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • a good mannual for hadoop development configuration.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • how to get it
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • like it
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

基于Eclipse和hadoop平台应用开发入门手册

  1. 1. 基于Eclipse和Hadoop平台<br />应用开发入门手册<br />西铭(李振华)<br />2010-4-12<br />1<br />
  2. 2. 目录<br />2<br />
  3. 3. 1.你准备好了吗?<br /><ul><li>操作系统
  4. 4. Linux 64bit 运行Hadoop
  5. 5. Windows 用于运行Eclipse
  6. 6. Java SE Development Kit(JDK)
  7. 7. 下载:http://java.sun.com/javase/downloads/widget/jdk6.jsp
  8. 8. 需要下载Windows 32bit和Linux64bit 2个版本
  9. 9. Cygwin(Linux开发环境忽略)
  10. 10. 下载: http://cygwin.com/
  11. 11. Ant
  12. 12. 下载:http://ant.apache.org/bindownload.cgi
  13. 13. 需要下载Linux系统上
  14. 14. Eclipse Galileo版
  15. 15. 下载:http://www.eclipse.org/downloads/
  16. 16. 需要下载Eclipse IDE for Java Developers
  17. 17. For Windows 32bit和Linux64bit 2个版本
  18. 18. Hadoop
  19. 19. 0.19.2官方版下载:http://www.apache.org/dyn/closer.cgi/hadoop/core/</li></ul>3<br />
  20. 20. 2.Windows环境搭建-WorkDir<br />注:本案例以Hadoop0.19.2进行安装演示,与官方版本的配置区别会提示说明。<br />工作目录(workdir)=D:SearchCenterJwork<br />将下载的windows版本的JDK1.6、Eclipse、Hadoop0.19.2放到工作目录下<br />4<br />
  21. 21. 2.Windows环境搭建-安装JDK<br />1.直接安装JDK安装包<br />2.设定系统环境变量<br /> JAVA_HOME=C:Program FilesJavajdk1.6.0_19(你安装的目录)<br /> Path = 在原有path内容后面加“; C:Program FilesJavajdk1.6.0_19in”<br />5<br />
  22. 22. 2.Windows环境搭建-安装Cygwin<br />1.直接双击Cygwin,运行setup.exe<br />2. 选择通过internet 安装,next;设置安装目录,next;选择直连访问internet,next;选择一个下载站点,next;<br />3.现在安装的内容时,通过search框搜索ssh,找到Open SSH,点击安装;之后同样方法查找ssl,点击安装;一直到最后安装完成<br />6<br />
  23. 23. 2.Windows环境搭建-安装Cygwin<br />4. 设置环境变量,将D:Program Filescygwinin路径追加到path环境变量中<br />7<br />
  24. 24. 3.Linux环境搭建-WorkDir<br />注:工作目录可以自己根据情况设定,这里演示使用<br />开发服务器:dev5<br />操作系统:Linux 64位 4U7版本<br />工作目录(WorkDir)=/home/zhenhua<br />将下载的JDK、Ant、Eclipse、Hadoop for Linux版本放到工作目录下<br />1.wget http://apache.etoak.com//hadoop/core/hadoop-0.19.2/hadoop-0.19.2.tar.gz<br />2.使用gzip –d 和 tar -xf分别解开压缩包<br />8<br />
  25. 25. 3.Linux环境搭建-安装JDK<br />注:一般系统都自带JDK,如果没有可以自行安装,具体安装细节(略)<br />9<br />
  26. 26. 3.Linux环境搭建-安装Ant<br />将解压缩后的apache-ant-1.8软件包放到工作目录下,重命名为ant-1.8<br />10<br />
  27. 27. 3.Linux环境搭建-Compile Hadoop<br />编译官方版本Hadoop-0.19.2,提示BUILD SUCCESSFUL 完成。<br />11<br />
  28. 28. 3.Linux环境搭建-Compile HadoopPlugin<br />编译Hadoop-0.19.2 Plugin for eclipse,需要修改Java源代码<br />修改/home/zhenhua/hadoop-0.19.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/launch/ HadoopApplicationLaunchShortcut.java<br />12<br />
  29. 29. 3.Linux环境搭建-Compile HadoopPlugin<br />编译官方版本Hadoop-0.19.2 Plugin for eclipse,需要修改build.xml<br />修改/home/zhenhua/hadoop-0.19.2/src/contrib/eclipse-plugin/build.xml<br />将${hadoop.root}/build/hadoop-${version}-core.jar 改为<br /> ${hadoop.root}/hadoop-${version}-core.jar<br />13<br />
  30. 30. 3.Linux环境搭建-Compile HadoopPlugin<br />编译Hadoop-0.19.2 Plugin for eclipse<br />3.编译命令中的-Declipse.home=/home/zhenhua/eclipse 是下载并解压缩后eclipse目录,-Dversion=0.19.2 定义编译好的插件版本<br />编译好的官方版插件在: /home/zhenhua/hadoop-0.19.2/build/contrib/eclipse-plugin/ hadoop-0.19.2-eclipse-plugin.jar<br />14<br />
  31. 31. 3.Linux环境搭建-ConfigHadoop<br />4.配置hadoop运行环境<br />修改配置文件/home/zhenhua/hadoop-0.19.2/conf/hadoop-env.sh<br />设置JAVA_HOME参数<br />修改配置文件/home/zhenhua/hadoop-0.19.2/conf/ hadoop-site.xml<br />15<br />
  32. 32. 3.Linux环境搭建-Start Hadoop<br />5.启动 hadoop运行环境<br />/home/zhenhua/hadoop-0.19.2/bin/start-all.sh<br />此案例配置的hadoop运行模式为伪分布式,在一台服务器上启动namenode,secondarynamenode,datanode,jobtracker,tasktracker 5个Java服务,同样也是方便我们开发时单元测试<br />16<br />
  33. 33. 4.Windows开发环境上安装Eclipse Plugin<br />1.将在Linux下编译好的Eclipse Plugin下载到你的windows开发环境,并将其放到eclipse的插件目录下D:SearchCenterJWorkeclipse-java-galileo-SR2-win32eclipseplugins<br />17<br />
  34. 34. 4.Windows开发环境上安装Eclipse Plugin<br />2.启动Eclipse,点击Window=》Show View=》Other ,打开Show View窗口,在MapReduce Tools节点下选择Map/Reduce Locations,点击确定<br />18<br />
  35. 35. 4.Windows开发环境上安装Eclipse Plugin<br />3.在面板右下角,会出现Map/Reduce Locations窗口,鼠标右键点击弹出菜单中,选择New Hadoop location,设置配置好的远程Hadoop运行环境<br />19<br />
  36. 36. 5.第一个Map/Reduce Project<br />1.打开Eclipse,点击菜单File =》New =》Project,打开新增项目模板窗口,选择Map/Reduce Project,点击下一步,设置项目名称后,点击完成<br />20<br />
  37. 37. 5.第一个Map/Reduce Project<br />2.以同样方式创建一个Hadoop19项目,为了后面debug hadoop应用程序使用。<br />21<br />
  38. 38. 5.第一个Map/Reduce Project<br />3.导入Hadoop源代码到Hadoop19项目中<br />22<br />
  39. 39. 5.第一个Map/Reduce Project<br />3.编写第一个Map/Reduce程序WordCount.java<br />23<br />
  40. 40. 5.第一个Map/Reduce Project<br />3.编写第一个Map/Reduce程序WordCount.java<br />import java.io.IOException;<br />import java.util.*;<br />import org.apache.hadoop.fs.Path;<br />import org.apache.hadoop.conf.*;<br />import org.apache.hadoop.io.*;<br />import org.apache.hadoop.mapred.*;<br />import org.apache.hadoop.util.*;<br />public class WordCount {<br /> public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {<br /> private final static IntWritable one = new IntWritable(1);<br /> private Text word = new Text(); <br /> public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {<br /> String line = value.toString();<br />StringTokenizertokenizer = new StringTokenizer(line);<br /> while (tokenizer.hasMoreTokens()) {<br />word.set(tokenizer.nextToken());<br />output.collect(word, one);<br /> }<br /> }<br /> } <br /> public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {<br /> public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {<br />int sum = 0;<br /> while (values.hasNext()) {<br /> sum += values.next().get();<br /> }<br />output.collect(key, new IntWritable(sum));<br /> }<br /> } <br /> public static void main(String[] args) throws Exception {<br />JobConf conf = new JobConf(WordCount.class);<br />conf.setJobName("wordcount"); <br />conf.setOutputKeyClass(Text.class);<br />conf.setOutputValueClass(IntWritable.class); <br />conf.setMapperClass(Map.class);<br />conf.setCombinerClass(Reduce.class);<br />conf.setReducerClass(Reduce.class); <br />conf.setInputFormat(TextInputFormat.class);<br />conf.setOutputFormat(TextOutputFormat.class);<br />conf.setWorkingDirectory(new Path("hdfs://dev5.corp.alimama.com:9100/ajoin/quickstart")); <br />FileInputFormat.setInputPaths(conf, new Path("offer"));<br />FileOutputFormat.setOutputPath(conf, new Path("out")); <br />JobClient.runJob(conf);<br /> } <br />}<br />24<br />
  41. 41. 5.第一个Map/Reduce Project<br />4.运行 Map/Reduce程序WordCount.java<br />25<br />
  42. 42. 6. Debug你的Map/Reduce程序<br />1.在Java程序中设置断点,鼠标右键点击选择Debug as=》Java Application<br />26<br />
  43. 43. 6. Debug你的Map/Reduce程序<br />2.进入Debug面板,可以单步调试程序执行的每一个细节<br />27<br />
  44. 44. 6. Debug你的Map/Reduce程序<br />3.当调试进入hadoop源码时,找不到源码,这时就用到了你之前创建的Hadoop19的项目,点击Edit Source Lookup Path,再点击Add,选择JavaProject,再选择Hadoop19项目源码,点击确定,这时就可以看到Hadoop内部源代码的执行啦,至此大功告成!<br />28<br />

×