hadoop setting yes

1,679 views

Published on

Linux를 처음 접한 상태에서 여러 블로그를 참조하며 하둡 세팅을 정리한 자료임

Published in: Career
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,679
On SlideShare
0
From Embeds
0
Number of Embeds
359
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

hadoop setting yes

  1. 1. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Hadoop Setting Eunsil Yoon
  2. 2. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Linux Setting (version_2012.12월) 1. Linux 설치 - USB 에 부팅프로그램 넣어서 CentOS 설치 2. Hadoop 설치 - hadoop-1.0.4.tar - 경로: /usr/local/hadoop-1.0.4 - Hadoop conf 안에서 자바 경로 잡아주기 - /usr/local/hadoop-1.0.4/conf/hadoop-env.sh - 들어가서 screenshot 1 과 같이 typing 3. Java 설치 - jdk-7u10-linux-x64.rpm 으로 설치 - 경로: /usr/java/jdk1.7.0_10 - Java path 설정 - /etc/profile 에 들어가서 - export JAVA_HOME=/usr/java/jdk1.7.0_10 - export PATH=$PATH:$JAVA_HONE/bin - export CLASS_PATH=".“ 추가하기. Screenshot 2
  3. 3. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 1
  4. 4. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 2
  5. 5. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /etc/hosts
  6. 6. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Network setting 스캔 옆 리눅스pc 설정정보
  7. 7. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/core-site.xml
  8. 8. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/hdfs-site.xml
  9. 9. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/mapred-site.xml
  10. 10. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 예제 프로그램 따라해보기(wordcount) • 하둡은 개발자들이 하둡을 쉽게 배울 수 있게 예제 코드와 이를 패키징한 jar 파일을 제공함 • Wordcount 프로그램(jar)을 이용해 hadoop-env.sh 파일의 개수를 계산할것임 • 우선 hadoop-env.sh 파일을 HDFS에 업로드함(screenshot 3 맨첫줄) – ./bin/hadoop fs –put conf/hadoop-env.sh conf/hadoop-env.sh • 업로드 되면 하둡 명령어를 이용해 jar 파일 실행함(screenshot 3 두번째줄) – 입력값은 conf/hadoop-env.sh파일, 출력값을 output폴더를 사용하겠다는 의미 – .bin/hadoop jar hadoop-examples-*.jar wordcount conf/hadoop-env.sh wordcount_output • 실행하면, 맵리듀스 프로그램의 진행률과 다양한 로그메시지 출력됨 • HDFS에 저장된 출력값 확인(fs 명령어의 cat파라미터 사용) – ./bin/hadoop fs –cat wordcount_output/part-r-00000
  11. 11. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 3
  12. 12. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 4
  13. 13. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Linux Setting (version 2013.1월) • Ubuntu 12.04 LTS • Java jdk6u38으로 설치, 1.6.0_38 • Ubuntu는 따로 root계정에 접속할 수 있는 인터페이스 가 없어서, 수동으로 접속할 수 있도록 설정해야 함. (next page) 2013.01.15
  14. 14. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Memory setting • /boot 500MB • Swap 16384 • /usr 15360 • / 10240 • Var 5120 • Tmp 5120 • Home 나머지 2013.01.15
  15. 15. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Network setting 교수님께서 주신pc 설정정보 2013.01.15
  16. 16. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Root 계정 생성 2013.01.15
  17. 17. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ./bin/hadoop namenode -format 2013.01.15
  18. 18. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Wordcount 예제따라하기 성공 2013.01.15
  19. 19. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon wordcount_output/part-r-00000 2013.01.15
  20. 20. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon fs(filesystem)명령어 lsr 2013.01.15
  21. 21. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon yum설치 2013.01.15
  22. 22. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon nabi설치 2013.01.15
  23. 23. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon openssh-server설치 2013.01.15
  24. 24. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Localhost:50070  NameNode정보 확인가능 2013.01.15
  25. 25. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NameNode중 Live노드 정보 확인 가능 2013.01.15
  26. 26. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Eclipse 설정
  27. 27. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon eclipse • Eclipse 다운 후, 실행 • Java project 생성 • Project name 설정 후 libraries에서 ‘external jar’ 클릭.  jar 파일들, lib안에 있는 것 모두 포함 • Finish • Build.xml – 개발할 프로그램들은 컴파일 후에, jar파일로 만들어서 네임노드가 설치된 서버의 하둡 홈 디렉터리에 업로드 해야 함 – Build.xml은 ant를 이용해서 실행하면 jar파일이 생성됨 2013.01.15
  28. 28. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon New  file  build.xml 생성 2013.01.15
  29. 29. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Build.xml오른쪽  run as 2013.01.15
  30. 30. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 81p_HDFS파일을 자바 애플리케이션에서도 제어할 수 있게 자바 API제공함 Blrunner.com에서 소스코 드 받은 후, 해당 파일 을 .java파일로 생성. 이클립스에서 package이 름 예제에 나온것처럼 만 들어서 그 안에 .java파일 넣으면 에러없어짐 2013.01.16
  31. 31. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Build.xml실행하여 jar파일 생성 jar파일 안에 경로 들어가면 java파일 포함되어 있음 2013.01.16
  32. 32. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 생성된jar파일은 workspace에 기본적으로 있지만, hadoop하위에 붙여 넣어야 인식할 수 있음 2013.01.16
  33. 33. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Jar파일 실행하면서 출력파일이름과 입력데이터 입력 2013.01.16
  34. 34. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Fs –ls하면 HDFS 내 어디 폴더에 생성되었는지 확인 할 수 있음 2013.01.16
  35. 35. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon -cat명령어로 input.txt 내용 출력 2013.01.16
  36. 36. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Example1_20130119
  37. 37. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 1.병리데이터 txt파일로 hadoop에 붙여넣기
  38. 38. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 2. Hdfs에 적재
  39. 39. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3. Mapreduce 실행
  40. 40. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3-1. mapreduce 실행
  41. 41. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3-2. mapreduce 실행
  42. 42. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 4. Mapreduce output확인
  43. 43. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 5. Output 파일 다운로드 받아서 확인 • Text로 출력된 결과를 엑셀로 옮 겨 count순으로 내림차순 정렬함 • Output: – _user_medinfo_path_output_part-r- 00000
  44. 44. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCountMapper.java package wikibooks.hadoop.chapter04; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one ); } } }
  45. 45. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCountReducer.java
  46. 46. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCount.java
  47. 47. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon
  48. 48. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Example2. 116p. 130129
  49. 49. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ASA(American Standards Association: 미국 규격 협회) 에서 2009년 공개한 미국 항공편 운항 통계 데이터 http://stat- computing.org/dataexpo/ 2009 1987-2008년까지의 미국 내 모든 상업 항공편에 대 한 항공편 도착과 출발 세 부 사항에 대한 정보 제공
  50. 50. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Download the data
  51. 51. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 1987.csv 파일 상세 (29개의 칼럼) * 숫자 형식으로 데이터가 존재함
  52. 52. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 총 22개의 파일  11.4GB
  53. 53. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon HDFS에 적재 (소요시간 6분)
  54. 54. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ./bin/hadoop fs –ls input 으로 적재된 데이터 목록 확인
  55. 55. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon DepartureDelayCount 실행 소요시간(19분)
  56. 56. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 출력 결과 • 맵(117개 태스크)에서 30초, 리듀서(1개 태 스크)에서 18분 30초 소요. • Map input records = 117,161,410 건 • Map output & Reduce input records = 47,765,560 • Reduce output records = 245 건

×