Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Hadoop Setting
Eunsil Yoon
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Linux Setting (version_2012.12월)
1. Linux 설치
- USB 에 부팅프로그램 넣어서 CentOS 설치
2. Hadoop 설치
- hadoop-1.0.4.tar
- 경로: /usr/local/hadoop-1.0.4
- Hadoop conf 안에서 자바 경로 잡아주기
- /usr/local/hadoop-1.0.4/conf/hadoop-env.sh
- 들어가서 screenshot 1 과 같이 typing
3. Java 설치
- jdk-7u10-linux-x64.rpm 으로 설치
- 경로: /usr/java/jdk1.7.0_10
- Java path 설정
- /etc/profile 에 들어가서
- export JAVA_HOME=/usr/java/jdk1.7.0_10
- export PATH=$PATH:$JAVA_HONE/bin
- export CLASS_PATH=".“ 추가하기. Screenshot 2
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Screenshot 1
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Screenshot 2
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
/etc/hosts
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Network setting
스캔 옆 리눅스pc 설정정보
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
/usr/local/hadoop-1.0.4/conf/core-site.xml
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
/usr/local/hadoop-1.0.4/conf/hdfs-site.xml
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
/usr/local/hadoop-1.0.4/conf/mapred-site.xml
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
예제 프로그램 따라해보기(wordcount)
• 하둡은 개발자들이 하둡을 쉽게 배울 수 있게 예제 코드와 이를 패키징한 jar 파일을
제공함
• Wordcount 프로그램(jar)을 이용해 hadoop-env.sh 파일의 개수를 계산할것임
• 우선 hadoop-env.sh 파일을 HDFS에 업로드함(screenshot 3 맨첫줄)
– ./bin/hadoop fs –put conf/hadoop-env.sh conf/hadoop-env.sh
• 업로드 되면 하둡 명령어를 이용해 jar 파일 실행함(screenshot 3 두번째줄)
– 입력값은 conf/hadoop-env.sh파일, 출력값을 output폴더를 사용하겠다는 의미
– .bin/hadoop jar hadoop-examples-*.jar wordcount conf/hadoop-env.sh wordcount_output
• 실행하면, 맵리듀스 프로그램의 진행률과 다양한 로그메시지 출력됨
• HDFS에 저장된 출력값 확인(fs 명령어의 cat파라미터 사용)
– ./bin/hadoop fs –cat wordcount_output/part-r-00000
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Screenshot 3
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Screenshot 4
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Linux Setting (version 2013.1월)
• Ubuntu 12.04 LTS
• Java jdk6u38으로 설치, 1.6.0_38
• Ubuntu는 따로 root계정에 접속할 수 있는 인터페이스
가 없어서, 수동으로 접속할 수 있도록 설정해야 함.
(next page)
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Memory setting
• /boot 500MB
• Swap 16384
• /usr 15360
• / 10240
• Var 5120
• Tmp 5120
• Home 나머지
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Network setting
교수님께서 주신pc 설정정보
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Root 계정 생성
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
./bin/hadoop namenode -format
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Wordcount 예제따라하기 성공
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
wordcount_output/part-r-00000
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
fs(filesystem)명령어 lsr
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
yum설치
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
nabi설치
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
openssh-server설치
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Localhost:50070  NameNode정보 확인가능
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NameNode중 Live노드 정보 확인 가능
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Eclipse 설정
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
eclipse
• Eclipse 다운 후, 실행
• Java project 생성
• Project name 설정 후 libraries에서 ‘external jar’ 클릭.  jar
파일들, lib안에 있는 것 모두 포함
• Finish
• Build.xml
– 개발할 프로그램들은 컴파일 후에, jar파일로 만들어서 네임노드가
설치된 서버의 하둡 홈 디렉터리에 업로드 해야 함
– Build.xml은 ant를 이용해서 실행하면 jar파일이 생성됨
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
New  file  build.xml 생성
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Build.xml오른쪽  run as
2013.01.15
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
81p_HDFS파일을 자바 애플리케이션에서도 제어할 수 있게 자바 API제공함
Blrunner.com에서 소스코
드 받은 후, 해당 파일
을 .java파일로 생성.
이클립스에서 package이
름 예제에 나온것처럼 만
들어서 그 안에 .java파일
넣으면 에러없어짐
2013.01.16
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Build.xml실행하여 jar파일 생성
jar파일 안에 경로 들어가면 java파일 포함되어 있음
2013.01.16
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
생성된jar파일은 workspace에 기본적으로 있지만, hadoop하위에 붙여
넣어야 인식할 수 있음
2013.01.16
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Jar파일 실행하면서 출력파일이름과 입력데이터 입력
2013.01.16
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Fs –ls하면 HDFS 내 어디 폴더에 생성되었는지 확인 할 수
있음
2013.01.16
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
-cat명령어로 input.txt 내용 출력
2013.01.16
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Example1_20130119
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
1.병리데이터 txt파일로 hadoop에 붙여넣기
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
2. Hdfs에 적재
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
3. Mapreduce 실행
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
3-1. mapreduce 실행
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
3-2. mapreduce 실행
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
4. Mapreduce output확인
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
5. Output 파일 다운로드 받아서 확인
• Text로 출력된 결과를 엑셀로 옮
겨 count순으로 내림차순 정렬함
• Output:
– _user_medinfo_path_output_part-r-
00000
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
WordCountMapper.java
package wikibooks.hadoop.chapter04;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one
);
}
}
}
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
WordCountReducer.java
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
WordCount.java
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Example2. 116p. 130129
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
ASA(American Standards Association: 미국 규격 협회) 에서 2009년
공개한 미국 항공편 운항 통계 데이터
http://stat-
computing.org/dataexpo/
2009
1987-2008년까지의 미국
내 모든 상업 항공편에 대
한 항공편 도착과 출발 세
부 사항에 대한 정보 제공
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Download the data
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
1987.csv 파일 상세 (29개의 칼럼)
* 숫자 형식으로 데이터가 존재함
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
총 22개의 파일  11.4GB
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
HDFS에 적재 (소요시간 6분)
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
./bin/hadoop fs –ls input 으로 적재된 데이터 목록 확인
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
DepartureDelayCount 실행 소요시간(19분)
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
출력 결과
• 맵(117개 태스크)에서 30초, 리듀서(1개 태
스크)에서 18분 30초 소요.
• Map input records = 117,161,410 건
• Map output & Reduce input records =
47,765,560
• Reduce output records = 245 건

hadoop setting yes

  • 1.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Hadoop Setting Eunsil Yoon
  • 2.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Linux Setting (version_2012.12월) 1. Linux 설치 - USB 에 부팅프로그램 넣어서 CentOS 설치 2. Hadoop 설치 - hadoop-1.0.4.tar - 경로: /usr/local/hadoop-1.0.4 - Hadoop conf 안에서 자바 경로 잡아주기 - /usr/local/hadoop-1.0.4/conf/hadoop-env.sh - 들어가서 screenshot 1 과 같이 typing 3. Java 설치 - jdk-7u10-linux-x64.rpm 으로 설치 - 경로: /usr/java/jdk1.7.0_10 - Java path 설정 - /etc/profile 에 들어가서 - export JAVA_HOME=/usr/java/jdk1.7.0_10 - export PATH=$PATH:$JAVA_HONE/bin - export CLASS_PATH=".“ 추가하기. Screenshot 2
  • 3.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 1
  • 4.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 2
  • 5.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /etc/hosts
  • 6.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Network setting 스캔 옆 리눅스pc 설정정보
  • 7.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/core-site.xml
  • 8.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/hdfs-site.xml
  • 9.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/mapred-site.xml
  • 10.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 예제 프로그램 따라해보기(wordcount) • 하둡은 개발자들이 하둡을 쉽게 배울 수 있게 예제 코드와 이를 패키징한 jar 파일을 제공함 • Wordcount 프로그램(jar)을 이용해 hadoop-env.sh 파일의 개수를 계산할것임 • 우선 hadoop-env.sh 파일을 HDFS에 업로드함(screenshot 3 맨첫줄) – ./bin/hadoop fs –put conf/hadoop-env.sh conf/hadoop-env.sh • 업로드 되면 하둡 명령어를 이용해 jar 파일 실행함(screenshot 3 두번째줄) – 입력값은 conf/hadoop-env.sh파일, 출력값을 output폴더를 사용하겠다는 의미 – .bin/hadoop jar hadoop-examples-*.jar wordcount conf/hadoop-env.sh wordcount_output • 실행하면, 맵리듀스 프로그램의 진행률과 다양한 로그메시지 출력됨 • HDFS에 저장된 출력값 확인(fs 명령어의 cat파라미터 사용) – ./bin/hadoop fs –cat wordcount_output/part-r-00000
  • 11.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 3
  • 12.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 4
  • 13.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Linux Setting (version 2013.1월) • Ubuntu 12.04 LTS • Java jdk6u38으로 설치, 1.6.0_38 • Ubuntu는 따로 root계정에 접속할 수 있는 인터페이스 가 없어서, 수동으로 접속할 수 있도록 설정해야 함. (next page) 2013.01.15
  • 14.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Memory setting • /boot 500MB • Swap 16384 • /usr 15360 • / 10240 • Var 5120 • Tmp 5120 • Home 나머지 2013.01.15
  • 15.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Network setting 교수님께서 주신pc 설정정보 2013.01.15
  • 16.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Root 계정 생성 2013.01.15
  • 17.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ./bin/hadoop namenode -format 2013.01.15
  • 18.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Wordcount 예제따라하기 성공 2013.01.15
  • 19.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon wordcount_output/part-r-00000 2013.01.15
  • 20.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon fs(filesystem)명령어 lsr 2013.01.15
  • 21.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon yum설치 2013.01.15
  • 22.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon nabi설치 2013.01.15
  • 23.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon openssh-server설치 2013.01.15
  • 24.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Localhost:50070  NameNode정보 확인가능 2013.01.15
  • 25.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NameNode중 Live노드 정보 확인 가능 2013.01.15
  • 26.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Eclipse 설정
  • 27.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon eclipse • Eclipse 다운 후, 실행 • Java project 생성 • Project name 설정 후 libraries에서 ‘external jar’ 클릭.  jar 파일들, lib안에 있는 것 모두 포함 • Finish • Build.xml – 개발할 프로그램들은 컴파일 후에, jar파일로 만들어서 네임노드가 설치된 서버의 하둡 홈 디렉터리에 업로드 해야 함 – Build.xml은 ant를 이용해서 실행하면 jar파일이 생성됨 2013.01.15
  • 28.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon New  file  build.xml 생성 2013.01.15
  • 29.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Build.xml오른쪽  run as 2013.01.15
  • 30.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 81p_HDFS파일을 자바 애플리케이션에서도 제어할 수 있게 자바 API제공함 Blrunner.com에서 소스코 드 받은 후, 해당 파일 을 .java파일로 생성. 이클립스에서 package이 름 예제에 나온것처럼 만 들어서 그 안에 .java파일 넣으면 에러없어짐 2013.01.16
  • 31.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Build.xml실행하여 jar파일 생성 jar파일 안에 경로 들어가면 java파일 포함되어 있음 2013.01.16
  • 32.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 생성된jar파일은 workspace에 기본적으로 있지만, hadoop하위에 붙여 넣어야 인식할 수 있음 2013.01.16
  • 33.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Jar파일 실행하면서 출력파일이름과 입력데이터 입력 2013.01.16
  • 34.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Fs –ls하면 HDFS 내 어디 폴더에 생성되었는지 확인 할 수 있음 2013.01.16
  • 35.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon -cat명령어로 input.txt 내용 출력 2013.01.16
  • 36.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Example1_20130119
  • 37.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 1.병리데이터 txt파일로 hadoop에 붙여넣기
  • 38.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 2. Hdfs에 적재
  • 39.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3. Mapreduce 실행
  • 40.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3-1. mapreduce 실행
  • 41.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3-2. mapreduce 실행
  • 42.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 4. Mapreduce output확인
  • 43.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 5. Output 파일 다운로드 받아서 확인 • Text로 출력된 결과를 엑셀로 옮 겨 count순으로 내림차순 정렬함 • Output: – _user_medinfo_path_output_part-r- 00000
  • 44.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCountMapper.java package wikibooks.hadoop.chapter04; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one ); } } }
  • 45.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCountReducer.java
  • 46.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCount.java
  • 47.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon
  • 48.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Example2. 116p. 130129
  • 49.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ASA(American Standards Association: 미국 규격 협회) 에서 2009년 공개한 미국 항공편 운항 통계 데이터 http://stat- computing.org/dataexpo/ 2009 1987-2008년까지의 미국 내 모든 상업 항공편에 대 한 항공편 도착과 출발 세 부 사항에 대한 정보 제공
  • 50.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Download the data
  • 51.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 1987.csv 파일 상세 (29개의 칼럼) * 숫자 형식으로 데이터가 존재함
  • 52.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 총 22개의 파일  11.4GB
  • 53.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon HDFS에 적재 (소요시간 6분)
  • 54.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ./bin/hadoop fs –ls input 으로 적재된 데이터 목록 확인
  • 55.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon DepartureDelayCount 실행 소요시간(19분)
  • 56.
    Medical Informatics Laboratory Departmentof Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 출력 결과 • 맵(117개 태스크)에서 30초, 리듀서(1개 태 스크)에서 18분 30초 소요. • Map input records = 117,161,410 건 • Map output & Reduce input records = 47,765,560 • Reduce output records = 245 건