• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
hadoop setting yes
 

hadoop setting yes

on

  • 1,105 views

Linux를 처음 접한 상태에서 여러 블로그를 참조하며 하둡 세팅을 정리한 자료임

Linux를 처음 접한 상태에서 여러 블로그를 참조하며 하둡 세팅을 정리한 자료임

Statistics

Views

Total Views
1,105
Views on SlideShare
951
Embed Views
154

Actions

Likes
0
Downloads
14
Comments
0

1 Embed 154

http://ann890815.tistory.com 154

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    hadoop setting yes hadoop setting yes Presentation Transcript

    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Hadoop Setting Eunsil Yoon
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Linux Setting (version_2012.12월) 1. Linux 설치 - USB 에 부팅프로그램 넣어서 CentOS 설치 2. Hadoop 설치 - hadoop-1.0.4.tar - 경로: /usr/local/hadoop-1.0.4 - Hadoop conf 안에서 자바 경로 잡아주기 - /usr/local/hadoop-1.0.4/conf/hadoop-env.sh - 들어가서 screenshot 1 과 같이 typing 3. Java 설치 - jdk-7u10-linux-x64.rpm 으로 설치 - 경로: /usr/java/jdk1.7.0_10 - Java path 설정 - /etc/profile 에 들어가서 - export JAVA_HOME=/usr/java/jdk1.7.0_10 - export PATH=$PATH:$JAVA_HONE/bin - export CLASS_PATH=".“ 추가하기. Screenshot 2
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 1
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 2
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /etc/hosts
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Network setting 스캔 옆 리눅스pc 설정정보
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/core-site.xml
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/hdfs-site.xml
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon /usr/local/hadoop-1.0.4/conf/mapred-site.xml
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 예제 프로그램 따라해보기(wordcount) • 하둡은 개발자들이 하둡을 쉽게 배울 수 있게 예제 코드와 이를 패키징한 jar 파일을 제공함 • Wordcount 프로그램(jar)을 이용해 hadoop-env.sh 파일의 개수를 계산할것임 • 우선 hadoop-env.sh 파일을 HDFS에 업로드함(screenshot 3 맨첫줄) – ./bin/hadoop fs –put conf/hadoop-env.sh conf/hadoop-env.sh • 업로드 되면 하둡 명령어를 이용해 jar 파일 실행함(screenshot 3 두번째줄) – 입력값은 conf/hadoop-env.sh파일, 출력값을 output폴더를 사용하겠다는 의미 – .bin/hadoop jar hadoop-examples-*.jar wordcount conf/hadoop-env.sh wordcount_output • 실행하면, 맵리듀스 프로그램의 진행률과 다양한 로그메시지 출력됨 • HDFS에 저장된 출력값 확인(fs 명령어의 cat파라미터 사용) – ./bin/hadoop fs –cat wordcount_output/part-r-00000
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 3
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Screenshot 4
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Linux Setting (version 2013.1월) • Ubuntu 12.04 LTS • Java jdk6u38으로 설치, 1.6.0_38 • Ubuntu는 따로 root계정에 접속할 수 있는 인터페이스 가 없어서, 수동으로 접속할 수 있도록 설정해야 함. (next page) 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Memory setting • /boot 500MB • Swap 16384 • /usr 15360 • / 10240 • Var 5120 • Tmp 5120 • Home 나머지 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Network setting 교수님께서 주신pc 설정정보 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Root 계정 생성 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ./bin/hadoop namenode -format 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Wordcount 예제따라하기 성공 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon wordcount_output/part-r-00000 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon fs(filesystem)명령어 lsr 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon yum설치 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon nabi설치 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon openssh-server설치 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Localhost:50070  NameNode정보 확인가능 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NameNode중 Live노드 정보 확인 가능 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Eclipse 설정
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon eclipse • Eclipse 다운 후, 실행 • Java project 생성 • Project name 설정 후 libraries에서 ‘external jar’ 클릭.  jar 파일들, lib안에 있는 것 모두 포함 • Finish • Build.xml – 개발할 프로그램들은 컴파일 후에, jar파일로 만들어서 네임노드가 설치된 서버의 하둡 홈 디렉터리에 업로드 해야 함 – Build.xml은 ant를 이용해서 실행하면 jar파일이 생성됨 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon New  file  build.xml 생성 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Build.xml오른쪽  run as 2013.01.15
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 81p_HDFS파일을 자바 애플리케이션에서도 제어할 수 있게 자바 API제공함 Blrunner.com에서 소스코 드 받은 후, 해당 파일 을 .java파일로 생성. 이클립스에서 package이 름 예제에 나온것처럼 만 들어서 그 안에 .java파일 넣으면 에러없어짐 2013.01.16
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Build.xml실행하여 jar파일 생성 jar파일 안에 경로 들어가면 java파일 포함되어 있음 2013.01.16
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 생성된jar파일은 workspace에 기본적으로 있지만, hadoop하위에 붙여 넣어야 인식할 수 있음 2013.01.16
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Jar파일 실행하면서 출력파일이름과 입력데이터 입력 2013.01.16
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Fs –ls하면 HDFS 내 어디 폴더에 생성되었는지 확인 할 수 있음 2013.01.16
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon -cat명령어로 input.txt 내용 출력 2013.01.16
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Example1_20130119
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 1.병리데이터 txt파일로 hadoop에 붙여넣기
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 2. Hdfs에 적재
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3. Mapreduce 실행
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3-1. mapreduce 실행
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 3-2. mapreduce 실행
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 4. Mapreduce output확인
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 5. Output 파일 다운로드 받아서 확인 • Text로 출력된 결과를 엑셀로 옮 겨 count순으로 내림차순 정렬함 • Output: – _user_medinfo_path_output_part-r- 00000
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCountMapper.java package wikibooks.hadoop.chapter04; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one ); } } }
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCountReducer.java
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon WordCount.java
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Example2. 116p. 130129
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ASA(American Standards Association: 미국 규격 협회) 에서 2009년 공개한 미국 항공편 운항 통계 데이터 http://stat- computing.org/dataexpo/ 2009 1987-2008년까지의 미국 내 모든 상업 항공편에 대 한 항공편 도착과 출발 세 부 사항에 대한 정보 제공
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Download the data
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 1987.csv 파일 상세 (29개의 칼럼) * 숫자 형식으로 데이터가 존재함
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 총 22개의 파일  11.4GB
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon HDFS에 적재 (소요시간 6분)
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ./bin/hadoop fs –ls input 으로 적재된 데이터 목록 확인
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon DepartureDelayCount 실행 소요시간(19분)
    • Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon 출력 결과 • 맵(117개 태스크)에서 30초, 리듀서(1개 태 스크)에서 18분 30초 소요. • Map input records = 117,161,410 건 • Map output & Reduce input records = 47,765,560 • Reduce output records = 245 건