6. Step 1/4 : Collecting & Distributing Data
Flume
설정한 복수의 서버 주소로
데이터 분산 저장
7. Step 2/4 : Data Storage
HDFS: Hadoop file system
YARN: Yet Another Resource Negotiator, 연결된 서버 간 자원 관리
Saving data Through Map /
Reduce based Data
Compression
8. Step 3/4 : Data Search
HIVE : HiveQL based Search engine (SQL 과 유사)
9. Step 4a/4 : Data Classification
Mahout: 분산 처리용 기계학습 라이브러리
10. Step 4b/4 : Data Classification
TONY(Tensorflow on YARN)+ Distributed Tensorflow
Work in progress
11. Hadoop OR Spark?
Hadoop and Spark Comparison (DISK vs RAM)
Fast (x10)
Real-time process
Expensive Cost
Easy (One Package)
Slow
Batch based process
Inexpensive Cost
Hard (Many Packages)
데이터 특성에 따른 Ecosystem 선택 필요
12.
13. Hadoop made easy by…
[DEMO]
Cloudera
Oozie
Hue
Hadoop Web Monitoring