SlideShare a Scribd company logo
1 of 14
Download to read offline
1©	Cloudera,	Inc.	All	rights	reserved.
Hadoop	10th Birthday	and
Hadoop	3	Alpha
Dongjin	Seo	– Cloudera	Korea,	SE
2©	Cloudera,	Inc.	All	rights	reserved.
Agenda
Ⅰ. Hadoop 10th Birthday
Ⅱ. Hadoop 3 Alpha
3©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	at	10
Apache	
Hadoop
4©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop’s	Timeline
[2002]
• Doug	Cutting	and	Mike	Cafarella
create	Nutch,	an	open	source	
web	crawler
[2003]
• Google	publishes	its	“Google	
File	System”	paper
[2004]
• Cutting	&	Cafarella implement	
Nutch features	that	will	become	
HDFS
• Google	publishes	its	
“MapReduce”	paper
The	Invention	Years
2002	~	2004
The	Incubation	Years
2005	~	2007
The	Coming-Out	Years
2008	~	2009
The	Rapid	Adaption	Years
2010	~	2015
[2005]
• Cafarella spearheads	an	
implementation	of	MapReduce	
in	Nutch
[2006]
• Cutting	joins	Yahoo!;	starts	
Hadoop	subproject	by	carving	
code	from	Nutch
• First	Apache	release	of	Hadoop
[2007]
• First	Hadoop	User	Group	
meeting
• Community	contributions	begin	
to	rise	steeply
[2008]
• Hadoop	becomes	a	Top	Level	
ASF	project
• Yahoo!	launches	world’s	largest	
Hadoop	application
• Hive,	Hadoop’s	first	SQL	
framework,	becomes	a	Hadoop	
sub-project
• Cloudera,	first	company	to	
commercialize	Hadoop,	is	
founded
• Initial	Apache	release	of	Pig
[2009]
• Cutting	joins	Cloudera	as	its	
chief	architect
[2010-11]
• The	extended	Hadoop	
community	busily	builds	out	a	
plethora	of	new	components	
(Crunch,	Sqoop,	Flume,	Oozie,	
etc)
[2012-15]
• HDFS	NameNode HA,	YARN,		
significant	new	features	for	
enterprise	adoption
• Impala joins	the	ecosystem
• Spark becomes	a	Top	Level	ASF	
project
• Kudu,	the	first	native	storage	
option	for	Hadoop	since	HBase,	
joins	the	ASF	Incubator
5©	Cloudera,	Inc.	All	rights	reserved.
2006 2008 2009 2010 2011 2012 2013
Core	Hadoop
(HDFS,	
MapReduce)
HBase
ZooKeeper
Solr
Pig
Core	Hadoop
Hive
Mahout
HBase
ZooKeeper
Solr
Pig
Core	Hadoop
Sqoop
Avro
Hive
Mahout
HBase
ZooKeeper
Solr
Pig
Core	Hadoop
Flume
Bigtop
Oozie
HCatalog
Hue
Sqoop
Avro
Hive
Mahout
HBase
ZooKeeper
Solr
Pig
YARN
Core	Hadoop
Spark
Tez
Impala
Kafka
Drill
Flume
Bigtop
Oozie
HCatalog
Hue
Sqoop
Avro
Hive
Mahout
HBase
ZooKeeper
Solr
Pig
YARN
Core	Hadoop
Parquet
Sentry
Spark
Tez
Impala
Kafka
Drill
Flume
Bigtop
Oozie
HCatalog
Hue
Sqoop
Avro
Hive
Mahout
HBase
ZooKeeper
Solr
Pig
YARN
Core	Hadoop
The	stack	is	continually	evolving	and	growing!
2007
Solr
Pig
Core	Hadoop
Knox
Flink
Parquet
Sentry
Spark
Tez
Impala
Kafka
Drill
Flume
Bigtop
Oozie
HCatalog
Hue
Sqoop
Avro
Hive
Mahout
HBase
ZooKeeper
Solr
Pig
YARN
Core	Hadoop
2014 2015-
Kudu
RecordService
Ibis
Falcon
Knox
Flink
Parquet
Sentry
Spark
Tez
Impala
Kafka
Drill
Flume
Bigtop
Oozie
HCatalog
Hue
Sqoop
Avro
Hive
Mahout
HBase
ZooKeeper
Solr
Pig
YARN
Core	Hadoop
Evolution	of	the	Hadoop Platform
6©	Cloudera,	Inc.	All	rights	reserved.
Why	Did	Hadoop	Succeed?
Open	source	community	and	license
A	large	and	diverse	community	of	developers	has	historically	made,	and	continues	to	make,	the	
Hadoop	ecosystem	among	the	most	active	and	engaged	in	history,	while	the	Apache	License	
lowers	the	barrier	to	entry	for	users.
Extensibility/adaptability	
With	the	possible	exception	of	Linux,	no	other	complex	platform	has	evolved	on	so	many	levels,	
and	so	quickly,	to	meet	user	requirements	over	time.
A	strong	focus	on	systems
The	roots	of	Hadoop	are	in	making	distributed	computing	infrastructure	more	accessible	by	
application	developers.	That	continuing	focus	continues	to	bear	fruit	in	areas	like	resource	
management	and	security.
7©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	3	Alpha	Release!!
HDFS	Erasure	Coding	(HDFS-EC)
Release	Date:	03	September,	2016
Changes:	about	3,000
3.0.0-alpha1
Major	new	changes
YARN	Timeline	Service	v.2
Shell	Script	Rewrite
MapReduce	task-level	native	optimization
Support	for	Multiple	Stanby NameNodes
Java	8	Minimum	Runtime	Version
New	Default	Ports	for	Several	Services
Intra-DataNode Balancer
Reworked	daemon	and	task	heap	management
8©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	3	Alpha	Major	New	Changes
HDFS	Erasure	Coding	(HDFS-EC)
Release	Date:	03	September,	2016
Changes:	about	3,000
3.0.0-alpha1
Major	new	changes
Erasure Coding ?
- Fault-tolerance를 위한 데이터 보존 기법 중 하나로 흔히 RAID-5에서 사용되는 기법입니다.
데이터 저장 시 EC	Codec으로 데이터를 균일한 사이즈의 Data	cell/Parity	cell로 인코딩하며 이와
반대로 데이터 로드 시 Data	cell과 Parity	cell로 구성된 EC	Group에서 유실된 cell에 대해서 해당
그룹에 남아있는 cell들로 부터 재구성하여 원본 데이터 복구하는 디코딩 작업을 실행
HDFS-EC
ü EC의 Reed-Solomon algorithm을 수행하는 Intel	ISA-L	(Intelligence Stroage Acceleration
Library)	사용하여 스토리지 성능,	처리량,	보안,	안정성 개선
ü EC는 Exclusive-OR	공식 기반이지만 Multiple failure 을 보장하지 못하는 불안정한
부분이 존재하여,	Reed-Solomon algorithm을 적용하여 Multiple failures를 보장
ü Cloudera	+	Intel	+	Hadoop Community 합작하여 빌드
ü 각 개별 디렉토리에 ‘hdfs erasurecode -setPolicy’	커맨드로 policy 적용
Erasure Coding의 필요성
ü 복제개수 3개는 장애대응에 용이하나 기회비용이 비쌈 (200%	overhead in	storage space
and	other resources (e.g.	network	bandwidth when writing the data))	
ü 일반적인 Operations 에서는 추가적으로 복제된 블럭에 대해 접근을 잘 하지 않음
Erasure Coding의 효과
ü 더 적은 용량으로도 Fault-tolerance 보장
ü 3x	replication 에 비해 storage cost ~50%	감소 효과
9©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	3	Alpha	Major	New	Changes
YARN	Timeline	Service	v.2
Release	Date:	03	September,	2016
Changes:	about	3,000
3.0.0-alpha1
Major	new	changes
YARN	Timeline Service v.2
- Event,	Metrics와 같은 컨테이너 관련 정보 및 Map,	Reduce	Task	관련 Application 정보를 WebUI를
통해 확인 할 수 있도록 제공되는 서비스
1) Improving	scalability	and	reliability	of	Timeline	Service	
ü 확장성이 높고, 분산 저장 가능한 아키텍처(e.g.	HBase)를 채택하여 신뢰성 향상
2) enhancing	usability	by	introducing	flows	and	aggregation
ü YARN	Application	의 단계별 논리적 Flow 제공
ü Flow	레벨에서의 metrics에 대한 Aggregation 지원
Shell	Script	Rewrite
ü 오랫동안 가지고 있던 버그 수정 및 새로운 기능 추가를 위해 Hadoop	쉘스크립트가 수정 됨
ü 기존 쉘스크립트 버전에 대해 완벽하게 호환되지 않아 Hadoop	환경변수를 사용하거나 shell	
command를 이용하는 사용자에게 영향주는 부분에 대한 검토가 필요
more	info:	
https://issues.apache.org/jira/browse/HADOOP-9902
https://issues.apache.org/jira/secure/attachment/12599817/more-info.txt
10©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	3	Alpha	Major	New	Changes
MapReduce	task-level	native	optimization
Release	Date:	03	September,	2016
Changes:	about	3,000
3.0.0-alpha1
Major	new	changes
NativeTask ?
- 데이터 프로세싱에 초첨을 맞춘 native	computing	unit으로 Hadoop	MapReduce를 위한 고성능
C++	API	&	runtime
NativeTask의 필요성
ü I/O	bottleneck.	Most	Hadoop	workloads	are	data	intensive,	so	if	no	compression	is	used	
for	input,	mid-output,	and	output,	I/O(disk,	network)	could	be	a	bottleneck.
ü Inefficient	implementation.(Map	side	sort, Serialization/Deserialization, Shuffle, Data	
locality, Scheduling	&	starting	overhead)
ü Inflexible	programming	paradigm. - limits	its	performance
NativeTask의 효과
ü NativeTask 를 map	output	collector에 적용하여 shuffle-intensive	job의 성능을 30%	이상
향상
more	info:	
https://issues.apache.org/jira/browse/MAPREDUCE-2841
Java	side
(to	bypass	normal	java	
data	flow))
Native	side
(Actual	computation))
JNI
delegate the data
processing
11©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	3	Alpha	Major	New	Changes
Release	Date:	03	September,	2016
Changes:	about	3,000
3.0.0-alpha1
Major	new	changes
Support	for	Multiple	Stanby NameNodes
ü 기존 single	Active	Namenode, single	Stanby Namenode로만 구성이 가능했던 아키텍처에서
다중 Stanby Namenode로 구성이 가능하도록 변경
ü 더 높은 수준의 High-Availability를 제공
ü Namenode 는 5개 초과 X, 3개를 추천함 à communication	overheads 고려
more	info:	
http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-
hdfs/HDFSHighAvailabilityWithQJM.html
Java	8	Minimum	Runtime	Version
ü Java	7에 대한 오라클의 공식적인 지원이 종료(April	2015)되어 Java	8으로 변경할 수 밖에 없는
상황이 됨
ü 이에 따라 Hadoop	3의 최소 자바 버전은 Java	8	로 변경 됨
12©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	3	Alpha	Major	New	Changes
New	Default	Ports	for	Several	Services
Release	Date:	03	September,	2016
Changes:	about	3,000
3.0.0-alpha1
Major	new	changes
ü Hadoop	시작 시 bind	error를 피하기 위해 Namenode,	Secondary	NN,	Datanode,	KMS의 port를
변경
ü 대규모 클러스터에서의 rolling	restart의 신뢰성 향상에 기여할 것으로 예상
Namenode	ports:	50470	-->	9871,	50070	-->	9870,	8020	-->	9820			
Secondary NN	ports:	50091	-->	9869,	50090	-->	9868			
Datanode	ports:	50020	-->	9867,	50010	-->	9866,	50475	-->	9865,	50075	-->	9864	
KMS	port:	16000	-->	9600	
more	info:	
https://issues.apache.org/jira/browse/HDFS-9427
https://issues.apache.org/jira/browse/HADOOP-12811
Intra-DataNode Balancer
ü 디스크 추가/변경으로 발생되는 Datanode 내의 디스크들에 대한 데이터 적재 불균형을 HDFS	
Command(diskbalancer)로 해결
more	info:	
http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-
hdfs/HDFSCommands.html
13©	Cloudera,	Inc.	All	rights	reserved.
Apache	Hadoop	3	Alpha	Major	New	Changes
Release	Date:	03	September,	2016
Changes:	about	3,000
3.0.0-alpha1
Major	new	changes
Reworked	daemon	and	task	heap	management
ü 호스트의 메모리 사이즈를 기반으로 자동으로 튜닝해주는 기능 추가
ü HADOOP_HEAPSIZE는 deprecated	됨
ü 기존에 비해 간단하게 map/reduce	heap	사이즈 설정이 가능하게 되어 요구되는
heap사이즈를 task설정이나 Java	옵션으로 명시해야되는 수고가 없어졌음
more	info:	
https://issues.apache.org/jira/browse/HADOOP-10950
https://issues.apache.org/jira/browse/MAPREDUCE-5785
Conclusion
v Hadoop의 핵심인 Component(HDFS, MARECUDE)의 큰 변화는 또 다른 큰 발전을 위한 발돋음
ü HDFS	저장공간 활용도 증가 à ROI 증가, TOC 감소 à 초기 도입에 대한 장벽이 낮아짐
ü Multiple	Stanby Namenode, heap 튜닝 기능 à 운영환경에 대한 용이성 증가 à 운영에
투자되는 비용을 클러스터 확장, 개선과 같은 곳에 투자할 수 있는 기회 증가
v 변화한 Hadoop과 연관 Component들의 변화가 기대 됨
ü HIVE	의 Query	속도 개선?
ü HBase의 Throuthput 증가?
ü 데이터 압축 효율성 증가?
ü etc
v 이러한 큰 변화 뒤에는 Community와 구성원들의 관심과 사랑이 있기에 가능
14©	Cloudera,	Inc.	All	rights	reserved.
감사합니다
djseo@cloudera.com

More Related Content

Similar to Hadoop 10th Birthday and Hadoop 3 Alpha

[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안치완 박
 
On premise db & cloud database
On premise db & cloud databaseOn premise db & cloud database
On premise db & cloud databaseOracle Korea
 
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for HadoopSeungYong Baek
 
Mastering devops with oracle 강인호
Mastering devops with oracle 강인호Mastering devops with oracle 강인호
Mastering devops with oracle 강인호Inho Kang
 
서울 하둡 사용자 모임 발표자료
서울 하둡 사용자 모임 발표자료서울 하둡 사용자 모임 발표자료
서울 하둡 사용자 모임 발표자료Teddy Choi
 
데이터 레이크 알아보기(Learn about Data Lake)
데이터 레이크 알아보기(Learn about Data Lake)데이터 레이크 알아보기(Learn about Data Lake)
데이터 레이크 알아보기(Learn about Data Lake)SeungYong Baek
 
올챙이 현재와 미래
올챙이 현재와 미래올챙이 현재와 미래
올챙이 현재와 미래cho hyun jong
 
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기Nanha Park
 
Big data application architecture 요약2
Big data application architecture 요약2Big data application architecture 요약2
Big data application architecture 요약2Seong-Bok Lee
 
Open standard open cloud engine (3)
Open standard open cloud engine (3)Open standard open cloud engine (3)
Open standard open cloud engine (3)uEngine Solutions
 
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)SAMUEL SJ Cheon
 
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Keeyong Han
 
ApacheCon2011 에서는 무슨일이
ApacheCon2011 에서는 무슨일이ApacheCon2011 에서는 무슨일이
ApacheCon2011 에서는 무슨일이Sangmin Lee
 
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영JooHyung Kim
 
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기NAVER D2
 
DevOps - CI/CD 알아보기
DevOps - CI/CD 알아보기DevOps - CI/CD 알아보기
DevOps - CI/CD 알아보기SeungYong Baek
 
Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정HyeonSeok Choi
 
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)SAMUEL SJ Cheon
 
공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례rockplace
 

Similar to Hadoop 10th Birthday and Hadoop 3 Alpha (20)

[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
 
On premise db & cloud database
On premise db & cloud databaseOn premise db & cloud database
On premise db & cloud database
 
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
 
Mastering devops with oracle 강인호
Mastering devops with oracle 강인호Mastering devops with oracle 강인호
Mastering devops with oracle 강인호
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
서울 하둡 사용자 모임 발표자료
서울 하둡 사용자 모임 발표자료서울 하둡 사용자 모임 발표자료
서울 하둡 사용자 모임 발표자료
 
데이터 레이크 알아보기(Learn about Data Lake)
데이터 레이크 알아보기(Learn about Data Lake)데이터 레이크 알아보기(Learn about Data Lake)
데이터 레이크 알아보기(Learn about Data Lake)
 
올챙이 현재와 미래
올챙이 현재와 미래올챙이 현재와 미래
올챙이 현재와 미래
 
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
 
Big data application architecture 요약2
Big data application architecture 요약2Big data application architecture 요약2
Big data application architecture 요약2
 
Open standard open cloud engine (3)
Open standard open cloud engine (3)Open standard open cloud engine (3)
Open standard open cloud engine (3)
 
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
 
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Hadoop Introduction (1.0)
Hadoop Introduction (1.0)
 
ApacheCon2011 에서는 무슨일이
ApacheCon2011 에서는 무슨일이ApacheCon2011 에서는 무슨일이
ApacheCon2011 에서는 무슨일이
 
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
 
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
 
DevOps - CI/CD 알아보기
DevOps - CI/CD 알아보기DevOps - CI/CD 알아보기
DevOps - CI/CD 알아보기
 
Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정
 
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
 
공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례
 

More from Dataya Nolja

How to Study Mathematics for ML
How to Study Mathematics for MLHow to Study Mathematics for ML
How to Study Mathematics for MLDataya Nolja
 
Music Data Start to End
Music Data Start to EndMusic Data Start to End
Music Data Start to EndDataya Nolja
 
Find a Leak Time in the Schedule
Find a Leak Time in the ScheduleFind a Leak Time in the Schedule
Find a Leak Time in the ScheduleDataya Nolja
 
A Financial Company Story of Bringing Open Source and ML in
A Financial Company Story of Bringing Open Source and ML inA Financial Company Story of Bringing Open Source and ML in
A Financial Company Story of Bringing Open Source and ML inDataya Nolja
 
Practice, Practice, Practice and do the Dirty Work
Practice, Practice, Practice and do the Dirty WorkPractice, Practice, Practice and do the Dirty Work
Practice, Practice, Practice and do the Dirty WorkDataya Nolja
 
Predicting People Who May Get off at the Next Station
Predicting People Who May Get off at the Next StationPredicting People Who May Get off at the Next Station
Predicting People Who May Get off at the Next StationDataya Nolja
 
Endless Trial-and-Errors for Data Collecting
Endless Trial-and-Errors for Data CollectingEndless Trial-and-Errors for Data Collecting
Endless Trial-and-Errors for Data CollectingDataya Nolja
 
Log Design Case Study
Log Design Case StudyLog Design Case Study
Log Design Case StudyDataya Nolja
 
Let's Play with Data Safely
Let's Play with Data SafelyLet's Play with Data Safely
Let's Play with Data SafelyDataya Nolja
 
Things Data Scientists Should Keep in Mind
Things Data Scientists Should Keep in MindThings Data Scientists Should Keep in Mind
Things Data Scientists Should Keep in MindDataya Nolja
 
Things Happend between JDBC and MySQL
Things Happend between JDBC and MySQLThings Happend between JDBC and MySQL
Things Happend between JDBC and MySQLDataya Nolja
 
Human-Machine Interaction and AI
Human-Machine Interaction and AIHuman-Machine Interaction and AI
Human-Machine Interaction and AIDataya Nolja
 
Julia 0.5 and TensorFlow
Julia 0.5 and TensorFlowJulia 0.5 and TensorFlow
Julia 0.5 and TensorFlowDataya Nolja
 
Zeppelin and Open Source Ecosystem and Silicon Valley
Zeppelin and Open Source Ecosystem and Silicon ValleyZeppelin and Open Source Ecosystem and Silicon Valley
Zeppelin and Open Source Ecosystem and Silicon ValleyDataya Nolja
 
Kakao Bank Powered by Open Sources
Kakao Bank Powered by Open SourcesKakao Bank Powered by Open Sources
Kakao Bank Powered by Open SourcesDataya Nolja
 
Open Source is My Job
Open Source is My JobOpen Source is My Job
Open Source is My JobDataya Nolja
 
Creating Value through Data Analysis
Creating Value through Data AnalysisCreating Value through Data Analysis
Creating Value through Data AnalysisDataya Nolja
 
How to Make Money from Data - Global Cases
How to Make Money from Data - Global CasesHow to Make Money from Data - Global Cases
How to Make Money from Data - Global CasesDataya Nolja
 
Structured Streaming with Apache Spark
Structured Streaming with Apache SparkStructured Streaming with Apache Spark
Structured Streaming with Apache SparkDataya Nolja
 
How to Create Value from Data, and Its Difficulty
How to Create Value from Data, and Its DifficultyHow to Create Value from Data, and Its Difficulty
How to Create Value from Data, and Its DifficultyDataya Nolja
 

More from Dataya Nolja (20)

How to Study Mathematics for ML
How to Study Mathematics for MLHow to Study Mathematics for ML
How to Study Mathematics for ML
 
Music Data Start to End
Music Data Start to EndMusic Data Start to End
Music Data Start to End
 
Find a Leak Time in the Schedule
Find a Leak Time in the ScheduleFind a Leak Time in the Schedule
Find a Leak Time in the Schedule
 
A Financial Company Story of Bringing Open Source and ML in
A Financial Company Story of Bringing Open Source and ML inA Financial Company Story of Bringing Open Source and ML in
A Financial Company Story of Bringing Open Source and ML in
 
Practice, Practice, Practice and do the Dirty Work
Practice, Practice, Practice and do the Dirty WorkPractice, Practice, Practice and do the Dirty Work
Practice, Practice, Practice and do the Dirty Work
 
Predicting People Who May Get off at the Next Station
Predicting People Who May Get off at the Next StationPredicting People Who May Get off at the Next Station
Predicting People Who May Get off at the Next Station
 
Endless Trial-and-Errors for Data Collecting
Endless Trial-and-Errors for Data CollectingEndless Trial-and-Errors for Data Collecting
Endless Trial-and-Errors for Data Collecting
 
Log Design Case Study
Log Design Case StudyLog Design Case Study
Log Design Case Study
 
Let's Play with Data Safely
Let's Play with Data SafelyLet's Play with Data Safely
Let's Play with Data Safely
 
Things Data Scientists Should Keep in Mind
Things Data Scientists Should Keep in MindThings Data Scientists Should Keep in Mind
Things Data Scientists Should Keep in Mind
 
Things Happend between JDBC and MySQL
Things Happend between JDBC and MySQLThings Happend between JDBC and MySQL
Things Happend between JDBC and MySQL
 
Human-Machine Interaction and AI
Human-Machine Interaction and AIHuman-Machine Interaction and AI
Human-Machine Interaction and AI
 
Julia 0.5 and TensorFlow
Julia 0.5 and TensorFlowJulia 0.5 and TensorFlow
Julia 0.5 and TensorFlow
 
Zeppelin and Open Source Ecosystem and Silicon Valley
Zeppelin and Open Source Ecosystem and Silicon ValleyZeppelin and Open Source Ecosystem and Silicon Valley
Zeppelin and Open Source Ecosystem and Silicon Valley
 
Kakao Bank Powered by Open Sources
Kakao Bank Powered by Open SourcesKakao Bank Powered by Open Sources
Kakao Bank Powered by Open Sources
 
Open Source is My Job
Open Source is My JobOpen Source is My Job
Open Source is My Job
 
Creating Value through Data Analysis
Creating Value through Data AnalysisCreating Value through Data Analysis
Creating Value through Data Analysis
 
How to Make Money from Data - Global Cases
How to Make Money from Data - Global CasesHow to Make Money from Data - Global Cases
How to Make Money from Data - Global Cases
 
Structured Streaming with Apache Spark
Structured Streaming with Apache SparkStructured Streaming with Apache Spark
Structured Streaming with Apache Spark
 
How to Create Value from Data, and Its Difficulty
How to Create Value from Data, and Its DifficultyHow to Create Value from Data, and Its Difficulty
How to Create Value from Data, and Its Difficulty
 

Hadoop 10th Birthday and Hadoop 3 Alpha