SlideShare a Scribd company logo
Introduc)on	to	big	data	
Bắt	đầu	nghiên	cứu	từ	đâu	và	như	thế	nào	
Viet-Trung	Tran	
School	of	InformaAon	and	CommunicaAon	Technology
How	big	is	big	data?
How	big	is	big	data?
What	people	talk	about	big	data	(2008)
and	then	...	(2011)
The	human	face	of	data	(2014)
more	and	more...	(2014)
What	is	big	data?	(2014)	
2014
Big	data	5'V	
Big	data	is	a	term	for	data	sets	that	are	so	large	
or	complex	that	tradiAonal	data	processing	
applicaAon	soLware	is	inadequate	to	deal	with	
them	(wikipedia)
Where	big	data	come	from?
•  E-commerce	
•  Social	networks	
•  Internet	of	things	
•  Data-intensive	experiments	(bioinformaAcs,	quantum	physics,	etc)
Data	is	the	new	oil
Big	data	–	big	value		
source:	wipro.com
Big	data	job	trends
Big	challenges
Big	data	technology	stack
Hadoop	ecosystem
We	need	a	system	that	scales	
•  TradiAonal	tools	are	overwhelmed	
•  Slow	disks,	unreliable	machines,	parallelism	is	not	easy	
•  3	challenges	
•  Reliable	storage	
•  Powerful	data	processing	
•  Efficient	visualizaAon		
20
What	is	Apache	Hadoop?	
• Scalable	and	economical	data	storage	and	processing	
•  The	Apache	Hadoop	soLware	library	is	a	framework	that	allows	for	the	
distributed	processing	of	large	data	sets	across	clusters	of	computers	
using	simple	programming	models.	It	is	designed	to	scale	out	from	
single	servers	to	thousands	of	machines,	each	offering	local	
computaAon	and	storage.	Rather	than	rely	on	hardware	to	deliver	
high-availability,	the	library	itself	is	designed	to	detect	and	handle	
failures	at	the	applicaAon	layer,	so	delivering	a	highly-available	service	
on	top	of	a	cluster	of	computers,	each	of	which	may	be	prone	to	
failures	(commodity	hardware).	
• Heavily	inspired	by	Google	data	architecture	
21
Hadoop	main	components	
• Storage:	Hadoop	distributed	file	system	(HDFS)	
• Processing:	MapReduce	framework	
• System	uAliAes:		
• Hadoop	Common:	The	common	uAliAes	that	support	the	
other	Hadoop	modules.	
• Hadoop	YARN:	A	framework	for	job	scheduling	and	cluster	
resource	management.	
22
Scalability	
• Distributed	by	design	
• Hadoop	can	run	on	cluster		
• Individual	servers	within	a	cluster	are	called	nodes	
• each	node	may	both	store	and	process	data		
• Scale	out	by	adding	more	nodes	to	increase	
scalability	
• Up	to	several	thousand	nodes		
23
Fault	tolerance	
• Cluster	of	commodity	servers	
•  Hardware	failure	is	the	norm	rather	than	the	excepAon	
•  Built	with	redundancy		
• File	loaded	into	HDFS	are	replicated	across	nodes	in	the	
cluster		
•  If	a	node	failed,	its	data	is	re-replicated	using	one	of	the	copies		
• Data	processing	jobs	are	broken	into	individual	tasks	
•  Each	task	takes	a	small	amount	of	data	as	input		
•  Parallel	tasks	execuAon	
•  Failed	tasks	also	get	rescheduled	elsewhere	
• RouAne	failures	are	handled	automaAcally	without	any	loss	of	
data		
24
Hadoop	distributed	file	system	
• Provides	inexpensive	and	reliable	storage	for	massive	
amounts	of	data		
• OpAmized	for	big	files	(100	MB	to	several	TBs	file	sizes)	
• Hierarchical	UNIX	style	file	system	
•  (e.g.,	/hust/soict/hello.txt)	
•  UNIX	style	file	ownership	and	permissions	
• There	are	also	some	major	deviaAons	from	UNIX		
•  Append	only	
•  Write	once	read	many	Ames	
25
HDFS	Architecture	
• Master/slave	architecture	
• HDFS	master:	namenode	
•  Manage	namespace	and	
metadata	
•  Monitor	datanode	
• HDFS	slave:	datanode	
•  Handle	read/write	the	actual	
data	
26
Data	processing:	MapReduce	
• MapReduce	framework	is	the	Hadoop	default	data	processing	
engine		
• MapReduce	is	a	programming	model	for	data	processing	
•  it	is	not	a	language,	a	style	of	processing	data	created	by	Google	
• The	beauty	of	MapReduce	
•  Simplicity	
•  Flexibility	
•  Scalability	
27
a MR job = {Isolated Tasks}n
•  MapReduce	divides	the	workload	into	mulAple	independent	tasks	and	
schedule	them	across	cluster	nodes	
•  A	work	performed	by	each	task	is	done	in	isola-on	from	one	another	
•  The	 amount	 of	 communicaAon	 which	 can	 be	 performed	 by	 tasks	 is	
mainly	limited	for	scalability	reasons	
•  The	 communicaAon	 overhead	 required	 to	 keep	 the	 data	 on	 the	 nodes	
synchronized	at	all	Ames	would	prevent	the	model	from	performing	reliably	and	
efficiently	at	large	scale	
28
Data Distribution
•  In a MapReduce cluster, data is usually managed by a distributed
file systems (e.g., HDFS)
•  Move code to data and not data to code
29
Input	data:	A	large	file	
Node	1	
	
Chunk	of	input	data	
Node	2	
	
Chunk	of	input	data	
Node	3	
	
Chunk	of	input	data
Keys and Values
•  The	programmer	in	MapReduce	has	to	specify	two	funcAons,	the	map	
func-on	and	the	reduce	func-on	that	implement	the	Mapper	and	the	
Reducer	in	a	MapReduce	program	
•  In	MapReduce	data	elements	are	always	structured	as		
key-value	(i.e.,	(K,	V))	pairs	
•  The	map	and	reduce	funcAons	receive	and	emit	(K,	V)	pairs	
(K,	V)	
Pairs	
Map	
FuncAon	
(K’,
V’)
Pairs
Reduce	
FuncAon	
(K’’,
V’’)
Pairs
Input Splits Intermediate Outputs Final Outputs
Partitions
§  In	MapReduce,	intermediate	output	values	are	not	usually		
reduced	together	
§  All	values	with	the	same	key	are	presented	to	a	single		
Reducer	together	
§  More	 specifically,	 a	 different	 subset	 of	 intermediate	 key	 space	 is	
assigned	to	each	Reducer	
§  These	subsets	are	known	as	par--ons	
Different colors represent
different keys (potentially)
from different Mappers
Partitions are the input to Reducers
MapReduce	example	
• Input:	text file containing order ID, employee	name, and sale
amount 	
• Output: sum of all sales per employee 	
32
•  Hadoop	splits	job	into	many	individual	map	tasks		
•  Number	of	map	tasks	is	determined	by	the	amount	of	input	data		
•  Each	map	task	receives	a	porAon	of	the	overall	job	input	to	process		
•  Mappers	process	one	input	record	at	a	Ame		
•  For	each	input	record,	they	emit	zero	or	more	records	as	output		
•  In	this	case,	the	map	task	simply	parses	the	input	record		
•  And	then	emits	the	name	and	price	fields	for	each	as	output		
33	
Map	phrase
• Hadoop	automaAcally	sorts	and	merges	output	from	all	map	
tasks		
•  This	intermediate	process	is	known	as	the		shuffle	and	sort		
•  The	result	is	supplied	to	reduce	tasks		
34	
Shuffle	&	sort	phrase
•  Reducer	input	comes	from	the	shuffle	and	sort	process		
•  As	with	map,	the	reduce	funcAon	receives	one	record	at	a	Ame		
•  A	given	reducer	receives	all	records	for	a	given	key		
•  For	each	input	record,	reduce	can	emit	zero	or	more	output	records		
•  Our	reduce	funcAon	simply	sums	total	per	person		
•  And	emits	employee	name	(key)	and	total	(value)	as	output		
35	
Reduce	phrase
Data	flow	for	the	en)re	MapReduce	job	
36
Word Count Dataflow
MapReduce - Dataflow
Hadoop	ecosystem	
• Many	related	tools	integrate	with	Hadoop	
• Data	analysis			
• Database	integraAon		
• Workflow	management		
• These	are	not	considered	‘core	Hadoop’		
• Rather,	they	are	part	of	the	‘Hadoop	ecosystem’		
• Many	are	also	open	source	Apache	projects		
39
Apache	Pig	
•  Apache	Pig	builds	on	Hadoop	to	offer	high	level	data	processing		
•  Pig	is	especially	good	at	joining	and	transforming	data		
•  The	Pig	interpreter	runs	on	the	client	machine		
•  Turns	PigLaAn	scripts	into	MapReduce	jobs		
•  Submits	those	jobs	to	the	cluster		
40
Apache	Hive	
•  Another	abstracAon	on	top	of	MapReduce	
•  Reduce	development	Ame		
•  HiveQL:	SQL-like	language	
•  The	Hive	interpreter	runs	on	the	client	machine		
•  Turns	PigLaAn	scripts	into	MapReduce	jobs		
•  Submits	those	jobs	to	the	cluster		
41
Apache	Hbase	
•  HBase	is	a	distributed	column-oriented	data	store	built	on	top	of	HDFS	
•  Is	considered	as		the	Hadoop	database		
•  Data	is	logically	organized	into	tables,	rows	and	columns	
•  terabytes,	and	even	petabytes	of	data	in	a	table	
•  Tables	can	have	many	thousands	of	columns	
•  Scales	to	provide	very	high	write	throughput		
•  Hundreds	of	thousands	of	inserts	per	second	
•  Fairly	primiAve	when	compared	to	RDBMS		
•  NoSQL	:	There	is	no	high/level	query	language			
•  Use	API	to	scan	/	get	/	put	values	based	on	keys	
42
Apache	sqoop	
•  Sqoop	is	a	tool	designed	for	efficiently	
transferring	bulk	data	between	Apache	
Hadoop	and	structured	datastores	such	as	
relaAonal	databases.	
•  It	can	import	all	tables,	a	single	table,	or	a	
porAon	of	a	table	into	HDFS		
•  Via	a	Map/only	MapReduce	job		
•  Result	is	a	directory	in	HDFS	containing	
comma/delimited	text	files		
•  Sqoop	can	also	export	data	from	HDFS	back	
to	the	database		
43
Apache	KaZa	
• Kawa	is	a	distributed	streaming	
plaxorm	
•  It	lets	you	publish	and	subscribe	to	
streams	of	records.	In	this	respect	it	is	
similar	to	a	message	queue	or	enterprise	
messaging	system.	
•  It	lets	you	store	streams	of	records	in	a	
fault-tolerant	way.	
•  It	lets	you	process	streams	of	records	as	
they	occur.	
44
Apache	Oozie	
• Oozie	is	a	workflow	scheduler	system	to	manage	Apache	
Hadoop	jobs.	
• Oozie	Workflow	jobs	are	Directed	Acyclical	Graphs	(DAGs)	of	
acAons.	
• Oozie	supports	many	workflow	acAons,	including		
•  ExecuAng	MapReduce	jobs		
•  Running	Pig	or	Hive	scripts		
•  ExecuAng	standard	Java	or	shell	programs		
•  ManipulaAng	data	via	HDFS	commands		
•  Running	remote	commands	with	SSH		
•  Sending	e/mail	messages		
45
Apache	Zookeeper	
• Apache	ZooKeeper	is	a	highly	reliable	distributed	
coordinaAon	service	
• Group	membership	
• Leader	elecAon	
• Dynamic	ConfiguraAon	
• Status	monitoring	
• All	of	these	kinds	of	services	are	used	in	some	form	
or	another	by	distributed	applicaAons	
46
Big	data	pla^orm:	Hadoop	ecosystem
Hortonwork	data	pla^orm	(HDP)	
48
Cloudera's	Distribu)on	for	Hadoop	(CDH)	
49
AWS	big	data	ecosystem
Google	cloud	data	pla^orm
Big	data	management
Large-scale	machine	learning	–	h2o
Spark	Mllib	-	scalable	machine	learning	
•  Built	on	Spark		
•  Free	performance	gains
Bắt	đầu	nghiên	cứu	từ	đâu	và	
như	thế	nào?
Big	data	related	job	)tles	
source:	datacamp.com
source:	datacamp.com
source:	datacamp.com
source:	datacamp.com
Skillset
How		to	land	big	data	related	jobs	
•  Learn	to	code		
•  Coursera	
•  Udacity		
•  Freecodecamp	
•  Codecademy	
•  Math,	Stats	and	machine	learning		
•  Kaggle	
•  Hadoop,	NoSQL,	Spark	
•  VisualizaAon	and	ReporAng	
•  Tableau	
•  Pentahoo	
•  Meetup	&	Share	
•  Find	a	mentor	
•  Internships,	projects
Data	science	method	
Source:	Founda-onal	Methodology	for	Data	Science,	IBM,	2015	
1.	Formulate	a	quesAon	
3.	Analyze	data	
4.	Product	
2.	Gather	data
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% Answered
Baseline	
12/2007	
8/2008	
5/2009	
10/2009	
11/2010	
12/2008	
5/2008	
Now	Playing	in	the	
Winners	Cloud	
4/2010	
Precision	 DeepQA:	Incremental	Progress	in	Precision	and	Confidence	
6/2007-11/2010
Cleaning	Big	Data:	Most	Time-Consuming,	Least	
Enjoyable	Data	Science	Task,	Survey	Says	
•  Data	preparaAon	accounts	for	about	80%	of	the	work	of	data	scienAsts	
source:	h~ps://www.forbes.com/
Cleaning	Big	Data:	Most	Time-Consuming,	Least	
Enjoyable	Data	Science	Task,	Survey	Says	
•  57%	of	data	scienAsts	regard	cleaning	and	organizing	data	as	the	least	
enjoyable	part	of	their	work	and	19%	say	this	about	collecAng	data	
sets.
Thank	you	for	your	a~enAon!	
Q&A

More Related Content

What's hot

Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
Alex Meadows
 
Data Mining
Data MiningData Mining
Data Mining
Medicaps University
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
Raja Chiky
 
Data science and the future of statistics
Data science and the future of statisticsData science and the future of statistics
Data science and the future of statistics
Piet J.H. Daas
 
Datamining
DataminingDatamining
Datamining
greenstarvijay
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
DanWooster1
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
Golda Margret Sheeba J
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
MohammadAsharAshraf
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
KamleshKumar394
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
Dr Anjan Krishnamurthy
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
Editor IJCATR
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
Gregory Piatetsky-Shapiro
 
Data Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big DataData Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big Data
KamleshKumar394
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
AbcdDcba12
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
IJMER
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
Putchong Uthayopas
 

What's hot (19)

Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Data Mining
Data MiningData Mining
Data Mining
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Data science and the future of statistics
Data science and the future of statisticsData science and the future of statistics
Data science and the future of statistics
 
Datamining
DataminingDatamining
Datamining
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Data Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big DataData Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big Data
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 

Similar to Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017

How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
UpXAcademy
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
VedanteePathak
 
Working with real world data
Working with real world dataWorking with real world data
Working with real world data
PayamBarnaghi
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
Thinkful
 
Big_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_ReddiboinaBig_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_Reddiboina
Madhu Reddiboina
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
Graeme Wood
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
Semanticsoftware
 
Big data
Big dataBig data
Opportunities in Data Science.ppt
Opportunities in Data Science.pptOpportunities in Data Science.ppt
Opportunities in Data Science.ppt
SwapnilTelrandhe1
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
Murad Daryousse
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
TJ Stalcup
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
David Rostcheck
 
Misceb intro2014
Misceb intro2014Misceb intro2014
Misceb intro2014
Lee Schlenker
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
Wim Van Leuven
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
ShubhamSamrat5
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
AidaMustapha6
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
admsoyadm4
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
VaibhavGupta447155
 
data mining
data miningdata mining
data mining
AMITKUMAR202236
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
Thinkful
 

Similar to Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017 (20)

How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Working with real world data
Working with real world dataWorking with real world data
Working with real world data
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Big_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_ReddiboinaBig_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_Reddiboina
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
 
Big data
Big dataBig data
Big data
 
Opportunities in Data Science.ppt
Opportunities in Data Science.pptOpportunities in Data Science.ppt
Opportunities in Data Science.ppt
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Misceb intro2014
Misceb intro2014Misceb intro2014
Misceb intro2014
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
data mining
data miningdata mining
data mining
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 

More from Viet-Trung TRAN

Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
Viet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
Viet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
Viet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Viet-Trung TRAN
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
Viet-Trung TRAN
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
Viet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
Viet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
Viet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
Viet-Trung TRAN
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
Viet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
Viet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Viet-Trung TRAN
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
Viet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 

Recently uploaded

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 

Recently uploaded (20)

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017