Chula Data	Science
Center	of	Excellence	in	Multi-Disciplinary
Big	Data	Analytics
Big	Data	and	Data	Science	
Learning	Path
Digital	Transformation		 #แบ่งปัน
Head	of	Department
Dept.	of	Computer	Engineering
Faculty	of	Engineering
Chulalongkorn University
natawut.n@chula.ac.th
@natawutn
http://natawutn.wordpress.com
http://www.slideshare.net/natawutnupairoj
Asst.	Prof.	Natawut	Nupairoj,	Ph.D.
Data	Science	=	Sensors	+	Big	Data	+	Data	Analytics
The	New	Equation
Data	Analytics	Simplified
Descriptive
• “A.Natawut drinks	about	1	cup	of	coffee	a	
day”
Diagnostic
• “Number	of	cups	that	A.Natawut drinks	
depend	on	number	of	meetings	he	has	each	
day”
Predictive
• “Tomorrow,	A.Natawut has	2	meetings,	it	is	
very	likely	that	A.Natawut will	drink	2	cups	
tomorrow”
Prescriptive
• “Inform	secretary	to	prepare	1	cup	in	the	
morning	and	one	in	the	afternoon	for	
A.Natawut”
Sensors	=	App	/	IoT /	Social	Network
Big	Data	=	Processing	Capabilities
Data	Analytics	=	Domain-Oriented	Machine	Learning
Introducing	FDA-Approved	
Ingestible	Sensors	in	Pills
http://www.forbes.com/sites/singularity/2012/08/09/no-more-skipping-your-medicine-fda-approves-first-digital-pill/
Case	study:	Predictive	Policing
Being	used	by	60	cities	in	the	US	e.g.	Atlanta,	LA,	etc.
Source:	http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing
NHK	Documentary:	Disaster	Big	Data	- Key	to	recovery
Key	Question
“How	many	people	are	still	resided	in	each	area?”
Challenges
• How	to	process	big	data?
• 122M	subscribers	+	2.5	years	of	data	=	200TB-300TB
• How	to	analyze	data?
• What	is	the	definition	of	being	“residence”?
• How	to	sampling	mobile	subscribers	correctly?
• How	can	we	understand	the	results?
• How	to	visualize	data?
• How	to	tell	story?
“Data	Science	is	a	Team	Sport”	– DJ	Patil
Domain	
Knowledge
Math	&	
Statistics
Computer	
Science
Data	Scientist
Statistical	ResearchData	Processing
Machine	Learning
Data	Scientist	Skills	in	the	Context	of	
NHK	Documentary
Domain	
Knowledge
Math	&	
Statistics
Computer	
Science
Statistical	ResearchData	Processing
Machine	Learning
• How	to	store	300TB	of	data?
• How	to	process	300TB	
effectively?
• How	about	Data	Cleansing?
• How	to	visualize	data?
• How	to	sample	data	correctly?
• How	to	turn	geolocation	into	
structured	data?
• How	to	predict	population	
accurately?
• How	to	define	“residence”?
• How	to	classify	local	people	
from	workers?
• How	to	utilize	these	results?
Modern	Data	Science	Team
Source:	http://www.slideshare.net/continuumio/why-open-data-science-matters-
gartner-bi-analytics-summit-16
Understanding	/	Preparation	/	Modeling	/	Evaluation
Deployment
http://nirvacana.com/thoughts/becoming-a-data-scientist/
Most	In-Demand	Skills	for	Data	
Scientist	in	2016
Source:	https://www.crowdflower.com/what-skills-should-data-scientists-have-in-2016/
Final	Thoughts
• A	Good	Data	Scientist	Communicates	Effectively	To	
Business	Users
• A	Good	Data	Scientist	Knows	Your	Business
• A	Good	Data	Scientist	Understands	Statistical	
Phenomena
• A	Good	Data	Scientist	Makes	Efficient	Predictions
• A	Good	Data	Scientist	Provides	Production-Ready	
Solutions
• A	Good	Data	Scientist	Can	Work	On	A	Mass	Scale
https://blog.dataiku.com/2013/11/10/the-six-core-skills-of-a-data-scientist
Chula Data	Science
Center	of	Excellence	in	Multi-Disciplinary
Big	Data	Analytics

Digital Transformation: Big Data and Data Science Learning Path