SlideShare a Scribd company logo
So You Want To Be A Data Scientist?
What It Means To Be A Data Scientist
About:Me
Mohd Izhar Firdaus Ismail
- Current: Solution Architect @ ABYRES Enterprise
Technologies Sdn Bhd
- Open Source Activist & (self-proclaimed) Hacker, Open Data
Advocate, Fedora Ambassador, Data Architect, Data Engineer,
Consultant, Python Programmer, Analyst, Trainer, and bunch of
other hats ;-)
- Contributing to Open Source projects for over 8 years
- Over 6 years building systems related to data, content,
information and knowledge management
- http://linkedin.com/in/kagesenshi
- izhar@abyres.net / kagesenshi.87@gmail.com
The People I Work For
● Open Source Technology
Company
– Specialize in Cloud, Big Data &
Enterprise Application
Development
– Red Hat & Hortonworks Partner
● IT Consulting & Professional
Services around Open Source
Softwares
– Design, development,
implementation and training
services
– Consulting practice around
leveraging Open Source
technologies and implementing
Big Data project
● The largest organized mafia of
pure play open source geeks in
Malaysia ;-)
Before I Start
Some people call me a data scientist,
But I don't consider myself one (yet)
(( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point ))
But I do work quite a bit with data: designing application,
infrastructure, algorithms, processes and pipelines for big data
workload – from data acquisition to visualization
Who is A Data Scientist?
"Data scientists are involved with gathering data,
massaging it into a tractable form, making it tell its
story, and presenting that story to others."
- Mike Loukides, VP, O’Reilly Media.
"A data scientist is someone who can obtain, scrub,
explore, model and interpret data, blending hacking,
statistics and machine learning. Data scientists not only are
adept at working with data, but appreciate data itself as a
first-class product."
- Hillary Mason, Data Scientist, Accel, Scientist
Emeritus, bitly, co-founder, HackNY.
Whats With The Superhuman
Requirements?
Domain Knowledge & Soft Skills
● Knowledge to find what matters
– Knowing the statistics does not mean knowing
what is the significance of the results to a
business
– Business rules, terminologies, problem solving
techniques, scientific theories & formulas
– Identifying actionable informations
●
Problem solving & Hacker mindset
– New & creative ways to find, acquire,
transform, manipulate, mashing, and using
data
– Possibily unconventional uses of the same
result
– Knowing what data needed, and houw to get
them, to solve particular business problem
Math & Statistics
● People use your output for
decision making – wrong numbers
might end up with bad decisions
– Lies, damned lies, and statistics
● Machine Learning
– Predict future values
– Analyze patterns in structured and
unstructured data
– Automated decision support
systems
Programming & Database
● Programming
– Calculating few thousand rows on excel might be
okay, but dealing with distributed processing need
some skills
● Query over distributed data – you don't want a query that
stuck in a single core on a hundreds node cluster
– Simple visualizations can be done with drag-drop
builders, complex visualization will require you to get
yourself dirty
– Advanced decision system capabilities can only be
implemented through some sort of rule programming
– Develop data pipelines both batch and stream
– Develop data collection, scraping, machine learning &
artificial intelligence softwares
● Database
– Ingesting data from various type of sources,
managing data format, data storage, governance
Communication & Visualization
● Spreading information and discoveries
– Presenting data in the form that non-
scientist can understand
– Knowing how to explain to business users
as to why a result matters, how it can be
used to benefit the business,
organization, society
● Identifying patterns through visual
analysis
– Some insights might not be obvious when
presented in column and rows
– Knowing how to visualize information so
to make hidden patterns more obvious
Data Science
VS
Data Engineering
The Key Differences
● Data Science
– Problem solving through
strategies around data
– Hindsight, Insight,
Foresight
– Understanding of patterns,
behaviors, etc
– Automated Data Driven
Decision Making
● Data Engineering
– Ingestion pipelines
– Data integration
– Data enrichment
– Data cleansing
– Data preparation
– Data pipeline
Hadoop?
Hadoop is for Big Data
● Core of "Big Data"
– Techniques, technologies &
strategies, to handle ingestion,
storage, and processing of high
velocity, high volume, high
variety datasets
– Historical data, and not just
current state
– Transaction + interaction +
observation = Big Data
Data Science Need Big Data
"The reaction of one man could be forecast by no known mathematics;
the reaction of a billion is something else again"
– Asimov
● Without rich historical data, analysis and development become
more challenging
– Patterns will start to show itself in rich historical data
– Models that accurate with small data, might start to fall apart when
more parameters/data are introduced
● Start collecting data today!, you never know when you need it,
and when you do, the historical data is there for you to mine
Getting Started With Data Science
Some tips for beginners
Attn.
● Courses, trainings, documents, tools, etc will definitely
help you to establish your foundations and basics in
data science
– but, like any technical field, what important is your ability to
mash everything up and apply it to solve problems
● Anybody can learn how to draw, anybody can draw, but
not anybody can be an artist.
Domain & Business
● Learn more about your industry (or your target industry)
● Learn what make they tick, what number that matters,
what are scientific knowledge around the domain
● Businesses exist for they key purpose of making profit,
which usually translates to; increase sales & reduce
cost
– Find how to help your organization business by collecting
data and analyze to produce visualizations that will help in
organization make more profit
Math & Statistics
● Find that old textbook you had from university, and
study them again ;-)
● Learn, understand and start to apply how statistics can
be used for estimation, predictions.
Programming & Information System
● If you haven't know programming yet, start to pick up one
– I suggest Python as it has strong background in scientific computing
communities, and was designed by a mathematician – Guido Van Rossum
– Though I'm a biased parseltongue :P
– Books:
●
Packt's Practical Data Analysis
●
How to Think Like A Computer Scientist
● SQL is important
– Pretty much the most mature method for declaring data queries
● Pick up Big Data technologies to help you handle massive datasets
One more thing
http://pysiphae.rtfd.org
Thanks
Contact:
Izhar Firdaus (KageSenshi)
izhar@abyres.net / kagesenshi.87@gmail.com
+60172792765

More Related Content

What's hot

Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
DATAVERSITY
 
Choosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to useChoosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to use
mark madsen
 
Top career opportunities in data science
Top career opportunities in data scienceTop career opportunities in data science
Top career opportunities in data science
TanyaAgarwal71
 
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Formulatedby
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
David Rostcheck
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
Tyrone Systems
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
Motaz Saad
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
iECARUS
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Simplilearn
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceVignesh Prajapati
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
Davide Mauri
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
UpXAcademy
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
AbhijeetPandey71
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
Indu Khemchandani
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
Andrew Gardner
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
Ioannis Kourouklides
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Formulatedby
 

What's hot (20)

Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Choosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to useChoosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to use
 
Top career opportunities in data science
Top career opportunities in data scienceTop career opportunities in data science
Top career opportunities in data science
 
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
 

Similar to So you want to be a Data Scientist?

Demand For Data Scientist
Demand For Data ScientistDemand For Data Scientist
Demand For Data Scientist
ZaranTech LLC
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
sunnypatil1778
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Shiv Shakti Ghosh
 
ds.pptx
ds.pptxds.pptx
ds.pptx
Elves3
 
Week1day2 (1)
Week1day2 (1)Week1day2 (1)
Week1day2 (1)
Shaon Datta
 
Applied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptApplied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptJonathan Sedar
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
MartinFrigaard
 
First Steps on Big Data
First Steps on Big DataFirst Steps on Big Data
First Steps on Big Data
Alexandre Simundi
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
Rubikal
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
Ahmed Amr Abdul-Fattah
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
VamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
saitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
SaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science training
DIGITALSAI1
 

Similar to So you want to be a Data Scientist? (20)

Demand For Data Scientist
Demand For Data ScientistDemand For Data Scientist
Demand For Data Scientist
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
ds.pptx
ds.pptxds.pptx
ds.pptx
 
Week1day2 (1)
Week1day2 (1)Week1day2 (1)
Week1day2 (1)
 
Applied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptApplied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science Dept
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
First Steps on Big Data
First Steps on Big DataFirst Steps on Big Data
First Steps on Big Data
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

So you want to be a Data Scientist?

  • 1. So You Want To Be A Data Scientist? What It Means To Be A Data Scientist
  • 2. About:Me Mohd Izhar Firdaus Ismail - Current: Solution Architect @ ABYRES Enterprise Technologies Sdn Bhd - Open Source Activist & (self-proclaimed) Hacker, Open Data Advocate, Fedora Ambassador, Data Architect, Data Engineer, Consultant, Python Programmer, Analyst, Trainer, and bunch of other hats ;-) - Contributing to Open Source projects for over 8 years - Over 6 years building systems related to data, content, information and knowledge management - http://linkedin.com/in/kagesenshi - izhar@abyres.net / kagesenshi.87@gmail.com
  • 3. The People I Work For ● Open Source Technology Company – Specialize in Cloud, Big Data & Enterprise Application Development – Red Hat & Hortonworks Partner ● IT Consulting & Professional Services around Open Source Softwares – Design, development, implementation and training services – Consulting practice around leveraging Open Source technologies and implementing Big Data project ● The largest organized mafia of pure play open source geeks in Malaysia ;-)
  • 4. Before I Start Some people call me a data scientist, But I don't consider myself one (yet) (( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point )) But I do work quite a bit with data: designing application, infrastructure, algorithms, processes and pipelines for big data workload – from data acquisition to visualization
  • 5. Who is A Data Scientist?
  • 6. "Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others." - Mike Loukides, VP, O’Reilly Media. "A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product." - Hillary Mason, Data Scientist, Accel, Scientist Emeritus, bitly, co-founder, HackNY.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Whats With The Superhuman Requirements?
  • 12. Domain Knowledge & Soft Skills ● Knowledge to find what matters – Knowing the statistics does not mean knowing what is the significance of the results to a business – Business rules, terminologies, problem solving techniques, scientific theories & formulas – Identifying actionable informations ● Problem solving & Hacker mindset – New & creative ways to find, acquire, transform, manipulate, mashing, and using data – Possibily unconventional uses of the same result – Knowing what data needed, and houw to get them, to solve particular business problem
  • 13. Math & Statistics ● People use your output for decision making – wrong numbers might end up with bad decisions – Lies, damned lies, and statistics ● Machine Learning – Predict future values – Analyze patterns in structured and unstructured data – Automated decision support systems
  • 14. Programming & Database ● Programming – Calculating few thousand rows on excel might be okay, but dealing with distributed processing need some skills ● Query over distributed data – you don't want a query that stuck in a single core on a hundreds node cluster – Simple visualizations can be done with drag-drop builders, complex visualization will require you to get yourself dirty – Advanced decision system capabilities can only be implemented through some sort of rule programming – Develop data pipelines both batch and stream – Develop data collection, scraping, machine learning & artificial intelligence softwares ● Database – Ingesting data from various type of sources, managing data format, data storage, governance
  • 15. Communication & Visualization ● Spreading information and discoveries – Presenting data in the form that non- scientist can understand – Knowing how to explain to business users as to why a result matters, how it can be used to benefit the business, organization, society ● Identifying patterns through visual analysis – Some insights might not be obvious when presented in column and rows – Knowing how to visualize information so to make hidden patterns more obvious
  • 16.
  • 18.
  • 19. The Key Differences ● Data Science – Problem solving through strategies around data – Hindsight, Insight, Foresight – Understanding of patterns, behaviors, etc – Automated Data Driven Decision Making ● Data Engineering – Ingestion pipelines – Data integration – Data enrichment – Data cleansing – Data preparation – Data pipeline
  • 21. Hadoop is for Big Data ● Core of "Big Data" – Techniques, technologies & strategies, to handle ingestion, storage, and processing of high velocity, high volume, high variety datasets – Historical data, and not just current state – Transaction + interaction + observation = Big Data
  • 22.
  • 23. Data Science Need Big Data "The reaction of one man could be forecast by no known mathematics; the reaction of a billion is something else again" – Asimov ● Without rich historical data, analysis and development become more challenging – Patterns will start to show itself in rich historical data – Models that accurate with small data, might start to fall apart when more parameters/data are introduced ● Start collecting data today!, you never know when you need it, and when you do, the historical data is there for you to mine
  • 24. Getting Started With Data Science Some tips for beginners
  • 25. Attn. ● Courses, trainings, documents, tools, etc will definitely help you to establish your foundations and basics in data science – but, like any technical field, what important is your ability to mash everything up and apply it to solve problems ● Anybody can learn how to draw, anybody can draw, but not anybody can be an artist.
  • 26. Domain & Business ● Learn more about your industry (or your target industry) ● Learn what make they tick, what number that matters, what are scientific knowledge around the domain ● Businesses exist for they key purpose of making profit, which usually translates to; increase sales & reduce cost – Find how to help your organization business by collecting data and analyze to produce visualizations that will help in organization make more profit
  • 27. Math & Statistics ● Find that old textbook you had from university, and study them again ;-) ● Learn, understand and start to apply how statistics can be used for estimation, predictions.
  • 28. Programming & Information System ● If you haven't know programming yet, start to pick up one – I suggest Python as it has strong background in scientific computing communities, and was designed by a mathematician – Guido Van Rossum – Though I'm a biased parseltongue :P – Books: ● Packt's Practical Data Analysis ● How to Think Like A Computer Scientist ● SQL is important – Pretty much the most mature method for declaring data queries ● Pick up Big Data technologies to help you handle massive datasets
  • 29.
  • 32. Thanks Contact: Izhar Firdaus (KageSenshi) izhar@abyres.net / kagesenshi.87@gmail.com +60172792765