SlideShare a Scribd company logo
10 tips
from a young
data scientist
Nuno Carneiro (nc@nunocarneiro.com)
www.linkedin.com/in/nunocarneiro
1
17/04/2018
1. Data is here to stay
2. There is abundant learning material online
3. There are many types of data scientists
4. It usually looks like: Understand - Data prep - Analyze - Deploy/ Recommend
5. Data prep is 80% of the work
6. Start with the end
7. Projects with quick feedback time are easier
8. Be thorough with your analysis
9. Build insights and provide recommendations
10. Do data science for good
+ Extra: Description of a case study
Agenda
About me
10 tips
from a young data scientist
1. Data is here to stay
The amount of available data is growing exponentially. Making use of it will
be crucial for any successful company.
2. There is abundant learning material online
Learn Python:
● Code Academy
Learn Machine Learning:
● Coursera ML Course
● Coursera Deep Learning Course
● Caltech ML course
Learn Python + data science:
● Dataquest
Work on real projects/ compete with
others:
● Kaggle
● Numerai
● DrivenData
For anything else: Google + Stackoverflow
3. There are many types of data scientists
● Classification
● Recommender Systems
● Time-series analysis
● Regression
● Forecasting
● NLP
● ...
Data Science fields
● Business analyst
● Data engineer
● Quant
● Consultant
● ...
Backgrounds:
● Statistics
● Business
● Software Engineering
● ….
Data Science profiles
Data scientists perform very different tasks according to the problem they are trying
to solve. Being a data scientist can mean travelling the world as a consultant or being in
a research office the whole day.
4. It usually looks like this:
Understand Data prep. Analyze
Recommend
or
Deploy
When you are presenting recommendations or deploying a model, decision makers will
be looking for an intuitive explanation to your conclusions. It is very important to help
them make sense of it by explaining your methodology.
5. Data preparation is 80% of the work
Data preparation is a non-linear process that takes the most time in any
data science project. Don’t underestimate the effort it takes.
Extract data Transfer files
Combine
files
Understand
variables
Create new
variables
Treat errors Define target
Document
data
treatment
Select usable
data
Generate
datasets for
analysis
6. Start with the end
The first step in any data science project is to understand how the outcome will be used.
● What are you trying to learn?/ predict?
● How is the outcome going to be used?
● Which performance measures will be used as success criteria?
● How will the outcome have an impact on the business/ on people’s lives? For example,
you can optimize the prevention of churning customers, but this can drain the business
from all its customers.
For example, in classification exercises, Target definition is one the first and the most important steps. It
requires a good understanding of all the questions mentioned above.
When you start any data science project, the first task should be to
understand the business and how the project outcome will be used.
7. Projects with quick feedback time are easier
Target
definition
Prediction
ActionFeedback
If your project has fast feedback cycles, it will be easier to get an
advantage out of machine learning.
● Example of long feedback cycle: credit default prediction
(30 year mortgages…);
● Example of quick feedback cycle: daily sales forecasting.
Besides, while under development, your project should also get
feedback from external stakeholders as fast as possible:
● Iterate fast;
● Interact with your client (external or internal) at every
step of the way.
8. Be thorough with your analysis
A small mistake in the code can easily lead to wrong
conclusions.
Most often, a small mistake in thinking, like a wrongly
defined target or inclusion of future information, will be the
main source of errors.
If you are doing something that few people understand, your most important
currency is trust. Be very thorough with your analysis to prevent mistakes which
could make you lose that trust.
9. Build insights and provide recommendations
Many data scientists get lost in analysis and fail to draw conclusions from their work.
It is very rare to find a data scientist who combines business understanding with analytical
skills and domain over the data science tools (coding, ML, etc.).
So, what?
“So, what?” - Always ask yourself what is the conclusion of your work. If you do statistical
tests, plot data in charts, or analyze prediction backtest results, you should always aim at
two things: 1) Building insights; 2) Providing recommendations.
10. Do data science for good
Use your skills for Good. Even if you have good intentions, watch out for
unintended bad side effects of your work.
PS: Check out the platform mentioned on Tip 2: DrivenData. Their slogan speaks for itself: “Data science competitions to save the world.”
Be careful when creating tools which will not only
predict but also influence future events.
Sometimes, small decisions we take when coding
can have a big impact in other people’s lives.
Extra: Case study
Case Study: Credit Card Fraud Detection in E-Commerce
http://dx.doi.org/10.1016/j.dss.2017.01.002
Thank you
Nuno Carneiro (nc@nunocarneiro.com)
www.linkedin.com/in/nunocarneiro

More Related Content

What's hot

Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDriven
BigDataExpo
 
Anatomy of a data science project
Anatomy of a data science projectAnatomy of a data science project
Anatomy of a data science project
Adam Sroka
 
Cause and effect analysis
Cause and effect analysisCause and effect analysis
Cause and effect analysis
Amit Shrivastava
 
Claudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science OnlineClaudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science Online
sfdatascience
 
The Three Body Problem of Data Science
The Three Body Problem of Data ScienceThe Three Body Problem of Data Science
The Three Body Problem of Data Science
Dima Karamshuk
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
Domino Data Lab
 
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
Jheronimus Academy of Data Science
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
OrateTeam
 
Andreas weigend
Andreas weigendAndreas weigend
Andreas weigend
BigDataExpo
 
Analytics Lessons Learnt
Analytics Lessons Learnt Analytics Lessons Learnt
Analytics Lessons Learnt
Venkata Pingali
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
Domino Data Lab
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
Καρολίνα Κάτι
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Edureka!
 
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
Jheronimus Academy of Data Science
 
Cause and effect diagram
Cause and effect diagramCause and effect diagram
Cause and effect diagram
COEPD HR
 
Bayesian End-Member Mixing Model
Bayesian End-Member Mixing ModelBayesian End-Member Mixing Model
Bayesian End-Member Mixing Model
Andreas Scheidegger
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his mac
Rising Media, Inc.
 
Cause and effect diagram
Cause and effect diagramCause and effect diagram
Cause and effect diagramLizzette Danan
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
Domino Data Lab
 

What's hot (19)

Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDriven
 
Anatomy of a data science project
Anatomy of a data science projectAnatomy of a data science project
Anatomy of a data science project
 
Cause and effect analysis
Cause and effect analysisCause and effect analysis
Cause and effect analysis
 
Claudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science OnlineClaudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science Online
 
The Three Body Problem of Data Science
The Three Body Problem of Data ScienceThe Three Body Problem of Data Science
The Three Body Problem of Data Science
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
 
Andreas weigend
Andreas weigendAndreas weigend
Andreas weigend
 
Analytics Lessons Learnt
Analytics Lessons Learnt Analytics Lessons Learnt
Analytics Lessons Learnt
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
 
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
 
Cause and effect diagram
Cause and effect diagramCause and effect diagram
Cause and effect diagram
 
Bayesian End-Member Mixing Model
Bayesian End-Member Mixing ModelBayesian End-Member Mixing Model
Bayesian End-Member Mixing Model
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his mac
 
Cause and effect diagram
Cause and effect diagramCause and effect diagram
Cause and effect diagram
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 

Similar to 10 Tips From A Young Data Scientist

data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
sunnypatil1778
 
BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015Fiona Lew
 
Succeed in AI projects
Succeed in AI projectsSucceed in AI projects
Succeed in AI projects
Subhendu Dey
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
mustaq4
 
Learning Data Science from Scratch!
Learning Data Science from Scratch!Learning Data Science from Scratch!
Learning Data Science from Scratch!
Learnbay Datascience
 
6 steps to start your artificial intelligence project
6 steps to start your artificial intelligence project6 steps to start your artificial intelligence project
6 steps to start your artificial intelligence project
Tropos.io
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
Dylan
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
Mandar Parikh
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
Rohit Dubey
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
prateek kumar
 
Applied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptApplied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptJonathan Sedar
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
Juuso Parkkinen
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
Himanshu Bari
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
Aakashdata
 
Investing in ai driven startups
Investing in ai driven startupsInvesting in ai driven startups
Investing in ai driven startups
Roy Lowrance
 
what is data science
 what is data science what is data science
what is data science
Crampete
 
Challenges of Executing AI
Challenges of Executing AIChallenges of Executing AI
Challenges of Executing AI
Dr. Umesh Rao.Hodeghatta
 
Data Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdfData Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdf
Sujata Gupta
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
RupaliKute3
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedback
Peculium Crypto
 

Similar to 10 Tips From A Young Data Scientist (20)

data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015
 
Succeed in AI projects
Succeed in AI projectsSucceed in AI projects
Succeed in AI projects
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Learning Data Science from Scratch!
Learning Data Science from Scratch!Learning Data Science from Scratch!
Learning Data Science from Scratch!
 
6 steps to start your artificial intelligence project
6 steps to start your artificial intelligence project6 steps to start your artificial intelligence project
6 steps to start your artificial intelligence project
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Applied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptApplied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science Dept
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
 
Investing in ai driven startups
Investing in ai driven startupsInvesting in ai driven startups
Investing in ai driven startups
 
what is data science
 what is data science what is data science
what is data science
 
Challenges of Executing AI
Challenges of Executing AIChallenges of Executing AI
Challenges of Executing AI
 
Data Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdfData Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdf
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedback
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

10 Tips From A Young Data Scientist

  • 1. 10 tips from a young data scientist Nuno Carneiro (nc@nunocarneiro.com) www.linkedin.com/in/nunocarneiro 1 17/04/2018
  • 2. 1. Data is here to stay 2. There is abundant learning material online 3. There are many types of data scientists 4. It usually looks like: Understand - Data prep - Analyze - Deploy/ Recommend 5. Data prep is 80% of the work 6. Start with the end 7. Projects with quick feedback time are easier 8. Be thorough with your analysis 9. Build insights and provide recommendations 10. Do data science for good + Extra: Description of a case study Agenda
  • 4. 10 tips from a young data scientist
  • 5. 1. Data is here to stay The amount of available data is growing exponentially. Making use of it will be crucial for any successful company.
  • 6. 2. There is abundant learning material online Learn Python: ● Code Academy Learn Machine Learning: ● Coursera ML Course ● Coursera Deep Learning Course ● Caltech ML course Learn Python + data science: ● Dataquest Work on real projects/ compete with others: ● Kaggle ● Numerai ● DrivenData For anything else: Google + Stackoverflow
  • 7. 3. There are many types of data scientists ● Classification ● Recommender Systems ● Time-series analysis ● Regression ● Forecasting ● NLP ● ... Data Science fields ● Business analyst ● Data engineer ● Quant ● Consultant ● ... Backgrounds: ● Statistics ● Business ● Software Engineering ● …. Data Science profiles Data scientists perform very different tasks according to the problem they are trying to solve. Being a data scientist can mean travelling the world as a consultant or being in a research office the whole day.
  • 8. 4. It usually looks like this: Understand Data prep. Analyze Recommend or Deploy When you are presenting recommendations or deploying a model, decision makers will be looking for an intuitive explanation to your conclusions. It is very important to help them make sense of it by explaining your methodology.
  • 9. 5. Data preparation is 80% of the work Data preparation is a non-linear process that takes the most time in any data science project. Don’t underestimate the effort it takes. Extract data Transfer files Combine files Understand variables Create new variables Treat errors Define target Document data treatment Select usable data Generate datasets for analysis
  • 10. 6. Start with the end The first step in any data science project is to understand how the outcome will be used. ● What are you trying to learn?/ predict? ● How is the outcome going to be used? ● Which performance measures will be used as success criteria? ● How will the outcome have an impact on the business/ on people’s lives? For example, you can optimize the prevention of churning customers, but this can drain the business from all its customers. For example, in classification exercises, Target definition is one the first and the most important steps. It requires a good understanding of all the questions mentioned above. When you start any data science project, the first task should be to understand the business and how the project outcome will be used.
  • 11. 7. Projects with quick feedback time are easier Target definition Prediction ActionFeedback If your project has fast feedback cycles, it will be easier to get an advantage out of machine learning. ● Example of long feedback cycle: credit default prediction (30 year mortgages…); ● Example of quick feedback cycle: daily sales forecasting. Besides, while under development, your project should also get feedback from external stakeholders as fast as possible: ● Iterate fast; ● Interact with your client (external or internal) at every step of the way.
  • 12. 8. Be thorough with your analysis A small mistake in the code can easily lead to wrong conclusions. Most often, a small mistake in thinking, like a wrongly defined target or inclusion of future information, will be the main source of errors. If you are doing something that few people understand, your most important currency is trust. Be very thorough with your analysis to prevent mistakes which could make you lose that trust.
  • 13. 9. Build insights and provide recommendations Many data scientists get lost in analysis and fail to draw conclusions from their work. It is very rare to find a data scientist who combines business understanding with analytical skills and domain over the data science tools (coding, ML, etc.). So, what? “So, what?” - Always ask yourself what is the conclusion of your work. If you do statistical tests, plot data in charts, or analyze prediction backtest results, you should always aim at two things: 1) Building insights; 2) Providing recommendations.
  • 14. 10. Do data science for good Use your skills for Good. Even if you have good intentions, watch out for unintended bad side effects of your work. PS: Check out the platform mentioned on Tip 2: DrivenData. Their slogan speaks for itself: “Data science competitions to save the world.” Be careful when creating tools which will not only predict but also influence future events. Sometimes, small decisions we take when coding can have a big impact in other people’s lives.
  • 16. Case Study: Credit Card Fraud Detection in E-Commerce http://dx.doi.org/10.1016/j.dss.2017.01.002
  • 17. Thank you Nuno Carneiro (nc@nunocarneiro.com) www.linkedin.com/in/nunocarneiro