SlideShare a Scribd company logo
1 of 68
Download to read offline
What data scientists really do,
according to 50 data scientists
Hugo Bowne-Anderson
@hugobowne
Illustrations you can use, just copy/pasteA bit about me
➔ Hugo Bowne-Anderson, data scientist at DataCamp
◆ Undergrad in sciences/humanities (double math major)
◆ PhD in Pure Mathematics (UNSW, Sydney)
◆ Applied math research in cell biology (Yale University,
Max Planck Institute)
◆ Python curriculum engineer at DataCamp
◆ Host of DataFramed, the DataCamp podcast
➔ Data science at a high level
➔ The nitty gritty of data science
➔ Future data science
Today’s topics of discussion
➔ Data science at a high level
◆ Data science as hype
◆ What data scientists do (take one)
◆ The emergence of modern data science
◆ Data science and the decision function
➔ The nitty gritty of data science
➔ Future data science
Today’s topics of discussion
Where is data science in the Gartner Hype cycle?
Where is big data in the Gartner Hype cycle?
Where is data science in the Gartner Hype cycle?
➔ Data science at a high level
◆ Data science as hype
◆ What data scientists do (take one)
◆ The emergence of modern data science
◆ Data science and the decision function
➔ The nitty gritty of data science
➔ Future data science
Today’s topics of discussion
What Data Scientist Actually Do
➔ Build Venn Diagrams
➔ Have Imposter Syndrome
➔ Love Logistic Regression
Today’s data scientists do three things...
Venn Diagrams
I google image searched “Venn diagram” and look what I found:
Venn Diagrams
Venn Diagrams
Drew Conway on building data science teams
➔ “Data Science” diagram, NOT “Data
Scientist” diagram
Data science is ill-defined
➔ In terms of techniques, career trajectory, job titles
➔ “Do you think that imprecise ethics, no standards of practice,
and a lack of consistent vocabulary are not enough challenges
for us today?” -- Hilary Mason
➔ Brandon Rohrer (Facebook), in an article called “Imposter
Syndrome”:
Imposter syndrome
➔ Fernando Perez,
co-founder & co-lead of Project Jupyter
Logistic regression
➔ Simple, communicable, interpretable
➔ May not perform as well as extreme gradient
boosting or a CNN, but what are you optimizing
for?
➔ What is your (business) question?
➔ Claudia Perlich (DStillery, Two Sigma), Chris
Volinsky (AT&T) + many more...
How data scientists think
Humans are horrible at statistics: thought experiment
➔ Police have breathalyzers have a 5% false positive rate,
0% false negative rate.
➔ 1 in 1,000 drivers are driving drunk.
➔ Police stop a driver at random & the test is positive.
➔ What is the probability that the driver is actually drunk?
Michael Betancourt
We are horrible at statistics: thought experiment
➔ Imagine you do this 1,000 times.
➔ 999 people will NOT be drunk but, due to false
positives, ~50 will test positive.
➔ 1 person will be drunk and will test positive.
➔ 1 out of 51 are actually drunk!
➔ <2% of the positive test results are true positives.
➔ This is known as the base-rate fallacy.
➔ It’s a data scientist’s job to correct these heuristics and
biases within organisations AND society.
Michael Betancourt
Thinking and talking probabilistically
➔ FiveThirtyEight, highly reputable pollsters, gave Clinton a
71.4% chance of winning and Trump a 28.6% chance.
➔ I.e., Trump winning, according to this model, was more likely
than flipping two coins and getting two heads, which has a
25% chance.
Andrew Gelman
Allen Downey
Thinking and talking probabilistically
➔ Various pollsters predicted that Secretary Clinton would win
with probabilities ranging from 70% to 95%.
➔ It was common to speak of these predictions as though all
experts agreed that Clinton would win and simply had
different levels of certainty.
➔ That makes the mistake of interpreting the models as
deterministic, not probabilistic.
➔ Even if an event is only 10% likely, that’s about as likely as
flipping three coins and getting all heads, which we wouldn’t
find all too surprising.
Thinking and talking probabilistically
➔ Various pollsters predicted that Secretary Clinton would win
with probabilities ranging from 70% to 95%.
➔ It was common to speak of these predictions as though all
experts agreed that Clinton would win and simply had
different levels of certainty.
➔ That makes the mistake of interpreting the models as
deterministic, not probabilistic.
➔ Even if an event is only 10% likely, that’s about as likely as
flipping three coins and getting all heads, which we wouldn’t
find all too surprising.
➔ Data science at a high level
◆ Data science as hype
◆ What data scientists do (take one)
◆ The emergence of modern data science
◆ Data science and the decision function
➔ The nitty gritty of data science
➔ Future data science
Today’s topics of discussion
Data Science Emerged
➔ Google
➔ BuzzFeed
➔ LinkedIn
Illustrations you can use, just copy/pasteData Science in Tech
1. Lay Solid data foundation to perform robust analytics
2. Use online experiments to achieve sustainable growth
3. Build machine-learning pipelines and personalized data
products to make better business decisions
Robert Chang
Illustrations you can use, just copy/pasteMachine Learning in tech
Airbnb: Customer Lifetime Value (Robert Chang)
Illustrations you can use, just copy/pasteRogati’s AI Hierarchy of Needs
The AI Hierarchy of Needs, Monica Rogati
➔ Data science at a high level
◆ Data science as hype
◆ What data scientists do (take one)
◆ The emergence of modern data science
◆ Data science and the decision function
➔ The nitty gritty of data science
➔ Future data science
Today’s topics of discussion
Data science and the decision function
➔ Data scientists answer business questions & are one of
several inputs into the decision-making process.
➔ Renee Teate (Heliocampus, Becoming a Data Scientist):
Data science and the communication
➔ Communication skills (and data translation)
➔ ““Which skill is more important for a data scientist: the ability
to use the most sophisticated deep learning models, or the
ability to make good PowerPoint slides?” (Jonathan Nolis, DS
consultant & leader, Seattle Area)
➔ As a data scientist, when do you schedule meetings with
decision makers?
When to schedule meetings?
➔ Extraneous factors in
judicial decisions
(Danziger et al., PNAS,
2011)
Data science and the decision function
➔ But wait, there’s more:
➔ “Within each session, unrepresented prisoners usually go last and are less likely
to be granted parole than prisoners with attorneys.” -- Overlooked factors in the
analysis of parole decisions (Weinshall-Margel et al., 2011, PNAS)
➔ Be wary of the narrative fallacy.
Data science embeddings in orgs
➔ Centre of Excellence: dedicated data science team
➔ Embedded model: data scientists embedded in every team
➔ Hybrid model (e.g. Airbnb, DataCamp)
➔ Data scientist as “decision intelligence” operative
Data science embeddings in orgs
➔ Data science at a high level
➔ The nitty gritty of data science
◆ What data scientists actually do
◆ Types of data science
➔ Future data science
Today’s topics of discussion
What Data Scientist Actually Do
Today’s data scientist...
What Data Scientist Actually Do
➔ Data Collection and Cleaning
➔ Building dashboards and reports
➔ Data visualization
➔ Building models (statistical inference & machine learning)
➔ Communicate results to stakeholders
➔ Business decisions are then made!
Today’s data scientist...
What Data Scientist Actually Do
This will change...
Data Scientist Becoming More Specialized
➔ Type A data scientist
◆ Is an analyst, traditional statistician
➔ Type B data scientist
◆ Is building machine-learning models
Generalist ---> Specialist
Emily Robinson
Coda: data science consultancy
➔ “For consulting particularly, it makes sense if you are a generalist...the
skillset of understanding what the client needs and figuring out how
to get to the point where you can offer a solution.” -- Vicki Boykis
➔ Data science at a high level
➔ The nitty gritty of data science
◆ What data scientists actually do
◆ Types of data science
➔ Future data science
Today’s topics of discussion
Types Of Data Science
1. Business Intelligence
2. Machine Learning
3. Decision Science
Jonathan Nolis breaks data science into 3 components
Types Of Data Science
1.
➔ Taking data company already has
➔ Getting that data to the right people
➔ In form of dashboards, reports, emails
1. Business Intelligence (descriptive analytics)
Types Of Data Science
1.
Types Of Data Science
➔ Put models continuously into production
➔ E.g. LTV at airbnb
2. Machine Learning (predictive analytics)
Types Of Data Science
➔ Take the insight discovered from the data science work
➔ Use it to help company decision making
➔ E.g. what do you if your data science work tells you a
particular type of customer will churn?
3. Decision Making (prescriptive analytics)
➔ Data science at a high level
➔ The nitty gritty of data science
➔ Future data science
◆ What data scientists need to learn
◆ The future of data science
Today’s topics of discussion
Skills Data Scientists Should Have
➔ Deep learning
➔ AI
➔ Recurrent neural networks
Skills To Focus On
Skills Data Scientists Should Have
➔ Asking the right questions
➔ Turning business questions into data science questions (and
answers!)
➔ Learning on the fly
➔ Communicating well
➔ Explaining complex results to non-technical stakeholders
➔ Critical thinking and quantitative skills will remain in demand
Skills To Focus On
➔ Data science at a high level
➔ The nitty gritty of data science
➔ Future data science
◆ What data scientists need to learn
◆ The future of data science
Today’s topics of discussion
“Do you think that imprecise ethics, no
standards of practice, and a lack of
consistent vocabulary are not enough
challenges for us today?”
Hilary Mason
Data ethics or lack thereof
Ethics of Data Science
➔ We’re approaching a consensus that ethical standards need to come from
within data science itself, as well as from legislators, grassroots movements and
other stakeholders.
“We need to have that ethical
understanding, we need to have that
training, and we need to have
something akin to a Hippocratic Oath.”
Github Senior Machine Learning Data Scientist Omoju Miller
Ethics of Data Science
But do we need oaths, checklists and/or codes of conduct? This conversation is
just now heating up!
DrivenData has put together Deon, a command line tool, that allows you to easily
add an ethics checklist to your data science projects.
Ethics of Data Science
Part of this movement involves a re-emphasis on interpretability in models, as
opposed to black-box models.
Ethics of Data Science
➔ Used to predict recidivism rate
➔ Then used to inform parole decisions
COMPAS Recidivism Risk Score
Ethics of Data Science: Cathy O’Neil
➔ Old definition of DS: “A data savvy, quantitatively minded
coding literate problem solver.”
➔ New definition: “Data science doesn't just predict the future. It
causes the future.”
➔ Algorithmic audits, including a sensitivity analysis
➔ Ethical matrix: “rows are the stakeholders, the columns are the
concerns.”
Ethics and OSS
Ethics and OSS
“If you’re using the open source data science stack, then
you’re benefiting from others’ work given freely to you. I’d
say it makes perfect sense to give back.” -- Eric Ma
Problems with practice
The garden of forking assumptions
➔ “One assumption people make a lot is that data is objective.” -- Lukas Vermeer,
Booking.com.
The garden of forking assumptions
➔ First impressions last and humans look for patterns
➔ When data scientists perform analyses, they are also prone to confirmation bias
➔ The same data is often used for inspiration AND validation
◆ “Split your damned data” -- Cassie Kozyrkov
➔ Humans use confidence and precision as a proxy for accuracy
➔ The industry lacks consistent workflow (although see projects such as
Cookiecutter data science)
➔ The state of the industry is similar to medicine before randomized control trials.
The looming credibility crisis
“Will we even have data science in
10 years? I remember a world where
we didn’t and it wouldn’t surprise me
if the title goes the way of
‘webmaster’.”
Hilary Mason
Where is data science in the Gartner Hype cycle?
What will the slope of enlightenment look like?
“Very few companies expect only professional writers to know how to write. So
why ask only professional data scientists to understand and analyze data, at least
at a basic level?”
-- Jonathan Cornelissen, CEO, DataCamp in HBR
Thank you!
Hugo Bowne-Anderson
@hugobowne
@datacamp

More Related Content

What's hot

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Simplilearn
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AICori Faklaris
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
How to start your journey as a data scientist
How to start your journey as a data scientistHow to start your journey as a data scientist
How to start your journey as a data scientistParvaneh Shafiei
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
The Business Case for Applied Artificial Intelligence
The Business Case for Applied Artificial IntelligenceThe Business Case for Applied Artificial Intelligence
The Business Case for Applied Artificial IntelligenceOtto Werschitz, MBA
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Trends and AI in PM v2 - Mar 2023.pdf
Trends and AI in PM v2 - Mar 2023.pdfTrends and AI in PM v2 - Mar 2023.pdf
Trends and AI in PM v2 - Mar 2023.pdfdbiggins
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
SPEAK with CHATGPT 24h in US Language
SPEAK with CHATGPT 24h in US LanguageSPEAK with CHATGPT 24h in US Language
SPEAK with CHATGPT 24h in US LanguageErol GIRAUDY
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfLiming Zhu
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Edureka!
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Increase Productivity with ChatGPT
Increase Productivity with ChatGPTIncrease Productivity with ChatGPT
Increase Productivity with ChatGPTPrinceGarg95
 

What's hot (20)

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
How to start your journey as a data scientist
How to start your journey as a data scientistHow to start your journey as a data scientist
How to start your journey as a data scientist
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
The Business Case for Applied Artificial Intelligence
The Business Case for Applied Artificial IntelligenceThe Business Case for Applied Artificial Intelligence
The Business Case for Applied Artificial Intelligence
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Trends and AI in PM v2 - Mar 2023.pdf
Trends and AI in PM v2 - Mar 2023.pdfTrends and AI in PM v2 - Mar 2023.pdf
Trends and AI in PM v2 - Mar 2023.pdf
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
SPEAK with CHATGPT 24h in US Language
SPEAK with CHATGPT 24h in US LanguageSPEAK with CHATGPT 24h in US Language
SPEAK with CHATGPT 24h in US Language
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
Increase Productivity with ChatGPT
Increase Productivity with ChatGPTIncrease Productivity with ChatGPT
Increase Productivity with ChatGPT
 

Similar to What data scientists really do, according to 50 data scientists

What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century Human Capital Media
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data ScienceMatt Fornito
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Opportunities with data science
Opportunities with data scienceOpportunities with data science
Opportunities with data scienceAshiq Rahman
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with RStephen Withington
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data scienceThinkful
 
AI Driven Product Innovation
AI Driven Product InnovationAI Driven Product Innovation
AI Driven Product Innovationebelani
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19Xavier Amatriain
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!Dylan
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)Zenodia Charpy
 

Similar to What data scientists really do, according to 50 data scientists (20)

What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century
 
Tf wiads
Tf wiadsTf wiads
Tf wiads
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data Science
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Opportunities with data science
Opportunities with data scienceOpportunities with data science
Opportunities with data science
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 
Tf gsds
Tf gsdsTf gsds
Tf gsds
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 
AI Driven Product Innovation
AI Driven Product InnovationAI Driven Product Innovation
AI Driven Product Innovation
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Data Scientist
Data ScientistData Scientist
Data Scientist
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

What data scientists really do, according to 50 data scientists

  • 1. What data scientists really do, according to 50 data scientists Hugo Bowne-Anderson @hugobowne
  • 2. Illustrations you can use, just copy/pasteA bit about me ➔ Hugo Bowne-Anderson, data scientist at DataCamp ◆ Undergrad in sciences/humanities (double math major) ◆ PhD in Pure Mathematics (UNSW, Sydney) ◆ Applied math research in cell biology (Yale University, Max Planck Institute) ◆ Python curriculum engineer at DataCamp ◆ Host of DataFramed, the DataCamp podcast
  • 3. ➔ Data science at a high level ➔ The nitty gritty of data science ➔ Future data science Today’s topics of discussion
  • 4. ➔ Data science at a high level ◆ Data science as hype ◆ What data scientists do (take one) ◆ The emergence of modern data science ◆ Data science and the decision function ➔ The nitty gritty of data science ➔ Future data science Today’s topics of discussion
  • 5. Where is data science in the Gartner Hype cycle?
  • 6. Where is big data in the Gartner Hype cycle?
  • 7. Where is data science in the Gartner Hype cycle?
  • 8. ➔ Data science at a high level ◆ Data science as hype ◆ What data scientists do (take one) ◆ The emergence of modern data science ◆ Data science and the decision function ➔ The nitty gritty of data science ➔ Future data science Today’s topics of discussion
  • 9. What Data Scientist Actually Do ➔ Build Venn Diagrams ➔ Have Imposter Syndrome ➔ Love Logistic Regression Today’s data scientists do three things...
  • 10. Venn Diagrams I google image searched “Venn diagram” and look what I found:
  • 13. Drew Conway on building data science teams ➔ “Data Science” diagram, NOT “Data Scientist” diagram
  • 14. Data science is ill-defined ➔ In terms of techniques, career trajectory, job titles ➔ “Do you think that imprecise ethics, no standards of practice, and a lack of consistent vocabulary are not enough challenges for us today?” -- Hilary Mason ➔ Brandon Rohrer (Facebook), in an article called “Imposter Syndrome”:
  • 15. Imposter syndrome ➔ Fernando Perez, co-founder & co-lead of Project Jupyter
  • 16. Logistic regression ➔ Simple, communicable, interpretable ➔ May not perform as well as extreme gradient boosting or a CNN, but what are you optimizing for? ➔ What is your (business) question? ➔ Claudia Perlich (DStillery, Two Sigma), Chris Volinsky (AT&T) + many more...
  • 18. Humans are horrible at statistics: thought experiment ➔ Police have breathalyzers have a 5% false positive rate, 0% false negative rate. ➔ 1 in 1,000 drivers are driving drunk. ➔ Police stop a driver at random & the test is positive. ➔ What is the probability that the driver is actually drunk? Michael Betancourt
  • 19. We are horrible at statistics: thought experiment ➔ Imagine you do this 1,000 times. ➔ 999 people will NOT be drunk but, due to false positives, ~50 will test positive. ➔ 1 person will be drunk and will test positive. ➔ 1 out of 51 are actually drunk! ➔ <2% of the positive test results are true positives. ➔ This is known as the base-rate fallacy. ➔ It’s a data scientist’s job to correct these heuristics and biases within organisations AND society. Michael Betancourt
  • 20. Thinking and talking probabilistically ➔ FiveThirtyEight, highly reputable pollsters, gave Clinton a 71.4% chance of winning and Trump a 28.6% chance. ➔ I.e., Trump winning, according to this model, was more likely than flipping two coins and getting two heads, which has a 25% chance. Andrew Gelman Allen Downey
  • 21. Thinking and talking probabilistically ➔ Various pollsters predicted that Secretary Clinton would win with probabilities ranging from 70% to 95%. ➔ It was common to speak of these predictions as though all experts agreed that Clinton would win and simply had different levels of certainty. ➔ That makes the mistake of interpreting the models as deterministic, not probabilistic. ➔ Even if an event is only 10% likely, that’s about as likely as flipping three coins and getting all heads, which we wouldn’t find all too surprising.
  • 22. Thinking and talking probabilistically ➔ Various pollsters predicted that Secretary Clinton would win with probabilities ranging from 70% to 95%. ➔ It was common to speak of these predictions as though all experts agreed that Clinton would win and simply had different levels of certainty. ➔ That makes the mistake of interpreting the models as deterministic, not probabilistic. ➔ Even if an event is only 10% likely, that’s about as likely as flipping three coins and getting all heads, which we wouldn’t find all too surprising.
  • 23. ➔ Data science at a high level ◆ Data science as hype ◆ What data scientists do (take one) ◆ The emergence of modern data science ◆ Data science and the decision function ➔ The nitty gritty of data science ➔ Future data science Today’s topics of discussion
  • 24. Data Science Emerged ➔ Google ➔ BuzzFeed ➔ LinkedIn
  • 25. Illustrations you can use, just copy/pasteData Science in Tech 1. Lay Solid data foundation to perform robust analytics 2. Use online experiments to achieve sustainable growth 3. Build machine-learning pipelines and personalized data products to make better business decisions Robert Chang
  • 26. Illustrations you can use, just copy/pasteMachine Learning in tech Airbnb: Customer Lifetime Value (Robert Chang)
  • 27. Illustrations you can use, just copy/pasteRogati’s AI Hierarchy of Needs The AI Hierarchy of Needs, Monica Rogati
  • 28. ➔ Data science at a high level ◆ Data science as hype ◆ What data scientists do (take one) ◆ The emergence of modern data science ◆ Data science and the decision function ➔ The nitty gritty of data science ➔ Future data science Today’s topics of discussion
  • 29. Data science and the decision function ➔ Data scientists answer business questions & are one of several inputs into the decision-making process. ➔ Renee Teate (Heliocampus, Becoming a Data Scientist):
  • 30. Data science and the communication ➔ Communication skills (and data translation) ➔ ““Which skill is more important for a data scientist: the ability to use the most sophisticated deep learning models, or the ability to make good PowerPoint slides?” (Jonathan Nolis, DS consultant & leader, Seattle Area) ➔ As a data scientist, when do you schedule meetings with decision makers?
  • 31. When to schedule meetings? ➔ Extraneous factors in judicial decisions (Danziger et al., PNAS, 2011)
  • 32. Data science and the decision function ➔ But wait, there’s more: ➔ “Within each session, unrepresented prisoners usually go last and are less likely to be granted parole than prisoners with attorneys.” -- Overlooked factors in the analysis of parole decisions (Weinshall-Margel et al., 2011, PNAS) ➔ Be wary of the narrative fallacy.
  • 33. Data science embeddings in orgs ➔ Centre of Excellence: dedicated data science team ➔ Embedded model: data scientists embedded in every team ➔ Hybrid model (e.g. Airbnb, DataCamp) ➔ Data scientist as “decision intelligence” operative
  • 35. ➔ Data science at a high level ➔ The nitty gritty of data science ◆ What data scientists actually do ◆ Types of data science ➔ Future data science Today’s topics of discussion
  • 36. What Data Scientist Actually Do Today’s data scientist...
  • 37. What Data Scientist Actually Do ➔ Data Collection and Cleaning ➔ Building dashboards and reports ➔ Data visualization ➔ Building models (statistical inference & machine learning) ➔ Communicate results to stakeholders ➔ Business decisions are then made! Today’s data scientist...
  • 38. What Data Scientist Actually Do This will change...
  • 39. Data Scientist Becoming More Specialized ➔ Type A data scientist ◆ Is an analyst, traditional statistician ➔ Type B data scientist ◆ Is building machine-learning models Generalist ---> Specialist Emily Robinson
  • 40. Coda: data science consultancy ➔ “For consulting particularly, it makes sense if you are a generalist...the skillset of understanding what the client needs and figuring out how to get to the point where you can offer a solution.” -- Vicki Boykis
  • 41. ➔ Data science at a high level ➔ The nitty gritty of data science ◆ What data scientists actually do ◆ Types of data science ➔ Future data science Today’s topics of discussion
  • 42. Types Of Data Science 1. Business Intelligence 2. Machine Learning 3. Decision Science Jonathan Nolis breaks data science into 3 components
  • 43. Types Of Data Science 1. ➔ Taking data company already has ➔ Getting that data to the right people ➔ In form of dashboards, reports, emails 1. Business Intelligence (descriptive analytics)
  • 44. Types Of Data Science 1.
  • 45. Types Of Data Science ➔ Put models continuously into production ➔ E.g. LTV at airbnb 2. Machine Learning (predictive analytics)
  • 46. Types Of Data Science ➔ Take the insight discovered from the data science work ➔ Use it to help company decision making ➔ E.g. what do you if your data science work tells you a particular type of customer will churn? 3. Decision Making (prescriptive analytics)
  • 47. ➔ Data science at a high level ➔ The nitty gritty of data science ➔ Future data science ◆ What data scientists need to learn ◆ The future of data science Today’s topics of discussion
  • 48. Skills Data Scientists Should Have ➔ Deep learning ➔ AI ➔ Recurrent neural networks Skills To Focus On
  • 49. Skills Data Scientists Should Have ➔ Asking the right questions ➔ Turning business questions into data science questions (and answers!) ➔ Learning on the fly ➔ Communicating well ➔ Explaining complex results to non-technical stakeholders ➔ Critical thinking and quantitative skills will remain in demand Skills To Focus On
  • 50. ➔ Data science at a high level ➔ The nitty gritty of data science ➔ Future data science ◆ What data scientists need to learn ◆ The future of data science Today’s topics of discussion
  • 51. “Do you think that imprecise ethics, no standards of practice, and a lack of consistent vocabulary are not enough challenges for us today?” Hilary Mason
  • 52. Data ethics or lack thereof
  • 53. Ethics of Data Science ➔ We’re approaching a consensus that ethical standards need to come from within data science itself, as well as from legislators, grassroots movements and other stakeholders.
  • 54. “We need to have that ethical understanding, we need to have that training, and we need to have something akin to a Hippocratic Oath.” Github Senior Machine Learning Data Scientist Omoju Miller
  • 55. Ethics of Data Science But do we need oaths, checklists and/or codes of conduct? This conversation is just now heating up! DrivenData has put together Deon, a command line tool, that allows you to easily add an ethics checklist to your data science projects.
  • 56. Ethics of Data Science Part of this movement involves a re-emphasis on interpretability in models, as opposed to black-box models.
  • 57. Ethics of Data Science ➔ Used to predict recidivism rate ➔ Then used to inform parole decisions COMPAS Recidivism Risk Score
  • 58. Ethics of Data Science: Cathy O’Neil ➔ Old definition of DS: “A data savvy, quantitatively minded coding literate problem solver.” ➔ New definition: “Data science doesn't just predict the future. It causes the future.” ➔ Algorithmic audits, including a sensitivity analysis ➔ Ethical matrix: “rows are the stakeholders, the columns are the concerns.”
  • 60. Ethics and OSS “If you’re using the open source data science stack, then you’re benefiting from others’ work given freely to you. I’d say it makes perfect sense to give back.” -- Eric Ma
  • 62. The garden of forking assumptions ➔ “One assumption people make a lot is that data is objective.” -- Lukas Vermeer, Booking.com.
  • 63. The garden of forking assumptions ➔ First impressions last and humans look for patterns ➔ When data scientists perform analyses, they are also prone to confirmation bias ➔ The same data is often used for inspiration AND validation ◆ “Split your damned data” -- Cassie Kozyrkov ➔ Humans use confidence and precision as a proxy for accuracy ➔ The industry lacks consistent workflow (although see projects such as Cookiecutter data science) ➔ The state of the industry is similar to medicine before randomized control trials.
  • 65. “Will we even have data science in 10 years? I remember a world where we didn’t and it wouldn’t surprise me if the title goes the way of ‘webmaster’.” Hilary Mason
  • 66. Where is data science in the Gartner Hype cycle?
  • 67. What will the slope of enlightenment look like? “Very few companies expect only professional writers to know how to write. So why ask only professional data scientists to understand and analyze data, at least at a basic level?” -- Jonathan Cornelissen, CEO, DataCamp in HBR