SlideShare a Scribd company logo
Data Science on a Budget: 
Maximizing Insight and Impact 
Nicholas Arcolano, Ph.D. 
Senior Data Scientist 
@arcolano 
Photo by giuseppemilo / CC BY
A little background… 
• Spent 10 years at MIT Lincoln Laboratory 
working in ballistic missile defense and cyber 
security research 
• Areas of interest: statistics, machine learning, 
parallel computing, “big data” 
• Realized these things had been collectively 
re-branded as “data science” 
• Started calling myself a “data scientist” and 
joined a start-up 
Nicholas Arcolano – Data Science on 2 a Budget – November 2014
What does a data scientist do? 
Nicholas Arcolano – Data Science on 3 a Budget – November 2014
What does a data scientist do? 
• Something that happens at the intersection of 
statistics, machine learning, and computer science 
• Usually involves data (typically lots of it) 
• Actually, this isn’t the most critical question to be 
worrying about 
Nicholas Arcolano – Data Science on 4 a Budget – November 2014
A better question… 
• What does a data team do? 
• Basically, two things: 
1. Use data to help the rest of the company understand what our 
users are doing 
2. Help the rest of the company use this information to improve our 
product and our business 
Nicholas Arcolano – Data Science on 5 a Budget – November 2014
The Company 
• Started in 2008 
• Based in Boston 
• About 50 people 
• 4-person data team 
• 37 million users 
• 450 million fitness activities 
• 200 billion GPS points 
• 17 billion interactions and events 
Our Product 
The Data 
• RunKeeper app for GPS and manual tracking of running, walking, 
cycling, other activities 
• Long-term fitness goals, training plans, and performance insights 
• iOS, Android, web, 3rd party devices
PRODUCT SYSTEMS 
DATA 
MARKETING 
EXECUTIVE 
BUSINESS 
DEVELOPMENT 
USER 
EXPERIENCE 
QUALITY 
ASSURANCE 
• analytics and business 
intelligence 
• modeling and forecasting 
• data systems and archiving 
• user research and testing 
• data-driven features 
• data stories and 
visualizations 
7 
SUPPORT 
“DATA SCIENCE”
How can we accomplish all this, quickly and 
with a small team? 
It’s hard… but here are some steps to 
making it easier 
Nicholas Arcolano – Data Science on 8 a Budget – November 2014
Step 1: Communicate. A lot. 
Nicholas Arcolano – Data Science on 9 a Budget – November 2014
Step 1: Communicate. A lot. 
Nicholas Arcolano – Data Science on 10 a Budget – November 2014
Step 1: Communicate. A lot. 
• You have a lot to learn about the rest of the company 
– Every part of the company has its own blend of tools, systems, processes, 
environments 
– Every part has data it understands and cares about 
– Every part knows things that affect the data that you won’t see— 
user interviews, support feedback, product bugs, system failures 
• You also have a lot to teach people 
– What data we have 
– What it can—and can’t—do 
– Empower people to “think with data” 
Nicholas Arcolano – Data Science on 11 a Budget – November 2014
Step 1: Communicate. A lot. 
• Be patient—sometimes you 
have to say the same things 
many times 
• You may be the only one 
looking at certain data—if you 
see something, say something! 
Nicholas Arcolano – Data Science on 12 a Budget – November 2014
Setting expectations 
Things our data team will discover 
exciting new things things we already knew 
Anticipated impact 
of data exploration: 
Things our data team will discover 
bugs, missing data, 
and bad data 
things we already knew 
exciting new things 
Actual impact of 
data exploration: 
Nicholas Arcolano – Data Science on 13 a Budget – November 2014
Step 2: Move quickly but carefully. 
“Wisely and slow. They stumble that 
run fast.” 
– Friar Laurence, from 
Shakespeare’s Romeo and Juliet 
Nicholas Arcolano – Data Science on 14 a Budget – November 2014
Step 2: Move quickly but carefully. 
• On moving fast… 
– Data science can work well in an agile framework 
– Make assumptions, but understand them 
– Don’t be afraid to provide caveats 
• On being cautious… 
– Bad analysis is worse than no analysis 
– Make time for data QA 
– Use common sense—if it seems to good (or bad) to be true, it usually is 
Nicholas Arcolano – Data Science on 15 a Budget – November 2014
Step 3: Keep it simple. 
• Go for lots of small, quick wins 
• Learn and iterate 
• Resist the urge to show everyone 
how smart you are by doing 
something super complicated 
Nicholas Arcolano – Data Science on 16 a Budget – November 2014
Step 3: Keep it simple. 
• Do the “stupid thing” first 
– It helps build understanding 
– It helps uncover issues with the data 
– It may turn out that you’re not even solving the right problem 
– It may actually work pretty well 
• When in doubt, favor a simpler method that you understand better 
over a more complex one 
– Easier to implement 
– Easier to debug 
– Easier to explain to others 
Nicholas Arcolano – Data Science on 17 a Budget – November 2014
You don’t have to use all the data 
• Sometimes, using all the data is the right thing to do: 
SELECT COUNT(userid) FROM rk_user; 
• Sometimes, though, you can solve your problem entirely with a 
small data set 
• Benefits 
– Easier computation and data wrangling means faster results 
– “Curse of dimensionality” is a real thing 
– Mitigate bad assumptions (lack of stationarity, different product versions, 
changing environments, regional and seasonal effects, etc.) 
Nicholas Arcolano – Data Science on 18 a Budget – November 2014
Step 4: Use the right tools. 
• In any given scenario, the “right 
tool” is one of the following: 
– The tool you already know and are 
comfortable with 
– Something you don’t know but 
suspect would work really well 
– Something that doesn’t exist yet 
• It’s up to you to figure out which 
one it is 
Nicholas Arcolano – Data Science on 19 a Budget – November 2014
Languages and technologies I used 
during 10 years at my last job 
Languages and technologies I’ve used 
during 1 year at my current job 
Step 4: Use the right tools. 
• Be comfortable using a variety of tools 
• Make time to learn new ones 
• Build your own tools for repeatable 
analysis—once you know it’s worth it 
• Open source: take advantage of the hard 
work of others, but make sure you 
understand what you’re using 
• Give back 
Nicholas Arcolano – Data Science on 20 a Budget – November 2014
Step 4: Use the right tools. 
• Many of the same principles apply to your “analytical toolkit” 
• Try to learn when to stick with a well-worn approach and when 
to try something new 
• Be skeptical of the conventional wisdom 
– Just because a metric or analytical approach is common doesn’t mean it’s 
the right thing to do for your situation 
– Typical example: A/B testing 
Nicholas Arcolano – Data Science on 21 a Budget – November 2014
Hypothesis testing (“A/B testing”) 
GROUP A 
“Control” 
GROUP B 
“Treatment” 
USERS 
90% 
10% 
Standard 
flow 
Experimental 
flow 
Test 
statistic 
Nicholas Arcolano – Data Science on 22 a Budget – November 2014 
DECISION 
“reject/accept 
null hypothesis” 
# of successes, 
failures 
# of successes, 
failures 
“Null hypothesis”: treatment has no effect 
“Alternate hypothesis”: treatment has some effect
Thoughts about A/B testing 
• A/B testing is hard to do well 
– Need lots of data and good estimates of baseline rates to have a chance at significance 
– Need lots of data infrastructure to do it quickly on a large scale 
– Need to manage variables such multiple testing, changes in product and environment, 
interactions between tests, subjects 
– Need to make sure tests align with high-level vision and learning goals 
• An A/B test can help with one very specific decision, but typically will not... 
– Help you understand how multiple different factors interact 
– Predict long-term reactions (the “taste test” phenomenon)—need longitudinal study 
– Always give you the answer you want—results may be null or inconclusive 
– Tell you anything of any value whatsoever if you did it wrong 
Nicholas Arcolano – Data Science on 23 a Budget – November 2014
Thoughts about A/B testing 
Even when performed “correctly”, an A/B 
test may not tell you what you think it does
Step 5: Have faith and have fun 
• Don’t try to understand everything all at once—keep looking from multiple 
angles and trust that more understanding will come in time 
Nicholas Arcolano – Data Science on 25 a Budget – November 2014
Step 5: Have faith and have fun 
• Working data from millions of engaged users is awesome 
• Helping your company have a real impact on their lives is even 
more awesome 
• All the tools are available to do truly amazing things 
• Make sure everyone knows how much you love the data, and 
they will grow to love it too 
Nicholas Arcolano – Data Science on 26 a Budget – November 2014
Things we’re still working on 
• Synthesizing knowledge and communicating results 
• Data-driven products and features 
• Analytics and instrumentation 
• Giving back (open source, blogging, tutorials, talks) 
Nicholas Arcolano – Data Science on 27 a Budget – November 2014
Thanks for listening! Questions? 
nicholas.arcolano@runkeeper.com 
http://arcolano.com 
@arcolano 
http://www.runkeeper.com

More Related Content

What's hot

Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...
John Hudson
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
Domino Data Lab
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
Rommel Garcia
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
Suman Banerjee
 
2009 Resource Planning Summit Presentation Charles Howell
2009 Resource Planning Summit Presentation Charles Howell2009 Resource Planning Summit Presentation Charles Howell
2009 Resource Planning Summit Presentation Charles Howell
Charles Howell, PMP
 
Be Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data ScientistBe Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data Scientist
Pamela Pavliscak
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
Gregory Piatetsky-Shapiro
 
eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records
Health Informatics New Zealand
 
Agile data science
Agile data scienceAgile data science
Agile data science
Joel Horwitz
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
Big Data Spain
 
Challenges of managing Data Science Project
Challenges of managing Data Science ProjectChallenges of managing Data Science Project
Challenges of managing Data Science Project
Lamjed Ben Jabeur
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
Konpal Darakshan
 

What's hot (12)

Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
 
2009 Resource Planning Summit Presentation Charles Howell
2009 Resource Planning Summit Presentation Charles Howell2009 Resource Planning Summit Presentation Charles Howell
2009 Resource Planning Summit Presentation Charles Howell
 
Be Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data ScientistBe Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data Scientist
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
 
Challenges of managing Data Science Project
Challenges of managing Data Science ProjectChallenges of managing Data Science Project
Challenges of managing Data Science Project
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 

Similar to Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD

Research and Community Building with a Roadmap
Research and Community Building with a RoadmapResearch and Community Building with a Roadmap
Research and Community Building with a Roadmap
QuestionPro
 
Decision making
Decision makingDecision making
Decision making
Lee Schlenker
 
Managerial Decision-Making
Managerial Decision-MakingManagerial Decision-Making
Managerial Decision-Making
Lee Schlenker
 
Managerial Decision Making
Managerial Decision MakingManagerial Decision Making
Managerial Decision Making
Lee Schlenker
 
AMA Nebraska - SurveyMonkey (08-14)
AMA Nebraska  - SurveyMonkey (08-14)AMA Nebraska  - SurveyMonkey (08-14)
AMA Nebraska - SurveyMonkey (08-14)
Brent Chudoba
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics Overview
Dr Susan Entwisle
 
Baworld adapting to whats happening
Baworld adapting to whats happeningBaworld adapting to whats happening
Baworld adapting to whats happening
Dave Davis PMP, PgMP, PBA
 
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
CTLes
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
UNCResearchHub
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet
joshh12
 
An Introduction to Monitoring & Evaluation
An Introduction to Monitoring & EvaluationAn Introduction to Monitoring & Evaluation
An Introduction to Monitoring & Evaluation
Robin Beveridge
 
Program Evaluation Basics - Center for Nonprofit Success slides
Program Evaluation Basics - Center for Nonprofit Success slidesProgram Evaluation Basics - Center for Nonprofit Success slides
Program Evaluation Basics - Center for Nonprofit Success slides
Isaac Castillo
 
Lesson 2 audience and research
Lesson 2 audience and researchLesson 2 audience and research
Lesson 2 audience and research
Heath Park, Wolverhampton
 
Guerrilla (or Agile) Evaluation for Learning
Guerrilla (or Agile) Evaluation for LearningGuerrilla (or Agile) Evaluation for Learning
Guerrilla (or Agile) Evaluation for Learning
Julie Dirksen
 
Data and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentivesData and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentives
Academy of Science of South Africa (ASSAf)
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Watershed
 
Small Data Assessment and Action Research
Small Data Assessment and Action ResearchSmall Data Assessment and Action Research
Small Data Assessment and Action Research
srosenblatt
 
Data Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better BusinessData Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better Business
McKonly & Asbury, LLP
 
Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11
MattLumley
 

Similar to Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD (20)

Research and Community Building with a Roadmap
Research and Community Building with a RoadmapResearch and Community Building with a Roadmap
Research and Community Building with a Roadmap
 
Decision making
Decision makingDecision making
Decision making
 
Managerial Decision-Making
Managerial Decision-MakingManagerial Decision-Making
Managerial Decision-Making
 
Managerial Decision Making
Managerial Decision MakingManagerial Decision Making
Managerial Decision Making
 
AMA Nebraska - SurveyMonkey (08-14)
AMA Nebraska  - SurveyMonkey (08-14)AMA Nebraska  - SurveyMonkey (08-14)
AMA Nebraska - SurveyMonkey (08-14)
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics Overview
 
Baworld adapting to whats happening
Baworld adapting to whats happeningBaworld adapting to whats happening
Baworld adapting to whats happening
 
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet
 
An Introduction to Monitoring & Evaluation
An Introduction to Monitoring & EvaluationAn Introduction to Monitoring & Evaluation
An Introduction to Monitoring & Evaluation
 
Program Evaluation Basics - Center for Nonprofit Success slides
Program Evaluation Basics - Center for Nonprofit Success slidesProgram Evaluation Basics - Center for Nonprofit Success slides
Program Evaluation Basics - Center for Nonprofit Success slides
 
Lesson 2 audience and research
Lesson 2 audience and researchLesson 2 audience and research
Lesson 2 audience and research
 
Guerrilla (or Agile) Evaluation for Learning
Guerrilla (or Agile) Evaluation for LearningGuerrilla (or Agile) Evaluation for Learning
Guerrilla (or Agile) Evaluation for Learning
 
Data and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentivesData and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentives
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
 
Small Data Assessment and Action Research
Small Data Assessment and Action ResearchSmall Data Assessment and Action Research
Small Data Assessment and Action Research
 
Data Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better BusinessData Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better Business
 
Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11
 

More from freshdatabos

An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
 
Thinking in Data Workshop
Thinking in Data WorkshopThinking in Data Workshop
Thinking in Data Workshop
freshdatabos
 
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
freshdatabos
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
freshdatabos
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
freshdatabos
 
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
freshdatabos
 
Vector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhDVector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhD
freshdatabos
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
freshdatabos
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
freshdatabos
 

More from freshdatabos (9)

An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Thinking in Data Workshop
Thinking in Data WorkshopThinking in Data Workshop
Thinking in Data Workshop
 
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
 
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
 
Vector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhDVector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhD
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 

Recently uploaded

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 

Recently uploaded (20)

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 

Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD

  • 1. Data Science on a Budget: Maximizing Insight and Impact Nicholas Arcolano, Ph.D. Senior Data Scientist @arcolano Photo by giuseppemilo / CC BY
  • 2. A little background… • Spent 10 years at MIT Lincoln Laboratory working in ballistic missile defense and cyber security research • Areas of interest: statistics, machine learning, parallel computing, “big data” • Realized these things had been collectively re-branded as “data science” • Started calling myself a “data scientist” and joined a start-up Nicholas Arcolano – Data Science on 2 a Budget – November 2014
  • 3. What does a data scientist do? Nicholas Arcolano – Data Science on 3 a Budget – November 2014
  • 4. What does a data scientist do? • Something that happens at the intersection of statistics, machine learning, and computer science • Usually involves data (typically lots of it) • Actually, this isn’t the most critical question to be worrying about Nicholas Arcolano – Data Science on 4 a Budget – November 2014
  • 5. A better question… • What does a data team do? • Basically, two things: 1. Use data to help the rest of the company understand what our users are doing 2. Help the rest of the company use this information to improve our product and our business Nicholas Arcolano – Data Science on 5 a Budget – November 2014
  • 6. The Company • Started in 2008 • Based in Boston • About 50 people • 4-person data team • 37 million users • 450 million fitness activities • 200 billion GPS points • 17 billion interactions and events Our Product The Data • RunKeeper app for GPS and manual tracking of running, walking, cycling, other activities • Long-term fitness goals, training plans, and performance insights • iOS, Android, web, 3rd party devices
  • 7. PRODUCT SYSTEMS DATA MARKETING EXECUTIVE BUSINESS DEVELOPMENT USER EXPERIENCE QUALITY ASSURANCE • analytics and business intelligence • modeling and forecasting • data systems and archiving • user research and testing • data-driven features • data stories and visualizations 7 SUPPORT “DATA SCIENCE”
  • 8. How can we accomplish all this, quickly and with a small team? It’s hard… but here are some steps to making it easier Nicholas Arcolano – Data Science on 8 a Budget – November 2014
  • 9. Step 1: Communicate. A lot. Nicholas Arcolano – Data Science on 9 a Budget – November 2014
  • 10. Step 1: Communicate. A lot. Nicholas Arcolano – Data Science on 10 a Budget – November 2014
  • 11. Step 1: Communicate. A lot. • You have a lot to learn about the rest of the company – Every part of the company has its own blend of tools, systems, processes, environments – Every part has data it understands and cares about – Every part knows things that affect the data that you won’t see— user interviews, support feedback, product bugs, system failures • You also have a lot to teach people – What data we have – What it can—and can’t—do – Empower people to “think with data” Nicholas Arcolano – Data Science on 11 a Budget – November 2014
  • 12. Step 1: Communicate. A lot. • Be patient—sometimes you have to say the same things many times • You may be the only one looking at certain data—if you see something, say something! Nicholas Arcolano – Data Science on 12 a Budget – November 2014
  • 13. Setting expectations Things our data team will discover exciting new things things we already knew Anticipated impact of data exploration: Things our data team will discover bugs, missing data, and bad data things we already knew exciting new things Actual impact of data exploration: Nicholas Arcolano – Data Science on 13 a Budget – November 2014
  • 14. Step 2: Move quickly but carefully. “Wisely and slow. They stumble that run fast.” – Friar Laurence, from Shakespeare’s Romeo and Juliet Nicholas Arcolano – Data Science on 14 a Budget – November 2014
  • 15. Step 2: Move quickly but carefully. • On moving fast… – Data science can work well in an agile framework – Make assumptions, but understand them – Don’t be afraid to provide caveats • On being cautious… – Bad analysis is worse than no analysis – Make time for data QA – Use common sense—if it seems to good (or bad) to be true, it usually is Nicholas Arcolano – Data Science on 15 a Budget – November 2014
  • 16. Step 3: Keep it simple. • Go for lots of small, quick wins • Learn and iterate • Resist the urge to show everyone how smart you are by doing something super complicated Nicholas Arcolano – Data Science on 16 a Budget – November 2014
  • 17. Step 3: Keep it simple. • Do the “stupid thing” first – It helps build understanding – It helps uncover issues with the data – It may turn out that you’re not even solving the right problem – It may actually work pretty well • When in doubt, favor a simpler method that you understand better over a more complex one – Easier to implement – Easier to debug – Easier to explain to others Nicholas Arcolano – Data Science on 17 a Budget – November 2014
  • 18. You don’t have to use all the data • Sometimes, using all the data is the right thing to do: SELECT COUNT(userid) FROM rk_user; • Sometimes, though, you can solve your problem entirely with a small data set • Benefits – Easier computation and data wrangling means faster results – “Curse of dimensionality” is a real thing – Mitigate bad assumptions (lack of stationarity, different product versions, changing environments, regional and seasonal effects, etc.) Nicholas Arcolano – Data Science on 18 a Budget – November 2014
  • 19. Step 4: Use the right tools. • In any given scenario, the “right tool” is one of the following: – The tool you already know and are comfortable with – Something you don’t know but suspect would work really well – Something that doesn’t exist yet • It’s up to you to figure out which one it is Nicholas Arcolano – Data Science on 19 a Budget – November 2014
  • 20. Languages and technologies I used during 10 years at my last job Languages and technologies I’ve used during 1 year at my current job Step 4: Use the right tools. • Be comfortable using a variety of tools • Make time to learn new ones • Build your own tools for repeatable analysis—once you know it’s worth it • Open source: take advantage of the hard work of others, but make sure you understand what you’re using • Give back Nicholas Arcolano – Data Science on 20 a Budget – November 2014
  • 21. Step 4: Use the right tools. • Many of the same principles apply to your “analytical toolkit” • Try to learn when to stick with a well-worn approach and when to try something new • Be skeptical of the conventional wisdom – Just because a metric or analytical approach is common doesn’t mean it’s the right thing to do for your situation – Typical example: A/B testing Nicholas Arcolano – Data Science on 21 a Budget – November 2014
  • 22. Hypothesis testing (“A/B testing”) GROUP A “Control” GROUP B “Treatment” USERS 90% 10% Standard flow Experimental flow Test statistic Nicholas Arcolano – Data Science on 22 a Budget – November 2014 DECISION “reject/accept null hypothesis” # of successes, failures # of successes, failures “Null hypothesis”: treatment has no effect “Alternate hypothesis”: treatment has some effect
  • 23. Thoughts about A/B testing • A/B testing is hard to do well – Need lots of data and good estimates of baseline rates to have a chance at significance – Need lots of data infrastructure to do it quickly on a large scale – Need to manage variables such multiple testing, changes in product and environment, interactions between tests, subjects – Need to make sure tests align with high-level vision and learning goals • An A/B test can help with one very specific decision, but typically will not... – Help you understand how multiple different factors interact – Predict long-term reactions (the “taste test” phenomenon)—need longitudinal study – Always give you the answer you want—results may be null or inconclusive – Tell you anything of any value whatsoever if you did it wrong Nicholas Arcolano – Data Science on 23 a Budget – November 2014
  • 24. Thoughts about A/B testing Even when performed “correctly”, an A/B test may not tell you what you think it does
  • 25. Step 5: Have faith and have fun • Don’t try to understand everything all at once—keep looking from multiple angles and trust that more understanding will come in time Nicholas Arcolano – Data Science on 25 a Budget – November 2014
  • 26. Step 5: Have faith and have fun • Working data from millions of engaged users is awesome • Helping your company have a real impact on their lives is even more awesome • All the tools are available to do truly amazing things • Make sure everyone knows how much you love the data, and they will grow to love it too Nicholas Arcolano – Data Science on 26 a Budget – November 2014
  • 27. Things we’re still working on • Synthesizing knowledge and communicating results • Data-driven products and features • Analytics and instrumentation • Giving back (open source, blogging, tutorials, talks) Nicholas Arcolano – Data Science on 27 a Budget – November 2014
  • 28. Thanks for listening! Questions? nicholas.arcolano@runkeeper.com http://arcolano.com @arcolano http://www.runkeeper.com