SlideShare a Scribd company logo
1 of 29
“I've come up with a set of rules that describe our
reactions to technologies:
1. Anything that is in the world when you’re born is
normal and ordinary and is just a natural part of the
way the world works.
2. Anything that's invented between when you’re fifteen
and thirty-five is new and exciting and revolutionary
and you can probably get a career in it.
3. Anything invented after you're thirty-five is against
the natural order of things.”
― Douglas Adams, The Salmon of Doubt
Should you hire a data
science team?
Bertil Hatt
Head of Data science
RentalCars.com
Data science at
RentalCars.com
Bertil Hatt …Nick Burgoyne
Raise your hand if
your service has:
• Meaningful automated decision or recommendation
• Running in production without individual human control
• Self-learning and updating on a schedule
Keep your hand up
if your own team is:
• Dedicated to building models, not reports
• Has three full time employees or more
• Including at least two full-time modellers
Should you hire
a data scientist?
What do to before you do
This talk is not about any
particular experience
Contents
• Hiring is cool but not always appropriate
• How to tell if it is the right time to grow
• If we have time: Examples of data not ready
The hype is real
And it is a good thing
Image: (c) Olga Tarkovskiy.
The hype is blurring the lines
And it is less a great
What do we mean
by ‘data scientist’
Analyst: uses data to answer
ad hoc questions
uses data streams to build alerts,
automated reports & dashboards
Statistician: build statistical models
build self-updating models
used to automate decisions
ML researcher: imagines new types of models
Trophy data scientist
Expensive to hire, happy to churn
That is the real problem
Are you ready?
A simple 12-point score
The Joel test
Joel Splosky’s test for
code-based projects:
• 12 Yes/No
• Easy to tell
• Easy to start
https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/
12-point test for readiness to
implement data science
• Are your strategy, effort and product team
structure making sense? Are they aligned?
• Is your data easy to get at and up to date?
Are your analysts building the right datasets?
• Would a model find a place to sit in?
Are your product &
goals clearly defined?
Is the company goal broken down into team metrics in a
mutually exclusive, completely exhaustive (MECE) way?
Can a new team member describe the current effort
& its impact on metrics at the end of their first week?
Are the metrics known by everyone, i.e. reported, audited
& challenged? Do developers other teams key metrics?
…
Is your data easy to get?
Do you have an up-to-date metrics dictionary with a
detailed explanation of each number, incl. edge case?
Are the reference analytical tables audited daily?
Totals add up with production, trends make sense, etc.
Are simple data requests answered within an hour? i.e.
Can a question that fit in two lines fit on a single query
…
Extract Transform Load
Production
databaseRead
replica
Analytics
database
More
services
Aggregated tables:
- Pageviews to sessions
- Customer first & last
- Daily metrics (all)
Denormalised individual tables:
- Per item with all information
- UTC, local time, DoW, ∂
- One per concept: order,
campaign, action per day.
Normalised tables:
references, events
Inferences
Machine learning
training datasets
Reports
Automated audits & alerts
Is your data up-to-date?
Do you use version control on the ETL?
Do you monitor your analysts queries for bad patterns?
Talk about improvements at weekly improvement review?
Is there more than three days between a product release
& subsequent ETL update (tested, reviewed, pushed)?
Is the team handling the ETL aware of the product
schedule, including re-factorisation & service split?
…
ML in production
Production
server (JS)
Client
ML server
(Python)Features
Prediction
Trained
model
Train, test
& copy
Production db
ML in production
Production
server (JS)
Client
ML server
(Python)Features
Prediction
Production db
recommendation_placeholder.py
import os
import pymssql
def get_smart_recommendation(input):
analytics_server = os.environ['ANALYTICS_URL']
conn = pymssql.connect(analytics_server).cursor()
cursor = conn.cursor()
cursor.execute('SELECT TOP 5 product_id FROM sales;')
five_suggested_product_list = []
for row in cursor:
five_suggested_product_list.ammend(row)
return five_suggested_product_list
Placeholder
naive model
Connection
naive model
Are you logging your
(placeholder) model?
Do you have estimates (from past A/B tests) of
how model quality impact your metrics?
Do you log the user input, suggestion, user action & model
version? Are those logs processed & audited?
Do you have a placeholder machine learning environment
with a training server fed from the analytics pipeline?
Questions?
1. Overall goal split MECE in team KPI?
2. Can newbies describe current impact?
3. Are the metrics known by everyone?
4. Up-to-date metrics dictionary?
5. Are analytical tables audited daily?
6. Are queries answered in an hour?
7. Version control on ETL?
8. Monitor analysts queries & feedback?
9. Less 3 d. from launch to ETL update?
10. Estimate impact of model quality?
11. Do you log input, & user action?
12. Placeholder ML environment?
• Hiring is cool but not
always appropriate How to
tell if it is the right time to
grow
• Product & goals clear?
• Data clear & updated?
• Model structure ready?
• Examples of product & data
not ready
• Growth & fake users
• Customer service actions
• Local dates & D1 retention
• Hiring is cool but not always appropriate
• How to tell if it is the right time to grow
• Examples of product & data not ready
Four cases where
lack of clarity
lead to bad models
Real cases of real models
gone wrong in production
Growth accounting
• Breakdown Active Users into influx and outcome
• Late identification of fake account
• Predicting user retention:
• fake accounts are more active
• and more likely to be gone
Customer service
• All agent-based operations need to be listed and logged
• Without an exhaustive list, user state might change
• To recommend a common response, you’d miss a step
Clear hierarchy of types
for recommendation
• Player/user actions, partners need a meaningful structure
• Classify food for recommendation:
• Restaurant chain: name or financial data?
• Food type: is Fusion a type of Asian?
• How to cold-start new restaurant, new customer?
Dates (local time zones)
• User daily rhythm is heavily dependent on local time
• Store timestamps UTC but store action time of day
• D1 Retention difference Canada vs. New-Zealand

More Related Content

What's hot

20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project ManagersTze-Yiu Yong
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesPeter Varhol
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportPeter Skomoroch
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
 
Artificial Intelligence for Medicine
Artificial Intelligence for MedicineArtificial Intelligence for Medicine
Artificial Intelligence for MedicineTassilo Klein
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectEugene Mandel
 
AI Orange Belt - Session 3
AI Orange Belt - Session 3AI Orange Belt - Session 3
AI Orange Belt - Session 3AI Black Belt
 
Techniques for Keeping Distributed Retrospectives Effective and Fun
Techniques for Keeping Distributed Retrospectives Effective and FunTechniques for Keeping Distributed Retrospectives Effective and Fun
Techniques for Keeping Distributed Retrospectives Effective and FunExcella
 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackPeter Skomoroch
 
Operational analytics overview
Operational analytics overviewOperational analytics overview
Operational analytics overviewpallavi pentapati
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
From Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valueFrom Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valuePeadar Coyle
 
Prototyping and Product Development for Startups
Prototyping and Product Development for StartupsPrototyping and Product Development for Startups
Prototyping and Product Development for StartupsAlbert Y. C. Chen
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveBigML, Inc
 
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...Matt Stubbs
 
AI Orange Belt - Session 1
AI Orange Belt - Session 1AI Orange Belt - Session 1
AI Orange Belt - Session 1AI Black Belt
 
How to make product decisions?
How to make product decisions?How to make product decisions?
How to make product decisions?Nitin T Bhat
 

What's hot (20)

20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project Managers
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product Manager
 
Artificial Intelligence for Medicine
Artificial Intelligence for MedicineArtificial Intelligence for Medicine
Artificial Intelligence for Medicine
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science Project
 
AI Orange Belt - Session 3
AI Orange Belt - Session 3AI Orange Belt - Session 3
AI Orange Belt - Session 3
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Techniques for Keeping Distributed Retrospectives Effective and Fun
Techniques for Keeping Distributed Retrospectives Effective and FunTechniques for Keeping Distributed Retrospectives Effective and Fun
Techniques for Keeping Distributed Retrospectives Effective and Fun
 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev Stack
 
Operational analytics overview
Operational analytics overviewOperational analytics overview
Operational analytics overview
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
From Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valueFrom Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into value
 
Agile Pushback
Agile PushbackAgile Pushback
Agile Pushback
 
Prototyping and Product Development for Startups
Prototyping and Product Development for StartupsPrototyping and Product Development for Startups
Prototyping and Product Development for Startups
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business Perspective
 
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...
 
AI Orange Belt - Session 1
AI Orange Belt - Session 1AI Orange Belt - Session 1
AI Orange Belt - Session 1
 
How to make product decisions?
How to make product decisions?How to make product decisions?
How to make product decisions?
 
William "RED" Davidson Presentation
William "RED" Davidson Presentation William "RED" Davidson Presentation
William "RED" Davidson Presentation
 

Similar to Are you ready for Data science? A 12 point test

Operationalizing Machine Learning
Operationalizing Machine LearningOperationalizing Machine Learning
Operationalizing Machine LearningAgileThought
 
Blitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village StageBlitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village StageGreylock Partners
 
Pin the tail on the metric v00 75 min version
Pin the tail on the metric v00 75 min versionPin the tail on the metric v00 75 min version
Pin the tail on the metric v00 75 min versionSteven Martin
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys HolovatyiDataScienceConferenc1
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
3 Optimisation Decks : WAW Copenhagen - 27 Feb 2013
3 Optimisation Decks : WAW Copenhagen - 27 Feb 20133 Optimisation Decks : WAW Copenhagen - 27 Feb 2013
3 Optimisation Decks : WAW Copenhagen - 27 Feb 2013Craig Sullivan
 
Deploying a data centric approach to enterprise agility
Deploying a data centric approach to enterprise agilityDeploying a data centric approach to enterprise agility
Deploying a data centric approach to enterprise agilityComparative Agility
 
Pin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octPin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octSteven Martin
 
Tableau Conference 2014 Presentation
Tableau Conference 2014 PresentationTableau Conference 2014 Presentation
Tableau Conference 2014 Presentationkrystalstjulien
 
People Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive ChangePeople Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive ChangeAmin Astaneh
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsProduct School
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Julia Grosman
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AIPeter Skomoroch
 
Agile methods and dw mha
Agile methods and dw mhaAgile methods and dw mha
Agile methods and dw mhaAgileDenver
 
Agile metrics at-pmi bangalore
Agile metrics at-pmi bangaloreAgile metrics at-pmi bangalore
Agile metrics at-pmi bangaloreBimlesh Gundurao
 
Integrating AI - Business Applications
Integrating AI - Business ApplicationsIntegrating AI - Business Applications
Integrating AI - Business ApplicationsHal Kalechofsky
 
Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Edmund FOng
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
Building a successful data organization nov 2018
Building a successful data organization   nov 2018Building a successful data organization   nov 2018
Building a successful data organization nov 2018Alejandro Cantarero
 

Similar to Are you ready for Data science? A 12 point test (20)

Operationalizing Machine Learning
Operationalizing Machine LearningOperationalizing Machine Learning
Operationalizing Machine Learning
 
Blitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village StageBlitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village Stage
 
Pin the tail on the metric v00 75 min version
Pin the tail on the metric v00 75 min versionPin the tail on the metric v00 75 min version
Pin the tail on the metric v00 75 min version
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
3 Optimisation Decks : WAW Copenhagen - 27 Feb 2013
3 Optimisation Decks : WAW Copenhagen - 27 Feb 20133 Optimisation Decks : WAW Copenhagen - 27 Feb 2013
3 Optimisation Decks : WAW Copenhagen - 27 Feb 2013
 
Deploying a data centric approach to enterprise agility
Deploying a data centric approach to enterprise agilityDeploying a data centric approach to enterprise agility
Deploying a data centric approach to enterprise agility
 
Pin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octPin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 oct
 
Tableau Conference 2014 Presentation
Tableau Conference 2014 PresentationTableau Conference 2014 Presentation
Tableau Conference 2014 Presentation
 
People Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive ChangePeople Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive Change
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
Agile methods and dw mha
Agile methods and dw mhaAgile methods and dw mha
Agile methods and dw mha
 
Agile metrics at-pmi bangalore
Agile metrics at-pmi bangaloreAgile metrics at-pmi bangalore
Agile metrics at-pmi bangalore
 
Integrating AI - Business Applications
Integrating AI - Business ApplicationsIntegrating AI - Business Applications
Integrating AI - Business Applications
 
Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
Building a successful data organization nov 2018
Building a successful data organization   nov 2018Building a successful data organization   nov 2018
Building a successful data organization nov 2018
 

More from Bertil Hatt

Five finger audit
Five finger auditFive finger audit
Five finger auditBertil Hatt
 
Prediction machines
Prediction machinesPrediction machines
Prediction machinesBertil Hatt
 
Garbage in, garbage out
Garbage in, garbage outGarbage in, garbage out
Garbage in, garbage outBertil Hatt
 
MancML Growth accounting
MancML Growth accountingMancML Growth accounting
MancML Growth accountingBertil Hatt
 
What to do to get started with AI
What to do to get started with AIWhat to do to get started with AI
What to do to get started with AIBertil Hatt
 

More from Bertil Hatt (6)

Five finger audit
Five finger auditFive finger audit
Five finger audit
 
AlexNet
AlexNetAlexNet
AlexNet
 
Prediction machines
Prediction machinesPrediction machines
Prediction machines
 
Garbage in, garbage out
Garbage in, garbage outGarbage in, garbage out
Garbage in, garbage out
 
MancML Growth accounting
MancML Growth accountingMancML Growth accounting
MancML Growth accounting
 
What to do to get started with AI
What to do to get started with AIWhat to do to get started with AI
What to do to get started with AI
 

Recently uploaded

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Are you ready for Data science? A 12 point test

  • 1. “I've come up with a set of rules that describe our reactions to technologies: 1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works. 2. Anything that's invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it. 3. Anything invented after you're thirty-five is against the natural order of things.” ― Douglas Adams, The Salmon of Doubt
  • 2. Should you hire a data science team? Bertil Hatt Head of Data science RentalCars.com
  • 4. Raise your hand if your service has: • Meaningful automated decision or recommendation • Running in production without individual human control • Self-learning and updating on a schedule
  • 5. Keep your hand up if your own team is: • Dedicated to building models, not reports • Has three full time employees or more • Including at least two full-time modellers
  • 6. Should you hire a data scientist? What do to before you do
  • 7. This talk is not about any particular experience
  • 8. Contents • Hiring is cool but not always appropriate • How to tell if it is the right time to grow • If we have time: Examples of data not ready
  • 9. The hype is real And it is a good thing Image: (c) Olga Tarkovskiy.
  • 10. The hype is blurring the lines And it is less a great
  • 11. What do we mean by ‘data scientist’ Analyst: uses data to answer ad hoc questions uses data streams to build alerts, automated reports & dashboards Statistician: build statistical models build self-updating models used to automate decisions ML researcher: imagines new types of models
  • 12. Trophy data scientist Expensive to hire, happy to churn That is the real problem
  • 13. Are you ready? A simple 12-point score
  • 14. The Joel test Joel Splosky’s test for code-based projects: • 12 Yes/No • Easy to tell • Easy to start https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/
  • 15. 12-point test for readiness to implement data science • Are your strategy, effort and product team structure making sense? Are they aligned? • Is your data easy to get at and up to date? Are your analysts building the right datasets? • Would a model find a place to sit in?
  • 16. Are your product & goals clearly defined? Is the company goal broken down into team metrics in a mutually exclusive, completely exhaustive (MECE) way? Can a new team member describe the current effort & its impact on metrics at the end of their first week? Are the metrics known by everyone, i.e. reported, audited & challenged? Do developers other teams key metrics? …
  • 17. Is your data easy to get? Do you have an up-to-date metrics dictionary with a detailed explanation of each number, incl. edge case? Are the reference analytical tables audited daily? Totals add up with production, trends make sense, etc. Are simple data requests answered within an hour? i.e. Can a question that fit in two lines fit on a single query …
  • 18. Extract Transform Load Production databaseRead replica Analytics database More services Aggregated tables: - Pageviews to sessions - Customer first & last - Daily metrics (all) Denormalised individual tables: - Per item with all information - UTC, local time, DoW, ∂ - One per concept: order, campaign, action per day. Normalised tables: references, events Inferences Machine learning training datasets Reports Automated audits & alerts
  • 19. Is your data up-to-date? Do you use version control on the ETL? Do you monitor your analysts queries for bad patterns? Talk about improvements at weekly improvement review? Is there more than three days between a product release & subsequent ETL update (tested, reviewed, pushed)? Is the team handling the ETL aware of the product schedule, including re-factorisation & service split? …
  • 20. ML in production Production server (JS) Client ML server (Python)Features Prediction Trained model Train, test & copy Production db
  • 21. ML in production Production server (JS) Client ML server (Python)Features Prediction Production db recommendation_placeholder.py import os import pymssql def get_smart_recommendation(input): analytics_server = os.environ['ANALYTICS_URL'] conn = pymssql.connect(analytics_server).cursor() cursor = conn.cursor() cursor.execute('SELECT TOP 5 product_id FROM sales;') five_suggested_product_list = [] for row in cursor: five_suggested_product_list.ammend(row) return five_suggested_product_list Placeholder naive model Connection naive model
  • 22. Are you logging your (placeholder) model? Do you have estimates (from past A/B tests) of how model quality impact your metrics? Do you log the user input, suggestion, user action & model version? Are those logs processed & audited? Do you have a placeholder machine learning environment with a training server fed from the analytics pipeline?
  • 23. Questions? 1. Overall goal split MECE in team KPI? 2. Can newbies describe current impact? 3. Are the metrics known by everyone? 4. Up-to-date metrics dictionary? 5. Are analytical tables audited daily? 6. Are queries answered in an hour? 7. Version control on ETL? 8. Monitor analysts queries & feedback? 9. Less 3 d. from launch to ETL update? 10. Estimate impact of model quality? 11. Do you log input, & user action? 12. Placeholder ML environment? • Hiring is cool but not always appropriate How to tell if it is the right time to grow • Product & goals clear? • Data clear & updated? • Model structure ready? • Examples of product & data not ready • Growth & fake users • Customer service actions • Local dates & D1 retention
  • 24. • Hiring is cool but not always appropriate • How to tell if it is the right time to grow • Examples of product & data not ready
  • 25. Four cases where lack of clarity lead to bad models Real cases of real models gone wrong in production
  • 26. Growth accounting • Breakdown Active Users into influx and outcome • Late identification of fake account • Predicting user retention: • fake accounts are more active • and more likely to be gone
  • 27. Customer service • All agent-based operations need to be listed and logged • Without an exhaustive list, user state might change • To recommend a common response, you’d miss a step
  • 28. Clear hierarchy of types for recommendation • Player/user actions, partners need a meaningful structure • Classify food for recommendation: • Restaurant chain: name or financial data? • Food type: is Fusion a type of Asian? • How to cold-start new restaurant, new customer?
  • 29. Dates (local time zones) • User daily rhythm is heavily dependent on local time • Store timestamps UTC but store action time of day • D1 Retention difference Canada vs. New-Zealand

Editor's Notes

  1. Sending emails, granting rebate but not to everyone New user can receive something without human intervention Discover new rules
  2. Building reports is close to data science, but not same profile Three: significant independent effort Two modellers: do full time
  3. Illustrative Don’t focus on where we are
  4. Job roles less clear Not really a problem: Novice can learn Experts like the technicality
  5. Before we go to explain the problem, I need to clarify
  6. No clear objectives No access to data No implementation path No resources to fix any of that
  7. Joel Splosky is a though leader in how to make great software There were complicated ways of estimating quality of environment He wanted to simplify it to a short check list Each can be set up by individual decisions; some expensive but none head-scratcher Few are dependent on each other
  8. How much do you score on Joel’s test too? Were you surprised by the score?
  9. Do you give space for challenges to metrics? None of those are about data science: Because those will be defined later, when you have one DS Happy to do those too
  10. Ask those to the most junior analyst Managers often too far to know the real struggle
  11. Ddi steal some quite obviously from Joel Monitor for joins, new tables, querying production table Pretext to teach good, legible effective SQL
  12. What can most data scientist can handle: - can you host a server?
  13. With an example? Notice that no matter what the input, five most common Boiler plate that a coder can easily produce