Many companies start their big data and AI journey by hiring a team of data scientists, give them some data, and expect them to work their miracles. Although it may yield results, it is not an efficient way to use data scientists. We will explain the problems that occur, and how to adapt the context to get business value from data scientists.
- Why data science teams might fail to deliver results
- What data scientists need to be efficient
- What talent you need in addition to data scientists
6. www.mimeria.com
What we asked them What we should have asked?
Wrong data scientists?
6
When to use Jaccard
or cosine distance?
How do you implement an
LSTM with Tensorflow?
When do you
terminate an A/B test?
How to get data from a
PO using email only?
How to recover Hadoop
namenode?
How to debug AWS
"403 permission denied"?
How to get sysadmin to
open firewall from
Jupyter to MySQL?
7. www.mimeria.com
Size = effort Credits: “Hidden Technical Debt in
Colour = code complexity Machine Learning Systems”,
Google, NIPS 2015
Machine learning products
7
Configuration Data collection
Monitoring
Serving
infrastructure
Feature extraction
Process
management tools
Analysis tools
Machine
resource
management
Data
verification
ML
8. www.mimeria.com
Data science
Machine learning products
8
Configuration Data collection
Monitoring
Serving
infrastructure
Feature extraction
Process
management tools
Analysis tools
Machine
resource
management
Data
verification
ML
10. www.mimeria.com
Data science hierarchy of needs
Credits: “The data science hierarchy of needs”,
Monica Rogati
10
AI
Deep learning
A/B testing
Machine learning
Analytics
Segments
Curation
Anomaly detection
Data infrastructure
Pipelines
Instrumentation
Data collection
11. www.mimeria.com
Data science hierarchy of needs
Data science
Credits: “The data science hierarchy of needs”,
Monica Rogati
11
AI
Deep learning
A/B testing
Machine learning
Analytics
Segments
Curation
Anomaly detection
Data infrastructure
Pipelines
Instrumentation
Data collection
13. www.mimeria.com
AI first
● Might work once or twice
● Not a sustainable strategy
13
AI
Deep learning
A/B testing
Machine learning
14. www.mimeria.com
AI first
● Might work once or twice
● Not a sustainable strategy
● Machine learning is difficult
14
AI
Deep learning
A/B testing
Machine learning
Effort
15. www.mimeria.com
AI first
● Might work once or twice
● Not a sustainable strategy
● Machine learning is difficult
● Low return of investment
15
AI
Deep learning
A/B testing
Machine learning
Value Effort
16. www.mimeria.com
AI last
● Lots of hanging fruit
○ Push notifications
○ Simple recommendations
○ Risk & forecasting
○ Reporting
○ Product insights
○ Data-driven product development
○ Anomaly detection
○ ...
● High return of investment
● Media attention != business value
16
Analytics
Segments
Curation
Anomaly detection
Data infrastructure
Pipelines
Instrumentation
Data collection
Value Effort
17. www.mimeria.com
How do we make best use of data scientists?
17
● They need
○ Supporting roles
○ Continuous access to fresh data
○ Feedback from validation, monitoring, ...
● But where, how, from whom?
?
18. www.mimeria.com
Data engineering
Domain expertise
What do we need?
18
Configuration Data collection
Monitoring
Serving
infrastructure
Feature extraction
Process
management tools
Analysis tools
Machine
resource
management
Data
verification
ML
Product management
QA
19. www.mimeria.com
Data engineering
Data science
Frontend
Domain expertise
What do we want?
19
Configuration Data collection
Monitoring
Serving
infrastructure
Feature extraction
Process
management tools
Analysis tools
Machine
resource
management
Data
verification
ML
DevOps /
DataOps
QA
Product management
20. www.mimeria.com
Data engineering
Domain expertise
Most data-driven products
20
Configuration Data collection
Monitoring
Serving
infrastructure
Feature extraction
Process
management tools
Analysis tools
Machine
resource
management
Data
verification
Product management
22. www.mimeria.com
Service-oriented architectures
● Data lives with services
● Heterogeneous coupling
22
Service Service Service
App App App
Poll
Aggregate
logs
NFS
Hourly dump
Data
warehouse
ETL
Queue
Queue
NFS
scp
DB
HTTP
DB DBDB
33. www.mimeria.com
What to do with my data scientists?
● Get them out into production
● Pair them with
○ Data engineers
○ Domain experts
○ Product owners
● Invest in processing capabilities
33
34. www.mimeria.com
Key takeaways
● Machine learning is a team sport
● Solid data processing is necessary
● Learning happens in production
Lars Albertsson, founder of Mimeria
Data-value-as-a-service - tailored data platforms & data pipelines
34
35. www.mimeria.com
Key takeaways
● Machine learning is a team sport
● Solid data processing is necessary
● Learning happens in production
Lars Albertsson, founder of Mimeria
Data-value-as-a-service - tailored data platforms & data pipelines
35