APPLIED DATA SCIENCE
Giovanni Lanzani – Chief Science Officer
GoDataDriven
@gglanzani
WHO AM I
Italy
01
Leiden
University
02
KPMG
03
GoDataDriven
04
WHAT IS MACHINE LEARNING
LEARNING
FROM DATA
• You have some (lots) of data
• You need to generalize
BEST MODEL
• Which one would you
choose here?
• It’s about making a tradeoff
• This trade off is the most
important job of the PO
• A 100% correct answer
might not exist!!!
WHAT’S DATA SCIENCE
ULTIMATELY
• It’s about creating value from data
• Using Machine Learning,Advanced Analytics, and
visualization
WHEN YOU SAY DATA SCIENCE,
COMPANIES UNDERSTAND
• All the things big data
• Predictive modeling & Advanced Analytics
• More money
• Do all the cool things the others are doing
HOW TO GET THERE
TRADITIONAL
DATA WAREHOUSE
ARCHITECTURE
EDW
Data consumer
Web app
Dashboard /
Reporting
Traditional
Business app
AND NOW?
?
Data consumer
Web app
Dashboard /
Reporting
Traditional
Business app
API
WHAT COMPANIES
GOT
• A lot of POCs
• A lot of
screenshots/presentations/dashboards on a
laptop
• Nice stories to tell to their network, about
those screenshots and especially those
dashboards
• Headaches with data and infra even more
scattered
BUT…
• We got a data scientist working on trees, and forests
• Neural networks!
• Deep learning!!!
WHAT DO COMPANIES ACTUALLY NEED
• Put things into production
• They don’t teach that in any data
science course or MOOC (that I
know)
THE THREE
HURDLES
Credit to Jon Shave gdd.li/lavaredo
OVERSIMPLIFYING
Requirements
Data
Sources
Exploration
Modeling
Products
Feedback
Data scientist
ML
engineer
Data
engineer
Data
engineer
KAGGLE CURSE
• gdd.li/toldYouSo
• Many data scientists approach the problem at
hand with a Kaggle-like mentality: delivering the
best model in absolute terms, no matter what
the practical implications are.
• In reality it's not the best model that we
implement, but the one that combines quality
and practicality: a continuous balancing act
• Netflix competition
SOLVING
THEM
BUSINESS CASE
Business case for
• True Positives
• True Negatives
Cost of
• False Positives
• False Negatives
DATA
Data {insert something
here} should be pro
grade
SKILLS
• Participate in actually building production
quality systems OR being proficient enough in R
or python to hack together a prototype on a
very small dataset?
• Supply of the second group keeps growing while
demand is flat or shrinking
• Especially as executives get burned by “data
scientists” who don't know how to help them build
things of value
HIRING
• Companies that are not engineering driven, often have
trouble hiring good technical people
• The “IQ” test is not really representative of applied data
science
• At GoDataDriven we do a “at home, at your convenience”
assessment
• Real dataset, real business question, real product
• Models are software: treat them as such
TAKEAWAYS
• POs should know “their stuff”
• Automate all the data movements
• Hire data scientists that are good at programming (or hire machine learning
engineers)
QUESTIONS?
• We’re hiring
• Data & Machine Learning Engineers!
• career@godatadriven.com

GoDataDriven Giovanni Lanzani