2. @delahuntyondata
WHAT I’M GOING TO SPEAK ABOUT
1. A brief history of the Database and what we can learn from
its development in terms of ML.
2. Data as an enabler for ML: What does great look like?
3. Challenges and possible solutions.
3. @delahuntyondata
ASSUMPTIONS
● You know some stuff about data.
● You have at least have a working knowledge of what ML is
and what's involved.
● You probably know more than me!
9. @delahuntyondata
Approx year end share
price:
1992: $0.69
2002: $11.56
2016: $39.10
The growth in value of Oracle is a good approximation for the
growth of database technology
12. @delahuntyondata
In a world where prediction is cheap,
what does that allow us to do?
➔What existing questions can I get better answers
to?
➔What new questions can I now ask that I wasn't
able to before?
➔What new kinds of data can I analyse that would
otherwise be uneconomic?
19. @delahuntyondata
Supervised Learning needs Labelled Data
SELECT
Customer
,CASE WHEN trunc(last_active) > ‘01082018’
THEN 0
ELSE 1
END AS Is_Churn
FROM
fact_customer_activity
22. @delahuntyondata
In the world of BI, the journey ended when the data
had made its way to a reporting or visualisation tool
23. @delahuntyondata
In the world of ML, it is much more common that your
intended destination is a customer facing application
Credit: Unsplash
24. @delahuntyondata
When It Goes Wrong
BI/Reporting ML/Customer Facing
● Someone is "angry with IT" but
the customer is not directly
impacted.
● Customer experience and the
bottom line can be hit more
quickly and more adversely.
25. @delahuntyondata
How Machine Learning differs from
Business Intelligence
In order to successfully navigate the Machine Learning world:
➢ New skills, capabilities and potentially new teams need to be
invested in.
➢ The traditional BI/Data team need to be given the room to
expand, grow their knowledge and develop their capability.
28. @delahuntyondata
Data Developer Vs Data Engineer
Data Developer Data Engineer
● ETL ● Distributed computing frameworks
● Closed source tools ● Open source packages
● SQL ● R, Python, Scala, C#
● Batch ● Batch & Real time
● Relational Database ● File Storage
Reference: “Successfully Transitioning Your Team from Data Warehousing to Big Data” - Uli Bethke, Sonra
35. @delahuntyondata
Machine Learning Architecture Components
- “Nice to haves”
● Model repo
● Feature store
● Automated model quality checks
● Versioning
● Auto-deploy; CICD
● Containers
● Hyper parameter optimisation
36. @delahuntyondata
Offline Vs Online Models
Offline Model Online Model
● Preparing for the exam by
getting all the content up front
and studying it.
● Learning as each question
comes up and optimising your
exam answers over time.
43. @delahuntyondata
Keeping a Product Focus
● Prove out hypothesis to achieve customer
outcome.
● Release value. Otherwise it's an expensive
research project.
44. @delahuntyondata
Skills
● ML skills are a hot commodity.
● Develop, attract, retain.
● “One talented engineer is better than 10
average engineers”.
47. @delahuntyondata
Explainability
● Explainability can be difficult particularly
with deep neural networks and can inhibit
adoption.
● Progress has been made:
• Local Interpretable Model-Agnostic
Explanations (LIME)
• Feature importance
• SHAP
Credit: Unsplash
49. @delahuntyondata
The Future
● The impact we see will spread and evolve.
● It becomes normal.
● ML model building becomes commoditised.
● Applications will continue to be deep and narrow.
ML is currently a differentiator but only for another 5-10 years…
50. @delahuntyondata
“In the 1920s and 30s we imagined steel men walking
around factories holding hammers, and in the 1950s we
imagined humanoid robots walking around the kitchen
doing the housework. We didn't get robot servants - we
got washing machines.”
– Benedict Evans
Extremely dumb
but does one
thing really well.
Deep and Narrow
-> Think Washing Machines not Terminator
51. @delahuntyondata
Takeaways
1. Data is the bedrock of Machine Learning. Get your data in a good
state.
2. Have a clear problem to solve and have a clear set of criteria to
measure same.
3. Keep it stupidly simple, particularly at the start. There is a lot of
value from getting to an 80% accuracy.
• "Start supervised, start small, start with good data" - Danny Lange
4. Build a pipeline (people and tech stack)
5. Measure and iterate
• There will be failures, expectation management is critical to adoption.