6. Data Scientist, SW Engineer – Separate domains
Model
Building
Model
Deployment
Data Store
Content
Delivery
Data Capture
Analytics
Data Scientist realm – Exploration, Analysis, Accuracy
Software Engineer realm – Production, Delivery, Scale
7. Continuum – Broad, Deep
Deep – Data Scientist
- Good enough model
- More data, better data
- Feature selection
Broad – SW Engineering
- Basic ML, scale and automation
- Rapid experimentation, fail fast
- Continuous feedback, update
8. Continuum – Broad, Deep
Execution at scale
- Model building
- Deploying
- Pipeline, Single flow
Optimization
- ML algorithms
- Data, more, higher quality
- Features, analysis
9. How much is good enough?
Model
Building
Data Store
Data Capture
Model
Deployment
Serving Predictions,
Recommendations
AB Testing,
Experiments
Start thin, build as you go
Go beyond the prototype
Enough to support the first use case
Not the best model, much better
than random
Time box, x weeks
12. Offline, Batch Processing Flow
Applications,
Interactions
Data
Store
Api Layer –
Data capture
Other sources of data,
non transactional,
offline
Train
Run
Customer facing
applications
Api Layer – Serving
recommendations,
insights
Periodically
batch model
training
Periodically run
models
Pre-
computed
Content
Feedback
13. Real-time Processing Flow
Applications,
Interactions
Data
Store
Api Layer –
Data capture
Other sources of data,
non transactional,
offline
Train
Run
Customer facing
applications
Api Layer –
Compute on-the-
fly
Periodically
batch model
training
Model
Store, In-
memory
cache
Feedback
16. Architecture Blueprint
Applications
Api (real-time
content)
Api (delivery) Api (events) Api (feedback)
Pre-
computed
cache,
Feature Data
Event Log
RT Analytics
Signal Data
Api (analytics)
Feedback,
behavior
Transaction,
Event
Personalized
Content
In-App
Data
Personalized
Content
Train models periodically
Run
Models
Model Building
Model
Deployment
Re-run new models
periodically
Raw data or
features
24. Questions?
Please reach out for any questions, comments. I would
love to hear from you. My contact info:
email: mukul.sood@gmail.com or
LinkedIn: www.linkedin.com/in/muksudny
Thank You!
25. Acknowledgements
Following references were used for this presentation:
• Towards Data Science blog Personalization, ML
• Data Science Lifecycle - https://docs.microsoft.com/en-us/azure/machine-
learning/team-data-science-process/lifecycle
• Pivotal Greenplum, Cloud Foundry ML platform
• How can we design an Intelligent Recommendation Engine
https://uxplanet.org/how-can-we-design-an-intelligent-recommendation-
engine-b9bb1db4d050
• Qcon SanFrancisco 2015 Talk - https://www.slideshare.net/InfoQ/takes-a-
village-to-raise-a-machine-learning-model?qid=a6608306-b305-40f4-
a624-46bd852e79fb&v=&b=&from_search=16