Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Charmee Patel, Syntasa
No REST till Production –
Building and Deploying 9 Models to
Production in 3 weeks
#UnifiedDataAnal...
3
London
Washington, DC
• Offices in Washington, DC and London
• Marketing AI Platform used by large Enterprises
• Fit nat...
4
About SYNTASA
• 50+ production models
• 100s of behavioural data sources
• 100s of experimental models
• ~1B unique visi...
Why care about behavioural data?
5
• Media optimisation
• Recommendation
• Fraud detection
• Churn reduction
Company
Mobil...
~2M Visitors
~100k SKUs
Our Christmas Project
Support media buying decisions for certain product segments
6#UnifiedDataAna...
Challenges
• High volume
• Complex
• Non-stationary
• Hard to featurise
• Training requires the full data
• Reliability in...
Prediction
Prediction
User Activity & Time
8#UnifiedDataAnalytics #SparkAISummit
Prediction
Prediction
Lookback Window
Loo...
Feature Store
Features @ Visitor level
• Last 7 days
• Interaction with certain pages, products, cart
• ~400 form elements...
Experiment setup
3 datasets
• Training period Nov 2018
Split in test & train
• Additional evaluation on Dec 2018
Statistic...
Accelerating Experimentation
11
Abstract Away Design Patterns
12
Process Template
Dataset à Processes à Dataset
• aka Functors
Why Processes?
• UDFs/UDAFs not always the right fit
• Custo...
Experiments
Multiclass model X
• Severe class imbalance (<0.1%)
• Poor learning and evaluation metrics
What if we build se...
Production
• Several Models for each Product
• Ensemble predictions for each product separately
• Call REST API to push pr...
Overall App Flow
16
Campaign Results
17#UnifiedDataAnalytics #SparkAISummit
0%
50%
100%
150%
200%
250%
CTR Conver sion Rate
Performance Compar...
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Upcoming SlideShare
Loading in …5
×

of

No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 1 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 2 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 3 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 4 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 5 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 6 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 7 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 8 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 9 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 10 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 11 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 12 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 13 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 14 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 15 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 16 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 17 No REST till Production – Building and Deploying 9 Models to Production in 3 weeks Slide 18
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

No REST till Production – Building and Deploying 9 Models to Production in 3 weeks

Download to read offline

The state of the art in productionizing machine Learning models today primarily addresses building RESTful APIs. In the Digital Ecosystem, RESTful APIs are a necessary, but not sufficient, part of the complete solution for productionizing ML models. And according to recent research by the McKinsey Global Institute, applying AI in marketing and sales has the most potential value.

In the digital ecosystem, productionizing ML models at an accelerated pace becomes easy with:

Feature Store with commonly used features that is available for all data scientists
Feature Stores that distill visitor behavior is ready to use feature vectors in a semi supervised manner
Data pipeline that can support the challenging demands of the digital ecosystem to feed the Feature Store on an ongoing basis
Pipeline templates that support the challenging demands of the digital ecosystem that feed feature store, predict and distribute predictions on an ongoing basis. With these, a major electronics manufacturer was able to build and productionize a new model in 3 weeks.
The use case for the model is retargeting advertising; it analyzes the behavior of website visitors and builds customized audiences of the visitors that are most likely to purchase 9 different products. Using the model, this manufacturer was able to maintain the same level of purchases with half of the retargeting media spend -increasing the efficiency of their marketing spend by 100%.

  • Be the first to like this

No REST till Production – Building and Deploying 9 Models to Production in 3 weeks

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Charmee Patel, Syntasa No REST till Production – Building and Deploying 9 Models to Production in 3 weeks #UnifiedDataAnalytics #SparkAISummit
  3. 3. 3 London Washington, DC • Offices in Washington, DC and London • Marketing AI Platform used by large Enterprises • Fit natively in all Hadoop distros & Clouds • Customers include several household brands About SYNTASA
  4. 4. 4 About SYNTASA • 50+ production models • 100s of behavioural data sources • 100s of experimental models • ~1B unique visitors and customer activities • 30B Million events monthly • Billions of predictions served • Trillions of historical records
  5. 5. Why care about behavioural data? 5 • Media optimisation • Recommendation • Fraud detection • Churn reduction Company Mobile Web IVR email CRM Financials ERP
  6. 6. ~2M Visitors ~100k SKUs Our Christmas Project Support media buying decisions for certain product segments 6#UnifiedDataAnalytics #SparkAISummit Background • Clickstream data • ~2M visitors a day • ~100k SKUs • Products of interest – <0.1% conversion rate <0.1% conversion rate Existing Marketing activity • Building rules-based audiences • Using black-box AI models in their Martech and Adtech tools We built bespoke models using their behavioral + enterprise data
  7. 7. Challenges • High volume • Complex • Non-stationary • Hard to featurise • Training requires the full data • Reliability in productionizing model • Timely inference at scale • Models drift 7
  8. 8. Prediction Prediction User Activity & Time 8#UnifiedDataAnalytics #SparkAISummit Prediction Prediction Lookback Window Lookback Window Lookback Window Lookback Window 1 2 3 4 5 6 7 8
  9. 9. Feature Store Features @ Visitor level • Last 7 days • Interaction with certain pages, products, cart • ~400 form elements that were available in tracking • Total general activity • Features include zero and non-zero counts of fields and one-hot encoded values Initial ~1,000 features, down-weighing features based on variance resulting in ~400 features 9#UnifiedDataAnalytics #SparkAISummit
  10. 10. Experiment setup 3 datasets • Training period Nov 2018 Split in test & train • Additional evaluation on Dec 2018 Statistical Metrics • F1score due to class imbalance 10 Business Metrics • If we have a good model but what does that mean for campaign? • Campaigns need minimum sample size for A/B testing • How do we find right audience and confirm projected positive results for audience • Lift projections – Lift @ 5% – Lift @ 20%
  11. 11. Accelerating Experimentation 11
  12. 12. Abstract Away Design Patterns 12
  13. 13. Process Template Dataset à Processes à Dataset • aka Functors Why Processes? • UDFs/UDAFs not always the right fit • Custom transformers on top of Spark transform is too cumbersome • Abstracts away Spark idiosyncrasies • Allows re-use by team members of different skill levels • Battle tested and unit tested 13#UnifiedDataAnalytics #SparkAISummit
  14. 14. Experiments Multiclass model X • Severe class imbalance (<0.1%) • Poor learning and evaluation metrics What if we build several binary models? • Initial results promising Several algorithms and hyper params tested (LR, RF, GBM) 14 First best model results – Random forest • Learning (f1score) – 0.9 • Eval on test split (f1score) – 0.85 • Eval on December – 0.7!! • Lift @ 5% - 9.5x Next best model results – Logistic Regression • Learning (f1score) – 0.89 • Eval on test split (f1score) – 0.87 • Eval on December – 0.78 • Lift @ 5% – 9x
  15. 15. Production • Several Models for each Product • Ensemble predictions for each product separately • Call REST API to push predictions @ scale to Ad Networks 15#UnifiedDataAnalytics #SparkAISummit
  16. 16. Overall App Flow 16
  17. 17. Campaign Results 17#UnifiedDataAnalytics #SparkAISummit 0% 50% 100% 150% 200% 250% CTR Conver sion Rate Performance Comparison Rule-bas ed Algo-Bespoke Algo-MC 0% 20% 40% 60% 80% 100% 120% Impres sions Clic ks Conver sions Marketing Activity Share Rule-bas ed Algo-Bespoke Algo-MC
  18. 18. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

The state of the art in productionizing machine Learning models today primarily addresses building RESTful APIs. In the Digital Ecosystem, RESTful APIs are a necessary, but not sufficient, part of the complete solution for productionizing ML models. And according to recent research by the McKinsey Global Institute, applying AI in marketing and sales has the most potential value. In the digital ecosystem, productionizing ML models at an accelerated pace becomes easy with: Feature Store with commonly used features that is available for all data scientists Feature Stores that distill visitor behavior is ready to use feature vectors in a semi supervised manner Data pipeline that can support the challenging demands of the digital ecosystem to feed the Feature Store on an ongoing basis Pipeline templates that support the challenging demands of the digital ecosystem that feed feature store, predict and distribute predictions on an ongoing basis. With these, a major electronics manufacturer was able to build and productionize a new model in 3 weeks. The use case for the model is retargeting advertising; it analyzes the behavior of website visitors and builds customized audiences of the visitors that are most likely to purchase 9 different products. Using the model, this manufacturer was able to maintain the same level of purchases with half of the retargeting media spend -increasing the efficiency of their marketing spend by 100%.

Views

Total views

267

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×