Personalization Journey: From Single Node to Cloud Streaming

Personalization Journey
From single node to
Cloud Streaming

Agenda
Stefanos Doltsinis
Machine Learning Architect
Kostas Andrikopoulos
Big Data Architect

About
▪ Kaizen is a top GameTech company in Greece and one of the fastest
growing in Europe.
▪ At Kaizen we use the technology to offer the best possible product
and services to those who trust us for their entertainment.

AIM: Offer personalized services to our customers
▪ Personalized content
▪ Personalized offers

A bit of history - initial workflow
▪ Several data sources
▪ Data Warehouse, DB’s, Files etc.
▪ Training on local workstation
▪ Model / application
deployment (docker)

Architecture Bottlenecks and Challenges
▪ Data
▪ Data availability
▪ Time traveling
▪ Noisy label / no label
▪ Features
▪ Recalculation
▪ Model
▪ Versioning
▪ Experiment tracking / logs
▪ Dedicated VMs
▪ Scalability
▪ Application dockerization
▪ Model versioning
ApplicationMachine learning

Journey Log: Day 210
▪ Databricks & Azure
▪ Real-time Data flows
▪ Feature creation
▪ Model predictions
▪ Batch Data flows
▪ Model training
▪ ETL
▪ MLflow
▪ Experiment Tracking
▪ Model registry
▪ Delta Lake
▪ Single Source of Truth
▪ ACID transactions
▪ Time travel

Designing Data Pipelines (What, Why) => How
▪ What, Why
▪ Input:
▪ Structured Data stored in Kafka in avro format
▪ Latency up to 10 sec
▪ Output:
▪ avro messages dispatched in Kafka
▪ directly consumed from microservices
▪ How
▪ Use structured streaming for both:
▪ feature generation
▪ model prediction
▪ Use Kafka for low latency and pipelining between
data flows
Use case 1. Pipelines with low latency

Designing Data Pipelines (What, Why) => How
▪ What, Why
▪ Input:
▪ Structured Data stored in Kafka in avro format
▪ Delta Tables
▪ Latency few minutes
▪ Output:
▪ Delta Tables
▪ PostgreSQL tables
▪ How
▪ Use structured streaming for both:
▪ feature generation
▪ model prediction
▪ Use Batch processing for feature vector generation
Use case 2. Pipelines with average latency

▪ Some numbers
▪ ~3K unique games per day
▪ ~ breaks down to markets
▪ ~300K unique events per year
▪ Our aim is to provide
▪ personalized content
▪ improve experience
▪ increase loyalty
Sportsbook Personalization

Architecture and technical overview
▪ Collaborative filtering
▪ Rating utility matrix
▪ Historical customer preferences
▪ Spark MLlib - ALS
▪ Daily trainings
▪ ~600M of transactions annually
▪ ~400K customers / ~300K unique events
▪ ~ 500M daily recommendations
▪ Dynamic content matching
▪ MAP - Top 100 : ~0.7

▪ Reward increases loyalty
▪ ~ 40% of customer support communication
▪ ~ 4.5M bonus reward assessments per year
▪ Manual and periodic assessments
▪ Real-time decision on bonus eligibility and allocation
Real Time Bonus Computation

Architecture and technical overview
▪ Feature / prediction streaming
▪ Binary Classification / MLlib
Gradient Boosting
▪ MLflow
▪ Experimental tracking
▪ Model deployment
▪ Model registry

Future steps
▪ Real-time applications
▪ Feature store and reusability
▪ Cassandra
▪ MLflow Model Serving
▪ Use Redis for key value lookup
use cases

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Personalization Journey: From Single Node to Cloud Streaming

More Related Content

What's hot

Similar to Personalization Journey: From Single Node to Cloud Streaming

More from Databricks

Recently uploaded

Personalization Journey: From Single Node to Cloud Streaming