Personalization Journey
From single node to
Cloud Streaming
Agenda
Stefanos Doltsinis
Machine Learning Architect
Kostas Andrikopoulos
Big Data Architect
About
▪ Kaizen is a top GameTech company in Greece and one of the fastest
growing in Europe.
▪ At Kaizen we use the technology to offer the best possible product
and services to those who trust us for their entertainment.
AIM: Offer personalized services to our customers
▪ Personalized content
▪ Personalized offers
A bit of history - initial workflow
▪ Several data sources
▪ Data Warehouse, DB’s, Files etc.
▪ Training on local workstation
▪ Model / application
deployment (docker)
Architecture Bottlenecks and Challenges
▪ Data
▪ Data availability
▪ Time traveling
▪ Noisy label / no label
▪ Features
▪ Recalculation
▪ Model
▪ Versioning
▪ Experiment tracking / logs
▪ Dedicated VMs
▪ Scalability
▪ Application dockerization
▪ Model versioning
ApplicationMachine learning
Journey Log: Day 210
▪ Databricks & Azure
▪ Real-time Data flows
▪ Feature creation
▪ Model predictions
▪ Batch Data flows
▪ Model training
▪ ETL
▪ MLflow
▪ Experiment Tracking
▪ Model registry
▪ Delta Lake
▪ Single Source of Truth
▪ ACID transactions
▪ Time travel
Designing Data Pipelines (What, Why) => How
▪ What, Why
▪ Input:
▪ Structured Data stored in Kafka in avro format
▪ Latency up to 10 sec
▪ Output:
▪ avro messages dispatched in Kafka
▪ directly consumed from microservices
▪ How
▪ Use structured streaming for both:
▪ feature generation
▪ model prediction
▪ Use Kafka for low latency and pipelining between
data flows
Use case 1. Pipelines with low latency
Designing Data Pipelines (What, Why) => How
▪ What, Why
▪ Input:
▪ Structured Data stored in Kafka in avro format
▪ Delta Tables
▪ Latency few minutes
▪ Output:
▪ Delta Tables
▪ PostgreSQL tables
▪ How
▪ Use structured streaming for both:
▪ feature generation
▪ model prediction
▪ Use Batch processing for feature vector generation
Use case 2. Pipelines with average latency
Personalization Journey
▪ Some numbers
▪ ~3K unique games per day
▪ ~ breaks down to markets
▪ ~300K unique events per year
▪ Our aim is to provide
▪ personalized content
▪ improve experience
▪ increase loyalty
Sportsbook Personalization
Architecture and technical overview
▪ Collaborative filtering
▪ Rating utility matrix
▪ Historical customer preferences
▪ Spark MLlib - ALS
▪ Daily trainings
▪ ~600M of transactions annually
▪ ~400K customers / ~300K unique events
▪ ~ 500M daily recommendations
▪ Dynamic content matching
▪ MAP - Top 100 : ~0.7
Personalization Journey
▪ Reward increases loyalty
▪ ~ 40% of customer support communication
▪ ~ 4.5M bonus reward assessments per year
▪ Manual and periodic assessments
▪ Real-time decision on bonus eligibility and allocation
Real Time Bonus Computation
Architecture and technical overview
▪ Feature / prediction streaming
▪ Binary Classification / MLlib
Gradient Boosting
▪ MLflow
▪ Experimental tracking
▪ Model deployment
▪ Model registry
Future steps
▪ Real-time applications
▪ Feature store and reusability
▪ Cassandra
▪ MLflow Model Serving
▪ Use Redis for key value lookup
use cases
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Personalization Journey: From Single Node to Cloud Streaming

  • 1.
    Personalization Journey From singlenode to Cloud Streaming
  • 2.
    Agenda Stefanos Doltsinis Machine LearningArchitect Kostas Andrikopoulos Big Data Architect
  • 3.
    About ▪ Kaizen isa top GameTech company in Greece and one of the fastest growing in Europe. ▪ At Kaizen we use the technology to offer the best possible product and services to those who trust us for their entertainment.
  • 4.
    AIM: Offer personalizedservices to our customers ▪ Personalized content ▪ Personalized offers
  • 5.
    A bit ofhistory - initial workflow ▪ Several data sources ▪ Data Warehouse, DB’s, Files etc. ▪ Training on local workstation ▪ Model / application deployment (docker)
  • 6.
    Architecture Bottlenecks andChallenges ▪ Data ▪ Data availability ▪ Time traveling ▪ Noisy label / no label ▪ Features ▪ Recalculation ▪ Model ▪ Versioning ▪ Experiment tracking / logs ▪ Dedicated VMs ▪ Scalability ▪ Application dockerization ▪ Model versioning ApplicationMachine learning
  • 7.
    Journey Log: Day210 ▪ Databricks & Azure ▪ Real-time Data flows ▪ Feature creation ▪ Model predictions ▪ Batch Data flows ▪ Model training ▪ ETL ▪ MLflow ▪ Experiment Tracking ▪ Model registry ▪ Delta Lake ▪ Single Source of Truth ▪ ACID transactions ▪ Time travel
  • 8.
    Designing Data Pipelines(What, Why) => How ▪ What, Why ▪ Input: ▪ Structured Data stored in Kafka in avro format ▪ Latency up to 10 sec ▪ Output: ▪ avro messages dispatched in Kafka ▪ directly consumed from microservices ▪ How ▪ Use structured streaming for both: ▪ feature generation ▪ model prediction ▪ Use Kafka for low latency and pipelining between data flows Use case 1. Pipelines with low latency
  • 9.
    Designing Data Pipelines(What, Why) => How ▪ What, Why ▪ Input: ▪ Structured Data stored in Kafka in avro format ▪ Delta Tables ▪ Latency few minutes ▪ Output: ▪ Delta Tables ▪ PostgreSQL tables ▪ How ▪ Use structured streaming for both: ▪ feature generation ▪ model prediction ▪ Use Batch processing for feature vector generation Use case 2. Pipelines with average latency
  • 10.
    Personalization Journey ▪ Somenumbers ▪ ~3K unique games per day ▪ ~ breaks down to markets ▪ ~300K unique events per year ▪ Our aim is to provide ▪ personalized content ▪ improve experience ▪ increase loyalty Sportsbook Personalization
  • 11.
    Architecture and technicaloverview ▪ Collaborative filtering ▪ Rating utility matrix ▪ Historical customer preferences ▪ Spark MLlib - ALS ▪ Daily trainings ▪ ~600M of transactions annually ▪ ~400K customers / ~300K unique events ▪ ~ 500M daily recommendations ▪ Dynamic content matching ▪ MAP - Top 100 : ~0.7
  • 12.
    Personalization Journey ▪ Rewardincreases loyalty ▪ ~ 40% of customer support communication ▪ ~ 4.5M bonus reward assessments per year ▪ Manual and periodic assessments ▪ Real-time decision on bonus eligibility and allocation Real Time Bonus Computation
  • 13.
    Architecture and technicaloverview ▪ Feature / prediction streaming ▪ Binary Classification / MLlib Gradient Boosting ▪ MLflow ▪ Experimental tracking ▪ Model deployment ▪ Model registry
  • 14.
    Future steps ▪ Real-timeapplications ▪ Feature store and reusability ▪ Cassandra ▪ MLflow Model Serving ▪ Use Redis for key value lookup use cases
  • 15.
    Feedback Your feedback isimportant to us. Don’t forget to rate and review the sessions.