[DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf

•

0 likes•6 views

DataScienceConferenc1

Data & Analytics

Detecting fraud
in sport
| Domen Požrl, Stefano Gemin
23.11.2023

2
Agenda
Sportradar’s key datasets
The Fraud Detection System journey
The Model and the Challenges
From Day After to Real Time
The Inference Streaming Pipeline
04
05

3
Sportradar’skeydatasets
MTS BOOKMAKER
BET
- Bookmaker
- Channel (online, retail)
- Bet status (accepted, rejected, …)
- Stake
- Profit / Loss
- Sport event(s)
- Market(s) / outcome(s)
- Odds
- User info
COMPUTER
VISION
LIVESCORE
CRAWLERS
DATA
JOURNALISTS
LIVE DATA
- Event details (status, time, score)
- Timeline (ex. Shots, fouls, goals)
- Tracking data (players, ball, …)
ODDS CRAWLERS
ODDS CHANGE
- Bookmaker
- Sport event
- Event details (status, time, score)
- Market
- Odds
600+ bookmakers
crawled
~ 25 billions odds
movements per year
200+ bookmakers
with bet integration
with Sportradar
Managed Trading
Services
~ 5 billions bets per year
ANALYTICS PLATFORM
900,000+ sport events per year covered live
9,500+ data journalists globally
70+ sports
DATA LAKE
-Model training
- Batch inferencing
- Performance
monitoring
KAFKA
-Real time inferencing
INTEGRITY ANALYSTS
Over 9,000 matches
escalated since 2009
SPORT EVENT ANALYSIS
- Hotlisting data
- Escalation data

4
TheFraudDetectionSystemjourney
Alerting system
- SME driven
- Looking at anomalous odds
movements or values
- 125,000+ alerts per year
- Less than 4% of them
suspicious after manual review
Alert scoring model
- Binary classification model (on alerts)
- Features: Alert data + match properties
- 70% of alerts automatically classified
as not suspicious
Match Fixing Detection model – day
after
- Binary classification model (on
matches)
- Batch inference on matches of
previous day
- Features: alert scores + betting data
+ match properties
Match Fixing Detection model –
real time
- Features: subset of previous
model (not possible to compute
some features in real time)
- Inference time: starting one day
prior to kick off, until match is over
- In 2023 up to Q4 814 escalated
matches were initially flagged by
the model (160 of them uniquely)
2018 2022
2020

5
TheModelandtheChallenges
Binary classification model
Maximising the AUC on validation
set over multiple iterations
Model explainability
Extremely Unbalanced Dataset
9,325 escalations out of 4,522,387
Target Variable: Hotlisted
Weighted by: Escalated
Seasonality of sport events Random split on 3 years of
historical data
Keeping track with evolving
match-fixing trends
&
Unexpected event outcomes
Human in the loop
&
Continuous retraining
Crucial to provide evidence of
fixing
Feature contribution at inference
time

6
FromDayAftertoRealTime
Current time: 45'
Potential payout: 9,000 €
Est. probability: 1%
Expected Profit: 9€
Current time: 60'
Potential payout : 9,000 €
Est. probability: 25%
Expected Profit: -2,175€
Current time: End of match
Potential payout : 9,000 €
Est. probability: 100%
Expected Profit: -9,000€
45' 60' End
• A single bet placed at 45’
• Odds: 90
• Stake: 100€
• Outcome: Green Team wins
• Changes in the probability for the
outcome to occur

7
TheInferenceStreamingPipeline
• Processes up to 2000 messages per second
• Outputs around 3 predictions per second
• Score, time, alert-based predictions
• Feature construction
• Consumes 3 Kafka topics and outputs to 1
• Modelling resources fetched from S3 on
startup
• Aggregated features periodically retrieved
from Redshift
• Outputs to S3 via NiFi, with Athena on top
for analytics

Similar to [DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf

Decentralized Sports Media PlatformTanya McTavish

Sportz Interactive Product Portfolioshaileshg

Financial Races PresentationIntegrated IT Solutions

Slot Yield Presentation 2015richlehman

ccs 066790 Algorithm Investement Report알고리즘 기업분석 컨설팅-알기컨,algikeon

PgConf_2016_EU.pptxNikitaShaburov

Casino online system arplcAdrian Ponce de Leon

Financial Racesepicwebaz

WSO2Con USA 2015: Patterns for Deploying Analytics in the Real WorldWSO2

Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...SlideTeam

datasolution 263800 Algorithm Investment Report알고리즘 기업분석 컨설팅-알기컨,algikeon

Pycricbuzz - a python library to fetch live cricket scoresShivam Mitra

ArcLight Tournament SystemJason Kaehler

[DSC Europe 23] Stefano Gemin & Lidija Jovanovska - Leveraging Language Model...DataScienceConferenc1

www-businesswire-comFederico Winer

Intelligent Systems for Process MiningFaculty of Computer Science - Free University of Bozen-Bolzano

Evenbet Gaming DFS Platformstardmitry

Simply Business and Snowplow - Multichannel Attribution AnalysisStewart Duncan

Bet strikerz updatedAli Al-Enzi

M|18 Analytics in the Real World, Case Studies and Use CasesMariaDB plc

Similar to [DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf (20)

Decentralized Sports Media Platform

Sportz Interactive Product Portfolio

Financial Races Presentation

Slot Yield Presentation 2015

ccs 066790 Algorithm Investement Report

PgConf_2016_EU.pptx

Casino online system arplc

Financial Races

WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World

Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...

datasolution 263800 Algorithm Investment Report

Pycricbuzz - a python library to fetch live cricket scores

ArcLight Tournament System

[DSC Europe 23] Stefano Gemin & Lidija Jovanovska - Leveraging Language Model...

www-businesswire-com

Intelligent Systems for Process Mining

Evenbet Gaming DFS Platform

Simply Business and Snowplow - Multichannel Attribution Analysis

Bet strikerz updated

M|18 Analytics in the Real World, Case Studies and Use Cases

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

ASML's Taxonomy Adventure by Daniel Cantervoginip

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

Call Girls in Saket 99530🔝 56974 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

B2 Creative Industry Response Evaluation.docxStephen266013

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

RA-11058_IRR-COMPRESS Do 198 series of 1998

04242024_CCC TUG_Joins and Relationships

ASML's Taxonomy Adventure by Daniel Canter

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

Call Girls in Saket 99530🔝 56974 Escort Service

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

B2 Creative Industry Response Evaluation.docx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

GA4 Without Cookies [Measure Camp AMS]

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

20240419 - Measurecamp Amsterdam - SAM.pdf

[DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf

1. Detecting fraud in sport | Domen Požrl, Stefano Gemin 23.11.2023

2. 2 Agenda Sportradar’s key datasets The Fraud Detection System journey The Model and the Challenges From Day After to Real Time The Inference Streaming Pipeline 04 05

3. 3 Sportradar’skeydatasets MTS BOOKMAKER BET - Bookmaker - Channel (online, retail) - Bet status (accepted, rejected, …) - Stake - Profit / Loss - Sport event(s) - Market(s) / outcome(s) - Odds - User info COMPUTER VISION LIVESCORE CRAWLERS DATA JOURNALISTS LIVE DATA - Event details (status, time, score) - Timeline (ex. Shots, fouls, goals) - Tracking data (players, ball, …) ODDS CRAWLERS ODDS CHANGE - Bookmaker - Sport event - Event details (status, time, score) - Market - Odds 600+ bookmakers crawled ~ 25 billions odds movements per year 200+ bookmakers with bet integration with Sportradar Managed Trading Services ~ 5 billions bets per year ANALYTICS PLATFORM 900,000+ sport events per year covered live 9,500+ data journalists globally 70+ sports DATA LAKE -Model training - Batch inferencing - Performance monitoring KAFKA -Real time inferencing INTEGRITY ANALYSTS Over 9,000 matches escalated since 2009 SPORT EVENT ANALYSIS - Hotlisting data - Escalation data

4. 4 TheFraudDetectionSystemjourney Alerting system - SME driven - Looking at anomalous odds movements or values - 125,000+ alerts per year - Less than 4% of them suspicious after manual review Alert scoring model - Binary classification model (on alerts) - Features: Alert data + match properties - 70% of alerts automatically classified as not suspicious Match Fixing Detection model – day after - Binary classification model (on matches) - Batch inference on matches of previous day - Features: alert scores + betting data + match properties Match Fixing Detection model – real time - Features: subset of previous model (not possible to compute some features in real time) - Inference time: starting one day prior to kick off, until match is over - In 2023 up to Q4 814 escalated matches were initially flagged by the model (160 of them uniquely) 2018 2022 2020

5. 5 TheModelandtheChallenges Binary classification model Maximising the AUC on validation set over multiple iterations Model explainability Extremely Unbalanced Dataset 9,325 escalations out of 4,522,387 Target Variable: Hotlisted Weighted by: Escalated Seasonality of sport events Random split on 3 years of historical data Keeping track with evolving match-fixing trends & Unexpected event outcomes Human in the loop & Continuous retraining Crucial to provide evidence of fixing Feature contribution at inference time

6. 6 FromDayAftertoRealTime Current time: 45' Potential payout: 9,000 € Est. probability: 1% Expected Profit: 9€ Current time: 60' Potential payout : 9,000 € Est. probability: 25% Expected Profit: -2,175€ Current time: End of match Potential payout : 9,000 € Est. probability: 100% Expected Profit: -9,000€ 45' 60' End • A single bet placed at 45’ • Odds: 90 • Stake: 100€ • Outcome: Green Team wins • Changes in the probability for the outcome to occur

7. 7 TheInferenceStreamingPipeline • Processes up to 2000 messages per second • Outputs around 3 predictions per second • Score, time, alert-based predictions • Feature construction • Consumes 3 Kafka topics and outputs to 1 • Modelling resources fetched from S3 on startup • Aggregated features periodically retrieved from Redshift • Outputs to S3 via NiFi, with Athena on top for analytics

8. THANK YOU!

[DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf

Recommended

Recommended

More Related Content

Similar to [DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf

Similar to [DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf (20)

More from DataScienceConferenc1

More from DataScienceConferenc1 (20)

Recently uploaded

Recently uploaded (20)

[DSC Europe 23] Stefano Gemin & Domen Pozrl - Detecting Fraud in Sport.pdf