Data Pipelines and Telephony Fraud Detection Using Machine Learning

•

1 like•488 views

Eugene

Engineering

Data Pipelines and Telephony Fraud
Detec5on Using Machine Learning
Presented by
Eugene Shulga Pla;orm Engineer
Elana Woldenberg Pla;orm Engineer

1.Data Pipelines
2.Fraud Detec5on
Agenda
2

Massive amount of data
4
CDRs (Call Detail Records)
Hundreds of millions
SIP messages
Billions
LRN (Local RouCng Number)
Hundreds of millions

Telnyx Recipe
• Message rouCng and reliable delivery (KaIa, RabbitMQ)
• Storage (Cassandra, Postgres)
• Real Cme aggregaCon (Spark Streaming)
• Batch and ad-hoc analysis (Spark and Notebooks)
• VisualizaCon (Kibana, Grafana)
5

Cloud Agnos5c
6
Requirements
• Cannot use cloud speciﬁc data soluCons
• Flexible enough for HA
• All the services and servers are built with Docker
• Single deployment script for any cloud with Docker, Swarm and Ansible
Challenges
• Every cloud is diﬀerent. Diﬀerent APIs, hardware proﬁles, and performance
• What about data migraCon/replicaCon?

FreeSWITCH Data Pipeline
7
Fraud Detec+on
• All the data ﬂows to
Apache KaIa
• Spark Streaming for
real Cme processing
• Cassandra and
Spark batch jobs for
hourly, daily, weekly
analysis

KaLa
9
Pros
• High throughput distributed
messaging
• AutomaCc recovery from broker
failures
• Decouples data pipelines
• Handles massive data load
• Data distribuCon and parCConing
across nodes
• Distributed log implementaCon
Cons
• Zookeeper, support/monitoring tools

Apache Spark Programming Model
• RDD (Resilient Distributed Dataset) a collecCon of objects stored in memory or
disk across the cluster
• RDDs have acCons and transformaCons
• All the transformaCons are lazy, once acCon is called Spark creates a DAG
(Directed Acyclic Graph) and submits it to Scheduler
• Task Scheduler which launches tasks via cluster manager (Spark Standalone,
Yarn, Mesos)
11

Spark Cassandra Integra5on
13
App
Spark Worker
(JVM)
Cassandra
Executor
Executor
Spark
Worker
(JVM)
Spark
Worker
(JVM)
Spark
Worker
(JVM)
Executor
Executor
Cassandra
Cassandra
Spark Master (JVM)
Node 1
Node 2
Node 3
Node N
Cassandra

Cassandra Data Modeling
14
CDR Use Cases
Internal metrics/aggregates
across all customers
Historical and real Cme
analyCcs (per user, date)
Metrics (ASR, ACD, MOU, etc.)
for customers and dashboards
Customer Insights
Access to FreeSWITCH raw
CDRs for troubleshooCng

Distributed System Challenges
Idempotency
Helps with scale, greatly simpliﬁes processing
Par++oning
Split data to handle scale and isolate failure
Consistency model
Trade oﬀ between throughput and consistency
Denormaliza+on/duplica+on
SomeCmes data redundancy is good
15

FreeSWITCH Data Pipeline
16
Fraud Detec+on

Fraud Detec5on
• How does a carrier detect
usage fraud?
• What does usage fraud
look like?
19

Steps of Fraud Detec5on
20
1. Collect the data
a. Time series
2. Process the data
a. Asynchronous
b. Scale horizontally
3. Detect anomalies
a. StaCc
b. Dynamic
4. Alert

Process the Data
How to handle huge datasets without sacriﬁcing speed or quality?
21
Golang + Worker Pools
+ Asynchronous
Telegraph + InﬂuxDB
+ Grafana
Open Source Proprietary

Detect Anomalies
StaCc
• Thresholds
Dynamic (PredicCve)
• StaCsCcs
- Mean / Standard DeviaCon
• Machine Learning
- K Means Clustering
- MulCvariate Gaussian DistribuCon
22

Q & A
Presented by
Eugene Shulga Pla;orm Engineer
Elana Woldenberg Pla;orm Engineer

Viewers also liked

FreeSWITCH as a MicroserviceEvan McGee

Generalized Virtual Networking, an enabler for Service Centric Networking and...Stefano Salsano

Fraud Detection Using A Database PlatformEZ-R Stats, LLC

Machine Learning with Spark MLlibTodd McGrath

Spark DataFrames and ML PipelinesDatabricks

Introduction to Machine Learning with Sparkdatamantra

AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)Amazon Web Services

Real-Time Fraud Detection in Payment TransactionsChristian Gügi

MLlib and Machine Learning on SparkPetr Zapletal

Credit Fraud Prevention with Spark and Graph AnalysisJen Aman

Chapter 06 - Routingphanleson

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...Spark Summit

Practical Machine Learning Pipelines with MLlibDatabricks

7 Keys to Fraud Prevention, Detection and ReportingBrown Smith Wallace

How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion

Machine Learning With SparkShivaji Dutta

Viewers also liked (16)

FreeSWITCH as a Microservice

Generalized Virtual Networking, an enabler for Service Centric Networking and...

Fraud Detection Using A Database Platform

Machine Learning with Spark MLlib

Spark DataFrames and ML Pipelines

Introduction to Machine Learning with Spark

AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)

Real-Time Fraud Detection in Payment Transactions

MLlib and Machine Learning on Spark

Credit Fraud Prevention with Spark and Graph Analysis

Chapter 06 - Routing

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...

Practical Machine Learning Pipelines with MLlib

7 Keys to Fraud Prevention, Detection and Reporting

How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...

Machine Learning With Spark

Similar to Data Pipelines and Telephony Fraud Detection Using Machine Learning

Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf

Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA

Real Time Data Processing Using Spark StreamingHari Shreedharan

Event Detection Pipelines with Apache KafkaDataWorks Summit

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit

Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson

Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent

A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit

Spark Streaming Early Warning Use Caserandom_chance

Unleashing Apache Kafka and TensorFlow in the Cloud Kai Wähner

Cloud Lambda Architecture PatternsAsis Mohanty

Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media

Huawei Advanced Data Science With Spark StreamingJen Aman

The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.

Real Time Data Processing Using Spark StreamingHari Shreedharan

Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.

Accelerating Cyber Threat Detection With GPUJoshua Patterson

Building Event Streaming Architectures on Scylla and KafkaScyllaDB

Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger

Similar to Data Pipelines and Telephony Fraud Detection Using Machine Learning (20)

Spark Streaming & Kafka-The Future of Stream Processing

Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...

Real Time Data Processing Using Spark Streaming

Event Detection Pipelines with Apache Kafka

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson

Streaming Analytics with Spark, Kafka, Cassandra and Akka

Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...

A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)

Spark Streaming Early Warning Use Case

Unleashing Apache Kafka and TensorFlow in the Cloud 

Cloud Lambda Architecture Patterns

Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...

Huawei Advanced Data Science With Spark Streaming

The Future of Hadoop: A deeper look at Apache Spark

Real Time Data Processing Using Spark Streaming

Real Time Data Processing using Spark Streaming | Data Day Texas 2015

Accelerating Cyber Threat Detection With GPU

Building Event Streaming Architectures on Scylla and Kafka

Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...

Recently uploaded

Application of Residue Theorem to evaluate real integrations.pptx959SahilShah

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Heart Disease Prediction using machine learning.pptxPoojaBan

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

microprocessor 8085 and its interfacingjaychoudhary37

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Biology for Computer Engineers Course Handout.pptxDeepakSakkari2

Current Transformer Drawing and GTP for MSETCLDeelipZope

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

Architect Hassan Khalil Portfolio for 2024hassan khalil

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Recently uploaded (20)

Application of Residue Theorem to evaluate real integrations.pptx

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Heart Disease Prediction using machine learning.pptx

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

microprocessor 8085 and its interfacing

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

Biology for Computer Engineers Course Handout.pptx

Current Transformer Drawing and GTP for MSETCL

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf

Software and Systems Engineering Standards: Verification and Validation of Sy...

Architect Hassan Khalil Portfolio for 2024

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

HARMONY IN THE HUMAN BEING - Unit-II UHV-2

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Data Pipelines and Telephony Fraud Detection Using Machine Learning

1. Data Pipelines and Telephony Fraud Detec5on Using Machine Learning Presented by Eugene Shulga Pla;orm Engineer Elana Woldenberg Pla;orm Engineer

2. 1.Data Pipelines 2.Fraud Detec5on Agenda 2

3. Data Pipelines

4. Massive amount of data 4 CDRs (Call Detail Records) Hundreds of millions SIP messages Billions LRN (Local RouCng Number) Hundreds of millions

5. Telnyx Recipe • Message rouCng and reliable delivery (KaIa, RabbitMQ) • Storage (Cassandra, Postgres) • Real Cme aggregaCon (Spark Streaming) • Batch and ad-hoc analysis (Spark and Notebooks) • VisualizaCon (Kibana, Grafana) 5

6. Cloud Agnos5c 6 Requirements • Cannot use cloud specific data soluCons • Flexible enough for HA • All the services and servers are built with Docker • Single deployment script for any cloud with Docker, Swarm and Ansible Challenges • Every cloud is different. Different APIs, hardware profiles, and performance • What about data migraCon/replicaCon?

7. FreeSWITCH Data Pipeline 7 Fraud Detec+on • All the data ﬂows to Apache KaIa • Spark Streaming for real Cme processing • Cassandra and Spark batch jobs for hourly, daily, weekly analysis

8. FreeSWITCH Data Pipeline 8

9. KaLa 9 Pros • High throughput distributed messaging • AutomaCc recovery from broker failures • Decouples data pipelines • Handles massive data load • Data distribuCon and parCConing across nodes • Distributed log implementaCon Cons • Zookeeper, support/monitoring tools

10. FreeSWITCH Data Pipeline 10

11. Apache Spark Programming Model • RDD (Resilient Distributed Dataset) a collecCon of objects stored in memory or disk across the cluster • RDDs have acCons and transformaCons • All the transformaCons are lazy, once acCon is called Spark creates a DAG (Directed Acyclic Graph) and submits it to Scheduler • Task Scheduler which launches tasks via cluster manager (Spark Standalone, Yarn, Mesos) 11

12. FreeSWITCH Data Pipeline 12

13. Spark Cassandra Integra5on 13 App Spark Worker (JVM) Cassandra Executor Executor Spark Worker (JVM) Spark Worker (JVM) Spark Worker (JVM) Executor Executor Cassandra Cassandra Spark Master (JVM) Node 1 Node 2 Node 3 Node N Cassandra

14. Cassandra Data Modeling 14 CDR Use Cases Internal metrics/aggregates across all customers Historical and real Cme analyCcs (per user, date) Metrics (ASR, ACD, MOU, etc.) for customers and dashboards Customer Insights Access to FreeSWITCH raw CDRs for troubleshooCng

15. Distributed System Challenges Idempotency Helps with scale, greatly simpliﬁes processing Par++oning Split data to handle scale and isolate failure Consistency model Trade oﬀ between throughput and consistency Denormaliza+on/duplica+on SomeCmes data redundancy is good 15

16. FreeSWITCH Data Pipeline 16 Fraud Detec+on

17. Fraud Detec5on

18. 18 What is fraud in Telecom? Hint: $$$$

19. Fraud Detec5on • How does a carrier detect usage fraud? • What does usage fraud look like? 19

20. Steps of Fraud Detec5on 20 1. Collect the data a. Time series 2. Process the data a. Asynchronous b. Scale horizontally 3. Detect anomalies a. StaCc b. Dynamic 4. Alert

21. Process the Data How to handle huge datasets without sacriﬁcing speed or quality? 21 Golang + Worker Pools + Asynchronous Telegraph + InﬂuxDB + Grafana Open Source Proprietary

22. Detect Anomalies StaCc • Thresholds Dynamic (PredicCve) • StaCsCcs - Mean / Standard DeviaCon • Machine Learning - K Means Clustering - MulCvariate Gaussian DistribuCon 22

23. Alert 23 APIMessaging layer Push Pull

24. Q & A Presented by Eugene Shulga Pla;orm Engineer Elana Woldenberg Pla;orm Engineer

Data Pipelines and Telephony Fraud Detection Using Machine Learning

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to Data Pipelines and Telephony Fraud Detection Using Machine Learning

Similar to Data Pipelines and Telephony Fraud Detection Using Machine Learning (20)

Recently uploaded

Recently uploaded (20)

Data Pipelines and Telephony Fraud Detection Using Machine Learning