SlideShare a Scribd company logo
1 of 21
Download to read offline
SKETCHY: A COMPLEX EVENT 
PROCESSING NETWORK 
FOR SPAM DETECTION. 
! 
Matt Weiden / SoundCloud Ltd.
! 
WHO? 
Ich heiße Matt Weiden. Freut mich. 
• Backend Engineer, SoundCloud’s Trust, Safety & Security 
Team 
• Previously Cognitive Science, BCI research 
! 
Contributors 
• Rany Keddo 
• Michael Brückner 
• Astera Schneeweisz 
• Others 
Matt Weiden / SoundCloud Ltd.
WHAT? 
INFERENCE FROM RELATED STREAMS 
OF DATA 
The problem: How quickly and efficiently can we draw 
aggregate inferences from large streams of related events? 
! 
! 
! 
! 
! 
! 
! 
! 
What inferences could we make? 
How quickly and efficiently can we make them? 
Matt Weiden / SoundCloud Ltd. 
Time 
Posts 
Views 
Follows
DRINKING FROM A FIREHOSE. 
Performing this for a whole site might take a little more thought.
WHAT (MORE SPECIFICALLY)? 
INFERENCE FROM RELATED STREAMS 
OF DATA 
! 
! 
! 
How quickly and efficiently can we draw aggregate inferences 
from large streams of related events? 
Matt Weiden / SoundCloud Ltd.
HOW? 
EVENT-DRIVEN ARCHITECTURE 
Event-Driven Architecture (EDA) 
! 
• Near realtime 
• Only process the data once* 
• Operate on 
• incremental sub-goal results 
• ‘Complex Events’ by adding ‘Context’ 
• Asynchronous, pipelined parallelism 
• Broadcast reusable events and complex events 
Matt Weiden / SoundCloud Ltd.
HOW? 
EVENT PROCESSING NETWORK 
Event Processing Networks (EPNs) implement EDA 
! 
• Represented as a directed acyclic graph of 
• Event producers 
• Event processing agents (EPAs) 
• enrich events 
• transform events into complex events 
• detect patterns 
• Event consumers 
• Event channels 
Matt Weiden / SoundCloud Ltd.
HOW? 
EVENT PROCESSING NETWORK 
Sketchy is an EPN that implements EDA 
! 
• Prevents text and social graph spam at SoundCloud 
• Open-source 
• Modular 
• written as a flexible library, adaptable 
• many common components available out of the box 
• Battle tested 
• ingests many sensitive event types at SoundCloud 
Matt Weiden / SoundCloud Ltd.
HOW? 
EVENT PROCESSING NETWORK 
Event producers introduce events into a network 
Matt Weiden / SoundCloud Ltd. 
!• 
Represented as a directed graph of 
• Event producers 
• Event channels 
• Event processing agents (EPAs) 
• enrich events 
• transform events into complex events 
• detect patterns 
• Event consumers 
Producer 
Event Channel A 
Event Channel B 
Event Channel C
HOW? 
EVENT PROCESSING NETWORK 
Event channels route events through the network 
Producer 
or EPA 2 
Consumer 
or EPA 4 
Matt Weiden / SoundCloud Ltd. 
!• 
Represented as a directed graph of 
• Event producers 
• Event channels 
• Event processing agents (EPAs) 
• enrich events 
• transform events into complex events 
• detect patterns 
• Event consumers 
Producer 
or EPA 1 
Event Channel 
Consumer 
or EPA 3
HOW? 
EVENT PROCESSING NETWORK 
Event processing agents contain business logic 
Matt Weiden / SoundCloud Ltd. 
!• 
Represented as a directed graph of 
• Event producers 
• Event channels 
• Event processing agents (EPAs) 
• enrich events 
• transform events into complex events 
• detect patterns 
• Event consumers 
DB 1 cache 
Event 
Processing 
Agent 
Event Channel A 
Event Channel B 
Event Channel A 
Event Channel B
HOW? 
EVENT PROCESSING NETWORK 
Event consumers act on processing in the network 
Matt Weiden / SoundCloud Ltd. 
!• 
Represented as a directed graph of 
• Event producers 
• Event channels 
• Event processing agents (EPAs) 
• enrich events 
• transform events into complex events 
• detect patterns 
• Event consumers 
Consumer 
Event Channel A 
Event Channel B 
Event Channel C
HOW? 
DO EPNs ACHIEVE EDA’s GOALS? 
• Asynchronous, pipelined parallelism 
! 
! 
! 
! 
Producer 
Event Channel Consumer 
! 
or EPA 1 ! 
! 
The node to node flow allows parallelism asynchronous 
computation. 
Matt Weiden / SoundCloud Ltd. 
or EPA 3
HOW? 
DO EPNs ACHIEVE EDA’s GOALS? 
• Asynchronous, pipelined parallelism 
! 
! 
! 
! 
! 
! 
! 
! 
! 
Source: http://www2.engr.arizona.edu/~ece462/Lec03-pipe/ 
Matt Weiden / SoundCloud Ltd.
HOW? 
DO EPNs ACHIEVE EDA’s GOALS? 
• Build ‘Complex Events’ by putting events into the context in 
which they occur 
! 
!!!!!!!!! 
DB 1 
Event 
Processing 
Agent 
Abstract 
example of a 
complex event 
being created. 
EVENT5 
EVENT5 
+ 
context 
E1 E2 E3 E4 
cache 
Possible by aggregating and/or summarizing with data from external sources. 
Matt Weiden / SoundCloud Ltd.
HOW? 
DO EPNs ACHIEVE EDA’s GOALS? 
• Build ‘Complex Events’ by putting events into the context 
in which they occur 
! 
! 
! 
! 
! 
! 
! 
!!! 
Stores Fingerprint Finds similar fingerprints (Jacquard distance) 
fingerprints 
In Sketchy the bulk agent stores a text fingerprint context in memcached. 
Matt Weiden / SoundCloud Ltd. 
M1 M2 M3 M4 
MSG 
4 bulkStatisticsAgent bulkDetectorAgent 
Bulk! 
Complex Event 
M1 M2 M3 M4 
memcached
HOW? 
DO EPNs ACHIEVE EDA’s GOALS? 
• Broadcast events and complex events wherever their reuse 
is possible 
! 
! 
! 
! 
! 
! 
! 
!! 
Producer 
or EPA 1 
Matt Weiden / SoundCloud Ltd. 
Consumer 
or EPA 3 
A common use case in a Sketchy network. 
Producer 
or EPA 2 
Event Channel Consumer 
or EPA 4 
The event channel 
can send messages in 
this fashion. 
messageCreateIngester 
junkStatisticsAgent junkDetectorAgent 
signalEmitterAgent 
rateLimiterAgent
SKETCHY@SOUNDCLOUD 
messageCreateIngester 
Matt Weiden / SoundCloud Ltd. 
junkStatisticsAgent junkDetectorAgent 
signalEmitterAgent 
rateLimiterAgent
MOVE SKETCHY’S LOGIC TO 
TWITTER’S STORM? 
Storm is a framework for building EPNs at scale 
STORM Sketchy’s Network 
Matt Weiden / SoundCloud Ltd. 
Components 
Language Scala Scala 
Parallelism Multiple workers on 
Multiple hosts 
Multiple workers on 
Single host 
Deployment ‘Nimbus’ & Zookeeper Bazooka 
Messaging Guarantees atLeastOnce, 
atMostOnce 
Not yet 
Hadoop Integration Yes No
LEARN MORE 
• Event Processing Networks 
• Sharon and Etzion, “Event Processing Network, A 
Conceptual Model,” VLDB, 2007 
• Sketchy 
• https://github.com/soundcloud/sketchy-core 
• Storm 
• Toshniwal et al., “Storm@Twitter,” SIGMOD, 2014 
• https://storm.incubator.apache.org 
Matt Weiden / SoundCloud Ltd.
THANK YOU. QUESTIONS? 
! 
Matt Weiden / SoundCloud Ltd. 
@mweiden, weiden@soundcloud.com

More Related Content

Similar to Data Days 2014 - Matt Weiden

Janus + NDI @ ClueCon 2021
Janus + NDI @ ClueCon 2021Janus + NDI @ ClueCon 2021
Janus + NDI @ ClueCon 2021Lorenzo Miniero
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windowsconfluent
 
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...Amazon Web Services
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
 
JFall - Process Oriented Integration
JFall - Process Oriented IntegrationJFall - Process Oriented Integration
JFall - Process Oriented IntegrationBernd Ruecker
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2
 
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, SolaceMesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, SolaceHostedbyConfluent
 
Hong Kong User Group 2019
Hong Kong User Group 2019Hong Kong User Group 2019
Hong Kong User Group 2019Solace
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices AntipatternsC4Media
 
Webinar - Big Data: Let's SMACK - Jorg Schad
Webinar - Big Data: Let's SMACK - Jorg SchadWebinar - Big Data: Let's SMACK - Jorg Schad
Webinar - Big Data: Let's SMACK - Jorg SchadCodemotion
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...Animesh Singh
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...
[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...
[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...Amazon Web Services Korea
 
Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermRandy Shoup
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
EDA With Glassfish ESB Jfall IEP Intelligent Event Processing
EDA With Glassfish ESB Jfall IEP Intelligent Event ProcessingEDA With Glassfish ESB Jfall IEP Intelligent Event Processing
EDA With Glassfish ESB Jfall IEP Intelligent Event ProcessingEugene Bogaart
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platformhadooparchbook
 
Hacking News
Hacking NewsHacking News
Hacking Newsamclean
 
10(?) holiday gifts for the SOC who has everything
10(?) holiday gifts for the SOC who has everything10(?) holiday gifts for the SOC who has everything
10(?) holiday gifts for the SOC who has everythingRyan Kovar
 

Similar to Data Days 2014 - Matt Weiden (20)

Janus + NDI @ ClueCon 2021
Janus + NDI @ ClueCon 2021Janus + NDI @ ClueCon 2021
Janus + NDI @ ClueCon 2021
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
JFall - Process Oriented Integration
JFall - Process Oriented IntegrationJFall - Process Oriented Integration
JFall - Process Oriented Integration
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
 
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, SolaceMesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
 
Hong Kong User Group 2019
Hong Kong User Group 2019Hong Kong User Group 2019
Hong Kong User Group 2019
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices Antipatterns
 
Webinar - Big Data: Let's SMACK - Jorg Schad
Webinar - Big Data: Let's SMACK - Jorg SchadWebinar - Big Data: Let's SMACK - Jorg Schad
Webinar - Big Data: Let's SMACK - Jorg Schad
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...
[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...
[AWS Media Symposium 2019] Broadcast Television in AWS - Evan Statton, AWS M&...
 
Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long Term
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
EDA With Glassfish ESB Jfall IEP Intelligent Event Processing
EDA With Glassfish ESB Jfall IEP Intelligent Event ProcessingEDA With Glassfish ESB Jfall IEP Intelligent Event Processing
EDA With Glassfish ESB Jfall IEP Intelligent Event Processing
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platform
 
Hacking News
Hacking NewsHacking News
Hacking News
 
10(?) holiday gifts for the SOC who has everything
10(?) holiday gifts for the SOC who has everything10(?) holiday gifts for the SOC who has everything
10(?) holiday gifts for the SOC who has everything
 

More from datadays

Data Days 2014 - Glickon
Data Days 2014 - GlickonData Days 2014 - Glickon
Data Days 2014 - Glickondatadays
 
Data Days 2014 - Mikio Braun
Data Days 2014 - Mikio BraunData Days 2014 - Mikio Braun
Data Days 2014 - Mikio Braundatadays
 
Data Days 2014 - akanoo
Data Days 2014 - akanooData Days 2014 - akanoo
Data Days 2014 - akanoodatadays
 
Data Days 2014 - q.Datum
Data Days 2014 - q.DatumData Days 2014 - q.Datum
Data Days 2014 - q.Datumdatadays
 
Data Days 2014 - 10000 Flies
Data Days 2014 - 10000 FliesData Days 2014 - 10000 Flies
Data Days 2014 - 10000 Fliesdatadays
 
Data Days 2014 - Measurence
Data Days 2014 - MeasurenceData Days 2014 - Measurence
Data Days 2014 - Measurencedatadays
 
Data Days 2014 - Philipp Hackländer
Data Days 2014 - Philipp HackländerData Days 2014 - Philipp Hackländer
Data Days 2014 - Philipp Hackländerdatadays
 
Data Days 2014 - Anna Cremers
Data Days 2014 - Anna CremersData Days 2014 - Anna Cremers
Data Days 2014 - Anna Cremersdatadays
 
Data Days 2014 - Benedikt Blaß
Data Days 2014 - Benedikt BlaßData Days 2014 - Benedikt Blaß
Data Days 2014 - Benedikt Blaßdatadays
 
Data Days 2014 - Benedikt Köhler
Data Days 2014 - Benedikt KöhlerData Days 2014 - Benedikt Köhler
Data Days 2014 - Benedikt Köhlerdatadays
 
Data Days 2014 - Volkmar Uhlig
Data Days 2014 - Volkmar UhligData Days 2014 - Volkmar Uhlig
Data Days 2014 - Volkmar Uhligdatadays
 
Data Days 2014 - Nina Dierks
Data Days 2014 - Nina DierksData Days 2014 - Nina Dierks
Data Days 2014 - Nina Dierksdatadays
 
Data Days 2014 - Anne Roth
Data Days 2014 - Anne RothData Days 2014 - Anne Roth
Data Days 2014 - Anne Rothdatadays
 
Data Days 2014 - Nan Zhao
Data Days 2014 - Nan ZhaoData Days 2014 - Nan Zhao
Data Days 2014 - Nan Zhaodatadays
 
Data Days 2014 - Dirk Wisselmann
Data Days 2014 - Dirk WisselmannData Days 2014 - Dirk Wisselmann
Data Days 2014 - Dirk Wisselmanndatadays
 
Data Days 2014 - Martin Vesper
Data Days 2014 - Martin VesperData Days 2014 - Martin Vesper
Data Days 2014 - Martin Vesperdatadays
 

More from datadays (16)

Data Days 2014 - Glickon
Data Days 2014 - GlickonData Days 2014 - Glickon
Data Days 2014 - Glickon
 
Data Days 2014 - Mikio Braun
Data Days 2014 - Mikio BraunData Days 2014 - Mikio Braun
Data Days 2014 - Mikio Braun
 
Data Days 2014 - akanoo
Data Days 2014 - akanooData Days 2014 - akanoo
Data Days 2014 - akanoo
 
Data Days 2014 - q.Datum
Data Days 2014 - q.DatumData Days 2014 - q.Datum
Data Days 2014 - q.Datum
 
Data Days 2014 - 10000 Flies
Data Days 2014 - 10000 FliesData Days 2014 - 10000 Flies
Data Days 2014 - 10000 Flies
 
Data Days 2014 - Measurence
Data Days 2014 - MeasurenceData Days 2014 - Measurence
Data Days 2014 - Measurence
 
Data Days 2014 - Philipp Hackländer
Data Days 2014 - Philipp HackländerData Days 2014 - Philipp Hackländer
Data Days 2014 - Philipp Hackländer
 
Data Days 2014 - Anna Cremers
Data Days 2014 - Anna CremersData Days 2014 - Anna Cremers
Data Days 2014 - Anna Cremers
 
Data Days 2014 - Benedikt Blaß
Data Days 2014 - Benedikt BlaßData Days 2014 - Benedikt Blaß
Data Days 2014 - Benedikt Blaß
 
Data Days 2014 - Benedikt Köhler
Data Days 2014 - Benedikt KöhlerData Days 2014 - Benedikt Köhler
Data Days 2014 - Benedikt Köhler
 
Data Days 2014 - Volkmar Uhlig
Data Days 2014 - Volkmar UhligData Days 2014 - Volkmar Uhlig
Data Days 2014 - Volkmar Uhlig
 
Data Days 2014 - Nina Dierks
Data Days 2014 - Nina DierksData Days 2014 - Nina Dierks
Data Days 2014 - Nina Dierks
 
Data Days 2014 - Anne Roth
Data Days 2014 - Anne RothData Days 2014 - Anne Roth
Data Days 2014 - Anne Roth
 
Data Days 2014 - Nan Zhao
Data Days 2014 - Nan ZhaoData Days 2014 - Nan Zhao
Data Days 2014 - Nan Zhao
 
Data Days 2014 - Dirk Wisselmann
Data Days 2014 - Dirk WisselmannData Days 2014 - Dirk Wisselmann
Data Days 2014 - Dirk Wisselmann
 
Data Days 2014 - Martin Vesper
Data Days 2014 - Martin VesperData Days 2014 - Martin Vesper
Data Days 2014 - Martin Vesper
 

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

Data Days 2014 - Matt Weiden

  • 1. SKETCHY: A COMPLEX EVENT PROCESSING NETWORK FOR SPAM DETECTION. ! Matt Weiden / SoundCloud Ltd.
  • 2. ! WHO? Ich heiße Matt Weiden. Freut mich. • Backend Engineer, SoundCloud’s Trust, Safety & Security Team • Previously Cognitive Science, BCI research ! Contributors • Rany Keddo • Michael Brückner • Astera Schneeweisz • Others Matt Weiden / SoundCloud Ltd.
  • 3. WHAT? INFERENCE FROM RELATED STREAMS OF DATA The problem: How quickly and efficiently can we draw aggregate inferences from large streams of related events? ! ! ! ! ! ! ! ! What inferences could we make? How quickly and efficiently can we make them? Matt Weiden / SoundCloud Ltd. Time Posts Views Follows
  • 4. DRINKING FROM A FIREHOSE. Performing this for a whole site might take a little more thought.
  • 5. WHAT (MORE SPECIFICALLY)? INFERENCE FROM RELATED STREAMS OF DATA ! ! ! How quickly and efficiently can we draw aggregate inferences from large streams of related events? Matt Weiden / SoundCloud Ltd.
  • 6. HOW? EVENT-DRIVEN ARCHITECTURE Event-Driven Architecture (EDA) ! • Near realtime • Only process the data once* • Operate on • incremental sub-goal results • ‘Complex Events’ by adding ‘Context’ • Asynchronous, pipelined parallelism • Broadcast reusable events and complex events Matt Weiden / SoundCloud Ltd.
  • 7. HOW? EVENT PROCESSING NETWORK Event Processing Networks (EPNs) implement EDA ! • Represented as a directed acyclic graph of • Event producers • Event processing agents (EPAs) • enrich events • transform events into complex events • detect patterns • Event consumers • Event channels Matt Weiden / SoundCloud Ltd.
  • 8. HOW? EVENT PROCESSING NETWORK Sketchy is an EPN that implements EDA ! • Prevents text and social graph spam at SoundCloud • Open-source • Modular • written as a flexible library, adaptable • many common components available out of the box • Battle tested • ingests many sensitive event types at SoundCloud Matt Weiden / SoundCloud Ltd.
  • 9. HOW? EVENT PROCESSING NETWORK Event producers introduce events into a network Matt Weiden / SoundCloud Ltd. !• Represented as a directed graph of • Event producers • Event channels • Event processing agents (EPAs) • enrich events • transform events into complex events • detect patterns • Event consumers Producer Event Channel A Event Channel B Event Channel C
  • 10. HOW? EVENT PROCESSING NETWORK Event channels route events through the network Producer or EPA 2 Consumer or EPA 4 Matt Weiden / SoundCloud Ltd. !• Represented as a directed graph of • Event producers • Event channels • Event processing agents (EPAs) • enrich events • transform events into complex events • detect patterns • Event consumers Producer or EPA 1 Event Channel Consumer or EPA 3
  • 11. HOW? EVENT PROCESSING NETWORK Event processing agents contain business logic Matt Weiden / SoundCloud Ltd. !• Represented as a directed graph of • Event producers • Event channels • Event processing agents (EPAs) • enrich events • transform events into complex events • detect patterns • Event consumers DB 1 cache Event Processing Agent Event Channel A Event Channel B Event Channel A Event Channel B
  • 12. HOW? EVENT PROCESSING NETWORK Event consumers act on processing in the network Matt Weiden / SoundCloud Ltd. !• Represented as a directed graph of • Event producers • Event channels • Event processing agents (EPAs) • enrich events • transform events into complex events • detect patterns • Event consumers Consumer Event Channel A Event Channel B Event Channel C
  • 13. HOW? DO EPNs ACHIEVE EDA’s GOALS? • Asynchronous, pipelined parallelism ! ! ! ! Producer Event Channel Consumer ! or EPA 1 ! ! The node to node flow allows parallelism asynchronous computation. Matt Weiden / SoundCloud Ltd. or EPA 3
  • 14. HOW? DO EPNs ACHIEVE EDA’s GOALS? • Asynchronous, pipelined parallelism ! ! ! ! ! ! ! ! ! Source: http://www2.engr.arizona.edu/~ece462/Lec03-pipe/ Matt Weiden / SoundCloud Ltd.
  • 15. HOW? DO EPNs ACHIEVE EDA’s GOALS? • Build ‘Complex Events’ by putting events into the context in which they occur ! !!!!!!!!! DB 1 Event Processing Agent Abstract example of a complex event being created. EVENT5 EVENT5 + context E1 E2 E3 E4 cache Possible by aggregating and/or summarizing with data from external sources. Matt Weiden / SoundCloud Ltd.
  • 16. HOW? DO EPNs ACHIEVE EDA’s GOALS? • Build ‘Complex Events’ by putting events into the context in which they occur ! ! ! ! ! ! ! !!! Stores Fingerprint Finds similar fingerprints (Jacquard distance) fingerprints In Sketchy the bulk agent stores a text fingerprint context in memcached. Matt Weiden / SoundCloud Ltd. M1 M2 M3 M4 MSG 4 bulkStatisticsAgent bulkDetectorAgent Bulk! Complex Event M1 M2 M3 M4 memcached
  • 17. HOW? DO EPNs ACHIEVE EDA’s GOALS? • Broadcast events and complex events wherever their reuse is possible ! ! ! ! ! ! ! !! Producer or EPA 1 Matt Weiden / SoundCloud Ltd. Consumer or EPA 3 A common use case in a Sketchy network. Producer or EPA 2 Event Channel Consumer or EPA 4 The event channel can send messages in this fashion. messageCreateIngester junkStatisticsAgent junkDetectorAgent signalEmitterAgent rateLimiterAgent
  • 18. SKETCHY@SOUNDCLOUD messageCreateIngester Matt Weiden / SoundCloud Ltd. junkStatisticsAgent junkDetectorAgent signalEmitterAgent rateLimiterAgent
  • 19. MOVE SKETCHY’S LOGIC TO TWITTER’S STORM? Storm is a framework for building EPNs at scale STORM Sketchy’s Network Matt Weiden / SoundCloud Ltd. Components Language Scala Scala Parallelism Multiple workers on Multiple hosts Multiple workers on Single host Deployment ‘Nimbus’ & Zookeeper Bazooka Messaging Guarantees atLeastOnce, atMostOnce Not yet Hadoop Integration Yes No
  • 20. LEARN MORE • Event Processing Networks • Sharon and Etzion, “Event Processing Network, A Conceptual Model,” VLDB, 2007 • Sketchy • https://github.com/soundcloud/sketchy-core • Storm • Toshniwal et al., “Storm@Twitter,” SIGMOD, 2014 • https://storm.incubator.apache.org Matt Weiden / SoundCloud Ltd.
  • 21. THANK YOU. QUESTIONS? ! Matt Weiden / SoundCloud Ltd. @mweiden, weiden@soundcloud.com