1. SKETCHY: A COMPLEX EVENT
PROCESSING NETWORK
FOR SPAM DETECTION.
!
Matt Weiden / SoundCloud Ltd.
2. !
WHO?
Ich heiße Matt Weiden. Freut mich.
• Backend Engineer, SoundCloud’s Trust, Safety & Security
Team
• Previously Cognitive Science, BCI research
!
Contributors
• Rany Keddo
• Michael Brückner
• Astera Schneeweisz
• Others
Matt Weiden / SoundCloud Ltd.
3. WHAT?
INFERENCE FROM RELATED STREAMS
OF DATA
The problem: How quickly and efficiently can we draw
aggregate inferences from large streams of related events?
!
!
!
!
!
!
!
!
What inferences could we make?
How quickly and efficiently can we make them?
Matt Weiden / SoundCloud Ltd.
Time
Posts
Views
Follows
4. DRINKING FROM A FIREHOSE.
Performing this for a whole site might take a little more thought.
5. WHAT (MORE SPECIFICALLY)?
INFERENCE FROM RELATED STREAMS
OF DATA
!
!
!
How quickly and efficiently can we draw aggregate inferences
from large streams of related events?
Matt Weiden / SoundCloud Ltd.
6. HOW?
EVENT-DRIVEN ARCHITECTURE
Event-Driven Architecture (EDA)
!
• Near realtime
• Only process the data once*
• Operate on
• incremental sub-goal results
• ‘Complex Events’ by adding ‘Context’
• Asynchronous, pipelined parallelism
• Broadcast reusable events and complex events
Matt Weiden / SoundCloud Ltd.
8. HOW?
EVENT PROCESSING NETWORK
Sketchy is an EPN that implements EDA
!
• Prevents text and social graph spam at SoundCloud
• Open-source
• Modular
• written as a flexible library, adaptable
• many common components available out of the box
• Battle tested
• ingests many sensitive event types at SoundCloud
Matt Weiden / SoundCloud Ltd.
9. HOW?
EVENT PROCESSING NETWORK
Event producers introduce events into a network
Matt Weiden / SoundCloud Ltd.
!•
Represented as a directed graph of
• Event producers
• Event channels
• Event processing agents (EPAs)
• enrich events
• transform events into complex events
• detect patterns
• Event consumers
Producer
Event Channel A
Event Channel B
Event Channel C
10. HOW?
EVENT PROCESSING NETWORK
Event channels route events through the network
Producer
or EPA 2
Consumer
or EPA 4
Matt Weiden / SoundCloud Ltd.
!•
Represented as a directed graph of
• Event producers
• Event channels
• Event processing agents (EPAs)
• enrich events
• transform events into complex events
• detect patterns
• Event consumers
Producer
or EPA 1
Event Channel
Consumer
or EPA 3
11. HOW?
EVENT PROCESSING NETWORK
Event processing agents contain business logic
Matt Weiden / SoundCloud Ltd.
!•
Represented as a directed graph of
• Event producers
• Event channels
• Event processing agents (EPAs)
• enrich events
• transform events into complex events
• detect patterns
• Event consumers
DB 1 cache
Event
Processing
Agent
Event Channel A
Event Channel B
Event Channel A
Event Channel B
12. HOW?
EVENT PROCESSING NETWORK
Event consumers act on processing in the network
Matt Weiden / SoundCloud Ltd.
!•
Represented as a directed graph of
• Event producers
• Event channels
• Event processing agents (EPAs)
• enrich events
• transform events into complex events
• detect patterns
• Event consumers
Consumer
Event Channel A
Event Channel B
Event Channel C
13. HOW?
DO EPNs ACHIEVE EDA’s GOALS?
• Asynchronous, pipelined parallelism
!
!
!
!
Producer
Event Channel Consumer
!
or EPA 1 !
!
The node to node flow allows parallelism asynchronous
computation.
Matt Weiden / SoundCloud Ltd.
or EPA 3
15. HOW?
DO EPNs ACHIEVE EDA’s GOALS?
• Build ‘Complex Events’ by putting events into the context in
which they occur
!
!!!!!!!!!
DB 1
Event
Processing
Agent
Abstract
example of a
complex event
being created.
EVENT5
EVENT5
+
context
E1 E2 E3 E4
cache
Possible by aggregating and/or summarizing with data from external sources.
Matt Weiden / SoundCloud Ltd.
16. HOW?
DO EPNs ACHIEVE EDA’s GOALS?
• Build ‘Complex Events’ by putting events into the context
in which they occur
!
!
!
!
!
!
!
!!!
Stores Fingerprint Finds similar fingerprints (Jacquard distance)
fingerprints
In Sketchy the bulk agent stores a text fingerprint context in memcached.
Matt Weiden / SoundCloud Ltd.
M1 M2 M3 M4
MSG
4 bulkStatisticsAgent bulkDetectorAgent
Bulk!
Complex Event
M1 M2 M3 M4
memcached
17. HOW?
DO EPNs ACHIEVE EDA’s GOALS?
• Broadcast events and complex events wherever their reuse
is possible
!
!
!
!
!
!
!
!!
Producer
or EPA 1
Matt Weiden / SoundCloud Ltd.
Consumer
or EPA 3
A common use case in a Sketchy network.
Producer
or EPA 2
Event Channel Consumer
or EPA 4
The event channel
can send messages in
this fashion.
messageCreateIngester
junkStatisticsAgent junkDetectorAgent
signalEmitterAgent
rateLimiterAgent
19. MOVE SKETCHY’S LOGIC TO
TWITTER’S STORM?
Storm is a framework for building EPNs at scale
STORM Sketchy’s Network
Matt Weiden / SoundCloud Ltd.
Components
Language Scala Scala
Parallelism Multiple workers on
Multiple hosts
Multiple workers on
Single host
Deployment ‘Nimbus’ & Zookeeper Bazooka
Messaging Guarantees atLeastOnce,
atMostOnce
Not yet
Hadoop Integration Yes No
20. LEARN MORE
• Event Processing Networks
• Sharon and Etzion, “Event Processing Network, A
Conceptual Model,” VLDB, 2007
• Sketchy
• https://github.com/soundcloud/sketchy-core
• Storm
• Toshniwal et al., “Storm@Twitter,” SIGMOD, 2014
• https://storm.incubator.apache.org
Matt Weiden / SoundCloud Ltd.