Risk Management in Retail
with Stream Processing
Daniel Jagielski
Daniel Jagielski
Tech Lead & Manager at VirtusLab
Passionate about distributed systems
After hours:
Contact:
djagielski@virtuslab.com
kontakt@yagiel.pl
@danjagiel
Identity API
AUTHENTICATION TOKEN
MANAGEMENT
RISK ANALYSIS CLUBCARD
VERIFICATION
AUTHORIZATION
Attack profitability
Cost ValueCost Value
Risk engine
Risk engine
Analysis of authentication
data streams
Trigger targeted actions
against attackers
Clients Identity API Risk engine
Risk based authentication
Requests Data feed
Orchestration Knowledge
Risk Engine V1
Risk Engine V1
State stored
in memory
Locks
and critical
sections
Up to 3 DB
calls per
event
No horizontal
scalability
Slow event
processing
times
No high
availability
Session aggregate
#DDD
#EventSourcing
#aggregate
#CQRS
#projection
Session aggregator
High
memory
demand
CPU spikes
Long
rebalancing
Data
distribution
and skew
Aggregates
reaching up
to 1 MB
Artificial
size limits Scaling up
Tuning
RocksDB
Session aggregator
Session aggregator
Risk Engine V2 – new principles
Let’s not create
a session
aggregate!
Focus on the
main goal
Single
responsibility
of each module
Independent,
parallel pipelines
Risk Engine V2 – core design
Calculate IP
statistics
Block
malicious IP
Analyse IP
statistics
Risk Engine V2 – architecture
{
"ipAddress": "127.0.0.1",
"totalCount": 6,
"failureCount": 3,
"creationTimestamp": 123456,
"eventType": "login",
"failureRatio": 0.5
}
{
"ipAddress": "127.0.0.1",
"creationTimestamp": 123459,
"violatedRule": "XYZ"
}
Risk Engine V2 – single pipeline
Risk Engine V2 – multiple pipelines
Risk Engine V2 – multiple pipelines
Risk Engine V2 - benefits
Data locality
Scalability
No single point of
failure
Separation of
concerns
Smaller memory
footprint, less
CPU usage
Faster startup,
less state
Distributed
platform
Significant
performance
improvement
Deployment
POD
Observability
Kafka Connect Kafka Connect
Integration with external systems
Kafka Connect Kafka Connect
Integration with external systems
Summary
Identify
bottlenecks
Know your data! Minimum Value
Aggregates
Distributed,
independent
pipelines
Thank you!
Q&A

Risk Management in Retail with Stream Processing (Daniel Jagielski, Virtuslab/Tesco) Kafka Summit 2020