Realtime streaming architecture in INFINARIO

How to analyze billions of events in real-time?
Jozo.Kovac@infinario.com
Co-Founder & Product Manager
Lambda architecture
for real-time streaming analytics

Agenda
• Goals & requirements
• Design patterns for streaming analytics
– General idea
– Lambda
– Kappa
• INFINARIO backend
• Discussion

Example: Lets build a funnel fast

Requirements
• VELOCITY
– Process never ending stream of “events” in real-time
• VARIETY AT SPEED
– Analyses! Not just predefined reports
• VOLUME
– Be able to reprocess a stream; retain data
• RELIABILITY
– Never lose an event
• AVAILABLITY
– Avoid down-times

DESIGN PATTERNS FOR REAL-TIME
STREAMING ANALYTICS
LETS LOOK OUTSIDE

Real Time Streaming Architecture
Source Systems
Sources
Syslog
Machine
Data
External
Streams
Other
Data
Collection
Flume /
Custom
Agent A
Agent B
Agent N
Messaging
System
Kafka
Topic B
Topic N
Topic A
Real Time
Processing
Storm
Topology B
Topology
N
Topology A
Storage
Search
Elastic
Search /
Solr
Low Latency
NoSql
HBase
Historic
Hive /
HDFS
Access
Web
Services
REST API
Web Apps
Analytic
Tools
R / Python
BI Tools
Alerting
Systems

Apache Kafka
• publish-subscribe messaging for real-time feeds
• retains data for configurable period of time
• immutable messages queue (events)
• high-throughput, low-latency

Lambda Architecture
New
Data
Data
Stream
Batch Layer
All Data
Pre-compute
Views
Speed Layer
Stream
Processing
Real Time
View
Serving Layer
Batch View
Batch View
Data
Access
Query
http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774

Components for Lambda
Batch layer components
Speed layer components
Serving layer components
http://lambda-architecture.net/

Lambda pros & cons
• Pros
– Combines real-time & batch processing
– Retains input data unchanged
– Allows to reprocess the data
– Stores immediate stages
• Cons
– 2 apps in 2 languages what do the same thing
– 2x implement, maintain & debug the code
– Say good bye to system specific features
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html

Kappa Architecture
Data
Source
Data
Stream
Stream Processing
System
Job Version
n
Serving DB
Output table
n
Output table
n + 1
Data
Access
Query
Job Version
n + 1
1. Use Kafka that retains full log of data to reprocess and allows for multiple subscribers.
2. Reprocessing: new instance of processing job process from start, outputs to new table.
3. When the second job has caught up, switch the application to read from the new table.
4. Stop the old version of the job, and delete the old output table.
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html

Kappa pros & cons
• Pros
– Allows people to develop, test, debug, and
operate their systems on top of a single processing
framework
• Cons
– Needs 2x total storage (2 versions of results)
– Requires DB with high volume writes

QUERIES
IN MEMORY PROCESSING
(IMF™)
PERSISTENT STORAGE
(NoSQL)
EVENTAPI
LOAD HISTORY
AFTER RESTART
EVENT
STREAM
INFINARIO Architecture (now)

IMF™
• “In-Memory (event processing) Framework”
• Collect, store and analyze events and players
• Distributed & scalable
– Built on NodeJS and C++
– Nodes per CPU core & proportion of RAM
– Provides API for analyses

IMF Benchmarking
0.004 0.007 0.039 0.349
0.243 2.354 23.894
262.784
0.349 2.593 25.245
284.803
0.202 2.28
522.518
1.609 86.233
1273.985
0
200
400
600
800
1000
1200
1400
100,000 1,000,000 10,000,000 100,000,000
Timetocalculatefunnel(s)
# of events in database
BlinkBytes
Mongo
TokuMX
Postgres
MySQL
IMF
https://infinario.com/speedtest

Our experience
 It’s lightning fast
 Cheap reprocess  No immediate results  Easy life
 Can process already processed stream (“streaming”)
x Code change or Add new node  reload IMF
x Reloads can take too long
x PB of RAM in 2015 is a joke

Reloads
• NoSQL eats too much resources (CPU time)
• Can potentially lose some events
• Reload time (NoSQL to IMF) grows fast
• Analyses are unavailable during reload

INFINARIO is like this
Source Systems
Sources
SDKs
BULK
Frontend
Data
Collection
Custom
API
Agent A
Agent B
Agent N
Messaging
System
Real Time
Processing
IMF
Topology B
Topology
N
Topology A
Storage
Historic
NoSQL
Access
Web
Services
REST API
Web Apps
Analytic
Tools
R / Python
BI Tools
Alerting
Systems

Questions
• Lambda?
• Kappa?
• Kafka?
• Technologies for components?

LOW LATENCY
Access
IN MEMORY PROCESSING
PERSISTENT
STORAGE
KAFKA
RELOADEVENT
STREAM
INFINARIO Architecture Updated
RAW DATA HISTORY VIEW
RAW DATA HISTORY VIEW
Ad hoc
DM
APP

AngularJS developer wanted
Our designers works much faster than frontend-team.
Could you help? Emai us: jobs@infinario.com

Realtime streaming architecture in INFINARIO

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Realtime streaming architecture in INFINARIO

Similar to Realtime streaming architecture in INFINARIO (20)

Recently uploaded

Recently uploaded (20)

Realtime streaming architecture in INFINARIO

Editor's Notes