Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

Real-time serverless
analytics at Shedd
Overview and hands-on workshop
Dobo Radichkov
OLX Data Summit, March 2018

2
What to expect…
ØGoal is to give you a sweeping view of the Shedd
serverless real-time analytics stack
ØWe will cover a lot of new tools and tech building blocks,
though we will steer clear of the nitty gritty details
ØExpect technical content and hands-on exercises – for
the non-technical folk in the audience, try to focus on the
high-level understanding of the concepts
ØWe hope the presentation gives you inspiration and
smoothens the learning curve in case you decide to
pursue a similar approach

3
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A

4
Why real-time analytics?
VS
Offline Real-time

5
Why real-time analytics?
VS
Offline Real-time
Enables products that adapt and respond to
changing user behaviour instantly and continuously

6
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
Day 1
activity
Browser Viewer Buyer

7
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity

8
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity
How can real-time analytics help?

9
Real-time analytics unlocks a number of capabilities
Segment user behaviour and build real-time single customer viewSegmentation
Personalisation
Targeting
Reporting
A/B testing
Data-driven
products
Instantly personalise product experience based on up-to-date user
preferences and behaviour
Target users with push notifications, in-app messaging and custom
product flows based on real-time triggers and rules
Build mission-critical reports for real-time decision-making (e.g.
during large live marketing campaign or new product releases)
Continuously optimise live A/B tests based on real-time results
Enable integration of data analytics & models within our products

10
Real-time analytics enables us to unlock the full value of dataThe diminishing value of data
Recent data is highly valuab
If you act on it in time
Perishable Insights (M. Gualtieri, F
Old + Recent data is more v
If you have the means to combine t

11
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Today we will take a peek at Shedd’s real-time data stack

12
Contents
▪ Introduction
▪ Q&A

13
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data

14
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data

15
Kinesis includes 3 flavours
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
Stream à Process Stream à Analyse Stream à Ingest

16
Kinesis Data Stream architecture
▪ 1 MB / sec data input
▪ 1 MB / sec data output
▪ 1000 records / sec
▪ 24 hours data retention
▪ $0.015 / shard / hour
($10.80 / shard / month)
▪ $0.014 / 1M records
($14 / 1B records)
…
Stream
Shard
Event / data record (e.g. JSON object)
Write event to stream shard
Read event from stream shard

17
Exercise: Create stream and feed with sample data
1. Create Kinesis data stream 2. Feed sample real-time data
https://us-west-2.console.aws.amazon.com/kinesis/home?region=us-west-2#/streams/create https://awslabs.github.io/amazon-kinesis-data-generator/

18
Kinesis Analytics enables real-time data analysis,
transformation, enrichment and visualisation

19
Exercise: Create Kinesis Analytics application and run some
real-time SQL analysis
1. Create Kinesis Analytics app 2. Run real-time SQL analysis

20
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data

21
Evolution of computing models
ON-PREMISE
Physical servers
SERVER as a service
Virtual server in
the cloud
Amazon EC2
APP as a service
Virtual app
container
Amazon ECS
FUNCTION as a service
Serverless
computing
AWS Lambda

22
Lambda is Amazon’s serverless event-driven compute service
Write code in
Python, Node.js,
Java, and others
and upload to
Lambda
Trigger code from
other AWS services,
HTTP endpoints or
in-app activity
Scale seamlessly and
elastically with number of
events, only using
required compute
resource
Only pay for the
compute time
used (per 100ms
execution time)
Forget about infrastructure, administration and scaling – focus 100% on your app logic

23
Exercise: Let’s create 2 simple Lambda functions
1. Create Hello World 2. Create stream processor

24
Combining Lambda with API gateway empowers the data
professional to create serverless APIs

25
serverless framework streamlines and automates deployment

26
Exercise: Create APIs with serverless + API gateway + Lambda
1. Create Hello World endpoint 2. Create mock API endpoint

27
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data

28
ElastiCache is Amazon’s managed service for Redis:
an INSANELY fast in-memory key-value database
▪In-memory
▪Low latency
▪Ridiculously fast
▪NoSQL à key-value store
▪Open source

29
Redis + Redshift =
▪ Run few queries infrequently
▪ Process billions of records per query
▪ Standard SQL
▪ Batch
▪ Run millions of commands continuously
▪ Process few records per command
▪ 200 Redis commands + Lua scripting
▪ Real-time

30
Redis is a key-value store supporting 5 basic data types
Key => { Data Structures }
Key
"I'm a Plain Text String!"
Key1 Val1
Key2 Val 2
A: 0.1 B: 0.3 C: 500 D: 500
A B C D
C B B A C
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
String
Hash
List
Set
Sorted set

31
Exercise: Let’s have a look at Redis in action
1. Play with Redis commands 2. Test Redis speed

32
Recap: We covered the 3 AWS building blocks for real-time data
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
+

33
Contents
▪ Introduction
▪ Q&A

34
Real-time vs offline data stacks
Offline
stack
Real-
time
stack
Raw data Files on S3 Kinesis streams
Database Redshift Redis
Volume
High – processing millions /
billions of records at the same time
Low – processing
single records at a time
Velocity
Low – running
few queries at a time
High – running thousands / millions
of queries at the same time
Query language SQL Python + Redis commands
End-user Humans, BI tools Lambda, APIs, products

35
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Shedd end-to-end data stack architecutre

36
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING

37
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
Segmentation API
Lambda
Kingsman service

38
Shedd app
Android /
iOS SDK
FRONTEND
Analytics API
handler
Lambda
Endpoint(s)
API gateway
API
Data
warehouse
Redshift
Redis
bulk loader
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd analytics APIs
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING

39
Contents
▪ Introduction
▪ Q&A

Thank you
Questions? Feedback?
Dobo Radichkov
Analytics summit, Jan 2018

Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

Similar to Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona (20)

Recently uploaded

Recently uploaded (20)

Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona