SlideShare a Scribd company logo
1 of 40
Download to read offline
Real-time serverless
analytics at Shedd
Overview and hands-on workshop
Dobo Radichkov
OLX Data Summit, March 2018
2
What to expect…
ØGoal is to give you a sweeping view of the Shedd
serverless real-time analytics stack
ØWe will cover a lot of new tools and tech building blocks,
though we will steer clear of the nitty gritty details
ØExpect technical content and hands-on exercises – for
the non-technical folk in the audience, try to focus on the
high-level understanding of the concepts
ØWe hope the presentation gives you inspiration and
smoothens the learning curve in case you decide to
pursue a similar approach
3
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
4
Why real-time analytics?
VS
Offline Real-time
5
Why real-time analytics?
VS
Offline Real-time
Enables products that adapt and respond to
changing user behaviour instantly and continuously
6
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
Day 1
activity
Browser Viewer Buyer
7
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity
Browser Viewer Buyer
8
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity
Browser Viewer Buyer
How can real-time analytics help?
9
Real-time analytics unlocks a number of capabilities
Segment user behaviour and build real-time single customer viewSegmentation
Personalisation
Targeting
Reporting
A/B testing
Data-driven
products
Instantly personalise product experience based on up-to-date user
preferences and behaviour
Target users with push notifications, in-app messaging and custom
product flows based on real-time triggers and rules
Build mission-critical reports for real-time decision-making (e.g.
during large live marketing campaign or new product releases)
Continuously optimise live A/B tests based on real-time results
Enable integration of data analytics & models within our products
10
Real-time analytics enables us to unlock the full value of dataThe diminishing value of data
Recent data is highly valuab
If you act on it in time
Perishable Insights (M. Gualtieri, F
Old + Recent data is more v
If you have the means to combine t
11
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Today we will take a peek at Shedd’s real-time data stack
12
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
13
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
14
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
15
Kinesis includes 3 flavours
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
Stream à Process Stream à Analyse Stream à Ingest
16
Kinesis Data Stream architecture
▪ 1 MB / sec data input
▪ 1 MB / sec data output
▪ 1000 records / sec
▪ 24 hours data retention
▪ $0.015 / shard / hour
($10.80 / shard / month)
▪ $0.014 / 1M records
($14 / 1B records)
…
Stream
Shard
Event / data record (e.g. JSON object)
Write event to stream shard
Read event from stream shard
17
Exercise: Create stream and feed with sample data
1. Create Kinesis data stream 2. Feed sample real-time data
https://us-west-2.console.aws.amazon.com/kinesis/home?region=us-west-2#/streams/create https://awslabs.github.io/amazon-kinesis-data-generator/
18
Kinesis Analytics enables real-time data analysis,
transformation, enrichment and visualisation
19
Exercise: Create Kinesis Analytics application and run some
real-time SQL analysis
1. Create Kinesis Analytics app 2. Run real-time SQL analysis
20
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
21
Evolution of computing models
ON-PREMISE
Physical servers
SERVER as a service
Virtual server in
the cloud
Amazon EC2
APP as a service
Virtual app
container
Amazon ECS
FUNCTION as a service
Serverless
computing
AWS Lambda
22
Lambda is Amazon’s serverless event-driven compute service
Write code in
Python, Node.js,
Java, and others
and upload to
Lambda
Trigger code from
other AWS services,
HTTP endpoints or
in-app activity
Scale seamlessly and
elastically with number of
events, only using
required compute
resource
Only pay for the
compute time
used (per 100ms
execution time)
Forget about infrastructure, administration and scaling – focus 100% on your app logic
23
Exercise: Let’s create 2 simple Lambda functions
1. Create Hello World 2. Create stream processor
24
Combining Lambda with API gateway empowers the data
professional to create serverless APIs
25
serverless framework streamlines and automates deployment
26
Exercise: Create APIs with serverless + API gateway + Lambda
1. Create Hello World endpoint 2. Create mock API endpoint
27
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
28
ElastiCache is Amazon’s managed service for Redis:
an INSANELY fast in-memory key-value database
▪In-memory
▪Low latency
▪Ridiculously fast
▪NoSQL à key-value store
▪Open source
29
Redis + Redshift =
▪ Run few queries infrequently
▪ Process billions of records per query
▪ Standard SQL
▪ Batch
▪ Run millions of commands continuously
▪ Process few records per command
▪ 200 Redis commands + Lua scripting
▪ Real-time
30
Redis is a key-value store supporting 5 basic data types
Key => { Data Structures }
Key
"I'm a Plain Text String!"
Key1 Val1
Key2 Val 2
A: 0.1 B: 0.3 C: 500 D: 500
A B C D
C B B A C
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
String
Hash
List
Set
Sorted set
31
Exercise: Let’s have a look at Redis in action
1. Play with Redis commands 2. Test Redis speed
32
Recap: We covered the 3 AWS building blocks for real-time data
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
+
33
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
34
Real-time vs offline data stacks
Offline
stack
Real-
time
stack
Raw data Files on S3 Kinesis streams
Database Redshift Redis
Volume
High – processing millions /
billions of records at the same time
Low – processing
single records at a time
Velocity
Low – running
few queries at a time
High – running thousands / millions
of queries at the same time
Query language SQL Python + Redis commands
End-user Humans, BI tools Lambda, APIs, products
35
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Shedd end-to-end data stack architecutre
36
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
37
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
Segmentation API
Lambda
Kingsman service
38
Shedd app
Android /
iOS SDK
FRONTEND
Analytics API
handler
Lambda
Endpoint(s)
API gateway
API
Data
warehouse
Redshift
Redis
bulk loader
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd analytics APIs
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
39
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
Thank you
Questions? Feedback?
Dobo Radichkov
Analytics summit, Jan 2018

More Related Content

What's hot

Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
How Retail Banks Use MongoDB
How Retail Banks Use MongoDBHow Retail Banks Use MongoDB
How Retail Banks Use MongoDBMongoDB
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
 
Webinar: Expanding Retail Frontiers with MongoDB
 Webinar: Expanding Retail Frontiers with MongoDB Webinar: Expanding Retail Frontiers with MongoDB
Webinar: Expanding Retail Frontiers with MongoDBMongoDB
 
Improving Transactional Applications with Analytics
Improving Transactional Applications with AnalyticsImproving Transactional Applications with Analytics
Improving Transactional Applications with AnalyticsDATAVERSITY
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...MongoDB
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBMongoDB
 
Use Cases for NoSQL in Media
Use Cases for NoSQL in MediaUse Cases for NoSQL in Media
Use Cases for NoSQL in MediaSander Kieft
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with techMongoDB
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
 
How Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsHow Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsDATAVERSITY
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphNeo4j
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDBMongoDB
 
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDBBusiness Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDBMongoDB
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...Big Data Spain
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j
 
JavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 IntegrationJavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 IntegrationQuentin Adam
 
Calculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsCalculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsMongoDB
 

What's hot (20)

Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
How Retail Banks Use MongoDB
How Retail Banks Use MongoDBHow Retail Banks Use MongoDB
How Retail Banks Use MongoDB
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
 
Webinar: Expanding Retail Frontiers with MongoDB
 Webinar: Expanding Retail Frontiers with MongoDB Webinar: Expanding Retail Frontiers with MongoDB
Webinar: Expanding Retail Frontiers with MongoDB
 
Improving Transactional Applications with Analytics
Improving Transactional Applications with AnalyticsImproving Transactional Applications with Analytics
Improving Transactional Applications with Analytics
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
 
Use Cases for NoSQL in Media
Use Cases for NoSQL in MediaUse Cases for NoSQL in Media
Use Cases for NoSQL in Media
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with tech
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
How Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsHow Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical Applications
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDB
 
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDBBusiness Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017
 
JavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 IntegrationJavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 Integration
 
Calculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsCalculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce Platforms
 

Similar to Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseElena Lopez
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Functional architectural patterns
Functional architectural patternsFunctional architectural patterns
Functional architectural patternsLars Albertsson
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAmazon Web Services
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Amazon Web Services
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studydeep.bi
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgDavid Pilato
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWSAmazon Web Services Korea
 
The Internet as a Single Database
The Internet as a Single DatabaseThe Internet as a Single Database
The Internet as a Single DatabaseDatafiniti
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 

Similar to Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona (20)

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Functional architectural patterns
Functional architectural patternsFunctional architectural patterns
Functional architectural patterns
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS
 
The Internet as a Single Database
The Internet as a Single DatabaseThe Internet as a Single Database
The Internet as a Single Database
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Analysing Data in Real-time
Analysing Data in Real-timeAnalysing Data in Real-time
Analysing Data in Real-time
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

  • 1. Real-time serverless analytics at Shedd Overview and hands-on workshop Dobo Radichkov OLX Data Summit, March 2018
  • 2. 2 What to expect… ØGoal is to give you a sweeping view of the Shedd serverless real-time analytics stack ØWe will cover a lot of new tools and tech building blocks, though we will steer clear of the nitty gritty details ØExpect technical content and hands-on exercises – for the non-technical folk in the audience, try to focus on the high-level understanding of the concepts ØWe hope the presentation gives you inspiration and smoothens the learning curve in case you decide to pursue a similar approach
  • 3. 3 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 5. 5 Why real-time analytics? VS Offline Real-time Enables products that adapt and respond to changing user behaviour instantly and continuously
  • 6. 6 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies Day 1 activity Browser Viewer Buyer
  • 7. 7 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies 2.9 ad views 0.02 replies 1.3 active days 150 ad views 0.4 replies 4.7 active days 670 ad views 6.7 replies 11.2 active days Day 1 activity Days 2-30 activity Browser Viewer Buyer
  • 8. 8 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies 2.9 ad views 0.02 replies 1.3 active days 150 ad views 0.4 replies 4.7 active days 670 ad views 6.7 replies 11.2 active days Day 1 activity Days 2-30 activity Browser Viewer Buyer How can real-time analytics help?
  • 9. 9 Real-time analytics unlocks a number of capabilities Segment user behaviour and build real-time single customer viewSegmentation Personalisation Targeting Reporting A/B testing Data-driven products Instantly personalise product experience based on up-to-date user preferences and behaviour Target users with push notifications, in-app messaging and custom product flows based on real-time triggers and rules Build mission-critical reports for real-time decision-making (e.g. during large live marketing campaign or new product releases) Continuously optimise live A/B tests based on real-time results Enable integration of data analytics & models within our products
  • 10. 10 Real-time analytics enables us to unlock the full value of dataThe diminishing value of data Recent data is highly valuab If you act on it in time Perishable Insights (M. Gualtieri, F Old + Recent data is more v If you have the means to combine t
  • 11. 11 BATCH DATA STACK Operational data layer (listings, replies, users, orders, etc.) Raw data layer (data lake) Tracking (Ninja / Hydra) Platform DB (Mongo) Adjust / Facebook / Google … BI Segmentation Performance marketing CLM Batch recommender … DATAWAREHOUSE Raw data streams REAL-TIME DATA STACK Tracking (Ninja / Hydra) Platform DB (Mongo) … Real-time data processing Real-time database (Online customer view) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ API gateway Real-time recommender Real-time segmentation Other real-time applications Today we will take a peek at Shedd’s real-time data stack
  • 12. 12 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 13. 13 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 14. 14 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 15. 15 Kinesis includes 3 flavours © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Amazon Kinesis Data Streams Amazon Kinesis Data Analytics Amazon Kinesis Data Firehose Build custom applications that process and analyze streaming data Easily process and analyze streaming data with standard SQL Easily load streaming data into AWS Stream à Process Stream à Analyse Stream à Ingest
  • 16. 16 Kinesis Data Stream architecture ▪ 1 MB / sec data input ▪ 1 MB / sec data output ▪ 1000 records / sec ▪ 24 hours data retention ▪ $0.015 / shard / hour ($10.80 / shard / month) ▪ $0.014 / 1M records ($14 / 1B records) … Stream Shard Event / data record (e.g. JSON object) Write event to stream shard Read event from stream shard
  • 17. 17 Exercise: Create stream and feed with sample data 1. Create Kinesis data stream 2. Feed sample real-time data https://us-west-2.console.aws.amazon.com/kinesis/home?region=us-west-2#/streams/create https://awslabs.github.io/amazon-kinesis-data-generator/
  • 18. 18 Kinesis Analytics enables real-time data analysis, transformation, enrichment and visualisation
  • 19. 19 Exercise: Create Kinesis Analytics application and run some real-time SQL analysis 1. Create Kinesis Analytics app 2. Run real-time SQL analysis
  • 20. 20 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 21. 21 Evolution of computing models ON-PREMISE Physical servers SERVER as a service Virtual server in the cloud Amazon EC2 APP as a service Virtual app container Amazon ECS FUNCTION as a service Serverless computing AWS Lambda
  • 22. 22 Lambda is Amazon’s serverless event-driven compute service Write code in Python, Node.js, Java, and others and upload to Lambda Trigger code from other AWS services, HTTP endpoints or in-app activity Scale seamlessly and elastically with number of events, only using required compute resource Only pay for the compute time used (per 100ms execution time) Forget about infrastructure, administration and scaling – focus 100% on your app logic
  • 23. 23 Exercise: Let’s create 2 simple Lambda functions 1. Create Hello World 2. Create stream processor
  • 24. 24 Combining Lambda with API gateway empowers the data professional to create serverless APIs
  • 25. 25 serverless framework streamlines and automates deployment
  • 26. 26 Exercise: Create APIs with serverless + API gateway + Lambda 1. Create Hello World endpoint 2. Create mock API endpoint
  • 27. 27 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 28. 28 ElastiCache is Amazon’s managed service for Redis: an INSANELY fast in-memory key-value database ▪In-memory ▪Low latency ▪Ridiculously fast ▪NoSQL à key-value store ▪Open source
  • 29. 29 Redis + Redshift = ▪ Run few queries infrequently ▪ Process billions of records per query ▪ Standard SQL ▪ Batch ▪ Run millions of commands continuously ▪ Process few records per command ▪ 200 Redis commands + Lua scripting ▪ Real-time
  • 30. 30 Redis is a key-value store supporting 5 basic data types Key => { Data Structures } Key "I'm a Plain Text String!" Key1 Val1 Key2 Val 2 A: 0.1 B: 0.3 C: 500 D: 500 A B C D C B B A C Strings/Blobs/Bitmaps Hash Tables (objects!) Linked Lists Sets Sorted Sets String Hash List Set Sorted set
  • 31. 31 Exercise: Let’s have a look at Redis in action 1. Play with Redis commands 2. Test Redis speed
  • 32. 32 Recap: We covered the 3 AWS building blocks for real-time data KINESIS Stream data LAMBDA Process data ELASTICACHE Store data +
  • 33. 33 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 34. 34 Real-time vs offline data stacks Offline stack Real- time stack Raw data Files on S3 Kinesis streams Database Redshift Redis Volume High – processing millions / billions of records at the same time Low – processing single records at a time Velocity Low – running few queries at a time High – running thousands / millions of queries at the same time Query language SQL Python + Redis commands End-user Humans, BI tools Lambda, APIs, products
  • 35. 35 BATCH DATA STACK Operational data layer (listings, replies, users, orders, etc.) Raw data layer (data lake) Tracking (Ninja / Hydra) Platform DB (Mongo) Adjust / Facebook / Google … BI Segmentation Performance marketing CLM Batch recommender … DATAWAREHOUSE Raw data streams REAL-TIME DATA STACK Tracking (Ninja / Hydra) Platform DB (Mongo) … Real-time data processing Real-time database (Online customer view) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ API gateway Real-time recommender Real-time segmentation Other real-time applications Shedd end-to-end data stack architecutre
  • 36. 36 Shedd app Android / iOS SDK FRONTEND Recommendation service orchestrator Lambda Endpoint(s) API gateway API Event stream Kinesis Event processor Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd real-time recommendations Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING
  • 37. 37 Shedd app Android / iOS SDK FRONTEND Recommendation service orchestrator Lambda Endpoint(s) API gateway API Event stream Kinesis Event processor Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd real-time recommendations Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING Segmentation API Lambda Kingsman service
  • 38. 38 Shedd app Android / iOS SDK FRONTEND Analytics API handler Lambda Endpoint(s) API gateway API Data warehouse Redshift Redis bulk loader Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd analytics APIs Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING
  • 39. 39 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 40. Thank you Questions? Feedback? Dobo Radichkov Analytics summit, Jan 2018