SlideShare a Scribd company logo
1 of 58
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roy Ben-Alta, Amazon AI and Analytics, BDM
Daniel Modlinger, VP R&D, Nutrino
June 21st, 2017
Real-Time Streaming Analytics
Overview Of Amazon Kinesis
What to expect from this session
• Real-Time Streaming Analytics – Why Now?
• Amazon Kinesis Overview
• New Features
• Design Patterns
• Nutrino Case Study
Most data is produced continuously
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test
What is your decision latency?
OR
The diminishing value of data
Recent data is highly valuable
• If you act on it in time
• Perishable Insights (M. Gualtieri, Forrester)
Old + Recent data is more valuable
• If you have the means to combine them
Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Ingest Transform Analyze React Persist
Streaming Data Scenarios Across Verticals
Scenarios/
Verticals
Accelerated Ingest-
Transform-Load
Continuous Metrics
Generation
Machine Learning and
Actionable Insights
Digital Ad
Tech/Marketing
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, and
conversion
User engagement with
ads, optimized bid/buy
engines
IoT Sensor, device telemetry
data ingestion
Operational metrics and
dashboards
Device operational
intelligence and alerts
Gaming Online data aggregation,
e.g., top 10 players
Massively multiplayer
online game (MMOG) live
dashboard
Leader board generation,
player-skill match
Consumer
Online
Clickstream analytics Metrics like impressions
and page views
Recommendation engines,
proactive care
Operation
monitoring
Security
DevOps tools,
Ingesting
VPCFlowLogs
Subscribe to
CloudwatchLogs and
analyze logs in Real-
Time
Anomaly Detection
Amazon Kinesis
Kinesis FirehoseKinesis Analytics
Platform for streaming data on AWS
Kinesis Streams
Amazon Kinesis
Kinesis FirehoseKinesis AnalyticsKinesis Streams
DeliverProcessCapture
Amazon Kinesis
Kinesis FirehoseKinesis AnalyticsKinesis Streams
DeliverProcess
For Technical Developers
Build your own custom
applications that process
or analyze streaming data
Amazon Kinesis
Kinesis FirehoseKinesis AnalyticsKinesis Streams
Deliver
For Technical Developers
Build your own custom
applications that process
or analyze streaming data
For All Developers, Data
Scientists, Business Analysts
Easily analyze data streams
using standard SQL queries
Amazon Kinesis
Kinesis FirehoseKinesis AnalyticsKinesis Streams
For Technical Developers
Build your own custom
applications that process
or analyze streaming data
For All Developers, Data
Scientists, Business Analysts
Easily analyze data streams
using standard SQL queries
For All Developers
Easily load massive volumes of
streaming data into S3,
Redshift and Elasticsearch,
clean and format using Lambda
Amazon Kinesis Customer Base Diversity
1 billion events/wk from
connected devices | IoT
17 PB of game data per
season | Entertainment
80 billion ad
impressions/day, 30 ms
response time | Ad Tech
100 GB/day click streams
from 250+ sites |
Enterprise
50 billion ad
impressions/day sub-50
ms responses | Ad Tech
10 million events/day
| Retail
Ingesting 2M+ network
events every second
Funnel all
production events
through Amazon
Kinesis
Why are these customers choosing Amazon Kinesis?
Lower costs
Performant without
heavy lifting
Scales elastically
Increased agility
Secure and visible
Plug and play
Amazon Kinesis Streams
Build your own data streaming applications
Easy administration: Simply create a new stream, and set the desired level of capacity with
shards. Scale to match your data throughput rate and volume.
Build real-time applications: Perform continual processing on streaming big data using
Kinesis Client Library (KCL), Kinesis Analytics, Apache Spark/Storm, AWS Lambda, and more.
Low cost: Cost-efficient for workloads of any scale.
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours – 7 days
• Scale Kinesis streams by splitting or
merging Shards
• Replay data inside of 24Hr -7days
Window
Amazon Kinesis Stream
Managed Ability to capture and store Data
Time-based
seek
Amazon Kinesis Streams pricing
Simple, pay-as-you-go, and no up-front costs
Dimension Value
Shard Hour $0.015
PUT Payload (per 1,0000,000 PUTs) $0.014
Extended Data Retention (Up to 7 days),
per Shard Hour
$0.020
Streaming Data Scenarios Across Verticals
Scenarios/
Verticals
Accelerated Ingest-
Transform-Load
Continuous Metrics
Generation
Machine Learning and
Actionable Insights
Digital Ad
Tech/Marketing
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, and
conversion
User engagement with
ads, optimized bid/buy
engines
IoT Sensor, device telemetry
data ingestion
Operational metrics and
dashboards
Device operational
intelligence and alerts
Gaming Online data aggregation,
e.g., top 10 players
Massively multiplayer
online game (MMOG) live
dashboard
Leader board generation,
player-skill match
Consumer
Online
Clickstream analytics Metrics like impressions
and page views
Recommendation engines,
proactive care
Operation
Security
DevOps tools, Ingesting
VPCFlowLogs
Subscribe to
CloudwatchLogs and
analyze logs in Real-Time
Anomaly Detection
Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3, Redshift
and Elasticsearch
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other
destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data
destinations in as little as 60 secs using simple configurations.
Elastic: Scales to match data throughput w/o intervention
Serverless ETL using AWS Lambda - Firehose can invoke your Lambda function to transform incoming
source data.
Capture and submit
streaming data
Analyze streaming data using
your favorite BI tools
Firehose loads streaming data
continuously into Amazon S3, Redshift
and Elasticsearch
Amazon Kinesis Firehose pricing
Simple, pay-as-you-go, and no up-front costs
Dimension Value
Per 1 GB of data ingested $0.029 - *$0.020
*Tier pricing
Amazon Kinesis Firehose or Amazon Kinesis Streams?
Kinesis Firehose vs. Kinesis Streams
Amazon Kinesis Streams is for use cases that require custom
processing, per incoming record, with sub-1 second processing
latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is for use cases that require zero
administration, ability to use existing analytics tools based on
Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a
data latency of 60 seconds or higher.
Streaming Data Scenarios Across Verticals
Scenarios/
Verticals
Accelerated Ingest-
Transform-Load
Continuous Metrics
Generation
Machine Learning and
Actionable Insights
Digital Ad
Tech/Marketing
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, and
conversion
User engagement with
ads, optimized bid/buy
engines
IoT Sensor, device telemetry
data ingestion
Operational metrics and
dashboards
Device operational
intelligence and alerts
Gaming Online data aggregation,
e.g., top 10 players
Massively multiplayer
online game (MMOG) live
dashboard
Leader board generation,
player-skill match
Consumer
Online
Clickstream analytics Metrics like impressions
and page views
Recommendation engines,
proactive care
Operation
Security
DevOps tools, Ingesting
VPCFlowLogs
Subscribe to
CloudwatchLogs and
analyze logs in Real-Time
Anomaly Detection
Amazon Kinesis Analytics
Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery
Stream and apply SQL skills.
Build real-time applications: Perform continual processing on streaming big data
with sub-second processing latencies.
Easy Scalability : Elastically scales to match data throughput.
Connect to Kinesis streams,
Firehose delivery streams
Run standard SQL queries
against data streams
Kinesis Analytics can send processed data
to analytics tools so you can create alerts
and respond in real-time
Use SQL to build real-time applications
Easily write SQL code to process
streaming data
Connect to streaming source
Continuously deliver SQL results
Coming Soon
Amazon Kinesis Streams
Forthcoming Features
Encryption at
Rest
VPCe
Amazon Kinesis Firehose
Forthcoming Features
VPCe:
RedShift
ElasticSearch
Parquet Format AWS RegionsKafka Connector
Amazon Kinesis Analytics
Forthcoming Features
Amazon
Lambda
Integration
Amazon
QuickSight
Destination
Improved
Autoscaling
Design Patterns -
Serverless
Capture all of the value from your data with Amazon Kinesis
Amazon
S3
Amazon
Kinesis
Analytics
Amazon Kinesis–
enabled app
Amazon Kinesis
Streams
Ingest Process React Persist
AWS
Lambda
0ms 200ms 1-2 s
Amazon
QuickSight
Amazon Kinesis
Firehose
Amazon
Redshift
Serverless Analytics – Enable your org with 0 operation overhead
Amazon S3
Ingest
Streaming
ETL
Persist Analyze
AWS
Lambda
O msec seconds < 5 minutes
Amazon Kinesis
Firehose
Amazon
Redshift
Amazon
Elasticsearch
Amazon
Athena
Amazon
Kinesis
Analytics
Amazon
Redshift
Spectrum
Amazon
Kinesis
Streams
Amazon Kinesis Data Generator on GitHub
https://awslabs.github.io/amazon-kinesis-data-generator/
Putting Data in Amazon Kinesis
AWS SDK
• PutRecord()
• PutRecordBatch()
Kinesis Agent
• Monitors files and sends new data records to your delivery stream
• Handles file rotation, checkpointing, and retry upon failures
• Pre-processing capabilities such as format conversion and log parsing
• Emits AWS CloudWatch metrics for monitoring and troubleshooting
Amazon EMR
Amazon
Kinesis
StreamsStreaming Input
Tumbling/Fixed
Window
Aggregation
Periodic Output
Amazon Redshift
COPY from
Amazon EMR
Common Integration Pattern with Amazon EMR
Tumbling Window Reporting
Amazon Kinesis Streams with AWS Lambda
Build your Streaming Pipeline with Amazon
Kinesis, AWS Lambda and Amazon EMR
Nutrino Case Study
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Daniel Modlinger
VP R&D
AWS Summit Tel Aviv
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Nutrition advice - conflicted
and not personal
$435B
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Food Analysis System
Scientific Literature
Food
(10,000+ papers)
(1,000,000+ foods)
(300,000+ restaurant locations)
Menus
FoodPrint™
Correlations
Insights
Predictions
Personal & continually
improving nutrition recommendations
Activity
Sleep
Continuous Glucose
Blood Tests
DNA - Genome
Taste, allergies etc.
Mood
Blood pressure
Temperature
Biome
Social media
Individual Inputs
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Examples - Partners
Pregnancy
Diabetes
ConsumerEmployer/payer
Retail
confidential
Hypertension
confidential
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Data Streaming
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
FoodPrint™
Activity
Sleep
Continuous Glucose
Blood Tests
DNA - Genome
Taste, allergies etc.
Mood
Blood pressure
Temperature
Biome
Social media
Individual Inputs
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Needs
Volume - Millions of daily data points
Scale - Non linear growth
Performance - Near real time, as low as 3rd party producers of data
allow
Extract as much data as possible and store unknown data types and
formats
Process and store a wide variety of known data types
Goal: Process and store nutrition-related sensor data
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
First attempts
2015 2016 2017
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Updated Needs (AKA lessons learned)
Cut the middle man
Detailed monitoring is a necessity
Be event driven
Fast developer response to changes in APIs
Reliability is key, otherwise data gets lost
Goal: Process and store nutrition related sensors data
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Current Solution - High Level Architecture
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Lambda allows us to write simple and clean code that will
deal with each entity without the need to be aware of flow
or context
Why Kinesis & Lambda (1)
Process and store sensor data and all other nutrition-related data
Scale - Non linear growth
Performance - Near real time
Store unknown data types and formats
Wide variety of known data types
All unknow data is saved to S3 using Kinesis Firehose,
which is later analyzed by our Spark cluster
With Lambda triggers on streams we can act
quickly enough
Scaling can be as easy as adding a shard, Lambda
functions are executed as much as we need
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Volume - Millions of daily data points
S3 is limitless, Kinesis streams are practically limitless
Why Kinesis & Lambda (2)
Process and store sensor data and all other nutrition-related data
Reliability is key, otherwise data gets lost
Allow fast developer response to changes
in APIs
Be event driven wherever possible
because polling causes problems
Kinesis streams are persistent (from 24h up to one
week), Lambda functions automatically retry,
cloudwatch has detailed monitoring of everything
Using API Gateway and Lambda integration allows 3rd
party producers to activate the processing
Changing one Lambda function does not affect the flow,
the deploy resolution can be as low as a single function
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Why Kinesis & Lambda (3)
Detailed monitoring is a necessity and
not a luxury
Cut the middle man - avoid writing
infrastructure to allow clients (iOS)
write sensor data
Iterator Age, put count, Lambda monitoring are a good
baseline to start from
We are using AWS iOS SDK to write the sensor data
Process and store sensor data and all other nutrition-related data
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
• Optimal shards number is a derivative of desired iterator age (aka latency)
• Optimal batch size - If your Lambda scales in a linear fashion then it’s doesn’t
really matters..
• A high number will cause less calls and can create an internal queue for the
consumer Lambda function (which will increase the latency to the user)
• A low number can be expensive in terms of Lambda overhead and price.
Scale & Performance (1)
Best Practice
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Scale & Performance (2)
Best Practice
• Lambda functions run sequentially on the shard’s data, but in parallel between
different shard
• The Lambda consumer will not continue to the next batch until the function
finishes successfully. Make sure to handle exception in your Lambda functions
or you can get your shard stuck!
• If your functions are slow, consider using a “forward” Lambda: a function that
will only read the records from Kinesis and then will activate another Lambda
function (which can run in parallel)
• Understand the KCL, it will help you adjust the different parameters to your
needs
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Best Practice
• Make sure to monitor:
• Iterator age
• Put records count
• Records distribution across shards
• Each Lambda on it’s own
• Your own required metrics
Monitoring
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
• Duplicate records are almost inevitable
• It’s easier to just handle duplicates than to avoid the problem
• Watch out - updates are not duplicates!
• It is possible to keep records order, but will likely to cause a larger overhead. Try
to avoid relying on records order as much as possible.
Best Practice
Duplication handling & order considerations
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Best Practice
• It’s recommended that the first setup will be done manually (using CLI)
• We found Serverless (by Serverless, Inc.) to be the most suitable for Kinesis
with Lambda usage (for Web APIs that might not be the case)
Frameworks
Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co
Thank you!
Analyzing Real-time Streaming Data with Amazon Kinesis

More Related Content

What's hot

What's hot (20)

(SEC321) Implementing Policy, Governance & Security for Enterprises
(SEC321) Implementing Policy, Governance & Security for Enterprises(SEC321) Implementing Policy, Governance & Security for Enterprises
(SEC321) Implementing Policy, Governance & Security for Enterprises
 
Databases
DatabasesDatabases
Databases
 
Modern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagementModern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagement
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
 
I servizi AWS per le applicazioni mobili: sviluppo, test e produzione
I servizi AWS per le applicazioni mobili: sviluppo, test e produzioneI servizi AWS per le applicazioni mobili: sviluppo, test e produzione
I servizi AWS per le applicazioni mobili: sviluppo, test e produzione
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Lessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete StanskiLessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete Stanski
 
Architecting Cloud Apps
Architecting Cloud AppsArchitecting Cloud Apps
Architecting Cloud Apps
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS Enterprise Summit Netherlands - WorkSpaces & WorkMail
AWS Enterprise Summit Netherlands - WorkSpaces & WorkMailAWS Enterprise Summit Netherlands - WorkSpaces & WorkMail
AWS Enterprise Summit Netherlands - WorkSpaces & WorkMail
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
 
Keynote: Future of IT - future of enterprise it Canada
Keynote: Future of IT - future of enterprise it CanadaKeynote: Future of IT - future of enterprise it Canada
Keynote: Future of IT - future of enterprise it Canada
 
Enterprise summit – architecting microservices on aws final v2
Enterprise summit – architecting microservices on aws   final v2Enterprise summit – architecting microservices on aws   final v2
Enterprise summit – architecting microservices on aws final v2
 
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
 
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?
 
Optimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot InstancesOptimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot Instances
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
Serverless Geospatial Mobile Apps with AWS
Serverless Geospatial Mobile Apps with AWSServerless Geospatial Mobile Apps with AWS
Serverless Geospatial Mobile Apps with AWS
 

Similar to Analyzing Real-time Streaming Data with Amazon Kinesis

Big Data and Analytics Innovation Summit
Big Data and Analytics Innovation SummitBig Data and Analytics Innovation Summit
Big Data and Analytics Innovation Summit
Martin Yan
 

Similar to Analyzing Real-time Streaming Data with Amazon Kinesis (20)

Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realPath to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
Big Data and Analytics Innovation Summit
Big Data and Analytics Innovation SummitBig Data and Analytics Innovation Summit
Big Data and Analytics Innovation Summit
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
 
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
 
Analysing Data in Real-time
Analysing Data in Real-timeAnalysing Data in Real-time
Analysing Data in Real-time
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
MindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
MindSphere: The cloud-based, open IoT operating system. Damiano ManocchiaMindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
MindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
 
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Real time analytics for streaming application v1.2
Real time analytics for streaming application v1.2Real time analytics for streaming application v1.2
Real time analytics for streaming application v1.2
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Analyzing Real-time Streaming Data with Amazon Kinesis

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roy Ben-Alta, Amazon AI and Analytics, BDM Daniel Modlinger, VP R&D, Nutrino June 21st, 2017 Real-Time Streaming Analytics Overview Of Amazon Kinesis
  • 2. What to expect from this session • Real-Time Streaming Analytics – Why Now? • Amazon Kinesis Overview • New Features • Design Patterns • Nutrino Case Study
  • 3. Most data is produced continuously Mobile Apps Web Clickstream Application Logs Metering Records IoT Sensors Smart Buildings [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/h tdocs/test
  • 4. What is your decision latency? OR
  • 5. The diminishing value of data Recent data is highly valuable • If you act on it in time • Perishable Insights (M. Gualtieri, Forrester) Old + Recent data is more valuable • If you have the means to combine them
  • 6. Processing real-time, streaming data • Durable • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements? Ingest Transform Analyze React Persist
  • 7. Streaming Data Scenarios Across Verticals Scenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Machine Learning and Actionable Insights Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care Operation monitoring Security DevOps tools, Ingesting VPCFlowLogs Subscribe to CloudwatchLogs and analyze logs in Real- Time Anomaly Detection
  • 8. Amazon Kinesis Kinesis FirehoseKinesis Analytics Platform for streaming data on AWS Kinesis Streams
  • 9. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams DeliverProcessCapture
  • 10. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams DeliverProcess For Technical Developers Build your own custom applications that process or analyze streaming data
  • 11. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams Deliver For Technical Developers Build your own custom applications that process or analyze streaming data For All Developers, Data Scientists, Business Analysts Easily analyze data streams using standard SQL queries
  • 12. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams For Technical Developers Build your own custom applications that process or analyze streaming data For All Developers, Data Scientists, Business Analysts Easily analyze data streams using standard SQL queries For All Developers Easily load massive volumes of streaming data into S3, Redshift and Elasticsearch, clean and format using Lambda
  • 13. Amazon Kinesis Customer Base Diversity 1 billion events/wk from connected devices | IoT 17 PB of game data per season | Entertainment 80 billion ad impressions/day, 30 ms response time | Ad Tech 100 GB/day click streams from 250+ sites | Enterprise 50 billion ad impressions/day sub-50 ms responses | Ad Tech 10 million events/day | Retail Ingesting 2M+ network events every second Funnel all production events through Amazon Kinesis
  • 14. Why are these customers choosing Amazon Kinesis? Lower costs Performant without heavy lifting Scales elastically Increased agility Secure and visible Plug and play
  • 15. Amazon Kinesis Streams Build your own data streaming applications Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming big data using Kinesis Client Library (KCL), Kinesis Analytics, Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.
  • 16. • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours – 7 days • Scale Kinesis streams by splitting or merging Shards • Replay data inside of 24Hr -7days Window Amazon Kinesis Stream Managed Ability to capture and store Data Time-based seek
  • 17. Amazon Kinesis Streams pricing Simple, pay-as-you-go, and no up-front costs Dimension Value Shard Hour $0.015 PUT Payload (per 1,0000,000 PUTs) $0.014 Extended Data Retention (Up to 7 days), per Shard Hour $0.020
  • 18. Streaming Data Scenarios Across Verticals Scenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Machine Learning and Actionable Insights Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care Operation Security DevOps tools, Ingesting VPCFlowLogs Subscribe to CloudwatchLogs and analyze logs in Real-Time Anomaly Detection
  • 19. Amazon Kinesis Firehose Load massive volumes of streaming data into Amazon S3, Redshift and Elasticsearch Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure. Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. Elastic: Scales to match data throughput w/o intervention Serverless ETL using AWS Lambda - Firehose can invoke your Lambda function to transform incoming source data. Capture and submit streaming data Analyze streaming data using your favorite BI tools Firehose loads streaming data continuously into Amazon S3, Redshift and Elasticsearch
  • 20. Amazon Kinesis Firehose pricing Simple, pay-as-you-go, and no up-front costs Dimension Value Per 1 GB of data ingested $0.029 - *$0.020 *Tier pricing
  • 21. Amazon Kinesis Firehose or Amazon Kinesis Streams?
  • 22. Kinesis Firehose vs. Kinesis Streams Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or higher.
  • 23. Streaming Data Scenarios Across Verticals Scenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Machine Learning and Actionable Insights Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care Operation Security DevOps tools, Ingesting VPCFlowLogs Subscribe to CloudwatchLogs and analyze logs in Real-Time Anomaly Detection
  • 24. Amazon Kinesis Analytics Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery Stream and apply SQL skills. Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies. Easy Scalability : Elastically scales to match data throughput. Connect to Kinesis streams, Firehose delivery streams Run standard SQL queries against data streams Kinesis Analytics can send processed data to analytics tools so you can create alerts and respond in real-time
  • 25. Use SQL to build real-time applications Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results
  • 27. Amazon Kinesis Streams Forthcoming Features Encryption at Rest VPCe
  • 28. Amazon Kinesis Firehose Forthcoming Features VPCe: RedShift ElasticSearch Parquet Format AWS RegionsKafka Connector
  • 29. Amazon Kinesis Analytics Forthcoming Features Amazon Lambda Integration Amazon QuickSight Destination Improved Autoscaling
  • 31. Capture all of the value from your data with Amazon Kinesis Amazon S3 Amazon Kinesis Analytics Amazon Kinesis– enabled app Amazon Kinesis Streams Ingest Process React Persist AWS Lambda 0ms 200ms 1-2 s Amazon QuickSight Amazon Kinesis Firehose Amazon Redshift
  • 32. Serverless Analytics – Enable your org with 0 operation overhead Amazon S3 Ingest Streaming ETL Persist Analyze AWS Lambda O msec seconds < 5 minutes Amazon Kinesis Firehose Amazon Redshift Amazon Elasticsearch Amazon Athena Amazon Kinesis Analytics Amazon Redshift Spectrum Amazon Kinesis Streams
  • 33. Amazon Kinesis Data Generator on GitHub https://awslabs.github.io/amazon-kinesis-data-generator/
  • 34. Putting Data in Amazon Kinesis AWS SDK • PutRecord() • PutRecordBatch() Kinesis Agent • Monitors files and sends new data records to your delivery stream • Handles file rotation, checkpointing, and retry upon failures • Pre-processing capabilities such as format conversion and log parsing • Emits AWS CloudWatch metrics for monitoring and troubleshooting
  • 35. Amazon EMR Amazon Kinesis StreamsStreaming Input Tumbling/Fixed Window Aggregation Periodic Output Amazon Redshift COPY from Amazon EMR Common Integration Pattern with Amazon EMR Tumbling Window Reporting
  • 36. Amazon Kinesis Streams with AWS Lambda
  • 37. Build your Streaming Pipeline with Amazon Kinesis, AWS Lambda and Amazon EMR
  • 39. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Daniel Modlinger VP R&D AWS Summit Tel Aviv
  • 40. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Nutrition advice - conflicted and not personal $435B
  • 41. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Food Analysis System Scientific Literature Food (10,000+ papers) (1,000,000+ foods) (300,000+ restaurant locations) Menus FoodPrint™ Correlations Insights Predictions Personal & continually improving nutrition recommendations Activity Sleep Continuous Glucose Blood Tests DNA - Genome Taste, allergies etc. Mood Blood pressure Temperature Biome Social media Individual Inputs
  • 42. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Examples - Partners Pregnancy Diabetes ConsumerEmployer/payer Retail confidential Hypertension confidential
  • 43. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Data Streaming
  • 44. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co FoodPrint™ Activity Sleep Continuous Glucose Blood Tests DNA - Genome Taste, allergies etc. Mood Blood pressure Temperature Biome Social media Individual Inputs
  • 45. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Needs Volume - Millions of daily data points Scale - Non linear growth Performance - Near real time, as low as 3rd party producers of data allow Extract as much data as possible and store unknown data types and formats Process and store a wide variety of known data types Goal: Process and store nutrition-related sensor data
  • 46. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co First attempts 2015 2016 2017
  • 47. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Updated Needs (AKA lessons learned) Cut the middle man Detailed monitoring is a necessity Be event driven Fast developer response to changes in APIs Reliability is key, otherwise data gets lost Goal: Process and store nutrition related sensors data
  • 48. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Current Solution - High Level Architecture
  • 49. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Lambda allows us to write simple and clean code that will deal with each entity without the need to be aware of flow or context Why Kinesis & Lambda (1) Process and store sensor data and all other nutrition-related data Scale - Non linear growth Performance - Near real time Store unknown data types and formats Wide variety of known data types All unknow data is saved to S3 using Kinesis Firehose, which is later analyzed by our Spark cluster With Lambda triggers on streams we can act quickly enough Scaling can be as easy as adding a shard, Lambda functions are executed as much as we need
  • 50. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Volume - Millions of daily data points S3 is limitless, Kinesis streams are practically limitless Why Kinesis & Lambda (2) Process and store sensor data and all other nutrition-related data Reliability is key, otherwise data gets lost Allow fast developer response to changes in APIs Be event driven wherever possible because polling causes problems Kinesis streams are persistent (from 24h up to one week), Lambda functions automatically retry, cloudwatch has detailed monitoring of everything Using API Gateway and Lambda integration allows 3rd party producers to activate the processing Changing one Lambda function does not affect the flow, the deploy resolution can be as low as a single function
  • 51. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Why Kinesis & Lambda (3) Detailed monitoring is a necessity and not a luxury Cut the middle man - avoid writing infrastructure to allow clients (iOS) write sensor data Iterator Age, put count, Lambda monitoring are a good baseline to start from We are using AWS iOS SDK to write the sensor data Process and store sensor data and all other nutrition-related data
  • 52. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co • Optimal shards number is a derivative of desired iterator age (aka latency) • Optimal batch size - If your Lambda scales in a linear fashion then it’s doesn’t really matters.. • A high number will cause less calls and can create an internal queue for the consumer Lambda function (which will increase the latency to the user) • A low number can be expensive in terms of Lambda overhead and price. Scale & Performance (1) Best Practice
  • 53. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Scale & Performance (2) Best Practice • Lambda functions run sequentially on the shard’s data, but in parallel between different shard • The Lambda consumer will not continue to the next batch until the function finishes successfully. Make sure to handle exception in your Lambda functions or you can get your shard stuck! • If your functions are slow, consider using a “forward” Lambda: a function that will only read the records from Kinesis and then will activate another Lambda function (which can run in parallel) • Understand the KCL, it will help you adjust the different parameters to your needs
  • 54. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Best Practice • Make sure to monitor: • Iterator age • Put records count • Records distribution across shards • Each Lambda on it’s own • Your own required metrics Monitoring
  • 55. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co • Duplicate records are almost inevitable • It’s easier to just handle duplicates than to avoid the problem • Watch out - updates are not duplicates! • It is possible to keep records order, but will likely to cause a larger overhead. Try to avoid relying on records order as much as possible. Best Practice Duplication handling & order considerations
  • 56. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Best Practice • It’s recommended that the first setup will be done manually (using CLI) • We found Serverless (by Serverless, Inc.) to be the most suitable for Kinesis with Lambda usage (for Web APIs that might not be the case) Frameworks
  • 57. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Thank you!