SlideShare a Scribd company logo
1 of 49
Download to read offline
Real-time Streaming Data on AWS
Deep Dive & Best Practices Using Amazon Kinesis,
Spark Streaming, AWS Lambda, and Amazon EMR
Roy Ben-Alta, Sr. Business Development Manager, AWS
Orit Alul, Data and Analytics R&D Director, Sizmek
June-16-2016
Agenda
Real-time streaming overview
Use cases and design patterns
Amazon Kinesis deep dive
Streaming data ingestion & Stream processing
Sizmek Case Study
Q&A
Batch Processing
Hourly server logs
Weekly or monthly bills
Daily web-site clickstream
Daily fraud reports
Stream Processing
Real-time metrics
Real-time spending alerts/caps
Real-time clickstream analysis
Real-time detection
It’s All About the Pace
Streaming Data Scenarios Across Verticals
Scenarios/
Verticals
Accelerated Ingest-
Transform-Load
Continuous Metrics
Generation
Responsive Data Analysis
Digital Ad
Tech/Marketing
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, and
conversion
User engagement with
ads, optimized bid/buy
engines
IoT Sensor, device telemetry
data ingestion
Operational metrics and
dashboards
Device operational
intelligence and alerts
Gaming Online data aggregation,
e.g., top 10 players
Massively multiplayer
online game (MMOG) live
dashboard
Leader board generation,
player-skill match
Consumer
Online
Clickstream analytics Metrics like impressions
and page views
Recommendation engines,
proactive care
Customer Use Cases
Sonos runs near real-time streaming
analytics on device data logs from
their connected hi-fi audio equipment.
One of the biggest online brokerages
for real estate in US. Built Hot Homes
feature.
Glu Mobile collects billions of
gaming events data points from
millions of user devices in
real-time every single day.
Nordstorm recommendation team built
online stylist using Amazon Kinesis
Streams and AWS Lambda.
Metering Record Common Log Entry
MQTT RecordSyslog Entry
{
"payerId": "Joe",
"productCode": "AmazonS3",
"clientProductCode": "AmazonS3",
"usageType": "Bandwidth",
"operation": "PUT",
"value": "22490",
"timestamp": "1216674828"
}
{
127.0.0.1 user-
identifier frank
[10/Oct/2000:13:5
5:36 -0700] "GET
/apache_pb.gif
HTTP/1.0" 200
2326
}
{
“SeattlePublicWa
ter/Kinesis/123/
Realtime” –
412309129140
}
{
<165>1 2003-10-11T22:14:15.003Z
mymachine.example.com evntslog -
ID47 [exampleSDID@32473 iut="3"
eventSource="Application"
eventID="1011"][examplePriority@
32473 class="high"]
}
Streaming Data Challenges: Variety & Velocity
• Streaming data comes in
different types and
formats
− Metering records,
logs and sensor data
− JSON, CSV, TSV
• Can vary in size from a
few bytes to kilobytes or
megabytes
• High velocity and
continuous processing
Two Main Processing Patterns
Stream processing (real time)
• Real-time response to events in data streams
Examples:
• Proactively detect hardware errors in device logs
• Notify when inventory drops below a threshold
• Fraud detection
Micro-batching (near real time)
• Near real-time operations on small batches of events in data streams
Examples:
• Aggregate and archive events
• Monitor performance SLAs
Amazon Kinesis Deep Dive
Amazon Kinesis
Streams
• For Technical Developers
• Build your own custom
applications that process
or analyze streaming
data
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift
and Amazon Elasticsearch
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
• Preview
Amazon Kinesis: Streaming Data Made Easy
Services make it easy to capture, deliver and process streams on AWS
Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3, Amazon
Redshift and Amazon Elasticsearch
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift
and Amazon Elasticsearch without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for
delivery into data destinations in as little as 60 secs using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit
streaming data to Firehose
Analyze streaming data using your
favorite BI tools
Firehose loads streaming data
continuously into S3, Amazon Redshift
and Amazon Elasticsearch
Data
Sources
App.4
[Machine
Learning]
AWSEndpoint
App.1
[Aggregate &
De-Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
Extraction]
Amazon S3
Amazon Redshift
App.3
[Sliding
Window
Analysis]
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Amazon Kinesis Streams
Managed service for real-time streaming
AWS Lambda
Amazon EMR
Amazon Kinesis Streams
Managed ability to capture and store data
• Streams are made of shards
• Each shard ingests up to 1MB/sec, and
1000 records/sec
• Each shard emits up to 2 MB/sec
• All data is stored for 24 hours by
default; storage can be extended for
up to 7 days
• Scale Kinesis streams using scaling util
• Replay data
Amazon Kinesis Streams
Managed ability to capture and store data
Amazon Kinesis Firehose vs. Amazon Kinesis
Streams
Amazon Kinesis Streams is for use cases that require custom
processing, per incoming record, with sub-1 second processing
latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is for use cases that require zero
administration, ability to use existing analytics tools based on
Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a
data latency of 60 seconds or higher.
Streaming Data Ingestion and
Stream Processing
Putting Data into Amazon Kinesis Streams
Determine your partition key strategy
• Managed buffer or streaming MapReduce job
• Ensure high cardinality for your shards
Provision adequate shards
• For ingress needs
• Egress needs for all consuming applications: if more
than two simultaneous applications
• Include headroom for catching up with data in stream
Putting Data into Amazon Kinesis
Amazon Kinesis Agent – (supports pre-processing)
• http://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html
Pre-batch before Puts for better efficiency
• Consider Flume, Fluentd as collectors/agents
• See https://github.com/awslabs/aws-fluent-plugin-kinesis
Make a tweak to your existing logging
• log4j appender option
• See https://github.com/awslabs/kinesis-log4j-appender
Amazon Kinesis Producer Library
• Writes to one or more Amazon Kinesis streams with automatic,
configurable retry mechanism
• Collects records and uses PutRecords to write multiple records to
multiple shards per request
• Aggregates user records to increase payload size and improve
throughput
• Integrates seamlessly with KCL to de-aggregate batched records
• Use Amazon Kinesis Producer Library with AWS Lambda (New!)
• Submits Amazon CloudWatch metrics on your behalf to provide
visibility into producer performance
Record Order and Multiple Shards
Unordered processing
• Randomize partition key to distribute events over
many shards and use multiple workers
Exact order processing
• Control partition key to ensure events are
grouped into the same shard and read by the
same worker
Need both? Use global sequence number
Producer
Get Global
Sequence
Unordered
Stream
Campaign Centric
Stream
Fraud Inspection
Stream
Get Event
Metadata
Sample Code for Scaling Shards
java -cp
KinesisScalingUtils.jar-complete.jar
-Dstream-name=MyStream
-Dscaling-action=scaleUp
-Dcount=10
-Dregion=eu-west-1 ScalingClient
Options:
• stream-name - The name of the stream to be scaled
• scaling-action - The action to be taken to scale. Must be one of "scaleUp”, "scaleDown"
or “resize”
• count - Number of shards by which to absolutely scale up or down, or resize
See https://github.com/awslabs/amazon-kinesis-scaling-utils
Amazon Kinesis Client Library
• Build Kinesis Applications with Kinesis Client Library (KCL)
• Open source client library available for Java, Ruby, Python,
Node.JS dev
• Deploy on your EC2 instances
• KCL Application includes three components:
1. Record Processor Factory – Creates the record processor
2. Record Processor – Processor unit that processes data from a
shard in Amazon Kinesis Streams
3. Worker – Processing unit that maps to each application instance
State Management with Kinesis Client Library
• One record processor maps to one shard and processes data records from
that shard
• One worker maps to one or more record processors
• Balances shard-worker associations when worker / instance counts change
• Balances shard-worker associations when shards split or merge
Other Options
• Third-party connectors(for example, Apache Storm,
Splunk and more)
• AWS IoT platform
• Amazon EMR with Apache Spark, Pig or Hive
• AWS Lambda
Apache Spark and Amazon Kinesis Streams
Apache Spark is an in-memory analytics cluster using
RDD for fast processing
Spark Streaming can read directly from an Amazon
Kinesis stream
Amazon software license linking – Add ASL dependency
to SBT/MAVEN project, artifactId = spark-
streaming-kinesis-asl_2.10
KinesisUtils.createStream(‘twitter-stream’)
.filter(_.getText.contains(”Open-Source"))
.countByWindow(Seconds(5))
Example: Counting tweets on a sliding window
Amazon EMR
Amazon
Kinesis
StreamsStreaming Input
Tumbling/Fixed
Window
Aggregation
Periodic Output
Amazon Redshift
COPY from
Amazon EMR
Common Integration Pattern with Amazon EMR
Tumbling Window Reporting
Using Spark Streaming with Amazon Kinesis
Streams
1. Use Spark 1.6+ with EMRFS consistent view option – if you
use Amazon S3 as storage for Spark checkpoint
2. Amazon DynamoDB table name – make sure there is only one
instance of the application running with Spark Streaming
3. Enable Spark-based checkpoints
4. Number of Amazon Kinesis receivers is multiple of executors so
they are load-balanced
5. Total processing time is less than the batch interval
6. Number of executors is the same as number of cores per
executor
7. Spark Streaming uses default of 1 sec with KCL
Demo
Sensors
Amazon
Kinesis
Streams
Streaming
ProcessData Source
Collect /
Store
Consume
- Simulated using EC2
- 20K events/sec
- Scalable
- Managed
- Millisecond latency
- Highly available
- Scalable
- Managed
- Micro-batch
- Checkpoint in DynamoDB
Amazon
DynamoDB
Device_Id, temperature, timestamp
1,65,2016-03-25 01:01:20
2,68,2016-03-25 01:01:20
3,67,2016-03-25 01:01:20
4,85,2016-03-25 01:01:20
Amazon Kinesis Streams with AWS Lambda
• Digital Media
• Publication brands
• Mobile Apps
• Re-targeting platforms
Hearst’s wonderful world of streaming data
Buzzing@Hearst Demo
The Business Value of
Buzzing@Hearst
Real-Time Reactions
Instant feedback on articles from our audiences
Promoting Popular Content Cross-Channel
Incremental re-syndication of popular articles across properties
(e.g. trending newspaper articles can be adopted by magazines)
Authentic Influence
Inform Hearst editors to write articles that are more relevant to our
audiences
Understanding Engagement
Inform both editors what channels are our audiences leveraging to read
Hearst articles
INCREMENTAL
REVENUE
25% more
page views
15% more
visitors
Buzzing API
API
Ready
Data
Amazon
Kinesis
Node.JS
App- Proxy
Clickstream
Data Science
Application
Amazon Redshift
ETL on EMR
Models
Agg Data
S3
Users to
Hearst
Properties
Final Hearst Data Pipeline
LATENCY
THROUGHPUT
Milliseconds 30 Seconds 100 Seconds 5 Seconds
100GB/Day 5GB/Day 1GB/Day 1GB/Day
Firehose
Real-time streaming
@sizmek
O r i t Al u l – R & D D i r e c t o r
Sizmek – Who? What? How?
 Who are we?
 2nd Largest Ad management company
in the world operating globally in 50+ countries
 Open Ad Management Platform
 Who are our customers?
 Advertising agencies and advertisers
 What is our value?
 We enable our clients to manage their cross-
channel campaigns, and find their relevant audience
seamlessly
 We built a fast, high throughput infrastructure that
enables real time analytics and decision making.
Over 13,000 brands
use our platform
Over 5,000 media
agencies use our
platform
Sizmek – Our customers
Data processing at Sizmek
 15B events per day
 4 TB data per day
 150K events per second
 Volatile data traffic load
 During the day /week/year
 During the special event(e.g. Black Friday)
 Increase demand for Realtime insights
NXT generation of Analytics at Sizmek
Flexible
Faster
Simple
Ad
Server
s
Users
Sizmek Data Platform on AWS
Users Store
Kinesis
Incoming
EMRKinesis
Enriched
Data
Lake
Our Kinesis
Journey
Why Amazon Kinesis?
41
 Apache Kafka -> Amazon Kinesis =
Managed Service = no Ops effort
 Integration with Apache Storm and AWS
Lambda
 Integration with Amazon S3, Amazon
Redshift and other AWS services
 Meets our throughput needs
Tip 1: Amazon Kinesis Apache
Storm connector
42
Insert a non-processed message back to the stream as new message
Bolt
Tried to process a message
max number of retries
Tip 2: putRecord = more shards!
 Requirements:
 Apache Storm requires processing event by event
 Average event size is 1KB
 We must use putRecord API
 Testing results:
 75% of Amazon Kinesis maximum TP limit
 300 - 500 shards
 Conclusion
 Provision more shards (25%)
 Always test your workload
Tip 3: Error handling on record
level
Date: <Date>
{
"FailedRecordCount": 2,
"Records": [
{
"SequenceNumber":
"49543463076548007577105092703039560359975228518395012686",
"ShardId": "shardId-000000000000"
},
{
"ErrorCode": "ProvisionedThroughputExceededException",
"ErrorMessage": "Rate exceeded for shard shardId-000000000001 in
stream <StreamName> under account 111111111111."
},
{
"ErrorCode": "InternalFailure",
"ErrorMessage": "Internal service failure."
}
]
}
Kinesis putRecords (bulk insert) response
Tip 4: Shard level metrics is your
friend
Thank You!
& Yes we’re
hiring 
Conclusion
• Amazon Kinesis offers: managed service to build applications, streaming
data ingestion, and continuous processing
• Ingest aggregate data using Amazon Producer Library
• Process data using Amazon Connector Library and open source connectors
• Determine your partition key strategy
• Try out Amazon Kinesis at http://aws.amazon.com/kinesis/
• Technical documentations
• Amazon Kinesis Agent
• Amazon Kinesis Streams and Spark Streaming
• Amazon Kinesis Producer Library Best Practice
• Amazon Kinesis Firehose and AWS Lambda
• Building Near Real-Time Discovery Platform with Amazon Kinesis
• Public case studies
• Glu mobile – Real-Time Analytics
• Hearst Publishing – Clickstream Analytics
• How Sonos Leverages Amazon Kinesis
• Nordstorm Online Stylist
Reference
benaltar@amazon.om

More Related Content

What's hot

대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...Amazon Web Services Korea
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
AWS S3 and GLACIER
AWS S3 and GLACIERAWS S3 and GLACIER
AWS S3 and GLACIERMahesh Raj
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링Amazon Web Services Korea
 
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Web Services Korea
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & LoggingJason Poley
 
Building APIs with Amazon API Gateway
Building APIs with Amazon API GatewayBuilding APIs with Amazon API Gateway
Building APIs with Amazon API GatewayAmazon Web Services
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAmazon Web Services
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentationadvaitdeo
 
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016Amazon Web Services Korea
 

What's hot (20)

BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
AWS S3 and GLACIER
AWS S3 and GLACIERAWS S3 and GLACIER
AWS S3 and GLACIER
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
 
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & Logging
 
Security Best Practices on AWS
Security Best Practices on AWSSecurity Best Practices on AWS
Security Best Practices on AWS
 
Building APIs with Amazon API Gateway
Building APIs with Amazon API GatewayBuilding APIs with Amazon API Gateway
Building APIs with Amazon API Gateway
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSight
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
 
Amazon Rekognition
Amazon RekognitionAmazon Rekognition
Amazon Rekognition
 

Viewers also liked

(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWSAmazon Web Services
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Amazon Web Services
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsAmazon Web Services
 
Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...
Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...
Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...Amazon Web Services
 
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)Amazon Web Services
 
AWS Summit Auckland - Smaller is Better - Microservices on AWS
AWS Summit Auckland - Smaller is Better - Microservices on AWSAWS Summit Auckland - Smaller is Better - Microservices on AWS
AWS Summit Auckland - Smaller is Better - Microservices on AWSAmazon Web Services
 
AWS Media and Entertainment - Broadcast and OTT Workloads - Toronto
AWS Media and Entertainment - Broadcast and OTT Workloads - TorontoAWS Media and Entertainment - Broadcast and OTT Workloads - Toronto
AWS Media and Entertainment - Broadcast and OTT Workloads - TorontoAmazon Web Services
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
The future of real time information
The future of real time informationThe future of real time information
The future of real time informationthaiscarbonell1512
 
Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016Amazon Web Services
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hybrid IT Approach and Technologies with the AWS Cloud
Hybrid IT Approach and Technologies with the AWS CloudHybrid IT Approach and Technologies with the AWS Cloud
Hybrid IT Approach and Technologies with the AWS CloudAmazon Web Services
 
How to extract valueable information from real time data feeds
How to extract valueable information from real time data feedsHow to extract valueable information from real time data feeds
How to extract valueable information from real time data feedsGene Leybzon
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesData Science Society
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloudSankar Nagarajan
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Amazon Web Services
 

Viewers also liked (20)

(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
 
Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...
Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...
Automating Software Deployments with AWS CodeDeploy by Matthew Trescot, Manag...
 
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
 
AWS Summit Auckland - Smaller is Better - Microservices on AWS
AWS Summit Auckland - Smaller is Better - Microservices on AWSAWS Summit Auckland - Smaller is Better - Microservices on AWS
AWS Summit Auckland - Smaller is Better - Microservices on AWS
 
AWS Media and Entertainment - Broadcast and OTT Workloads - Toronto
AWS Media and Entertainment - Broadcast and OTT Workloads - TorontoAWS Media and Entertainment - Broadcast and OTT Workloads - Toronto
AWS Media and Entertainment - Broadcast and OTT Workloads - Toronto
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Media Streaming on AWS
Media Streaming on AWSMedia Streaming on AWS
Media Streaming on AWS
 
The future of real time information
The future of real time informationThe future of real time information
The future of real time information
 
Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016
 
A Serverless Data Pipeline
A Serverless Data PipelineA Serverless Data Pipeline
A Serverless Data Pipeline
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hybrid IT Approach and Technologies with the AWS Cloud
Hybrid IT Approach and Technologies with the AWS CloudHybrid IT Approach and Technologies with the AWS Cloud
Hybrid IT Approach and Technologies with the AWS Cloud
 
How to extract valueable information from real time data feeds
How to extract valueable information from real time data feedsHow to extract valueable information from real time data feeds
How to extract valueable information from real time data feeds
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloud
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
 

Similar to Deep Dive and Best Practices for Real Time Streaming Applications

AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon KinesisAmazon Web Services
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Processamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoProcessamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoAmazon Web Services LATAM
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Web Services
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-SourceAmazon Web Services
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 

Similar to Deep Dive and Best Practices for Real Time Streaming Applications (20)

AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Processamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoProcessamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de uso
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-Source
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

Deep Dive and Best Practices for Real Time Streaming Applications

  • 1. Real-time Streaming Data on AWS Deep Dive & Best Practices Using Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR Roy Ben-Alta, Sr. Business Development Manager, AWS Orit Alul, Data and Analytics R&D Director, Sizmek June-16-2016
  • 2. Agenda Real-time streaming overview Use cases and design patterns Amazon Kinesis deep dive Streaming data ingestion & Stream processing Sizmek Case Study Q&A
  • 3. Batch Processing Hourly server logs Weekly or monthly bills Daily web-site clickstream Daily fraud reports Stream Processing Real-time metrics Real-time spending alerts/caps Real-time clickstream analysis Real-time detection It’s All About the Pace
  • 4. Streaming Data Scenarios Across Verticals Scenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Responsive Data Analysis Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care
  • 5. Customer Use Cases Sonos runs near real-time streaming analytics on device data logs from their connected hi-fi audio equipment. One of the biggest online brokerages for real estate in US. Built Hot Homes feature. Glu Mobile collects billions of gaming events data points from millions of user devices in real-time every single day. Nordstorm recommendation team built online stylist using Amazon Kinesis Streams and AWS Lambda.
  • 6. Metering Record Common Log Entry MQTT RecordSyslog Entry { "payerId": "Joe", "productCode": "AmazonS3", "clientProductCode": "AmazonS3", "usageType": "Bandwidth", "operation": "PUT", "value": "22490", "timestamp": "1216674828" } { 127.0.0.1 user- identifier frank [10/Oct/2000:13:5 5:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 } { “SeattlePublicWa ter/Kinesis/123/ Realtime” – 412309129140 } { <165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"][examplePriority@ 32473 class="high"] } Streaming Data Challenges: Variety & Velocity • Streaming data comes in different types and formats − Metering records, logs and sensor data − JSON, CSV, TSV • Can vary in size from a few bytes to kilobytes or megabytes • High velocity and continuous processing
  • 7. Two Main Processing Patterns Stream processing (real time) • Real-time response to events in data streams Examples: • Proactively detect hardware errors in device logs • Notify when inventory drops below a threshold • Fraud detection Micro-batching (near real time) • Near real-time operations on small batches of events in data streams Examples: • Aggregate and archive events • Monitor performance SLAs
  • 9. Amazon Kinesis Streams • For Technical Developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift and Amazon Elasticsearch Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries • Preview Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver and process streams on AWS
  • 10. Amazon Kinesis Firehose Load massive volumes of streaming data into Amazon S3, Amazon Redshift and Amazon Elasticsearch Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift and Amazon Elasticsearch without writing an application or managing infrastructure. Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. Seamless elasticity: Seamlessly scales to match data throughput w/o intervention Capture and submit streaming data to Firehose Analyze streaming data using your favorite BI tools Firehose loads streaming data continuously into S3, Amazon Redshift and Amazon Elasticsearch
  • 11. Data Sources App.4 [Machine Learning] AWSEndpoint App.1 [Aggregate & De-Duplicate] Data Sources Data Sources Data Sources App.2 [Metric Extraction] Amazon S3 Amazon Redshift App.3 [Sliding Window Analysis] Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone Amazon Kinesis Streams Managed service for real-time streaming AWS Lambda Amazon EMR
  • 12. Amazon Kinesis Streams Managed ability to capture and store data
  • 13. • Streams are made of shards • Each shard ingests up to 1MB/sec, and 1000 records/sec • Each shard emits up to 2 MB/sec • All data is stored for 24 hours by default; storage can be extended for up to 7 days • Scale Kinesis streams using scaling util • Replay data Amazon Kinesis Streams Managed ability to capture and store data
  • 14. Amazon Kinesis Firehose vs. Amazon Kinesis Streams Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or higher.
  • 15. Streaming Data Ingestion and Stream Processing
  • 16. Putting Data into Amazon Kinesis Streams Determine your partition key strategy • Managed buffer or streaming MapReduce job • Ensure high cardinality for your shards Provision adequate shards • For ingress needs • Egress needs for all consuming applications: if more than two simultaneous applications • Include headroom for catching up with data in stream
  • 17. Putting Data into Amazon Kinesis Amazon Kinesis Agent – (supports pre-processing) • http://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html Pre-batch before Puts for better efficiency • Consider Flume, Fluentd as collectors/agents • See https://github.com/awslabs/aws-fluent-plugin-kinesis Make a tweak to your existing logging • log4j appender option • See https://github.com/awslabs/kinesis-log4j-appender
  • 18. Amazon Kinesis Producer Library • Writes to one or more Amazon Kinesis streams with automatic, configurable retry mechanism • Collects records and uses PutRecords to write multiple records to multiple shards per request • Aggregates user records to increase payload size and improve throughput • Integrates seamlessly with KCL to de-aggregate batched records • Use Amazon Kinesis Producer Library with AWS Lambda (New!) • Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance
  • 19. Record Order and Multiple Shards Unordered processing • Randomize partition key to distribute events over many shards and use multiple workers Exact order processing • Control partition key to ensure events are grouped into the same shard and read by the same worker Need both? Use global sequence number Producer Get Global Sequence Unordered Stream Campaign Centric Stream Fraud Inspection Stream Get Event Metadata
  • 20. Sample Code for Scaling Shards java -cp KinesisScalingUtils.jar-complete.jar -Dstream-name=MyStream -Dscaling-action=scaleUp -Dcount=10 -Dregion=eu-west-1 ScalingClient Options: • stream-name - The name of the stream to be scaled • scaling-action - The action to be taken to scale. Must be one of "scaleUp”, "scaleDown" or “resize” • count - Number of shards by which to absolutely scale up or down, or resize See https://github.com/awslabs/amazon-kinesis-scaling-utils
  • 21. Amazon Kinesis Client Library • Build Kinesis Applications with Kinesis Client Library (KCL) • Open source client library available for Java, Ruby, Python, Node.JS dev • Deploy on your EC2 instances • KCL Application includes three components: 1. Record Processor Factory – Creates the record processor 2. Record Processor – Processor unit that processes data from a shard in Amazon Kinesis Streams 3. Worker – Processing unit that maps to each application instance
  • 22. State Management with Kinesis Client Library • One record processor maps to one shard and processes data records from that shard • One worker maps to one or more record processors • Balances shard-worker associations when worker / instance counts change • Balances shard-worker associations when shards split or merge
  • 23. Other Options • Third-party connectors(for example, Apache Storm, Splunk and more) • AWS IoT platform • Amazon EMR with Apache Spark, Pig or Hive • AWS Lambda
  • 24. Apache Spark and Amazon Kinesis Streams Apache Spark is an in-memory analytics cluster using RDD for fast processing Spark Streaming can read directly from an Amazon Kinesis stream Amazon software license linking – Add ASL dependency to SBT/MAVEN project, artifactId = spark- streaming-kinesis-asl_2.10 KinesisUtils.createStream(‘twitter-stream’) .filter(_.getText.contains(”Open-Source")) .countByWindow(Seconds(5)) Example: Counting tweets on a sliding window
  • 25. Amazon EMR Amazon Kinesis StreamsStreaming Input Tumbling/Fixed Window Aggregation Periodic Output Amazon Redshift COPY from Amazon EMR Common Integration Pattern with Amazon EMR Tumbling Window Reporting
  • 26. Using Spark Streaming with Amazon Kinesis Streams 1. Use Spark 1.6+ with EMRFS consistent view option – if you use Amazon S3 as storage for Spark checkpoint 2. Amazon DynamoDB table name – make sure there is only one instance of the application running with Spark Streaming 3. Enable Spark-based checkpoints 4. Number of Amazon Kinesis receivers is multiple of executors so they are load-balanced 5. Total processing time is less than the batch interval 6. Number of executors is the same as number of cores per executor 7. Spark Streaming uses default of 1 sec with KCL
  • 27. Demo Sensors Amazon Kinesis Streams Streaming ProcessData Source Collect / Store Consume - Simulated using EC2 - 20K events/sec - Scalable - Managed - Millisecond latency - Highly available - Scalable - Managed - Micro-batch - Checkpoint in DynamoDB Amazon DynamoDB Device_Id, temperature, timestamp 1,65,2016-03-25 01:01:20 2,68,2016-03-25 01:01:20 3,67,2016-03-25 01:01:20 4,85,2016-03-25 01:01:20
  • 28. Amazon Kinesis Streams with AWS Lambda
  • 29. • Digital Media • Publication brands • Mobile Apps • Re-targeting platforms Hearst’s wonderful world of streaming data
  • 31. The Business Value of Buzzing@Hearst Real-Time Reactions Instant feedback on articles from our audiences Promoting Popular Content Cross-Channel Incremental re-syndication of popular articles across properties (e.g. trending newspaper articles can be adopted by magazines) Authentic Influence Inform Hearst editors to write articles that are more relevant to our audiences Understanding Engagement Inform both editors what channels are our audiences leveraging to read Hearst articles INCREMENTAL REVENUE 25% more page views 15% more visitors
  • 32. Buzzing API API Ready Data Amazon Kinesis Node.JS App- Proxy Clickstream Data Science Application Amazon Redshift ETL on EMR Models Agg Data S3 Users to Hearst Properties Final Hearst Data Pipeline LATENCY THROUGHPUT Milliseconds 30 Seconds 100 Seconds 5 Seconds 100GB/Day 5GB/Day 1GB/Day 1GB/Day Firehose
  • 33. Real-time streaming @sizmek O r i t Al u l – R & D D i r e c t o r
  • 34. Sizmek – Who? What? How?  Who are we?  2nd Largest Ad management company in the world operating globally in 50+ countries  Open Ad Management Platform  Who are our customers?  Advertising agencies and advertisers  What is our value?  We enable our clients to manage their cross- channel campaigns, and find their relevant audience seamlessly  We built a fast, high throughput infrastructure that enables real time analytics and decision making.
  • 35. Over 13,000 brands use our platform Over 5,000 media agencies use our platform Sizmek – Our customers
  • 36. Data processing at Sizmek  15B events per day  4 TB data per day  150K events per second  Volatile data traffic load  During the day /week/year  During the special event(e.g. Black Friday)  Increase demand for Realtime insights
  • 37. NXT generation of Analytics at Sizmek Flexible Faster Simple
  • 38. Ad Server s Users Sizmek Data Platform on AWS Users Store Kinesis Incoming EMRKinesis Enriched Data Lake
  • 40. Why Amazon Kinesis? 41  Apache Kafka -> Amazon Kinesis = Managed Service = no Ops effort  Integration with Apache Storm and AWS Lambda  Integration with Amazon S3, Amazon Redshift and other AWS services  Meets our throughput needs
  • 41. Tip 1: Amazon Kinesis Apache Storm connector 42 Insert a non-processed message back to the stream as new message Bolt Tried to process a message max number of retries
  • 42. Tip 2: putRecord = more shards!  Requirements:  Apache Storm requires processing event by event  Average event size is 1KB  We must use putRecord API  Testing results:  75% of Amazon Kinesis maximum TP limit  300 - 500 shards  Conclusion  Provision more shards (25%)  Always test your workload
  • 43. Tip 3: Error handling on record level Date: <Date> { "FailedRecordCount": 2, "Records": [ { "SequenceNumber": "49543463076548007577105092703039560359975228518395012686", "ShardId": "shardId-000000000000" }, { "ErrorCode": "ProvisionedThroughputExceededException", "ErrorMessage": "Rate exceeded for shard shardId-000000000001 in stream <StreamName> under account 111111111111." }, { "ErrorCode": "InternalFailure", "ErrorMessage": "Internal service failure." } ] } Kinesis putRecords (bulk insert) response
  • 44. Tip 4: Shard level metrics is your friend
  • 45. Thank You! & Yes we’re hiring 
  • 46. Conclusion • Amazon Kinesis offers: managed service to build applications, streaming data ingestion, and continuous processing • Ingest aggregate data using Amazon Producer Library • Process data using Amazon Connector Library and open source connectors • Determine your partition key strategy • Try out Amazon Kinesis at http://aws.amazon.com/kinesis/
  • 47. • Technical documentations • Amazon Kinesis Agent • Amazon Kinesis Streams and Spark Streaming • Amazon Kinesis Producer Library Best Practice • Amazon Kinesis Firehose and AWS Lambda • Building Near Real-Time Discovery Platform with Amazon Kinesis • Public case studies • Glu mobile – Real-Time Analytics • Hearst Publishing – Clickstream Analytics • How Sonos Leverages Amazon Kinesis • Nordstorm Online Stylist Reference
  • 48.

Editor's Notes

  1. Speed matters in business! To capture “Perishable Insights”: Insights that can provide incredible value but the value expires and evaporates once the moment is gone. Mike Gualtieri, Principal Analyst at Forrester Research
  2. The benefits of real-time processing apply to pretty much every scenario where new data is being generated on a continual basis. We see it pervasively across all the big data areas we’ve looked at, and in pretty much every industry segment whose customers we’ve spoken to. Customers tend to start with mundane, unglamorous applications, such as system log ingestion and processing. They then progress over time to ever-more sophisticated forms of real-time processing. Initially they may simply process their data streams to produce simple metrics and reports, and perform simple actions in response, such as emitting alarms over metrics that have breached their warning thresholds. Eventually they start to do more sophisticated forms of data analysis in order to obtain deeper understanding of their data, such as applying machine learning algorithms. In the long run they can be expected to apply a multiplicity of complex stream and event processing algorithms to their data. We then looked at specific use cases Alert when public sentiment changes Customers want to respond quickly to social media feeds Twitter Firehose: ~450 million events per second, or 10.1 MB/sec1 Respond to changes in the stock market Customers want to compute value-at-risk or rebalance portfolios based on stock prices NASDAQ Ultra Feed: 230 GB/day, or 8.1 MB/sec1 Aggregate data before loading in external data store Customers have billions of click stream records arriving continuously from servers Large, aggregated PUTs to S3 are cost effective Internet marketing company: 20MB/sec Recommendation and Personalization engines
  3. Challenges with Streaming Data 1. Variety: Logs, Clickstream, Sensors, Phones, Transactions. Different types of data often in formats: JSON, CSV, TSV, and more. Streaming data also come in variety of size can be bytes, KB, MB. 2. Streaming data is not about volume it is about Velocity. You have rapid data arriving continuously and simultaneously from thousands of data sources. The challenge is to handle such variety and velocity: You need a fleet of servers to capture the data, buffer it, and batch it reliably and efficiently. And you need to monitor and maintain these servers. Just imagine if one of these servers goes down. Data is still streaming in and needs to be captured and buffered. And when your servers are back up again, you need a mechanism for checkpoinitng and catching up where you left.
  4. There are two ways to process streaming data: 1. Stream processing (also called real time: This gives you the ability to respond in real-time to events in data streams Examples: Proactively detect hardware errors in device logs; Notify when inventory drops below a threshold 2. Micro-batching (near real time): operations on small batches of events in data streams Examples: Identify fraud from activity logs Monitor performance SLAs
  5. Kinesis is the platform for Streaming Data on AWS and is usually the central technology component in streaming data and real-time applications.
  6. Since Amazon Kinesis launch in 2013, the ecosystem evolved and we introduced Kinesis Firehose and Kinesis Analytics will be launched later this year.
  7. [2 minutes] Streaming data processing has two layers: a storage layer and a processing layer. The storage layer needs to support specialized ordering and consistency semantics that enable fast, inexpensive, and replayable reads and writes of large streams of data. Kinesis is the storage layer in Kinesis / Kinesis. The processing layer is responsible for reading data from the storage layer, processing that data, and notifying the storage layer to delete data that is no longer needed. Kinesis supports the processing layer. Customers compile the Kinesis library into their data processing application. Kinesis notifies the application (the Kinesis Worker) when there is new data to process. The Kinesis / Kinesis control plane works with Kinesis Workers to solve scalability and fault tolerance problems in the processing layer.
  8. Moving time window buffer and oldest expire after 24 hours.
  9. Moving time window buffer and oldest expire after 24 hours.
  10. Some types of analysis do not require events to be processed in the exact order in which they were generated. For example, a component that calculates the total number of sales made on an e-Commerce site can process a sale made 5 minutes ago after one made 1 minute ago; the net effect is that 2 sales will be counted. If analysis does not rely on strict ordering of events, the processing can be easily distributed. Different workers on different servers could capture the sales from 5 minutes ago and 1 minute ago. Combining the workers’ counts will give the correct overall picture of 2 sales. This type of analysis can easily be implemented using the Amazon Kinesis Client Library. Events sent to Amazon Kinesis are distributed over many shards. A worker is instantiated for each shard and the results from each worker are combined to produce the final metric. Other analysis does require strict ordering of events. For example, providing an up-sell offer to a customer may require knowledge of the exact order in which items were added to their basket. This type of analysis can still be distributed across multiple workers, but it is essential that all the events relating to a specific basket be sent to the same worker. The worker will have a complete view of all changes made to the basket during the session and be able to make recommendations. In Amazon Kinesis, the partition key can be used to ensure events are grouped onto the same shard and read by the same worker.
  11. The KCL automatically creates a DynamoDB table for each Kinesis application to keep track and maintain state information. The KCL uses the name of the Amazon Kinesis application as the name of the DynamoDB table. Each application name must be unique within a region. Your account is charged for this DynamoDB usage. Each row in the DynamoDB table represents a shard in the Kinesis stream. Application State Data: Shard ID: Hash key for the shard that is being processed by your application. Checkpoint: The most recent checkpoint sequence number for the shard. Worker ID: The ID of the worker that is processing the shard. Lease/Heartbeat: A count that is periodically incremented by the record processor. The read/write throughput for the DynamoDB table depends on the number of shards and how often your application Once a shard is completely processed, the record processor that was processing it should receive a call to its shutdown method with the reason "TERMINATE", client's should call checkpointer.checkpoint() when the shutdown reason is "TERMINATE" so that the end of the shard is checkpointed. If cleanupLeasesUponShardCompletion is set to true in your KinesisClientLibConfiguration this will allow the shard sync task to clean up the lease for that shard. You can check your lease table in dynamodb, it will be named the same as the application name provided to your Workers. In that table you can confirm whether or not the leases for the closed shards have been checkpointed at SHARD_END or not. If the Amazon DynamoDB table for your Amazon Kinesis application does not exist when the application starts up, one of the workers creates the table and calls the describeStream method to populate the table. Throughput If your Amazon Kinesis application receives provisioned-throughput exceptions, you should increase the provisioned throughput for the DynamoDB table. The KCL creates the table with a provisioned throughput of 10 reads per second and 10 writes per second, but this might not be sufficient for your application. For example, if your Amazon Kinesis application does frequent checkpointing or operates on a stream that is composed of many shards, you might need more throughput.
  12. These frameworks make it easy to implement real-time analytics, but understanding how they work together helps you optimize performance. 
  13. Amazon Kinesis integration with Apache Spark is via Spark Streaming. Spark Streaming is an extension of the core Spark framework that enables scalable, high-throughput, fault-tolerant stream processing of data streams such as Amazon Kinesis Streams. Spark Streaming provides a high-level abstraction called a Discretized Stream or DStream, which represents a continuous sequence of RDDs. Spark Streaming uses the Amazon Kinesis Client Library (KCL) to consume data from an Amazon Kinesis stream. The KCL takes care of many of the complex tasks associated with distributed computing, such as load balancing, failure recovery, and check-pointing. Think of Spark Streaming as two main components: Fetching data from the streaming sources into DStreams Processing data in these DStreams as batches
  14. Ensure that the number of Amazon Kinesis receivers created are a multiple of executors so that they are load balanced evenly across all the executors. Ensure that the total processing time is less than the batch interval. Use the number of executors and number of cores per executor parameters to optimize parallelism and use the available resources efficiently. Be aware that Spark Streaming uses the default of 1 sec with KCL to read data from Amazon Kinesis. For reliable at-least-once semantics, enable the Spark-based checkpoints and ensure that your processing is idempotent (recommended) or you have transactional semantics around your processing. Ensure that you’re using Spark version 1.6 or later with the EMRFS consistent view option, when using Amazon S3 as the storage for Spark checkpoints. Ensure that there is only one instance of the application running with Spark Streaming, and that multiple applications are not using the same DynamoDB table (via the KCL).
  15. AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. AWS Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device. You can also use AWS Lambda to create new back-end services where compute resources are automatically triggered based on custom requests. With AWS Lambda you pay only for the requests served and the compute time required to run your code. Billing is metered in increments of 100 milliseconds, making it cost-effective and easy to scale automatically from a few requests per day to thousands per second.
  16. Sizmek – Who? What? How? Our Challenges Our Technology Stack Sizmek Data Platform High Level Architecture High Level Requirements Of ESP Apache Storm on top AWS Lessons Learned
  17. Our Journey in Data analytics -
  18. Time to insight faster
  19. Kinesis raw data Kinesis enriched data + insight Next will be performed: Kinesis analytics, spark streaming Clean data, S3 data lake
  20. Apache Storm – stream processing technology We chose storm from the following reasons: Familiarity with storm from our on prem with Kafka There was a ready to use storm connector to kinesis Spark streaming wasn’t ready
  21. Apache Storm – stream processing technology We chose storm from the following reasons: Familiarity with storm from our on prem with Kafka There was a ready to use storm connector to kinesis Spark streaming wasn’t ready Added capability to return the row of data back into the stream after failover (only due to resource un availability) Ask Roy – will this code be pushed to the available spout in the open source?
  22. The limitations are per shard but not available by APIs Due to storm we have to work per record and use the putRecord API We eventualy had to allocate more shards than according to the API Amazon Kinesis streams write limits: Up to 1,000 records per second for writes per shard Up to a maximum total data write rate of 1 MB per second  per shard