Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
2. What to expect from this session
Amazon Kinesis: Getting started with streaming data on AWS
• Streaming scenarios
• Amazon Kinesis Streams overview
• Amazon Kinesis Firehose overview
• Firehose experience for Amazon S3 and Amazon Redshift
3. Amazon Kinesis
Streams
Build your own custom
applications that
process or analyze
streaming data
Amazon Kinesis
Firehose
Easily load massive
volumes of streaming
data into Amazon S3
and Amazon Redshift
Amazon Kinesis
Analytics
Easily analyze data
streams using
standard SQL queries
Amazon Kinesis: Streaming data made easy
Services make it easy to capture, deliver, and process streams on AWS
In Preview
4. What to expect from this session
Amazon Kinesis streaming data in the AWS cloud
• Amazon Kinesis Streams
• Amazon Kinesis Firehose (focus of this session)
• Amazon Kinesis Analytics
In Preview
5. Scenarios Accelerated Ingest-
Transform-Load
Continual Metrics
Generation
Responsive Data
Analysis
Data types IT logs, applications logs, social media/clickstreams, sensor or device data, market data
Ad/marketing
tech
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, conversion
Analytics on user
engagement with ads,
optimized bid/buy engines
IoT Sensor, device telemetry
data ingestion
IT operational metrics
dashboards
Sensor operational
intelligence, alerts, and
notifications
Gaming Online customer engagement
data aggregation
Consumer engagement
metrics for level success;
transition rates; cost, time,
and resources (CTR)
Clickstream analytics,
leaderboard generation,
player-skill match engines
Consumer
engagement
Online customer engagement
data aggregation
Consumer engagement
metrics like page views,
CTR
Clickstream analytics,
recommendation engines
Streaming data scenarios across segments
1 2
3
6. Amazon Kinesis: Streaming data done the AWS way
Makes it easy to capture, deliver, and process real-time data streams
Pay as you go, no up-front costs
Elastically scalable
Right services for your specific use cases
Real-time latencies
Easy to provision, deploy, and manage
7. Amazon Kinesis Streams
Build your own data streaming applications
Easy administration: Simply create a new stream and set the desired level of capacity
with shards. Scale to match your data throughput rate and volume.
Build real-time applications: Perform continual processing on streaming big data using
Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.
Low cost: Cost-efficient for workloads of any scale.
8. Sending and reading data from Streams
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Amazon Kinesis
Client Library +
Connector Library
Apache
Storm
Amazon Elastic
MapReduce
Sending Consuming
AWS Mobile
SDK
Amazon
Kinesis
Producer
Library
AWS Lambda
Apache
Spark
9. Real-time streaming data ingestion
Custom-built
streaming
applications
Inexpensive: $0.014 per 1,000,000 PUT payload units
Amazon Kinesis Streams
Managed service for real-time processing
11. Amazon Kinesis Streams select new features…
Amazon Kinesis Producer Library
PutRecords API, 500 records or 5 MB payload
Amazon Kinesis Client Library in Python, Node.js, Ruby…
Server-side time stamps
Increased individual max record payload 50 KB to 1 MB
Reduced end-to-end propagation delay
Extended stream retention from 24 hours to 7 days
13. Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3 and Amazon Redshift
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon
Redshift, and other destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for
delivery into data destinations in as little as 60 seconds using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput without intervention.
Capture and submit
streaming data to Firehose
Firehose loads streaming data
continuously into S3 and
Amazon Redshift
Analyze streaming data using your favorite
BI tools
• Amazon S3
• Amazon Redshift
• Amazon
Elasticsearch
Service
14. AWS Platform SDKs Mobile SDKs Amazon Kinesis Agent AWS IoT
Amazon S3 Amazon Redshift
• Send data from IT infra, mobile devices, sensors
• Integrated with AWS SDK, agents, and AWS IoT
• Fully managed service to capture streaming
data
• Elastic w/o resource provisioning
• Pay-as-you-go: 3.5 cents/GB transferred
• Batch, compress, and encrypt data before loads
• Loads data into Amazon Redshift tables by using
the COPY command
Amazon Kinesis Firehose
Capture IT and app logs, device and sensor data, and more
Enable near-real time analytics using existing tools
Amazon Elasticsearch
Service
15. Scenarios Accelerated Ingest-
Transform-Load
Continual Metrics
Generation
Responsive Data
Analysis
Data Types IT logs, applications logs, social media/clickstreams, sensor or device data, market data
Marketing tech Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, conversion
Analytics on user
engagement with ads,
optimized bid/buy engines
IoT Sensor, device telemetry
data ingestion
IT operational metrics
dashboards
Sensor operational
intelligence, alerts and
notifications
Gaming Online customer engagement
data aggregation
Consumer engagement
metrics for level success,
transition rates, CTR
Clickstream analytics,
leaderboard generation,
player-skill match engines
Consumer
online
Online customer engagement
data aggregation
Consumer engagement
metrics like page views,
CTR
Clickstream analytics,
recommendation engines
Streaming data scenarios across segments
1 2
3
16. 1. Delivery stream: The underlying entity of Firehose. Use Firehose by
creating a delivery stream to a specified destination and send data to it.
• You do not have to create a stream or provision shards.
• You do not have to specify partition keys.
2. Records: The data producer sends data blobs as large as 1,000 KB to a
delivery stream. That data blob is called a record.
3. Data producers: Producers send records to a delivery stream. For
example, a web server that sends log data to a delivery stream is a data
producer.
Amazon Kinesis Firehose
Three simple concepts
17. Amazon Kinesis Firehose console experience
Unified console experience for Firehose and Streams
18. Amazon Kinesis Firehose console (S3)
Create fully managed resources for delivery without building an app
19. Amazon Kinesis Firehose console (S3)
Configure data delivery options simply using the console
20. Amazon Kinesis Firehose console (Amazon Redshift)
Configure data delivery to Amazon Redshift simply using the console
22. Amazon Kinesis agent
Software agent makes submitting data to Firehose easy
• Monitors files and sends new data records to your delivery stream
• Handles file rotation, check pointing, and retry upon failures
• Preprocessing capabilities such as format conversion and log parsing
• Delivers all data in a reliable, timely, and simple manner
• Emits Amazon CloudWatch metrics to help you better monitor and
troubleshoot the streaming process
• Supported on Amazon Linux AMI with version 2015.09 or later, or Red Hat
Enterprise Linux version 7 or later; install on Linux-based server
environments such as web servers, front ends, log servers, and more
• Also enabled for Streams
23. Amazon Kinesis Firehose pricing
Simple, pay-as-you-go, and no up-front costs
Dimension Value
Per 1 GB of data ingested $0.035
25. Amazon Kinesis Streams is a service for workloads that requires
custom processing, per incoming record, with sub-1-second
processing latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is a service for workloads that require
zero administration, ability to use existing analytics tools based
on Amazon S3, Amazon Redshift or Amazon Elasticsearch
Service, and a data latency of 60 seconds or higher.
27. Amazon Kinesis Analytics
Analyze data streams continuously with standard SQL
Apply SQL on streams: Easily connect to data streams and apply existing SQL
skills.
Build real-time applications: Perform continual processing on streaming big
data with sub-second processing latencies.
Scale elastically: Elastically scales to match data throughput without any
operator intervention.
Preview!
Connect to Amazon Kinesis
streams, Firehose delivery
streams
Run standard SQL queries
against data streams
Amazon Kinesis Analytics can send
processed data to analytics tools so you
can create alerts and respond in real time
October 2015, re:Invent we introduced Amazon Kinesis Firehose, a fully managed service that delivers streaming data to S3, Redshift, and other data sources for downstream processing and analytics.
We will cover:
Key concepts
Experience for S3 and Redshift
Putting data using the Amazon Kinesis Agent and Key APIs
Pricing
Data delivery patterns
Understanding key metrics
Troubleshooting
November 2013, re:Invent we introduced Amazon Kinesis to the world
A platform of services to work with streaming data.
Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure.
Flexible: Choose the service, or combination of services, for your specific data streaming use cases.
Add to the beginning – and then deep dive.
The benefits of real-time processing apply to pretty much every scenario where new data is being generated on a continual basis.
We see it pervasively across all the big data areas we’ve looked at, and in pretty much every industry segment whose customers we’ve spoken to.
Customers tend to start with mundane, unglamorous applications, such as system log ingestion and processing.
They then progress over time to ever-more sophisticated forms of real-time processing.
Initially they may simply process their data streams to produce simple metrics and reports, and perform simple actions in response, such as emitting alarms over metrics that have breached their warning thresholds.
Eventually they start to do more sophisticated forms of data analysis in order to obtain deeper understanding of their data, such as applying machine learning algorithms.
In the long run they can be expected to apply a multiplicity of complex stream and event processing algorithms to their data.
We then looked outside the company for Specific use cases
Alert when public sentiment changes
Customers want to respond quickly to social media feeds
Twitter Firehose: ~450 million events per second, or 10.1 MB/sec1
Respond to changes in the stock market
Customers want to compute value-at-risk or rebalance portfolios based on stock prices
NASDAQ Ultra Feed: 230 GB/day, or 8.1 MB/sec1
Aggregate data before loading in external data store
Customers have billions of click stream records arriving continuously from servers
Large, aggregated PUTs to S3 are cost effective
Internet marketing company: 20MB/sec
Recommendation and Personalization engines
Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure.
Real-Time: Collect real-time data streams and promptly respond to key business events and operational triggers.
Flexible: Choose the service, or combination of services, for your specific data streaming use cases.
Take an example of Clickstream analytics. Data from websites sent to Kinesis Streams, which stores the data and makes it available for data processing. Multiple consumers can read and process this data-e.g., Spark Streaming, Storm, and KCL applications. The data is then used by algorithms and models to personalize content and improve reader engagement.
The benefits of using Kinesis Streams:
Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume.
Build real-time applications: Perform continual processing on streaming big data using Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.
Low cost: Cost-efficient for workloads of any scale.
Lots of way to process streaming data, just like there are lots of ways to process batch data with Hadoop
Open source from AWS
Amazon Kinesis Streams enables you to build custom applications that process or analyze streaming data for specialized needs. It collects and stores streaming data reliably. Your applications can then consume this data.
Based on these requests, we are excited to add a new storage class – Standard – Infrequent Access
Standard IA is a new, lower-cost Amazon Simple Storage Service (Amazon S3) storage class for data that is accessed less frequently and does not require the highest levels of availability
Standard IA will join Standard and Glacier as one of three storage classes that customers can tier data between designed for 99.999999999% durability.
Standard will be used for low latency, high throughput, highly available workloads that frequently access data such as business applications, dynamic web sites, content distribution, and big data analytics
Standard-IA is ideal for data that is accessed less frequently, long lived, and does not require the highest levels of availability. Amazon S3 Standard-Infrequent Access (Standard-IA) offers the high durability, low latency, and throughput of Amazon S3 Standard like backup/archive, consumer file storage, and long retained data
For customers whose data does not need to be retrieved instantaneously and can tolerate a 4 hour retrieval time, Amazon Glacier is still the lowest cost Amazon S3 storage option.
We think about storage classes…
- Customers should be able to keep the same object name and bucket regardless of the storage class, so that they don’t have to move objects to different buckets or rename them if they want to optimize for cost
therefore, Storage class is a property of an object and customers should be able to switch it at any time
We also enable lifecycle
Add to the beginning – and then deep dive.
The benefits of real-time processing apply to pretty much every scenario where new data is being generated on a continual basis.
We see it pervasively across all the big data areas we’ve looked at, and in pretty much every industry segment whose customers we’ve spoken to.
Customers tend to start with mundane, unglamorous applications, such as system log ingestion and processing.
They then progress over time to ever-more sophisticated forms of real-time processing.
Initially they may simply process their data streams to produce simple metrics and reports, and perform simple actions in response, such as emitting alarms over metrics that have breached their warning thresholds.
Eventually they start to do more sophisticated forms of data analysis in order to obtain deeper understanding of their data, such as applying machine learning algorithms.
In the long run they can be expected to apply a multiplicity of complex stream and event processing algorithms to their data.
We then looked outside the company for Specific use cases
Alert when public sentiment changes
Customers want to respond quickly to social media feeds
Twitter Firehose: ~450 million events per second, or 10.1 MB/sec1
Respond to changes in the stock market
Customers want to compute value-at-risk or rebalance portfolios based on stock prices
NASDAQ Ultra Feed: 230 GB/day, or 8.1 MB/sec1
Aggregate data before loading in external data store
Customers have billions of click stream records arriving continuously from servers
Large, aggregated PUTs to S3 are cost effective
Internet marketing company: 20MB/sec
Recommendation and Personalization engines
Key talking points
This is under NDA
From early success in ad tech./ mobile/social gaming to broader content analytics, including traditional publishers like the guardian, and news digital media, popsugar, and IoT use cases
Customers are now worldwide in production (Aus, UK, and which is remarkable for streaming data systems which speaks to the ease of use overall compared to prior methods
Customers in black borders are going to be talking about Kinesis in different tracks at re:Invent – please attend those breakouts.
They are gathering sensor data from Kaiten Sushi to Kinesis directory to improve shop operation.
They are gathering sensor data from Kaiten Sushi to Kinesis directory to improve shop operation.
They are gathering sensor data from Kaiten Sushi to Kinesis directory to improve shop operation.