Amazon Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks. Amazon Kinesis analytics enables you to create and run SQL queries on streaming data so that you can gain actionable insights and respond to your business and customer needs promptly. In this session, we will provide an overview of the capabilities of the Amazon Kinesis Analytics. We will show you how you can build an entire stream processing pipeline to collect, ingest, process, and emit streaming data using Amazon Kinesis Analytics, Amazon Kinesis Firehose, and Amazon Kinesis Streams.
2. Most data is produced continuously
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/ht
docs/test
3. The diminishing value of data
Recent data is highly valuable
If you act on it in time
Perishable Insights (M. Gualtieri, Forrester)
Old + Recent data is more valuable
If you have the means to combine them
4. Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Ingest Transform Analyze React Persist
6. Amazon Kinesis Streams
Easy administration: Create a stream, set capacity level with shards. Scale to
match your data throughput rate & volume.
Build real-time applications: Process streaming data with Kinesis Client
Library (KCL), Apache Spark/Storm, AWS Lambda, ....
Low cost: Cost-efficient for workloads of any scale.
7. Amazon Kinesis Firehose
Zero administration: Capture and deliver streaming data to Amazon S3, Amazon
Redshift, or Amazon Elasticsearch Service without writing an app or managing
infrastructure.
Direct-to-data-store integration: Batch, compress, and encrypt streaming data for
delivery in as little as 60 seconds.
Seamless elasticity: Seamlessly scales to match data throughput without
intervention.
Capture and submit
streaming data to Firehose
Analyze streaming data using your
favorite BI tools
Firehose loads streaming data
continuously into S3, Amazon Redshift,
and Amazon ES
8. Amazon Kinesis Analytics
Apply SQL on streams: Easily connect to a Kinesis stream or Firehose
delivery stream and apply SQL skills.
Build real-time applications: Perform continual processing on streaming big
data with sub-second processing latencies.
Easy scalability: Elastically scales to match data throughput.
Connect to Kinesis streams,
Firehose delivery streams
Run standard SQL queries
against data streams
Kinesis Analytics can send processed data
to analytics tools so you can create alerts
and respond in real time
9. Amazon Kinesis: streaming data made easy
Services make it easy to capture, deliver, and process streams on
AWS
Kinesis Analytics
For all developers, data scientists
Easily analyze data streams
using standard SQL queries
Kinesis Firehose
For all developers, data scientists
Easily load massive
volumes of streaming data
into S3, Amazon Redshift,
or Amazon ES
Kinesis Streams
For Technical Developers
Collect and stream data
for ordered, replayable,
real-time processing
11. Kinesis Analytics
Pay for only what you use
Automatic elasticity
Standard SQL for analytics
Real-time processing
Easy to use
12. Use SQL to build real-time applications
Easily write SQL code to process
streaming data
Connect to streaming source
Continuously deliver SQL results
13. Connect to streaming source
• Streaming data sources include Firehose or
Streams
• Input formats include JSON, .csv, variable
column, unstructured text
• Each input has a schema; schema is inferred,
but you can edit
• Reference data sources (S3) for data
enrichment
14. Write SQL code
• Build streaming applications with one-to-many
SQL statements
• Robust SQL support and advanced analytic
functions
• Extensions to the SQL standard to work
seamlessly with streaming data
• Support for at-least-once processing
semantics
15. Continuously deliver SQL results
• Send processed data to multiple destinations
• S3, Amazon Redshift, Amazon ES (through
Firehose)
• Streams (with AWS Lambda integration for
custom destinations)
• End-to-end processing speed as low as sub-
second
• Separation of processing and data delivery
16. Generate time series analytics
• Compute key performance indicates over-time windows
• Combine with historical data in S3 or Amazon Redshift
Analytics
Streams
Firehose
Amazon
Redshift
S3
Streams
Firehose
Custom, real-
time
destinations
17. Feed real-time dashboards
• Validate and transform raw data, and then process to calculate
meaningful statistics
• Send processed data downstream for visualization in BI and
visualization services
Amazon
QuickSight
Analytics
Amazon ES
Amazon
Redshift
Amazon
RDS
Streams
Firehose
18. Create real-time alarms and notifications
• Build sequences of events from the stream, like user sessions in a
clickstream or app behavior through logs
• Identify events (or a series of events) of interest, and react to the
data through alarms and notifications
Analytics
Streams
Firehose
Streams
Amazon
SNS
Amazon
CloudWatch
Lambda
19. SQL on streaming data
• SQL is an API to your data
• Ask for what you want, system decides how to get it
• For all data, not just “flat” data in a database
• Opportunity for novel data organization and algorithms
• A standard (ANSI 2008, 2011) and the most commonly
used data manipulation language
20. A simple streaming query
• Tweets about the AWS NYC Summit
• Selecting from a STREAM of tweets, an in-application
stream
• Each row has a corresponding ROWTIME
SELECT STREAM ROWTIME, author, text
FROM Tweets
WHERE text LIKE ‘%#AWSNYCSummit%'
21. A streaming table is a STREAM
• In relational databases, you work with SQL tables
• With Analytics, you work with STREAMS
• SELECT, INSERT, and CREATE can be used with STREAMs
CREATE STREAM Tweets
(author VARCHAR(20),
text VARCHAR(140));
INSERT INTO Tweets
SELECT …
22. Writing queries on unbounded data sets
• Streams are unbounded data sets
• Need continuous queries, row-by-row or across rows
• WINDOWs define a start and end to the query
SELECT STREAM author,
count(author) OVER ONE_MINUTE
FROM Tweets
WINDOW ONE_MINUTE AS
(PARTITION BY author
RANGE INTERVAL '1' MINUTE PRECEDING);
24. Real-time analytical patterns
• Pre-processing: filtering, transformations
• Basic analytics: simple counts, aggregates over windows
• Advanced analytics: detecting anomalies, event
correlation
• Post-processing: alerting, triggering, final filters
25. Simple pricing model
• Analytics elastically scales based upon your data
throughput and query complexity
• Automatically provision Kinesis Processing Units (KPU),
which represent 1 vCPU and 4 GB memory
• KPU-hour is $0.11 in us-east-1
• Approximate examples
• Filtering events to different destinations for 1 MB/sec stream
is ~$80/month
• Aggregations using 1-minute window for 5 MB/sec stream is
~$150/month