This document provides an overview of AWS Kinesis and its components for streaming data. It discusses Kinesis Streams for ingesting and processing streaming data at scale. Kinesis Streams uses shards to provide throughput capacity. To ingest 10,000 records per second of 512 byte size each would require a Kinesis stream configured with 10 shards. Kinesis Firehose is for delivering streaming data to destinations like S3 or Redshift. Kinesis Analytics allows running SQL queries on streaming data and processing it in real-time.
10. AWS KINESIS STREAMS
TERMINOLOGY
▸ Streams - ordered sequence of data records
▸ Data record - Sequence Number, Partition Key, Data Blob
▸ 1MB max
▸ Retention period - 24h - 7d
▸ Producers, Consumers
▸ Shards
26. AWS KINESIS ANALYTICS
STREAMING SQL
▸ Tumbling Window
[...] GROUP BY
FLOOR((“SOURCE_SQL_STREAM_001”.ROWTIME – TIMESTAMP
‘1970-01-01 00:00:00’) SECOND / 10 TO SECOND)
▸ Sliding Window
SELECT AVG(change) OVER W1 as avg_change
FROM "SOURCE_SQL_STREAM_001"
WINDOW W1 AS (PARTITION BY ticker_symbol RANGE INTERVAL
'10' SECOND PRECEDING)