Processing streaming data is becoming increasingly important to many organisations who need to analyse incoming data both in near real-time and in batch. In this session we will look at the best practices and patterns for analysing streaming data with AWS Kinesis Streams, Kinesis Firehose and Kinesis Analytics.
Speaker: Johnathon Meichtry, Principal Solutions Architect, Amazon Web Services
2. Most Data is Produced Continuously
Mobile apps Web clickstream Application logs
Metering records IoT sensors Smart buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test
3. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Modern Data Architecture
4. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Real-time Pipeline
Amazon
Kinesis
Machines
Devices
Mobile
Clickstream
6. Amazon Kinesis Streams
• Ingest & store streaming data at low cost, build custom real-time
applications to process streaming data
7. Amazon Kinesis Firehose
• Reliably ingest and deliver batched, compressed, and encrypted data to
S3, Amazon Redshift, and Amazon Elasticsearch Service (Amazon ES)
• Point-and-click setup with zero administration and seamless elasticity
8. Amazon Kinesis Analytics
• Interact with streaming data in real time using SQL
• Build fully managed and elastic stream processing applications that
process data for real-time visualisations and alarms
11. SHARD1000 TPS or 1MB 5 TPS or 2MB
SHARD
2000 TPS or 2MB 10 TPS or 4MB
SHARD
3000 TPS or 3MB 15 TPS or 6MB
Retention: 24 hours to 7 Days
Amazon Kinesis Stream
14. Kinesis Producers use a PUT call to store data in
a Stream. Each record <= 1 MB
PutRecord {Data,StreamName,PartitionKey}
PutRecords {Records{Data,PartitionKey}, StreamName}
A Partition Key is supplied by producer and used
to distribute the PUTs across Shards
A unique Sequence # is returned to the Producer
upon a successful PUT call
Aggregates
Kinesis Producer Library
15. Kinesis Agent
• Monitors files and sends new data records to your delivery stream
• Monitor multiple directories
• Write to multiple streams
• Handles file rotation, checkpointing, and retry upon failures
• Delivers all data in a reliable, timely, and simple manner
• Pre-process data
• Emits AWS CloudWatch metrics