Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS Kinesis Streams

Real-time processing with AWS Kinesis Streams

  • Login to see the comments

AWS Kinesis Streams

  1. 1. Fernando Rodriguez Olivera @frodriguez Buenos Aires, Argentina, Dec 2015 Amazon Kinesis AWS User Group Argentina
  2. 2. Twitter: @frodriguez Professor at Universidad Austral (Distributed Systems, Compiler Design, Operating Systems, …) Creator of mvnrepository.com Organizer at Buenos Aires High Scalability Group Fernando Rodriguez Olivera
  3. 3. Amazon Kinesis Streams High-throughput, low-latency service for real-time data processing over large, distributed data streams
  4. 4. Kinesis Streams ... ... Producers Kinesis Stream data retention between 24 to 168 hrs App #1 App #2 designed for < 1 sec latency
  5. 5. Shards ... ... Producers Kinesis Stream App #1 App #2 Shard 1 Shard 2 PK9PK9 PK7PK1 PK1 KinesisEndpoints Shard 3 PK3PK6 Records annotated with same Partition Key (PK) are stored in the same shard
  6. 6. Shard Capacity New Records Get Records 24h Retention Max 86.4GB 168h Retention Max 604.8GB 1 MB/s 1K put/s 2 MB/s 5 tx/s 3.6 GB/h 3.6 M put/h 86.4 GB/d 86.4 M put/d 7.2 GB/h 18k tx/h 172.8 GB/d 432k tx/d
  7. 7. Shard Pricing 24h Retention $0.015/hr $11/month Up to 168h Retention $0.035/hr $25.6/month Extended Retention $0.020/hr $14.6/month * Prices for us-east + $0.014 per 1,000,000 PUT Payload Units (1 unit = 25KB) Max Record Size = 1MB
  8. 8. Kinesis from AWS CLI aws kinesis create-stream --stream-name myStream --shard-count 1 aws kinesis list-streams { "StreamNames": [ "myStream" ] } aws kinesis put-record --stream-name myStream --partition-key 123 --data “my data”
  9. 9. Collecting Records from SDK kinesis = new AmazonKinesisClient(…)
 
 result = kinesis.putRecord(new PutRecordRequest()
 .withStreamName("myStream")
 .withPartitionKey("partitionKey")
 .withData(bytes)) kinesis = new AmazonKinesisAsyncClient(…)
 
 future = kinesis.putRecordAsync(new PutRecordRequest()
 .withStreamName("myStream")
 .withPartitionKey("partitionKey")
 .withData(bytes)) or
  10. 10. Collecting Records (Batch) kinesis = new AmazonKinesisClient(…)
 ... records.add(new PutRecordsRequestEntry()
 .withPartitionKey("partitionKey")
 .withData(bytes)) records.add(…) 
 results = kinesis.putRecords(new PutRecordsRequest()
 .withStreamName("myStream")
 .withRecords(records))

  11. 11. KPL (Kinesis Producer Library) aggregationbuffering collection w/PutRequests records
  12. 12. Collecting with KPL config = new KinesisProducerConfiguration() .setRecordMaxBufferedTime(200) // millis .setMaxConnections(4) .setRequestTimeout(60000) .setRegion(“us-east-1”) producer = new KinesisProducer(config); producer.addUserRecord(“myStream”, “partitionKey1”, bytes1); producer.addUserRecord(“myStream”, “partitionKey2”, bytes2);
  13. 13. Consumer APIs High-level API (KCL = Kinesis Client Library) Low-level API (with shard iterators)
  14. 14. Low-Level API with Shard Iterators AT_SEQUENCE_NUMBER LATEST TRIM_HORIZON AFTER_SEQUENCE_NUMBER New Records All Records in Last 24hs New Records Get Records Max 5 read transactions per second per shard Shard
  15. 15. Kinesis from AWS CLI aws kinesis describe-stream --stream-name myStream { "StreamDescription": { "StreamStatus": "ACTIVE", "StreamName": "myStream", "StreamARN": "arn:aws:kinesis:…:stream/myStream", "Shards": [ { "ShardId": "shardId-000000000000", "HashKeyRange": { "EndingHashKey": "…", "StartingHashKey": "…" }, "SequenceNumberRange": { "StartingSequenceNumber": "…" } } ] } }
  16. 16. Kinesis from AWS CLI aws kinesis get-shard-iterator --stream-name myStream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON { "ShardIterator": "… iterator id …" } aws kinesis get-records --shard-iterator "… iterator id .." { "Records":[ { "Data": "...", "PartitionKey": "...", "SequenceNumber": "..." } ], "MillisBehindLatest": 1000, "NextShardIterator": "… new iterator id …" }
  17. 17. Splitting/Merging Shards Shard (CLOSED) Shard (OPEN) old records remains at parent children Shard (OPEN) after 24hs states changes from CLOSED to EXPIRED new events added to children GetRecords consumes from parent by using 1 shard iterator until split is detected. Then 2 iterators are required to consume from children
  18. 18. Consuming Records with KCL App w/2 consumersStream with 3 shards Record Processor KCLKCL Record Processor Record Processor KCL (Kinesis Client Library) Shard processing balanced across nodes If node fails, shards are re-assigned to remaining nodes machine01machine02
  19. 19. KCL Coordination w/DynamoDB App w/2 consumer nodes Record Processor KCL KCL Record Processor Record Processor lease key checkpoint lease counter lease owner shard01 … 123 machine01 shard02 … 234 machine01 shard03 … 345 machine02 machine01 machine02 lease counter continuously incremented (as a heart-beat) App Id used a table name. DynamoDB with conditional updates DynamoDB TableName=AppID
  20. 20. Consuming Records (KCL) class MyProcessor implements IRecordProcessor { void processRecords( List<Record> records, IRecordProcessorCheckpointer checkpointer) { for (Record record: records) { // Process record … } 
 checkpointer.checkpoint()
 } } * KCL available for: Java, Node.js, .NET, Python, Ruby
  21. 21. Thanks, Fernando Rodriguez Olivera @frodriguez frodriguez <at> gmail.com

×