AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records in 1 Second (ARC308)

279 views

Published on

Learn how AWS processes millions of records per second to support accurate metering across AWS and our customers. This session shows how we migrated from traditional frameworks to AWS managed services to support a large processing pipeline. You will gain insights on how we used AWS services to build a reliable, scalable, and fast processing system using Amazon Kinesis, Amazon S3, and Amazon EMR. Along the way we dive deep into use cases that deal with scaling and accuracy constraints. Attend this session to see AWS’s end-to-end solution that supports metering at AWS.

Published in: Technology

AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records in 1 Second (ARC308)

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Diego Macadar - Michael Fort December 1, 2016 From 0 to 100M Records in 1 Second AWS Metering ARC308
  2. 2. What to expect from the session Tools and techniques to deal with exponential growth of data.
  3. 3. Three principles • 100% accurate • Once and only once guarantee • Idempotent processing • Horizontally scalable • Loosely coupled components • Elasticity: Automated scaling • Focus on the business • Operationally excellent • Use managed frameworks
  4. 4. Architecture Global Data Global State Transform Analyze Aggregate DeliverCollect Audit
  5. 5. Streaming components Global Data Global State Transform Analyze Aggregate DeliveryCollect Audit
  6. 6. Batch components Global Data Global State Transform Analyze Aggregate DeliveryCollect Audit
  7. 7. Three logical entities ComputeStateData
  8. 8. Data ComputeStateData
  9. 9. Global data Global Data Global State Transform Analyze Aggregate DeliverCollect Audit
  10. 10. Global data Amazon S3 Structured dataUnstructured data vs • Must be immutable • Avoid performance bottlenecks by using storage best practices • Monitoring with Amazon CloudWatch • Secure data using versioning and encryption Amazon DynamoDB Amazon RDS
  11. 11. Global data example { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “1” }
  12. 12. Architecture DeliverCollect Audit AggregateTransform Analyze Global State Amazon S3 Global Store
  13. 13. Local data ServerAWS Cloud Amazon S3 Amazon DynamoDB Amazon RDS Local store • Can be mutable • Cache data locally to speed up processing • Invalidate local data once processed • Persist all long-term data in globally accessible cloud store
  14. 14. Local data example { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “23” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": "000000000001", “value”: “300” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "000000000001", “value”: “1” } Transform { "clientId": ”bestHotel", "timestamp": 10/14/2016, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": ”LED_A1", “value”: “23” } { "clientId": " bestHotel", "timestamp": 10/14/2016, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”, ”lightIdentifier": " LED_A1", “value”: “300” } { "clientId": " bestHotel", "timestamp": 10/14/2016, "eventType": ”outage", ”lightIdentifier": " LED_A1", “value”: “1” }
  15. 15. State ComputeStateData
  16. 16. Architecture DeliverCollect Audit AggregateTransform Analyze Global State Amazon S3 Global Store
  17. 17. Global state Source Sink Sink AWS Cloud Amazon S3 Amazon DynamoDB Amazon RDS
  18. 18. Global state examples Failed Created Completed Transformed
  19. 19. Architecture DeliveryCollect Amazon DynamoDB Audit AggregateTransform Analyze Global State Amazon S3 Global Store
  20. 20. Mutually shared state Source Sink Channel State
  21. 21. Channel selection Amazon SQS Amazon Kinesis Order Not ordered Ordered Locality Not localized Localized Delivery At-least-once At-least-once Channel Attributes
  22. 22. Hot partition example H1 H2 H1 H1 H1 H1 H1 H2 H1 H2
  23. 23. Amazon Kinesis hotspot management Amazon Kinesis Stream Producer Consumer PutRecord GetRecords Hotel ID Entropy Partition Key Entropy = MD5 % Partition Key Size CONSTRAINTS: • Hash function calculation is idempotent • Partition Key Size changes are time versioned • Partition Key Size is selected based on time information of the entity
  24. 24. Hot partition example H1 H2 H1 H1 H1 H1 H1 H2 H1 + 0 H1 + 1 H2 + 0
  25. 25. Amazon Kinesis hotspot management Amazon Kinesis Stream Producer Consumer PutRecord GetRecords AWS Cloud Amazon S3 Amazon DynamoDB Amazon RDS Capture Stream IO Statistics Hotspot Manager Reads Stream IO Statistics DescribeStream SplitShard MergeShards Read Partition Information Update Partition Information
  26. 26. Architecture DeliveryCollect Amazon DynamoDB Audit AggregateTransform Analyze Global State Amazon S3 Global Store
  27. 27. Local state ServerAWS Cloud Amazon S3 Amazon DynamoDB Amazon RDS Local cache • Cache state locally with Write-Once-Read-Many (WORM) characteristic • Validate state cache against global store as often as possible • Read state directly from global store which changes often
  28. 28. Architecture DeliveryCollect Amazon DynamoDB Audit AggregateTransform Analyze Global State Amazon S3 Global Store
  29. 29. Compute ComputeStateData
  30. 30. Server-based compute Compute Serverless compute Amazon EC2 AWS Lambda • No server management • Out-of-the-box scaling • Out-of-the-box metrics • Out-of-the-box logging • Fine grained controls • Time-sensitive response • Co-location of resources • Clustering vs
  31. 31. Architecture DeliveryCollect Amazon DynamoDB AWS Lambda Amazon EC2 Audit Aggregate Global State Amazon S3 Global Store
  32. 32. Amazon EC2 Auto Scaling Amazon EC2 w/ Auto Scaling Amazon CloudWatch Auto Scaling Monitors CloudWatch Alarms EC2 emits metrics to CloudWatch
  33. 33. Architecture DeliveryCollect Amazon DynamoDB AWS Lambda Amazon EC2 w/ Auto Scaling Audit Aggregate Global State Amazon S3 Global Store
  34. 34. Map Reduce workflow Lock input dataset for idempotent execution Amazon DynamoDB List of Manifests List of Batches Map and Reduce Records Amazon S3
  35. 35. Architecture DeliveryCollect Amazon DynamoDB AWS Lambda Amazon EC2 w/ Auto Scaling AWS Lambda Amazon EMR Audit Global State Amazon S3 Global Store
  36. 36. Cluster management Amazon EMR Controller Cluster Manager Amazon DynamoDB Amazon EMR Gather backlog Information Find and Lease Cluster Spin-up / Tear Down Clusters Enqueue Step
  37. 37. Architecture DeliveryCollect Amazon DynamoDB AWS Lambda Amazon EC2 w/ Auto Scaling AWS Lambda Amazon EMR Audit Global State Amazon S3 Global Store
  38. 38. External-facing API Elastic Load Balancing Amazon CloudFront Amazon Route 53 Amazon API Gateway • Authorization • Version control • Authentication • DDOS prevention • Caching • Throttling • Scale
  39. 39. Audit Architecture DeliveryCollect Amazon DynamoDB AWS Lambda Amazon EC2 w/ Auto Scaling AWS Lambda Amazon EMR AWS Lambda AWS Lambda Amazon API Gateway Amazon API Gateway Global State Amazon S3 Global Store
  40. 40. Incremental auditing Transitive property of equality If A = B and B = C, then A = C Color() Unique() Audit() Audit()
  41. 41. Checksum auditing Fixed – Static through the end of processing Checksum = HF(Fixed + Transformed) * Aggregating Value { "clientId": "bestHotel", "timestamp": 10/14/2016, "eventType": ”outage", ”lightIdentifier": "LED_A1", “value”: “2” } Result { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "000000000001", “value”: “1” } Source
  42. 42. Checksum auditing Checksum = HF(Fixed + Transformed) * Aggregating Value { "clientId": "bestHotel", "timestamp": 10/14/2016, "eventType": ”outage", ”lightIdentifier": "LED_A1", “value”: “2” } Result { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "000000000001", “value”: “1” } { "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "000000000001", “value”: “1” } Source Transformed – Changed in the lifetime of processing
  43. 43. Checksum auditing Checksum = HF(Fixed + Transformed) * Aggregating Value { "clientId": "bestHotel", "timestamp": 10/14/2016, "eventType": ”outage", ”lightIdentifier": "LED_A1", “value”: “2” } Result { "clientId": ”bestHotel", "timestamp": 10/14/2016, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "LED_A1", “value”: “1” } { "clientId": "bestHotel ", "timestamp": 10/14/2016, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”, ”lightIdentifier": "LED_A1", “value”: “1” } Source Perform transformations and filters on source data
  44. 44. Checksum auditing Checksum = HF(Fixed + Transformed) * Aggregating Value { "clientId": "bestHotel", "timestamp": 10/14/2016, "eventType": ”outage", ”lightIdentifier": "LED_A1", “value”: “2” } Result { "clientId": ”bestHotel", "timestamp": 10/14/2016, "eventType": ”outage", ”lightIdentifier": "LED_A1", “value”: “1” } { "clientId": "bestHotel ", "timestamp": 10/14/2016, "eventType": ”outage", ”lightIdentifier": "LED_A1", “value”: “1” } Source Run a hashing function over the fixed and transformed fields 1ae035081ed6c9a40f1c6eb1177350a9
  45. 45. Checksum auditing Checksum = HF(Fixed + Transformed) * Aggregating Value 1ae035081ed6c9a40f1c6eb1177350a9 {“value”: “2” } Result 1ae035081ed6c9a40f1c6eb1177350a9 {“value”: “1” } 1ae035081ed6c9a40f1c6eb1177350a9 {“value”: “1” } Source Aggregating Value – Field used for aggregation during processing
  46. 46. Checksum auditing Checksum = HF(Fixed + Transformed) * Aggregating Value 1ae035081ed6c9a40f1c6eb1177350a9 {“value”: “2” } Result 1ae035081ed6c9a40f1c6eb1177350a9 {“value”: “1” } 1ae035081ed6c9a40f1c6eb1177350a9 {“value”: “1” } Source Multiply hash * aggregating value 1AE035081ED6C9A40F1C6EB1177350A9 35C06A103DAD93481E38DD622EE6A1521AE035081ED6C9A40F1C6EB1177350A9
  47. 47. Checksum auditing Assert(sum(sourceChecksums) = sum(resultChecksums)) ResultSource Sum results compare source vs results 1AE035081ED6C9A40F1C6EB1177350A9 35C06A103DAD93481E38DD622EE6A1521AE035081ED6C9A40F1C6EB1177350A9 35C06A103DAD93481E38DD622EE6A152 35C06A103DAD93481E38DD622EE6A152
  48. 48. Architecture DeliveryCollect Amazon DynamoDB AWS Lambda Amazon EC2 w/ Auto Scaling AWS Lambda Amazon EMR AWS Lambda AWS Lambda Amazon API Gateway Amazon API Gateway Global State Audit Audit Amazon S3 Global Store
  49. 49. Thank you!
  50. 50. Remember to complete your evaluations!

×