Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyzing Real-time Streaming Data with Amazon Kinesis

643 views

Published on

Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you'll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.

  • Be the first to comment

Analyzing Real-time Streaming Data with Amazon Kinesis

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roy Ben-Alta, Amazon AI and Analytics, BDM Daniel Modlinger, VP R&D, Nutrino June 21st, 2017 Real-Time Streaming Analytics Overview Of Amazon Kinesis
  2. 2. What to expect from this session • Real-Time Streaming Analytics – Why Now? • Amazon Kinesis Overview • New Features • Design Patterns • Nutrino Case Study
  3. 3. Most data is produced continuously Mobile Apps Web Clickstream Application Logs Metering Records IoT Sensors Smart Buildings [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/h tdocs/test
  4. 4. What is your decision latency? OR
  5. 5. The diminishing value of data Recent data is highly valuable • If you act on it in time • Perishable Insights (M. Gualtieri, Forrester) Old + Recent data is more valuable • If you have the means to combine them
  6. 6. Processing real-time, streaming data • Durable • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements? Ingest Transform Analyze React Persist
  7. 7. Streaming Data Scenarios Across Verticals Scenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Machine Learning and Actionable Insights Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care Operation monitoring Security DevOps tools, Ingesting VPCFlowLogs Subscribe to CloudwatchLogs and analyze logs in Real- Time Anomaly Detection
  8. 8. Amazon Kinesis Kinesis FirehoseKinesis Analytics Platform for streaming data on AWS Kinesis Streams
  9. 9. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams DeliverProcessCapture
  10. 10. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams DeliverProcess For Technical Developers Build your own custom applications that process or analyze streaming data
  11. 11. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams Deliver For Technical Developers Build your own custom applications that process or analyze streaming data For All Developers, Data Scientists, Business Analysts Easily analyze data streams using standard SQL queries
  12. 12. Amazon Kinesis Kinesis FirehoseKinesis AnalyticsKinesis Streams For Technical Developers Build your own custom applications that process or analyze streaming data For All Developers, Data Scientists, Business Analysts Easily analyze data streams using standard SQL queries For All Developers Easily load massive volumes of streaming data into S3, Redshift and Elasticsearch, clean and format using Lambda
  13. 13. Amazon Kinesis Customer Base Diversity 1 billion events/wk from connected devices | IoT 17 PB of game data per season | Entertainment 80 billion ad impressions/day, 30 ms response time | Ad Tech 100 GB/day click streams from 250+ sites | Enterprise 50 billion ad impressions/day sub-50 ms responses | Ad Tech 10 million events/day | Retail Ingesting 2M+ network events every second Funnel all production events through Amazon Kinesis
  14. 14. Why are these customers choosing Amazon Kinesis? Lower costs Performant without heavy lifting Scales elastically Increased agility Secure and visible Plug and play
  15. 15. Amazon Kinesis Streams Build your own data streaming applications Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming big data using Kinesis Client Library (KCL), Kinesis Analytics, Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.
  16. 16. • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours – 7 days • Scale Kinesis streams by splitting or merging Shards • Replay data inside of 24Hr -7days Window Amazon Kinesis Stream Managed Ability to capture and store Data Time-based seek
  17. 17. Amazon Kinesis Streams pricing Simple, pay-as-you-go, and no up-front costs Dimension Value Shard Hour $0.015 PUT Payload (per 1,0000,000 PUTs) $0.014 Extended Data Retention (Up to 7 days), per Shard Hour $0.020
  18. 18. Streaming Data Scenarios Across Verticals Scenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Machine Learning and Actionable Insights Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care Operation Security DevOps tools, Ingesting VPCFlowLogs Subscribe to CloudwatchLogs and analyze logs in Real-Time Anomaly Detection
  19. 19. Amazon Kinesis Firehose Load massive volumes of streaming data into Amazon S3, Redshift and Elasticsearch Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure. Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. Elastic: Scales to match data throughput w/o intervention Serverless ETL using AWS Lambda - Firehose can invoke your Lambda function to transform incoming source data. Capture and submit streaming data Analyze streaming data using your favorite BI tools Firehose loads streaming data continuously into Amazon S3, Redshift and Elasticsearch
  20. 20. Amazon Kinesis Firehose pricing Simple, pay-as-you-go, and no up-front costs Dimension Value Per 1 GB of data ingested $0.029 - *$0.020 *Tier pricing
  21. 21. Amazon Kinesis Firehose or Amazon Kinesis Streams?
  22. 22. Kinesis Firehose vs. Kinesis Streams Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or higher.
  23. 23. Streaming Data Scenarios Across Verticals Scenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Machine Learning and Actionable Insights Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care Operation Security DevOps tools, Ingesting VPCFlowLogs Subscribe to CloudwatchLogs and analyze logs in Real-Time Anomaly Detection
  24. 24. Amazon Kinesis Analytics Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery Stream and apply SQL skills. Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies. Easy Scalability : Elastically scales to match data throughput. Connect to Kinesis streams, Firehose delivery streams Run standard SQL queries against data streams Kinesis Analytics can send processed data to analytics tools so you can create alerts and respond in real-time
  25. 25. Use SQL to build real-time applications Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results
  26. 26. Coming Soon
  27. 27. Amazon Kinesis Streams Forthcoming Features Encryption at Rest VPCe
  28. 28. Amazon Kinesis Firehose Forthcoming Features VPCe: RedShift ElasticSearch Parquet Format AWS RegionsKafka Connector
  29. 29. Amazon Kinesis Analytics Forthcoming Features Amazon Lambda Integration Amazon QuickSight Destination Improved Autoscaling
  30. 30. Design Patterns - Serverless
  31. 31. Capture all of the value from your data with Amazon Kinesis Amazon S3 Amazon Kinesis Analytics Amazon Kinesis– enabled app Amazon Kinesis Streams Ingest Process React Persist AWS Lambda 0ms 200ms 1-2 s Amazon QuickSight Amazon Kinesis Firehose Amazon Redshift
  32. 32. Serverless Analytics – Enable your org with 0 operation overhead Amazon S3 Ingest Streaming ETL Persist Analyze AWS Lambda O msec seconds < 5 minutes Amazon Kinesis Firehose Amazon Redshift Amazon Elasticsearch Amazon Athena Amazon Kinesis Analytics Amazon Redshift Spectrum Amazon Kinesis Streams
  33. 33. Amazon Kinesis Data Generator on GitHub https://awslabs.github.io/amazon-kinesis-data-generator/
  34. 34. Putting Data in Amazon Kinesis AWS SDK • PutRecord() • PutRecordBatch() Kinesis Agent • Monitors files and sends new data records to your delivery stream • Handles file rotation, checkpointing, and retry upon failures • Pre-processing capabilities such as format conversion and log parsing • Emits AWS CloudWatch metrics for monitoring and troubleshooting
  35. 35. Amazon EMR Amazon Kinesis StreamsStreaming Input Tumbling/Fixed Window Aggregation Periodic Output Amazon Redshift COPY from Amazon EMR Common Integration Pattern with Amazon EMR Tumbling Window Reporting
  36. 36. Amazon Kinesis Streams with AWS Lambda
  37. 37. Build your Streaming Pipeline with Amazon Kinesis, AWS Lambda and Amazon EMR
  38. 38. Nutrino Case Study
  39. 39. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Daniel Modlinger VP R&D AWS Summit Tel Aviv
  40. 40. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Nutrition advice - conflicted and not personal $435B
  41. 41. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Food Analysis System Scientific Literature Food (10,000+ papers) (1,000,000+ foods) (300,000+ restaurant locations) Menus FoodPrint™ Correlations Insights Predictions Personal & continually improving nutrition recommendations Activity Sleep Continuous Glucose Blood Tests DNA - Genome Taste, allergies etc. Mood Blood pressure Temperature Biome Social media Individual Inputs
  42. 42. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Examples - Partners Pregnancy Diabetes ConsumerEmployer/payer Retail confidential Hypertension confidential
  43. 43. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Data Streaming
  44. 44. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co FoodPrint™ Activity Sleep Continuous Glucose Blood Tests DNA - Genome Taste, allergies etc. Mood Blood pressure Temperature Biome Social media Individual Inputs
  45. 45. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Needs Volume - Millions of daily data points Scale - Non linear growth Performance - Near real time, as low as 3rd party producers of data allow Extract as much data as possible and store unknown data types and formats Process and store a wide variety of known data types Goal: Process and store nutrition-related sensor data
  46. 46. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co First attempts 2015 2016 2017
  47. 47. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Updated Needs (AKA lessons learned) Cut the middle man Detailed monitoring is a necessity Be event driven Fast developer response to changes in APIs Reliability is key, otherwise data gets lost Goal: Process and store nutrition related sensors data
  48. 48. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Current Solution - High Level Architecture
  49. 49. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Lambda allows us to write simple and clean code that will deal with each entity without the need to be aware of flow or context Why Kinesis & Lambda (1) Process and store sensor data and all other nutrition-related data Scale - Non linear growth Performance - Near real time Store unknown data types and formats Wide variety of known data types All unknow data is saved to S3 using Kinesis Firehose, which is later analyzed by our Spark cluster With Lambda triggers on streams we can act quickly enough Scaling can be as easy as adding a shard, Lambda functions are executed as much as we need
  50. 50. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Volume - Millions of daily data points S3 is limitless, Kinesis streams are practically limitless Why Kinesis & Lambda (2) Process and store sensor data and all other nutrition-related data Reliability is key, otherwise data gets lost Allow fast developer response to changes in APIs Be event driven wherever possible because polling causes problems Kinesis streams are persistent (from 24h up to one week), Lambda functions automatically retry, cloudwatch has detailed monitoring of everything Using API Gateway and Lambda integration allows 3rd party producers to activate the processing Changing one Lambda function does not affect the flow, the deploy resolution can be as low as a single function
  51. 51. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Why Kinesis & Lambda (3) Detailed monitoring is a necessity and not a luxury Cut the middle man - avoid writing infrastructure to allow clients (iOS) write sensor data Iterator Age, put count, Lambda monitoring are a good baseline to start from We are using AWS iOS SDK to write the sensor data Process and store sensor data and all other nutrition-related data
  52. 52. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co • Optimal shards number is a derivative of desired iterator age (aka latency) • Optimal batch size - If your Lambda scales in a linear fashion then it’s doesn’t really matters.. • A high number will cause less calls and can create an internal queue for the consumer Lambda function (which will increase the latency to the user) • A low number can be expensive in terms of Lambda overhead and price. Scale & Performance (1) Best Practice
  53. 53. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Scale & Performance (2) Best Practice • Lambda functions run sequentially on the shard’s data, but in parallel between different shard • The Lambda consumer will not continue to the next batch until the function finishes successfully. Make sure to handle exception in your Lambda functions or you can get your shard stuck! • If your functions are slow, consider using a “forward” Lambda: a function that will only read the records from Kinesis and then will activate another Lambda function (which can run in parallel) • Understand the KCL, it will help you adjust the different parameters to your needs
  54. 54. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Best Practice • Make sure to monitor: • Iterator age • Put records count • Records distribution across shards • Each Lambda on it’s own • Your own required metrics Monitoring
  55. 55. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co • Duplicate records are almost inevitable • It’s easier to just handle duplicates than to avoid the problem • Watch out - updates are not duplicates! • It is possible to keep records order, but will likely to cause a larger overhead. Try to avoid relying on records order as much as possible. Best Practice Duplication handling & order considerations
  56. 56. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Best Practice • It’s recommended that the first setup will be done manually (using CLI) • We found Serverless (by Serverless, Inc.) to be the most suitable for Kinesis with Lambda usage (for Web APIs that might not be the case) Frameworks
  57. 57. Nutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.coNutrino proprietary and confidential All rights reserved info@nutrino.co www.nutrino.co Thank you!

×