Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tech Talks

567 views

Published on

Learning Objectives:
- Identify common problems that streaming data can help solve
- Understand the AWS services that are used to solve these problems, including Amazon Kinesis
- Try out one of 5+ different solutions powered by Amazon Kinesis through AWS CloudFormation templates

  • Be the first to comment

Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tech Talks

  1. 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ryan Nienhuis, Senior Product Manager, AWS June 2018 Get Started with Real-Time Streaming Data in Under 5 Minutes
  2. 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Really, 5 minutes?
  3. 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Insights on AWS Account Activity https://aws.amazon.com/answers/account-management/real-time-insights-account-activity/
  4. 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to Expect from the Session • Overview of Real-Time Streaming Analytics • Key use cases • Kinesis Data Analytics – Solution Accelerators • Call To Action
  5. 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Overview of Real-Time Streaming Analytics
  6. 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stream New Data in Seconds Get actionable insights quickly Streaming Ingest video & data as it’s generated Real-time analytics/ML, alerts, actions Process data on the fly
  7. 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Timely Decisions Require New Data in Minutes Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Real time Seconds Minutes Hours Days Months Valueofdatatodecision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  8. 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Most Common Uses of Streaming Industrial Automation Smart Home Smart City Data Lakes IoT Analytics Log Analytics
  9. 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Overview Amazon Kinesis
  10. 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming with Amazon Kinesis Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams Kinesis Video Streams Load data streams into data stores Kinesis Data Firehose SQL Analyze data streams with SQL Kinesis Data Analytics Capture, process, and store data streams Kinesis Data Streams
  11. 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis is a Foundational Service Used Across Amazon Amazon Go video analytics Amazon.com online catalog Amazon CloudWatch logs Amazon S3 events AWS metering
  12. 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Streaming Collect, process, and analyze data streams in real time Amazon Elasticsearch Service SQL EMR/Spark/ Amazon SageMaker Custom code on EC2 Amazon S3 Amazon Redshift Splunk Ingest store data streams Kinesis Data Streams Kinesis Data Analytics Aggregate, filter, enrich data Kinesis Data Firehose Egress data streams AWS Lambda • Real-time • Fully-managed • Scalable • Secure • Cost-effective
  13. 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Streams Overview
  14. 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Data Firehose—How it Works Ingest Transform Deliver Amazon S3 Amazon Redshift Amazon Elasticsearch Service AWS IoT Amazon Kinesis Agent Amazon Kinesis Streams Amazon CloudWatch Logs Amazon CloudWatch Events Apache Kafka
  15. 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Analytics – How it Works
  16. 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Examples 50 billion daily ad impressions, sub-50 ms responses Online stylist processing 10 million events/day Facilitate communications between 100+ microservices IoT predictive analytics Analyze billions of network flows in real-time Near-real-time home valuation (Zestimates) Live clickstream dashboards refreshed under 10’s 1 billion events per week from connected devices 100 GB/day clickstreams from 250+ sites Real-time game events analytics
  17. 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Yieldmo: Ad Metrics in Milliseconds • Understand user behavior in real time for billions of ad impressions • UseAmazon Kinesis to capture, process, and stream ad-impression data for analytics • Analyze ad-interactions in milliseconds • Real-time metrics on ad performance to advertisers • Optimize ad placements Amazon Kinesis makes it simple to scale our solution end to end, including the capture, processing, and delivery of actionable insights. This empowers our customers to better understand their user base.” – Indu Narayan Director of Data “ Yieldmo products help marketers and publishers build deeper engagements with their customers
  18. 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Yieldmo Architecture Ingest Process Deliver Store Analyze Website visitors Kinesis Data Streams Kinesis Data Analytics Kinesis Data Firehose Amazon S3 Data Warehouse Real-time ad metrics User profiles Recommendations Data SQL Insights
  19. 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Solution Accelerators
  20. 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  21. 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  22. 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Insights on AWS Account Activity https://aws.amazon.com/answers/account-management/real-time-insights-account-activity/
  23. 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time IoT Device Monitoring with Kinesis Data Analytics https://aws.amazon.com/answers/iot/real-time-iot-device-monitoring-with-kinesis/
  24. 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Web Analytics with Kinesis Data Analytics https://aws.amazon.com/answers/web-applications/real-time-web-analytics-with-kinesis/
  25. 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitor and Analyze VPC Network Traffic Amazon CloudWatch Logs Subscription Amazon Kinesis Data Analytics Amazon S3 bucket for data in Parquet/ORC format Amazon Kinesis Data Firehose Amazon Athena Amazon CloudWatch dashboardsmetrics & alarms Amazon VPC flow logs Coming soon to AWS Big Data Blog
  26. 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/blogs/big-data/analyzing-apache- parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon- athena-and-amazon-redshift/
  27. 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  28. 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/kinesis/ Links covered in presentation: • Real-Time Insights on AWS Account Activity - https://aws.amazon.com/answers/account- management/real-time-insights-account-activity/ • Real-Time IoT Device Monitoring with Kinesis Data Analytics - https://aws.amazon.com/answers/iot/real- time-iot-device-monitoring-with-kinesis/ • Real-Time Web Analytics with Kinesis Data Analytics - https://aws.amazon.com/answers/web- applications/real-time-web-analytics-with-kinesis/ • Monitor and Analyze VPC Network Traffic - https://aws.amazon.com/blogs/big-data/ • Streaming ingestion and conversion to Parquet - https://aws.amazon.com/blogs/big-data/analyzing- apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon- redshift/ Getting Started
  29. 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank You! https://aws.amazon.com/kinesis/ Amazon Kinesis
  30. 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. APPENDIX
  31. 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Data Streams 3rd Party Connectors
  32. 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Data Streams Producers and Consumers Producers Consumers Kinesis Agent Apache Kafka AWS SDK LOG4J Flume Fluentd AWS Mobile SDK Kinesis Producer Library Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon EMR AWS Lambda Apache Spark Amazon Kinesis
  33. 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Managed Ability to Capture & Store Data • Data streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours – 7 days • Scale Kinesis data streams by splitting or merging Shards • Replay data inside of 24 hours – 7 days window Now Time-based seek -24 hours 1:00–7:00 7:00–13:00 13:00–19:00 19:00–1:00 Kinesis Stream SplitMergeSplit Shard 1 Shard 2 Shard 1 Shard 2 Shard 3 Shard 1 Shard 2 Shard 1 Shard 2 Shard 3
  34. 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security and Compliance • SupportsVPC Endpoint powered by AWS PrivateLink • Supports server-side encryption and client-side encryption • Using SSL and HTTPS • Integrated with AWS Identity and Access Management (IAM) • FedRAMP, HIPAA, Soc
  35. 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cost-Effective • Pay-as-you-go pricing • No upfront cost and no minimum fees • Based on two dimensions: Shard-Hour: $0.015 PUT Payload Units (25K), per million units: $0.014 • Extended data retention, per shard hour: $0.020
  36. 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Data Streams vs. Apache Kafka Attribute Kafka Kinesis Streams Cost $$ $ (pay for what you use) Ease of use Advanced setup required Get started in minutes Management Overhead High Low Scalability Difficult to scale Scale in seconds with one click Throughput Infinite Scales with shards, supports up to 1mb payloads Durability Configurable 3x by default Infrastructure You manage AWS manages Write-to-Read Latency <100 ms is achievable 100–200 ms Open Sourced? Yes No
  37. 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Kafka with Kinesis Data Streams Download the Kafka-Kinesis Connector Library Kafka cluster Kafka Kinesis connector S3 Bucket (Archived Data/ Original data) Redshift (Data Warehousing) S3 Bucket (Transformed Data) EC2 (Custom app) EMR – Spark, Pig, Hive, etc. Athena Kinesis Data Firehose Transformation of incoming data Elasticsearch
  38. 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Analytics – How it Works
  39. 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Data Analytics Applications Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results 1011101 1011010 0101010 1011101 1011010 0101010
  40. 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Connect to Streaming Data Sources • Easily connect to Kinesis Data streams and Kinesis Data Firehose delivery streams • Automatic schema discovery which works for CSV and JSON data • Supports multiple event types, arbitrary object nesting, single level of array nesting AmazonKinesis DataStreams AmazonKinesis DataFirehose 1011101 1011010 0101010
  41. 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pre-process Data Streams Using Schema Editor Schema editor provides fine grained control of mapping to SQL columns
  42. 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pre-process Data Streams Using AWS Lambda AWS Lambda function AmazonKinesisDataAnalyticsapplication Raw data transformed data SQL codesource destination Built-in AWS Lambda integration provides flexible pre-processing ahead of SQL code for: • Normalizing 10s to 100s of different event types • Converting other data formats (AVRO, Protobuf, ZIP) to JSON and CSV • Custom enrichment from database tables or API calls
  43. 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Easily Write SQL code to Process Data Streams • Sub-second end to end processing latencies • SQL steps can be chained together in serial or parallel steps • Build applications with one or hundreds of queries • Pre-built functions include everything from sum and count distinct to machine learning algorithms • Aggregations run continuously using window operators SQL
  44. 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interactive SQL Editor Fast, iterative development with SQL templates in console to get started
  45. 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Writing Streaming SQL Streams (in memory tables) CREATE STREAM calls_per_ip_stream( eventTimeStamp TIMESTAMP, computationType VARCHAR(256), category VARCHAR(1024), subCategory VARCHAR(1024), unit VARCHAR(256), unitValue BIGINT );
  46. 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Writing Streaming SQL Pumps (continuous query) CREATE OR REPLACE PUMP calls_per_ip_pump AS INSERT INTO calls_per_ip_stream SELECT STREAM "eventTimestamp", COUNT(*), "sourceIPAddress" FROM source_sql_stream_001 ctrail GROUP BY "sourceIPAddress", STEP(ctrail.ROWTIME BY INTERVAL '1' MINUTE), STEP(ctrail."eventTimestamp" BY INTERVAL '1' MINUTE);
  47. 47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Aggregating Streaming Data? Aggregations (count, sum, min,…) take granular real- time data and turn it into insights Data is continuously processed so you need to tell the application when you want results Windows!
  48. 48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Window Types Sliding, tumbling, and custom windows Tumbling windows are fixed size and grouped keys do not overlap Source Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
  49. 49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Add a SQL table to your streaming application from Amazon S3 Periodically update the table by calling the update application API Enrich your Data Stream using Amazon S3 Data In-application stream AmazonKinesisDataAnalyticsapplication SQL code joining table and stream streaming source destination Amazon S3 In-application table
  50. 50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully Managed and Elastic • Once your input, code, and output are setup, you call a run application API • Service automatically scales the application without servers based on throughput and query complexity • For customers with >10 MB/sec throughput, fine grained parallelism control is provided
  51. 51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Automated Machine Learning Capabilities Anomaly detection Anomaly detection with explanations Hotspot detection Unsupervised AUTO Online Real-timeAdaptive
  52. 52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Usage Pattern 1 Web Analytics and Leaderboards AWS LambdaKinesis Data Streams Kinesis Data Analytics SQL Amazon Cognito Lightweight JS client code Web Server on Amazon EC2 or Amazon DynamoDB Table Computetop10usersIngestwebappdata Persisttofeedliveapps
  53. 53. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Usage Pattern 2 Monitoring IoT Devices AWS LambdaKinesis Data Streams Kinesis Data Analytics Computeaverage temperatureevery10secIngestsensordata Persisttimeseriesdataaggregations Amazon RDS MySQL DB instance Amazon CloudWatch IoT sensors AWS IoT SQL
  54. 54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Usage Pattern 3 Analyzing AWS CloudTrail Event Logs Amazon S3 bucket for raw data Chart.JS Dashboard AWS LambdaKinesis Data Analytics Amazon CloudWatch events trigger SQL Ingestrawlogdata Compute operationalmetrics Delivertoarealtime dashboardsandarchival AWS CloudTrail Amazon DynamoDB Table(s) Kinesis Data Firehose
  55. 55. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Appendix
  56. 56. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Zillow’s Near-Real-Time Home-Value Estimates • Needed to provide timely home valuations for all new homes • Runs Zestimate, its machine learning-based home- valuation tool, onAWS • Performs machine-learning jobs in hours instead of a day • Gives customers more accurate data on more than 100 million homes • Scales storage and compute capacity on demand We can compute Zestimates in seconds, as opposed to hours, by using Amazon Kinesis Streams and Spark on Amazon EMR.” – Jasjeet Thind Vice President of Data Science and Engineering “ Zillow provides online home information to tens of millions of buyers and sellers every day
  57. 57. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Ecosystem: Connectors Kafka Log4J Flume FluentD Attunity Informatica IoT Platforms Kinesis Agent MemSQL Quobole Anodot Spark Flink Storm Amazon Kinesis

×