Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS Real-Time Event Processing

8,929 views

Published on

Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).

Published in: Technology

AWS Real-Time Event Processing

  1. 1. April 21, 2015 Seattle AWS Big Data Platform
  2. 2. Agenda Overview 10:00 AM Registration 10:30 AM Introduction to Big Data @ AWS 12:00 PM Lunch + Registration for Technical Sessions 12:30 PM Use Case Technical Deep Dive Sessions •  Data Collection and Storage •  Real-time Event Processing •  Analytics
  3. 3. Collect   Process   Analyze   Store   Data Collection and Storage Data Processing Data Analysis Event Processing Primitive Patterns S3 Kinesis DynamoDB RDS  (Aurora) MySQL AWS  Lambda KCL  Apps EMR Redshi? Machine Learning
  4. 4. Real-Time Event Processing •  Examples:
  5. 5. Processing framework
  6. 6. Two main processing patterns
  7. 7. Real-time event processing frameworks Kinesis Client Library AWS Lambda
  8. 8. Amazon KCL
  9. 9. Shard 1 Shard 2 Shard 3 Shard n Shard 4 KCL Worker 1 KCL Worker 2 EC2 Instance KCL Worker 3 KCL Worker 4 EC2 Instance KCL Worker n EC2 Instance Kinesis Kinesis Client Library (KCL)
  10. 10. KCL Design Components KCL restarts the processing of the shard at the last known processed record if a worker fails
  11. 11. Processing with Kinesis Client Library •  Connects to the stream and enumerates the shards •  Instantiates a record processor for every shard it manages •  Checkpoints processed records in Amazon DynamoDB •  Balances shard-worker associations when the worker instance count changes •  Balances shard-worker associations when shards are split or merged
  12. 12. Best practices for KCL applications
  13. 13. Amazon Kinesis Connector S3 Dynamo DB Redshift Kinesis
  14. 14. Amazon Kinesis connector application Connector Pipeline Transformed Filtered Buffered Emitted Incoming Records Outgoing to Endpoints
  15. 15. Real-time Monitoring dashboard with KCL Amazon Kinesis Kinesis-enabled Application Producer on Amazon EC2 Amazon DynamoDB Dashboard on Amazon EC2 2 sec sliding-window analysis over streaming clickstream data
  16. 16. Monitoring Demo Kinesis Client Library
  17. 17. AWS Lambda
  18. 18. Event-Driven Compute in the Cloud
  19. 19. No Infrastructure to Manage
  20. 20. Automatic Scaling
  21. 21. Bring your own code
  22. 22. Fine-grained pricing Free Tier 1M requests and 400,000 GB-s of compute. Every month, every customer. Never pay for idle.
  23. 23. Data Triggers: Amazon S3 Amazon S3 Bucket Events AWS Lambda Original image Thumbnailed image 1 2 3
  24. 24. Data Triggers: Amazon DynamoDB AWS LambdaAmazon DynamoDB Table and Stream Send Amazon SNS notifications Update another table
  25. 25. Calling Lambda Functions
  26. 26. Writing Lambda Functions
  27. 27. How can you use these features? “I want to send customized messages to different users” SNS + Lambda “I want to send an offer when a user runs out of lives in my game” Amazon Cognito + Lambda + SNS “I want to transform the records in a click stream or an IoT data stream” Amazon Kinesis + Lambda
  28. 28. Real-Time Alerting Demo AWS Lambda
  29. 29. Stream Processing Apache Spark Apache Storm Amazon EMR
  30. 30. Read Data Directly into Hive, Pig, Streaming and Cascading Real time sources into Batch Oriented Systems Multi-Application Support & Check-pointing Amazon EMR integration
  31. 31. CREATE  TABLE  call_data_records  (      start_time  bigint,      end_time  bigint,      phone_number  STRING,      carrier  STRING,      recorded_duration  bigint,      calculated_duration  bigint,      lat  double,      long  double   )   ROW  FORMAT  DELIMITED   FIELDS  TERMINATED  BY  ","   STORED  BY   'com.amazon.emr.kinesis.hive.KinesisStorageHandler'   TBLPROPERTIES("kinesis.stream.name"=”MyTestStream");   Amazon EMR integration: Hive
  32. 32. DStream RDD@T1 RDD@T2 Messages Receiver Spark Streaming – Basic concepts http://spark.apache.org/docs/latest/streaming-kinesis-integration.html
  33. 33. Spark Streaming
  34. 34. Processing Amazon Kinesis streams Amazon Kinesis Spark- Streaming
  35. 35. Weblog Demo Kinesis + Spark Streaming
  36. 36. Storm
  37. 37. Apache Storm: Basic Concepts https://github.com/awslabs/kinesis-storm-spout
  38. 38. Launches Workers Storm architecture Master Node Cluster Coordination Worker Processes Worker Nimbus Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Worker Worker Worker
  39. 39. Real-time: Event-based processing Kinesis Storm Spout Producer Amazon Kinesis Apache Storm ElastiCache (Redis) Node.js Client (D3) http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement- a-Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache
  40. 40. Thank You

×