Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming Data Processing with Amazon Kinesis

1,654 views

Published on

Realtor.com enables realtors to connect with home buyers. With AWS services, Realtor.com built a new advertising solution that allows realtors to launch marketing campaigns in real-time and scales to hundreds of millions of ad impressions every day. In this session, learn how Realtor.com architected their solution using Amazon Kinesis Streams, Amazon Kinesis Firehose, AWS Lambda, and Amazon Redshift to track native ad impressions on their site and mobile app. Realtor.com will share lessons learned and tips for getting the most out of streaming data services on AWS. We will also provide an overview of how to get started with real-time, streaming data using Amazon Kinesis services.

Published in: Technology
  • Positions Available Now! We currently have several openings for writing workers.  https://tinyurl.com/vvgf8vz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Streaming Data Processing with Amazon Kinesis

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. July 13, 2016 Streaming Data Processing with Amazon Kinesis Alan Lewis, Principal Architect, Realtor.com Ray Zhu, Sr. Product Manager, AWS
  2. 2. What to expect from this session Amazon Kinesis: Getting Started with streaming data on AWS • Streaming scenarios • Amazon Kinesis Streams overview • Amazon Kinesis Firehose overview • Firehose getting started experience • Amazon Kinesis at Realtor.com
  3. 3. Need to go a bit faster
  4. 4. Scenarios Accelerated Ingest- Transform-Load Continual Metrics Generation Responsive Data Analysis Data Types IT logs, applications logs, social media / clickstreams, sensor or device data, market data Ad/ Marketing Tech Publisher, bidder data aggregation Advertising metrics like coverage, yield, conversion Analytics on user engagement with ads, optimized bid / buy engines IoT Sensor, device telemetry data ingestion IT operational metrics dashboards Sensor operational intelligence, alerts, and notifications Gaming Online customer engagement data aggregation Consumer engagement metrics for level success, transition rates, CTR Clickstream analytics, leaderboard generation, player-skill match engines Consumer Engagement Online customer engagement data aggregation Consumer engagement metrics like page views, CTR Clickstream analytics, recommendation engines Streaming data scenarios across segments 1 2 3
  5. 5. Amazon Kinesis Services make it easy to capture, deliver, and process streams on AWS Amazon Confidential In Preview Amazon Kinesis Streams Stores data as a continuous replayable stream for custom applications Amazon Kinesis Firehose Load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service Amazon Kinesis Analytics Analyze data streams using standard SQL queries
  6. 6. Amazon Kinesis Streams
  7. 7. Amazon Kinesis Streams Store data as a continuous stream Easy administration: Simply create a new stream and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming big data using Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.
  8. 8. Amazon Kinesis Firehose
  9. 9. Amazon Kinesis Firehose Load massive volumes of streaming data into destinations Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure. Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. Seamless elasticity: Seamlessly scale to match data throughput without intervention. Capture and submit streaming data to Firehose Firehose loads streaming data continuously into Amazon S3 and Amazon Redshift Analyze streaming data using your favorite BI tools
  10. 10. Amazon Kinesis Firehose Customer Experience
  11. 11. Amazon Kinesis Firehose console experience Unified console experience for Firehose and Streams
  12. 12. Amazon Kinesis Firehose console (Amazon S3) Create fully managed resources for delivery without building an app
  13. 13. Amazon Kinesis Firehose console (Amazon S3) Configure data delivery options simply using the console
  14. 14. Amazon Kinesis Firehose console (Amazon Redshift) Configure data delivery to Amazon Redshift simply using the console
  15. 15. Amazon Kinesis Firehose console (Amazon ES) Configure data delivery to Amazon ES simply using the console
  16. 16. Amazon Kinesis Firehose monitoring Visibility into and transparency of data delivery
  17. 17. Amazon Kinesis Firehose monitoring Error logging for troubleshooting delivery failures
  18. 18. Amazon Kinesis Firehose pricing Simple, pay-as-you-go, and no upfront costs Dimension Value Per 1 GB of data ingested $0.035
  19. 19. Kinesis at Realtor.com
  20. 20. What I’d like you to take away Amazon Kinesis is: • Simple, reliable, and offers high performance • A transformative building block with broad applicability • An enabler for “real time everywhere”
  21. 21. About Realtor.com First national US real estate search site Most accurate real estate content Gets data from 99% of MLSs 55 million unique users in April
  22. 22. Realtor.com cloud strategy Going “all in” on cloud, most on AWS About ½ done – BI, search, geo services, photos all in AWS now Strong bias towards AWS managed services
  23. 23. Customer problem My listings get lots of traffic at start, but less over time I only want people searching for relevant listings I want to get more brand exposure in search
  24. 24. Solution: “Turbo listings” product Native ad product that provides customers more exposure in search 100% relevant placements, and are like any other listing Shows the agent profile photo in search
  25. 25. Turbo technical requirements Extreme availability and throughput Multiple systems, both inside and outside VPCs (and inside/outside AWS) Auditable, secure billing database
  26. 26. Why Kinesis? Great performance Multiproducer, multiconsumer queues Worry-free managed service
  27. 27. Turbo architecture AWS AWS Mobile Native Apps Decrement impressions API Create Campaign API Update Campaign API Delete Campaign API Campaign Expired? Count Reached zero? False True True Campaign Manager
  28. 28. Impression data { "campaign_id": "01d329aa-9eb2-426c-9b7b-4877a32fb176", "id": "a34f271f-058d-47ba-9d45-8140261742a0", "listing_id": 593893632, "property_id": 1258201259, "advertiser_id": "8675309", "event_type": "turbo_search_impression", "producer": "fesl", "client_source": "rdc_web", "client_version": "8.0", "page_variation": "list_view", "timestamp": "2016-03-02T00:47:25+00:00", "user_agent": "...” }
  29. 29. Impression tracking flow AWS Lambda Pull events Amazon RDS Amazon EC2 Amazon Kinesis Streams Post to web service Decrement in DB Campaign manager
  30. 30. Billing flow Amazon DynamoDB Amazon Redshift AWS Lambda Amazon S3 Amazon Kinesis Streams Amazon Kinesis Firehose AWS KMS Private subnet AWS Lambda AWS Lambda Validate event Firehose PutRecord Firehose destination SSE-KMS encryption on Amazon S3 Amazon S3 notification Status tracking Event source COPY command KMS encryption on Amazon Redshift Data transfer In JSON Event data in JSON
  31. 31. Redshift – 15 minute batches
  32. 32. Outcomes: Huge scale Serving millions of impressions per day on 2 Kinesis shards Tested up to 20x current site traffic Basically, we couldn’t break it
  33. 33. Outcomes: Great performance Latencies in single or low double digit milliseconds Events are processed in small batches for efficiency For our purposes, Kinesis gives us real time data streaming
  34. 34. Lessons learned Complexity with Amazon Redshift and private subnets Must consider what dedupe behavior you need Simple key–value data JSON structure pays dividends
  35. 35. Future: Real time pipeline Real time is the pinnacle Collect data on page 1, and act on page 2 What we’ve built on Kinesis with the turbo feature is the starting point for us Photo by @snordq on Flickr. Creative Commons License
  36. 36. What I’d like you to take away Amazon Kinesis is: Simple, reliable, and offers high performance A transformative building block with broad applicability An enabler for “real time everywhere”
  37. 37. One final thing… Hiring! Search for “realtor.com careers” (careers.move.com) Software engineers, QA engineers, data scientists, product managers, and project managers In Santa Clara, Ventura County, Vancouver, Canada, and Morgantown, WV Thank you: Eddy Luten, Viren Nagtode, and Sonal Shirke
  38. 38. Thank you!

×