Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS

2,284 views

Published on

What if you were told that within three months, you had to scale your existing platform from 1,000 req/sec (requests per second) to handle 300,000 req/sec with an average latency of 25 milliseconds? And that you had to accomplish this with a tight budget, expand globally, and keep the project confidential until officially announced by well-known global mobile device manufacturers? That’s what exactly happened to us. This session explains how The Weather Company partnered with AWS to scale our data distribution platform to prepare for unpredictable global demand. We cover the many challenges that we faced as we worked on architecture design, technology and tools selection, load testing, deployment and monitoring, and how we solved these challenges using AWS.

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jagmeet Chawla, Chief Architect, The Weather Channel Raul Frias, Solutions Architect, AWS October 2015 Scaling to 25 Billion Daily Requests Within 3 Months Building a Global Big Data Distribution Platform ARC346
  2. 2. What to Expect from the Session Building a Big Data Distribution Platform: - Goals - Architecture - Logical and Physical Components - Data Supply Chain, from Ingest to Distribution - Journey - Building, Tuning and Scaling the Platform - AWS Insights - Evolution of the Architecture Audience: - Engineering Leaders - Architects
  3. 3. Video Introduction
  4. 4. Video Conclusion
  5. 5. Background: The Weather Company We power weather for Apple, Facebook, Google, Microsoft, Twitter, Yahoo and many more Our B2B Division, WSI, has 4,600+ B2B clients in 60 countries. WHERE THE WORLD GETS ITS WEATHER #1 MOST DISTRIBUTED Cable Network 170M+ App Downloads 47.2M Unduplicated Monthly Uniques 124M+ Monthly Unique 72% visit 2x or more Daily
  6. 6. Background: A Data Company Data Network of 100K+ weather sensors Global Lightning Detection Network Global Radar & Location Data Largest Collection of Weather Data State-of-the-Science Forecasts Technologies Industry Best Forecast Modeling Proprietary Radar Algorithms Proprietary Weather Analytics 220+ Fulltime Meteorologists TWC Content (Video, Images, Articles) Weather APIs Content APIs 20+ TB Data Daily 800+ Sources of Ingest 40+ Billion API Requests Daily
  7. 7. Background: About Data Weather Data - Observations - Forecasts - Radar - Alerts - Notices - Emergency Bulletins - Health & Life Style Content - Articles - Images - Slide Shows - Videos - Maps Domain Specific - Aviation - Energy - Insurance
  8. 8. Background: Big Data - Push/Pull, every 5 minutes - Real Time Alerts & Notification - World’s most volatile atmospheric data - 15-20 sec. to prepare and serve - 800+ Partners - 50+ GB Raw compressed data - Several Billion Request / day Big Data Variety VolumeVelocity Textual data, structured, unstructured, binary data, pictures, images, videos
  9. 9. Background: About Distribution Digital - Weather.com, Wunderground.com - Mobile Apps on all Major Mobile OS Platforms Partnerships - Major Mobile Phone Company - Major Search Engine - Many Others … B2B - Major Airlines - Energy Trading Desks - Many Others … 40+ Billion API Requests / day Expect 60 Billion / day by EOY 2015 We power weather for Apple, Facebook, Google, Microsoft, Twitter, Yahoo and many more Our B2B Division, WSI, has 4,600+ B2B clients in 60 countries. 124M+ Monthly Unique 72% visit 2x or more Daily 170M+ App Downloads 47.2M Unduplicated Monthly Uniques
  10. 10. The Dark Ages: Before The Cloud - Run From TWC Data Centers - Slow Time To Market - Product - Content - Limited Distributed Scaling - Limits of our existing Data Centers - Batch Based Forecast Systems - Java Based Monolithic Applications - Big Web, Mobile Web - Data Services - Homegrown CMS
  11. 11. Business - Build a Low Latency Global On Demand Forecasting System - Build a Highly Scalable Global Data Distribution Platform - Reboot Digital Properties (weather.com, Mobile Apps, CMS) - Reduce time to deploy new data sets - Data Distribution APIs as Product - Secured/Metered access to APIs - Consolidate Data Centers Reboot & Reimagine: Goals Technical - 100% cloud based - Capable of handling billions of requests a day - Capable of ingesting & processing Terabytes of data a day - Low latency APIs (25-100 ms) - Highly Scalable - Highly Available (99.99) - Generic Data Processing Engine (DPE) - Developer Friendly APIs - Authentication, metering, and throttling
  12. 12. How we did it: Architecture Blueprint
  13. 13. Architecture: Component Layers - Large Undertaking – Divide & Conquer - Loosely Coupled Layered Architecture - Focus on your Core Competency - Best Tool/Technology for the job - Independent Delivery Timelines - DATA PLATFORM: Weather Data Distribution As A Service - Eat your own dog food! Data Processing Engine Data Services Storage Systems of Record GatewayCDN
  14. 14. Architecture: Data Processing Engine (DPE) - Generic DPE - API Driven - Data Agnostic - Extensible - Always on, Always flowing - Asynchronous, Non Blocking - High availability - Low latency - Horizontal scalability Data Processing Engine Data Services Storage Systems of Record GatewayCDN
  15. 15. Architecture: Data Processing Engine (DPE) Push/Pull Data Providers IAPI Rabbit MQ DPE Redis Riak S3 Rabbit MQ System Of Record (e.g. Forecast On Demand) DPE Core Plugin 1 Plugin 2 Plugin 3 - DPE Architecture - DPE Core - Custom Plugins for Process, Download, Store, Archive - Technical Stack - Java 1.7 - Storage (Redis) - Archive (Riak, S3) - Distribution – RabbitMQ - OS: Amazon-Linux (Centos 6 variant) - Ingestion API - RestFul Web Service - Messaging Queue - RabbitMQ Cluster - Workers - DPE
  16. 16. Architecture: Data Flow (DPE) Private Subnet RabbitMQ Cluster IAPI Endpoint AZ A AZ B Public Subnet Public Subnet Private Subnet Data Processing Engine Private Subnet Data Publisher Private Subnet
  17. 17. Architecture: Storage - Polyglot Architecture - Best Store for the Job - Most Cost Effective Storage for the Job - BYOS: Bring Your Own Store - Cache Rich! Data Processing Engine Data Services Storage Systems of Record GatewayCDN
  18. 18. Architecture: Storage Polyglot - Archive - Images - Videos Bucket Key/Value Master Slaves - Real-time Data and Caching Key/Value Node NodeNode Node Key/Value - Historical Weather Archive - Data Migration - Gateway Data - Analytics Node NodeNode Node Columnar - Analytics Parquet Columnar Storage Repositories MySQL SQL Server - Informatica - Drupal
  19. 19. Architecture: Cache is your friend! CDN Master Slaves - App Cache Key/Value (with data types for values) - Origin Cache - Edge Caching - Edge Compute - Make Sure All Data Elements are TTL Driven - Always Respect Cache Control Headers VarnishEC2 EC2 App Instances EC2 EC2 - And Keep It Simple!
  20. 20. Architecture: Systems Of Record - Let the system designers focus on the problem they are trying to solve - Let them pick the best technology - Just Make sure they interface using standard protocols - Let DPE handle Ingest - Let Services Layer handle Distribution - Support both Push/Pull model for publication to distribution engineData Processing Engine Data Services Storage Systems of Record GatewayCDN
  21. 21. Architecture: Systems of Record Forecast On Demand CMS GET Model Post Model Forecast On Demand Data Services Data Services Content Management system Get: On Cache Miss Post: On Publish RESTFul End Point Currents On Demand GET Model Currents On Demand Data Services Get: On Cache Miss
  22. 22. Architecture: Data Services Data Processing Engine Data Services Storage Systems of Record - RestFul API Design - Stateless - Decoupled - Atomic / Aggregation Services - Support both Push/Pull Model - API Key driven Auth/Metering - Horizontally Scalable - Capable of serving billions of request / day - Data lends well to caching GatewayCDN
  23. 23. Architecture: Distribution – Weather Data Redis Riak OAPI API Gateway CDN API Users FOD Dispatcher COD Dispatcher Aggregate Engine COD Cache FOD Cache Outbound API (OAPI) - Fine grained RESTful API - Intelligent Cache Management - Accesses datastores, system of records and other services Aggregate Engine - Aggregates fine grained APIs - Aggregates at Edge through CDN ESI
  24. 24. Architecture: Request Flow AZ A AZ B Public Subnet Public Subnet Private Subnet Internet Private Subnet OAPI FOD Cache COD Cache FOD COD OAPI
  25. 25. Distribution Services Architecture: Distribution – Content (Articles, Images, Video) D R U P A L C M S Metadata Store Images Videos Asset Metadata Image Cut Service Video Distribution Services Generic Asset Service mRSS Feeds Metadata Metadata Static Asset Pools S3
  26. 26. Architecture: Gateway Data Processing Engine Data Services Storage Systems of Record GatewayCDN - Authentication - Routing - Metering - Throttling - CDN Aware, CDN Driven - Remember 25ms latency target! - We rolled our own
  27. 27. Architecture: Gateway API Users CDN Authentication, metering, Throttling Quick Response Caching routingOrigin routing Source of Authentication Truth - User makes API request - CDN checks authorization - Look Aside - If authorized, check cache - If cache-miss, hit origin caching/routing - If origin cache-miss, pass through to backend servers
  28. 28. Architecture: The Other Side – Events & Analytics! Data Lake Operational Analytics Business Analytics Executive Dashboards Data Discovery Data Science 3rd Party System Integration Stream Processing Long Term Raw Storage Short Term Storage and Big Data Processing Consumers Amazon SQS Streaming Custom Ingestion Pipeline Events 3rd Party Other DBs S3 Batch Sources Streaming Sources ETL Data Access SQL
  29. 29. Architecture: Putting it all together Data Processing Engine Data Services Storage Systems of Record GatewayCDN
  30. 30. Architecture: Implementation Global Region 2 Global Region 3 Global Region 4 Global Region 1 Global Traffic Management and CDN Remote Ingestion Remote Ingestion FOD FOD FOD Global Region 2 MonitoringConfiguration Mgmt Automation Partner Data Sources: (Weather, Alerts, Traffic, etc) Distribution Engine Distribution Engine Distribution Engine FOD Distribution Engine
  31. 31. And while we were building it …
  32. 32. A curve ball ! Challenge: • New deal struck with a MAJOR mobile phone company • Ship new API • Time to Market = 3 months • Scale to 25+ billion requests per day
  33. 33. Some findings Architecture Already Decoupled - Focus on Scaling Distribution Layer Findings in Cycle: - Load Testing / Tuning - VPC NAT Saturation - DNS Servers Sizing - Instance Types and Characteristics - OS Kernel Limits - Destructive Testing / Fixing - Brought Down instances, AZs, Regions - Corrupted caches, databases Load Test Tune Destructive Test Fix
  34. 34. KEY TAKEAWAY It takes time to figure all this out … so please budget time and resources for both load and destructive testing
  35. 35. AWS Insights
  36. 36. Leverage AWS Managed Services • Amazon Route 53 – DNS • Amazon RDS – Relational DBs • Amazon DynamoDB – NoSQL DBs • Amazon ElastiCache – Redis or Memcached • Amazon SQS - Queuing • Amazon Redshift – Data Warehouse • Amazon Kinesis – Stream Storage • AWS Lambda – “Code as a Service” Data Processing Engine Data Services Storage Systems of Record GatewayCDN
  37. 37. Leverage AWS Managed Services • Amazon Route 53 – DNS • Amazon RDS – Relational DBs • Amazon DynamoDB – NoSQL DBs • Amazon ElastiCache – Redis or Memcached • Amazon SQS - Queuing • Amazon Redshift – Data Warehouse • Amazon Kinesis – Stream Storage • Lambda – “Code as a Service” Data Processing Engine Data Services Storage Systems of Record GatewayCDN
  38. 38. Why RDS vs. EC2-based RDMS Independent of RDBMS • Licensing • Replication engine: • Backups • Updates MySQL, Oracle, Postgres MS SQL Amazon Aurora Max. IOPS 20,000 10,000 100,000s Max. TBs 6 4 64 Storage
  39. 39. Which NoSQL? + Write performance more critical than durability + Native multi-X replication + Ecosystem – Repartitioning – Operational burden – Data transfer cost + “Zero downtime” + Cross-region replication – Repartitioning – Operational burden – Data transfer cost + Managed solution + Easy to scale + Constantly Evolving – Item size – Cross-region replication Storage DynamoDB
  40. 40. Stream Storage Building a DPE – AWS Style Decouple producers & consumers Temporary buffer Preserve client ordering Streaming MapReduce 4 4 3 3 2 2 1 14 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 4 3 3 2 2 1 1 Producer 1 Shard 1 Shard 2 Consumer 1 Count of Red = 4 Count of Violet = 4 Consumer 2 Count of Blue = 4 Count of Green = 4 Producer 2 Producer 3 Producer N Key = Red Key = Green Data Processing Engine
  41. 41. Which Stream Store Should I Use? Amazon Kinesis and Apache Kafka have many similarities • Multiple consumers • Ordering of records • Streaming MapReduce • Low latency • Highly durable, available, and scalable Differences • Record lifetime: 24 hours in Amazon Kinesis, configurable in Kafka • Record size: 1MB/record in Amazon Kinesis, configurable in Kafka • Amazon Kinesis is a fully managed service • Easier to provision, manage, and scale Data Processing Engine
  42. 42. Server-less Approach to DPE Data Input Amazon Kinesis Action AWS Lambda Data Output IT application activity Capture the stream Audit Process the stream SNS Metering records Condense Redshift Change logs Backup S3 IoT Device Data Store RDS Transaction orders Process SQS Server health metrics Monitor EC2 Data Processing Engine
  43. 43. Evolution
  44. 44. Architectural Evolution: Micro-services Approach GTM/CDNUser ForecastAggregationLocation VarnishVarnish Varnish Common Services Layer – Router & Controller Auth & Metering Lifestyle Varnish Storage Polyglot Micro DPE
  45. 45. Architectural Evolution: Technical Stack Ingest - Queue: - Amazon SQS - Stream - Kafka - Micro DPE - Avro - Thrift - Proto-buffs - Micro-Services Type of Model For Ingest Distribution - Micro Services - Language Polyglot - Service Discovery Storage - Amazon Aurora - BYOS Analytics - Parquet + Amazon S3 - Spark - Amazon EMR
  46. 46. Wrapping Up! - Have an Architectural Blueprint - Keep Decoupled or Loosely Coupled Layers - Communication via Standard Protocols - Keep Architectural Plan “Technology Agnostic” - Storage Polyglot - Language Polyglot - Be Aware of the Monoliths! - Keep Caching Architecture Simple – TTL Driven - Always Budget for - Load Testing - Destructive Testing
  47. 47. Related Sessions ARC309 - From Monolithic to Microservices: Evolving Architecture Patterns in the Cloud - Thursday ARC301 - Scaling Up to Your First 10 Million Users - Thursday BDT310 - Big Data Architectural Patterns and Best Practices on AWS – Today 2:45 PM BDT403 - Best Practices for Building Real-time Streaming Applications with Amazon Kinesis - Thursday
  48. 48. Remember to complete your evaluations!
  49. 49. Thank you!

×