Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DAT324_Expedia Flies with DynamoDB Lightning Fast Stream Processing for Travel Analytics

278 views

Published on

Building rich, high-performance streaming data systems requires fast, on-demand access to reference data sets, to implement complex business logic. In this talk, Expedia will discuss the architectural challenges the company faced, and how DAX + DynamoDB fits into the overall architecture and met their design requirements. Additionally, you will hear how DAX that enabled Expedia to add caching to their existing applications in hours, which previously was taking much longer. Session attendees will walk away with three key outputs: 1) Expedia’s overall architectural patterns for streaming data 2) how they uniquely leverage DynamoDB, DAX, Apache Spark, and Apache Kafka to solve these problems 3) the value that DAX provides and how it enabled them to improve our performance and throughput, reduce costs, and all without having to write any new code.

  • Be the first to comment

DAT324_Expedia Flies with DynamoDB Lightning Fast Stream Processing for Travel Analytics

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Expedia Flies with DynamoDB: Lightning Fast Stream Processing for Travel Analytics B r a n d o n O ’ B r i e n P r i n c i p a l E n g i n e e r – E x p e d i a , I n c . @ h a k c z a r N o v e m b e r 2 8 , 2 0 1 7 D A T 3 2 4
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Topics 1. Context: Real-Time Travel Analytics at Expedia 2. Patterns for Reference Data in Streaming Systems 3. Streaming Data Systems Architecture and Performance with DynamoDB + DAX
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Travel Analytics @ Expedia, Inc. Brands Expedia Hotels.com trivago HomeAway ebookers Wotif Egencia Orbitz … Products Hotels Flights Cars Cruises VR Activities Insurance … Expedia, Inc. two-sided, multi-product, multi-brand global travel marketplace
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Travel Analytics @ Expedia, Inc. Expedia, Inc. scale • Over 500K lodging properties on Core OTA platforms • Nearly 1.5M online bookable HomeAway listings • 500+ airlines • 25K+ activities • 150+ car rental companies • 600M+ monthly site visits to EI sites (2016) • 15B+ annual air searches (2016) • 75M+ monthly air shoppers (2016)
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Travel Analytics @ Expedia, Inc. • Real-time user clickstream: • Live view of demand patterns for suppliers • Live pricing statistics for shoppers • Live view of market for internal business & operations • Example use case: Real-time updated price statistics
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Travel Analytics @ Expedia, Inc. • Real-time user clickstream: • Live view of demand patterns for suppliers • Live pricing statistics for shoppers • Live view of market for internal business & operations • Example use case: Real-time updated price statistics
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Travel Analytics @ Expedia, Inc. • Reference Data: • On-demand reference data needed to unlock richer functional capability • Conceptually simple: A key-value lookup and a hash-join • In practice, at scale: several constraints & trade-offs to consider
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Travel Analytics @ Expedia, Inc. • Reference Data: • On-demand reference data needed to unlock richer functional capability • Conceptually simple: A key-value lookup and a hash-join • In practice, at scale: several constraints & trade-offs to consider • Reference Domain Layer: allow reuse of data in-situ across multiple apps in a streaming data ecosystem
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Pareto Distribution (Power Law)
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Pareto Distribution (Power Law) Organic data distribution
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Pareto Distribution (Power Law) • Cities • Travel Dates • Hotels • Flight Routes Organic data distribution
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Pareto Distribution (Power Law) • Cities • Travel Dates • Hotels • Flight Routes High cache hit % But large data set Organic data distribution
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Pareto Distribution (Power Law) • Cities • Travel Dates • Hotels • Flight Routes Organic data distribution High cache hit % But large data set
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Requirements Fast, random access Scalable throughput Availability Durability Mutability Scalable storage Reusability Simplicity & Flexibility Operational overhead SystemDataPerf Performance Data Management Wider System
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead SystemDataPerf
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 1. In-Memory/In-Heap • Very fast, but storage not scalable • Slows down streaming processor startup time • Updating ref data may be difficult SystemDataPerf Requirements ☑ Fast, random access ☑ Scalable throughput ☑ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 2. Streaming processor disk (ex: RocksDB) • Fast, scalable storage • May make streaming processor node deployment/management complex • Updating ref data may be difficult SystemDataPerf Requirements ☑ Fast, random access ☑ Scalable throughput ☑ Availability ☑ Durability ☐ Mutability ☑ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 3. Direct Service Call • Goes right to the source – up to date data • Durability & scalable storage • May or may be able to handle read-load SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☑ Durability ☑ Mutability ☑ Scalable storage ☑ Reusability ☑ Simplicity & Flexibility ☑ Operational overhead
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 3. Direct Service Call • Goes right to the source – up to date data • Durability & scalable storage • May or may be able to handle read-load • Have to explain to the team you DDOS’d your own website SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☑ Durability ☑ Mutability ☑ Scalable storage ☑ Reusability ☑ Simplicity & Flexibility ☑ Operational overhead ☐ Won’t DDOS your site
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 4. Redis/Amazon ElastiCache *As of 2.8.x. 3.2.4 now offers sharding to address scalability + availability • Fast & easily updateable • Shared data source – across ecosystem of streaming apps • Availability & durability story not great. Default settings on 2.8.x allow for ElastiCache crash & data loss. SystemDataPerf Requirements ☑ Fast, random access ☑ Scalable throughput ☐ Availability* ☐ Durability ☑ Mutability ☐ Scalable storage* ☑ Reusability ☑ Simplicity & Flexibility ☐ Operational overhead
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 5. Cassandra • Fast & highly available (no SPOF) • Linear scalability for storage & throughput • Operational overhead can be high (.5 dev in our case) SystemDataPerf Requirements ☑ Fast, random access ☑ Scalable throughput ☑ Availability ☑ Durability ☑ Mutability ☑ Scalable storage ☑ Reusability ☑ Simplicity & Flexibility ☐ Operational overhead
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 6. DB (ex: MySQL) • Reduced operational overhead • Great durability, reasonable scalability • Shared data source SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☑ Availability ☑ Durability ☑ Mutability ☑ Scalable storage ☑ Reusability ☑ Simplicity & Flexibility ☑ Operational overhead
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 7. DB + Cache • Increased read performance & throughput • Shared source – amortize data access cost across nodes • Streaming processor needs to talk to multiple data sources, more moving pieces to manage SystemDataPerf Requirements ☑ Fast, random access ☑ Scalable throughput ☑ Availability ☑ Durability ☑ Mutability ☑ Scalable storage ☑ Reusability ☑ Simplicity & Flexibility ☐ Operational overhead
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. ??? SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns Patterns 1. In-Memory/In-Heap 2. Streaming processor disk (ex: RocksDB) 3. Direct Service Call 4. Redis/Amazon ElastiCache 5. Cassandra 6. DB (ex: MySQL) 7. DB + Cache 8. DynamoDB + DAX SystemDataPerf Requirements ☐ Fast, random access ☐ Scalable throughput ☐ Availability ☐ Durability ☐ Mutability ☐ Scalable storage ☐ Reusability ☐ Simplicity & Flexibility ☐ Operational overhead
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 8. DynamoDB + DAX • Fast, available, scalable • Low maintenance & easy to update • Shared data source SystemDataPerf Requirements ☑ Fast, random access ☑ Scalable throughput ☑ Availability ☑ Durability ☑ Mutability ☑ Scalable storage ☑ Reusability ☑ Simplicity & Flexibility ☑ Operational overhead
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Data Access Patterns 8. DynamoDB + DAX + In-Heap • Fast, available, scalable • Low maintenance & easy to update • Shared data source SystemDataPerf Requirements ☑ Fast, random access ☑ Scalable throughput ☑ Availability ☑ Durability ☑ Mutability ☑ Scalable storage ☑ Reusability ☑ Simplicity & Flexibility ☑ Operational overhead
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming Architecture w/ DynamoDB + DAX
  37. 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DynamoDB + DAX: Performance System & Data Setup • DAX • 1 x dax.r3.large (13G) • TTL: 5 min • Eventually consistent reads • DynamoDB • Dataset cardinality: ~1M • Small items
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DynamoDB + DAX: Performance System & Data Setup • DAX • 1 x dax.r3.large (13G) • TTL: 5 min • Eventually consistent reads • DynamoDB • Dataset cardinality: ~1M • Small items Per item access time • DAX Cache hit: <1ms • DAX Cache miss: 11ms • DynamoDB: 10ms
  39. 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DynamoDB + DAX: Performance System & Data Setup • DAX • 1 x dax.r3.large (13G) • TTL: 5 min • Eventually consistent reads • DynamoDB • Dataset cardinality: ~1M • Small items Low volume • ~80k items/sec (~3.5K req/s) • Cache hit rate ~80% • CPU: 5% • DynamoDB RCU: 100 Per item access time • DAX Cache hit: <1ms • DAX Cache miss: 11ms • DynamoDB: 10ms
  40. 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DynamoDB + DAX: Performance System & Data Setup • DAX • 1 x dax.r3.large (13G) • TTL: 5 min • Eventually consistent reads • DynamoDB • Dataset cardinality: ~1M • Small items Low volume • ~80k items/sec (~3.5K req/s) • Cache hit rate ~80% • CPU: 5% • DynamoDB RCU: 100 Per item access time • DAX Cache hit: <1ms • DAX Cache miss: 11ms • DynamoDB: 10ms High volume • ~300k items/sec (~15k req/s) • Cache hit rate >99% • CPU: 80% • DynamoDB RCU: 500
  41. 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key Design Benefits • Balances Performance/Scalability/Availability/Durability/Mutability • In-situ reusability across streaming systems • Low operational overhead • Extremely cost effective (and performant) - if data access pattern sufficiently non-uniform • Access to reference data should be simple – so keep it simple
  42. 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tips for Performance & Availability • BatchGetItem on DAX: 10x-20x speed up. Batch size 25 vs. 1 • DynamoDB client (not DAX), not observed consistent performance improvements with keep-alive. Huge perf improvement with BatchGetItem though. • Save bandwidth: For large objects, use projection to specify subset of fields you want to return to economize bandwidth • Partition dilution: Choose high cardinality partition keys and avoid unnecessarily throughput scaling
  43. 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tips for Performance & Availability • DynamoDB/DAX client is threadsafe, but instantiation time is only ~20ms. • Spark – Catch ProvisionedThroughputExceededException, use exponential backoff, pause streaming processor, don’t update Kafka offsets, rely on Kafka consumer lag alerting. • Spark – Singleton LRU cache (ex: Guava) in Executor JVM’s to store reference data, to reuse across tasks and RDD partitions.
  44. 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! B r a n d o n O ’ B r i e n P r i n c i p a l E n g i n e e r - E x p e d i a , I n c . @ h a k c z a r D AT 3 2 4 : E x p e d i a F l i e s w i t h D y n a m o D B : L i g h t n i n g F a s t S t r e a m P r o c e s s i n g f o r T r a v e l A n a l y t i c s

×