Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalable Streaming Data Pipelines with Redis - Avram Lyon, Scopely

153 views

Published on

Scopely’s portfolio of social and mid-core games
generates billions of events each day, covering everything from in-game actions to advertising to game engine performance.As this portfolio grew in the past two years, Scopely moved all event analysis from third-party hosted solutions to a new
event analytics pipeline on top of Redis and Kinesis, dramatically reducing operating costs and enabling new real-time analysis and more efficient warehousing. Our solution receives events over HTTP and SQS and provides real-time aggregation using a
custom Redis-backed application, as well as prompt loads into HDFS for batch analyses. Recently, we migrated our real-time layer from a pure Redis datastore to a hybrid datastore with recent data in Redis and older data in DynamoDB, retaining
performance while further reducing costs. In this session we will describe our experience building, tuning and monitoring this pipeline, and the role of Redis in supporting handling of Kinesis worker failover, deployment, and idempotence, in addition to its more visible role in data aggregation. This session is intended be helpful for those building streaming data systems and intended be helpful for those building streaming data systems and those looking for solutions for aggregation and idempotence.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Scalable Streaming Data Pipelines with Redis - Avram Lyon, Scopely

  1. 1. Scalable Streaming Data Pipelines with Redis Avram Lyon @ajlyon / github.com/avram redisconf / May 10, 2016
  2. 2. MOBILE GAMES - PUBLISHER AND DEVELOPER
  3. 3. What kind of data? • App opened • Killed a walker • Bought something • Heartbeat • Memory usage report • App error • Declined a review prompt • Finished the tutorial • Clicked on that button • Lost a battle • Found a treasure chest • Received a push message • Finished a turn • Sent an invite • Scored a Yahtzee • Spent 100 silver coins • Anything else any game designer or developer wants to learn about
  4. 4. How much? Recently: Peak: 2.8 million events / minute 2.4 billion events / day
  5. 5. Primary Data Stream Collection Kinesis Warehousing Enrichment Realtime MonitoringKinesisPublic API
  6. 6. Collection HTTP Collection SQS SQS SQS Studio A Studio B Studio C Kinesis SQS Failover Redis Caching App Configurations System Configurations
  7. 7. Kinesis SQS Failover S3 Kinesis Elasticsearch? Enricher Data Warehouse Forwarder Ariel (Realtime) Idempotence Aggregation Idempotence Idempotence
  8. 8. What’s in the box? =
  9. 9. Where does this flow? Ariel / Real-Time Operational monitoring Business alerts Dashboarding Data Warehouse Funnel analysis Ad-hoc batch analysis Reporting Behavior analysis Elasticsearch Ad-hoc realtime analysis Fraud detection Top-K summaries Exploration Ad-Hoc Forwarding Data integration with partners Game-specific systems
  10. 10. Kinesis a short aside
  11. 11. Kinesis • Distributed, sharded streams. Akin to Kafka. • Get an iterator over the stream— and checkpoint with current stream pointer occasionally. • Workers coordinate shard leases and checkpoints in DynamoDB (via KCL) Shard 0 Shard 1 Shard 2
  12. 12. Shard 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Checkpointing Checkpoint for Shard 0: 10 Given: Worker checkpoints every 5 Worker A 🔥 Worker B
  13. 13. Auxiliary Idempotence • Idempotence keys at each stage • Redis sets of idempotence keys by time window • Gives resilience against various types of failures
  14. 14. Auxiliary Idempotence
  15. 15. Auxiliary Idempotence • Gotcha: Set expiry is O(N) • Broke up into small sets, partitioned by first 2 bytes of md5 of idempotence key
  16. 16. Collection Kinesis Warehousing Enrichment Realtime MonitoringKinesisPublic API
  17. 17. 1. Deserialize event batch 2. Apply changes to application properties 3. Get current device and application properties 4. Get known facts about sending device 5. Emit to each enriched event to Kinesis Collection Kinesis Enrichment
  18. 18. Kinesis SQS Failover Kinesis S3 Elasticsearch ? S3 Backups to HDFS Enricher Data Warehouse Forwarder Idempotence Ariel Realtime Idempotence Aggregation Idempotence
  19. 19. Now we have a stream of well- described, denormalized event facts.
  20. 20. Pipeline to HDFS • Partitioned by event name and game, buffered in-memory and written to S3 • Picked up every hour by Spark job • Converts to Parquet, loaded to HDFS
  21. 21. A closer look at Ariel
  22. 22. Dashboards Alarms
  23. 23. Ariel Goals • Low time-to-visibility • Easy configuration • Low cost per configured metric
  24. 24. Configuration
  25. 25. Live Metrics (Ariel) Enriched Event Data name: game_end time: 2015-07-15 10:00:00.000 UTC _devices_per_turn: 1.0 event_id: 12345 device_token: AAAA user_id: 100 name: game_end time: 2015-07-15 10:01:00.000 UTC _devices_per_turn: 14.1 event_id: 12346 device_token: BBBB user_id: 100 name: Cheating Games predicate: _devices_per_turn > 1.5 target: event_id type: DISTINCT id: 1 name: Cheating Players predicate: _devices_per_turn > 1.5 target: user_id type: DISTINCT id: 2 name: game_end time: 2015-07-15 10:01:00.000 UTC _devices_per_turn: 14.1 event_id: 12347 device_token: BBBB user_id: 100 PFADD /m/1/2015-07-15-10-00 12346 PFADD /m/1/2015-07-15-10-00 123467 PFADD /m/2/2015-07-15-10-00 BBBB PFADD /m/2/2015-07-15-10-00 BBBB PFCOUNT /m/1/2015-07-15-10-00 2 PFCOUNT /m/2/2015-07-15-10-00 1 Configured Metrics Collector
  26. 26. HyperLogLog • High-level algorithm (four bullet-point version stolen from my colleague, Cristian) • b bits of the hashed function is used as an index pointer (redis uses b = 14, i.e. m = 16384 registers) • The rest of the hash is inspected for the longest run of zeroes we can encounter (N) • The register pointed by the index is replaced with max(currentValue, N + 1) • An estimator function is used to calculate the approximated cardinality http://content.research.neustar.biz/blog/hll.html
  27. 27. Live Metrics (Ariel) Enriched Event Data name: game_end time: 2015-07-15 10:00:00.000 UTC _devices_per_turn: 1.0 event_id: 12345 device_token: AAAA user_id: 100 name: game_end time: 2015-07-15 10:01:00.000 UTC _devices_per_turn: 14.1 event_id: 12346 device_token: BBBB user_id: 100 name: Cheating Games predicate: _devices_per_turn > 1.5 target: event_id type: DISTINCT id: 1 name: Cheating Players predicate: _devices_per_turn > 1.5 target: user_id type: DISTINCT id: 2 name: game_end time: 2015-07-15 10:01:00.000 UTC _devices_per_turn: 14.1 event_id: 12347 device_token: BBBB user_id: 100 PFADD /m/1/2015-07-15-10-00 12346 PFADD /m/1/2015-07-15-10-00 123467 PFADD /m/2/2015-07-15-10-00 BBBB PFADD /m/2/2015-07-15-10-00 BBBB PFCOUNT /m/1/2015-07-15-10-00 2 PFCOUNT /m/2/2015-07-15-10-00 1 Configured Metrics We can count different things Collector
  28. 28. Kinesis Aggregation Ariel PFCOUNT Are installs anomalous? Collector Idempotence PFADD Web Workers
  29. 29. Pipeline Delay • Pipelines back up • Dashboards get outdated • Alarms fire!
  30. 30. Alarm Clocks • Push timestamp of current events to per-game pub/sub channel • Worker takes 99th percentile age of last N events per title as delay • Use that time for alarm calculations • Overlay delays on dashboards
  31. 31. Ariel, now with clocks Event ClockKinesis Aggregation PFCOUNT Are installs anomalous? Collector Idempotence PFADD Web Workers
  32. 32. Ariel 1.0 • ~30K metrics configured • Aggregation into 30-minute buckets • 12 kilobytes per HLL set (plus overhead)
  33. 33. Challenges • Dataset size. RedisLabs non-cluster max = 100GB • Packet/s limits: 250K in EC2-Classic • Alarm granularity
  34. 34. Hybrid Datastore: Requirements • Need to keep HLL sets to count distinct • Redis is relatively finite • HLL outside of Redis is messy
  35. 35. Hybrid Datastore: Plan • Move older HLL sets to DynamoDB • They’re just strings! • Cache reports aggressively • Fetch backing HLL data from DynamoDB as needed on web layer, merge using on-instance Redis
  36. 36. Ariel, now with hybrid datastore DynamoDB Report Caches Old Data Migration Event Clock Kinesis Aggregation PFCOUNT Are installs anomalous? Collector Idempotence PFADD Web Workers Merge Scratchpad
  37. 37. Much less memory…
  38. 38. Redis Roles • Idempotence • Configuration Caching • Aggregation • Clock • Scratchpad for merges • Cache of reports • Staging of DWH extracts
  39. 39. Other Considerations • Multitenancy. We run parallel stacks and give games an assigned affinity, to insulate from pipeline delays • Backfill. System is forward-looking only; can replay Kinesis backups to backfill, or backfill from warehouse
  40. 40. Why Not _____? • Druid • Flink • InfluxDB • RethinkDB
  41. 41. Thanks! Questions? scopely.com/jobs @ajlyon avram@scopely.com github.com/avram

×