Successfully reported this slideshow.
Your SlideShare is downloading. ×

Air traffic controller - Streams Processing meetup

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 32 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Air traffic controller - Streams Processing meetup (20)

Advertisement

Recently uploaded (20)

Air traffic controller - Streams Processing meetup

  1. 1. Air Traffic Controller Using Samza to manage communications with members By: Cameron Lee and Shubhanshu Nagar
  2. 2. Outline Problem Statement How ATC Solves it Implementation Interesting Features
  3. 3. What problem are we trying to solve? In the past, LinkedIn provided a poor communications experience to some of its members. Too much email, low quality email, fired on multiple channels at once Our goal was to build a system which could apply some common functionality across many different communication types and use cases in order to improve the member experience. Handle thousands of communications per second Good understanding of state of members on the site in near-real-time
  4. 4. How does ATC think about creating a delightful member experience?
  5. 5. 5 Rights Right member Right message Useful to member Shouldn’t have seen it before Right frequency Right channel
  6. 6. Filtering Don’t send stale messages Don’t send spammy messages Don’t send duplicate messages
  7. 7. Aggregation and Capping Don’t flood me. Consolidate if you have too much to say.
  8. 8. Channel Selection “Don’t blast all channels at the same time”
  9. 9. Delivery-time Optimization ● Hold on to a message and deliver it at the right moment. ● Ex: Don’t buzz my phone at 2 AM. ● I like to read my daily digests every day after work.
  10. 10. How did we build this thing?
  11. 11. Requirements for ATC ● Highly-scalable ● Nearline (but close to real-time!) ● Ingest data from many sources ● Persist some data, but most needs are low TTL
  12. 12. What’s ATC built on?
  13. 13. Ecosystem Message Delivery Service Offline apps Online apps ATCRelevance scores User action data
  14. 14. Persistence: RocksDB Out-of-the-box storage layer Write-optimized for high performance on SSDs. Changelogs provide fault tolerance and bootstrapping capabilities
  15. 15. ATC Pipeline instance 1 ATC Repartitioner Re-partitioning of events External services ATC Pipeline instance n
  16. 16. ATC task External Requests Channel Selection Message Delivery Service Scheduler Filtering Message Data Tree Generation Aggregation & Capping Hipster Stream Processing
  17. 17. Implementation Details
  18. 18. Streaming Technologies Kafka: publish-subscribe messaging system Used to send input to ATC to trigger communications Many actions and signals in the LinkedIn ecosystem are tracked in kafka events. We can consume these signals to better understand the state of the ecosystem. Databus: change capture system for databases Produces an event whenever an entry in a database changes
  19. 19. Host affinity By default, whenever a Samza app is deployed, the task instances can be moved to any host in the cluster, regardless of where the instances were previously deployed. If there was any state saved (e.g. RocksDB), then the new instances would have to rebuild that state off of the changelog. This bootstrapping can take some time depending on the amount of data to reload. Task instances can’t process new input until bootstrapping is complete. We have some use cases which can’t be delayed for the amount of time it
  20. 20. Host affinity (continued) Host affinity is a Samza feature which allows us to deploy task instances back to the same hosts from the previous deployment, so state does not need to be reloaded. In case of failures for individual instances, Samza can fallback to moving the instance elsewhere and bootstrapping off of the changelog.
  21. 21. Multiple datacenters Samza does not currently support replicating persistent application state (e.g. RocksDB) across multiple clusters which are running the same app. We need ATC to run in multiple datacenters for redundancy. We need to have state in each datacenter so that if we have to move processing between datacenters, then we can continue to properly handle input.
  22. 22. Multiple datacenters We rely on the input streams to replicate the main input so that we can do processing and build up state in all datacenters. The side effects (trigger the actual email send) then will only get emitted by one of the datacenters. We can dynamically choose where side effects are triggered.
  23. 23. Multiple datacenters (continued)
  24. 24. Deployments When we deploy changes to ATC, we can deploy to a single datacenter at a time in order to test new versions on only a fraction of traffic. In some cases, we shift all side effects out of a datacenter to do an upgrade. Since we still process all input, we can validate almost all of our functionality and ensure performance doesn’t take an unexpected hit.
  25. 25. Store migrations In some cases, we need to migrate our system to use a new instance of a store. For example, when support was added to use RocksDB TTL, we needed to migrate some of our stores. Since we only needed the last X days of data, we could use the following strategy for the migration: Write to both the old and new store for X days, but continue to read from the old store. After X days, read from the new store, but continue writing both stores so we could fall back
  26. 26. Personalization through relevance We work closely with a relevance team in order to make better decisions about the communications we send out. e.g. channel selection, delivery time, aggregation thresholds Every day, scores for different decisions are computed offline (Hadoop) by the relevance team. Those scores are pushed to ATC through Kafka, and then ATC stores the scores in RocksDB. Scores are generated for each member, so we can personalize the experience.
  27. 27. Interesting features
  28. 28. Remote calls Some data is not available on a Kafka stream in a pragmatic way We make REST requests to fetch that data Done at the beginning of pipeline Extract event Make remote calls and decorate event Process decorated event
  29. 29. Remote calls - Efficiently Use ParSeq Framework to write asynchronous code in Java Open Sourced ParSeq uses a thread pool for making remote calls Rest of processing happens serially Checkpointing handled by application
  30. 30. Real-time Processing Some messages require real-time latency Tuned Kafka’s batching configuration to achieve sub-second of pre-ATC latency Can be tuned even more aggressively! ATC/Samza processes most events in 2-3 ms No remote calls for these messages
  31. 31. Scheduler Scheduler RocksDB Scheduled requests (from aggregation, follow-up, etc.) Window task (periodic) Other processing Message Delivery Service
  32. 32. Questions?

×