Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Air Traffic Controller
Using Samza to manage communications with members
By: Cameron Lee and Shubhanshu Nagar
Outline
Problem Statement
How ATC Solves it
Implementation
Interesting Features
What problem are we trying to solve?
In the past, LinkedIn provided a poor communications experience to some of
its member...
How does ATC think about
creating
a delightful member experience?
5 Rights
Right member
Right message
Useful to member
Shouldn’t have seen it before
Right frequency
Right channel
Filtering
Don’t send stale messages
Don’t send spammy messages
Don’t send duplicate messages
Aggregation and Capping
Don’t flood me. Consolidate if you have too much to say.
Channel Selection
“Don’t blast all channels at the same time”
Delivery-time Optimization
● Hold on to a message and deliver it at the right
moment.
● Ex: Don’t buzz my phone at 2 AM.
●...
How did we build this
thing?
Requirements for ATC
● Highly-scalable
● Nearline (but close to real-time!)
● Ingest data from many sources
● Persist some...
What’s ATC built on?
Ecosystem
Message
Delivery Service
Offline
apps
Online apps
ATCRelevance
scores
User action
data
Persistence: RocksDB
Out-of-the-box storage layer
Write-optimized for high performance on SSDs.
Changelogs provide fault t...
ATC
Pipeline
instance 1
ATC
Repartitioner
Re-partitioning of events
External
services
ATC
Pipeline
instance n
ATC
task
External
Requests
Channel
Selection
Message Delivery
Service
Scheduler
Filtering
Message Data
Tree
Generation
Agg...
Implementation Details
Streaming Technologies
Kafka: publish-subscribe messaging system
Used to send input to ATC to trigger communications
Many ...
Host affinity
By default, whenever a Samza app is deployed, the task instances can be
moved to any host in the cluster, re...
Host affinity (continued)
Host affinity is a Samza feature which allows us to deploy task instances
back to the same hosts...
Multiple datacenters
Samza does not currently support replicating persistent application state
(e.g. RocksDB) across multi...
Multiple datacenters
We rely on the input streams to replicate the main input so that we can do
processing and build up st...
Multiple datacenters (continued)
Deployments
When we deploy changes to ATC, we can deploy to a single datacenter at a
time in order to test new versions on...
Store migrations
In some cases, we need to migrate our system to use a new instance of a
store.
For example, when support ...
Personalization through relevance
We work closely with a relevance team in order to make better decisions
about the commun...
Interesting features
Remote calls
Some data is not available on a Kafka stream in a pragmatic way
We make REST requests to fetch that data
Done...
Remote calls - Efficiently
Use ParSeq
Framework to write asynchronous code in Java
Open Sourced
ParSeq uses a thread pool ...
Real-time Processing
Some messages require real-time latency
Tuned Kafka’s batching configuration to achieve sub-second of...
Scheduler
Scheduler RocksDB
Scheduled requests
(from aggregation,
follow-up, etc.)
Window task
(periodic)
Other
processing...
Questions?
Upcoming SlideShare
Loading in …5
×

Air traffic controller - Streams Processing meetup

1,819 views

Published on

Air traffic controller - Streams Processing meetup

Published in: Engineering
  • Looking For A Job? Positions available now. FT or PT. $10-$30/hr. No exp required.  http://ishbv.com/easywriter/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Air traffic controller - Streams Processing meetup

  1. 1. Air Traffic Controller Using Samza to manage communications with members By: Cameron Lee and Shubhanshu Nagar
  2. 2. Outline Problem Statement How ATC Solves it Implementation Interesting Features
  3. 3. What problem are we trying to solve? In the past, LinkedIn provided a poor communications experience to some of its members. Too much email, low quality email, fired on multiple channels at once Our goal was to build a system which could apply some common functionality across many different communication types and use cases in order to improve the member experience. Handle thousands of communications per second Good understanding of state of members on the site in near-real-time
  4. 4. How does ATC think about creating a delightful member experience?
  5. 5. 5 Rights Right member Right message Useful to member Shouldn’t have seen it before Right frequency Right channel
  6. 6. Filtering Don’t send stale messages Don’t send spammy messages Don’t send duplicate messages
  7. 7. Aggregation and Capping Don’t flood me. Consolidate if you have too much to say.
  8. 8. Channel Selection “Don’t blast all channels at the same time”
  9. 9. Delivery-time Optimization ● Hold on to a message and deliver it at the right moment. ● Ex: Don’t buzz my phone at 2 AM. ● I like to read my daily digests every day after work.
  10. 10. How did we build this thing?
  11. 11. Requirements for ATC ● Highly-scalable ● Nearline (but close to real-time!) ● Ingest data from many sources ● Persist some data, but most needs are low TTL
  12. 12. What’s ATC built on?
  13. 13. Ecosystem Message Delivery Service Offline apps Online apps ATCRelevance scores User action data
  14. 14. Persistence: RocksDB Out-of-the-box storage layer Write-optimized for high performance on SSDs. Changelogs provide fault tolerance and bootstrapping capabilities
  15. 15. ATC Pipeline instance 1 ATC Repartitioner Re-partitioning of events External services ATC Pipeline instance n
  16. 16. ATC task External Requests Channel Selection Message Delivery Service Scheduler Filtering Message Data Tree Generation Aggregation & Capping Hipster Stream Processing
  17. 17. Implementation Details
  18. 18. Streaming Technologies Kafka: publish-subscribe messaging system Used to send input to ATC to trigger communications Many actions and signals in the LinkedIn ecosystem are tracked in kafka events. We can consume these signals to better understand the state of the ecosystem. Databus: change capture system for databases Produces an event whenever an entry in a database changes
  19. 19. Host affinity By default, whenever a Samza app is deployed, the task instances can be moved to any host in the cluster, regardless of where the instances were previously deployed. If there was any state saved (e.g. RocksDB), then the new instances would have to rebuild that state off of the changelog. This bootstrapping can take some time depending on the amount of data to reload. Task instances can’t process new input until bootstrapping is complete. We have some use cases which can’t be delayed for the amount of time it
  20. 20. Host affinity (continued) Host affinity is a Samza feature which allows us to deploy task instances back to the same hosts from the previous deployment, so state does not need to be reloaded. In case of failures for individual instances, Samza can fallback to moving the instance elsewhere and bootstrapping off of the changelog.
  21. 21. Multiple datacenters Samza does not currently support replicating persistent application state (e.g. RocksDB) across multiple clusters which are running the same app. We need ATC to run in multiple datacenters for redundancy. We need to have state in each datacenter so that if we have to move processing between datacenters, then we can continue to properly handle input.
  22. 22. Multiple datacenters We rely on the input streams to replicate the main input so that we can do processing and build up state in all datacenters. The side effects (trigger the actual email send) then will only get emitted by one of the datacenters. We can dynamically choose where side effects are triggered.
  23. 23. Multiple datacenters (continued)
  24. 24. Deployments When we deploy changes to ATC, we can deploy to a single datacenter at a time in order to test new versions on only a fraction of traffic. In some cases, we shift all side effects out of a datacenter to do an upgrade. Since we still process all input, we can validate almost all of our functionality and ensure performance doesn’t take an unexpected hit.
  25. 25. Store migrations In some cases, we need to migrate our system to use a new instance of a store. For example, when support was added to use RocksDB TTL, we needed to migrate some of our stores. Since we only needed the last X days of data, we could use the following strategy for the migration: Write to both the old and new store for X days, but continue to read from the old store. After X days, read from the new store, but continue writing both stores so we could fall back
  26. 26. Personalization through relevance We work closely with a relevance team in order to make better decisions about the communications we send out. e.g. channel selection, delivery time, aggregation thresholds Every day, scores for different decisions are computed offline (Hadoop) by the relevance team. Those scores are pushed to ATC through Kafka, and then ATC stores the scores in RocksDB. Scores are generated for each member, so we can personalize the experience.
  27. 27. Interesting features
  28. 28. Remote calls Some data is not available on a Kafka stream in a pragmatic way We make REST requests to fetch that data Done at the beginning of pipeline Extract event Make remote calls and decorate event Process decorated event
  29. 29. Remote calls - Efficiently Use ParSeq Framework to write asynchronous code in Java Open Sourced ParSeq uses a thread pool for making remote calls Rest of processing happens serially Checkpointing handled by application
  30. 30. Real-time Processing Some messages require real-time latency Tuned Kafka’s batching configuration to achieve sub-second of pre-ATC latency Can be tuned even more aggressively! ATC/Samza processes most events in 2-3 ms No remote calls for these messages
  31. 31. Scheduler Scheduler RocksDB Scheduled requests (from aggregation, follow-up, etc.) Window task (periodic) Other processing Message Delivery Service
  32. 32. Questions?

×