Dive into Streams with Brooklin
Celia Kung
LinkedIn
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
linkedin-streams-brooklin/
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Outline
Background
Scenarios
Application Use Cases
Architecture
Current and Future
Background
Nearline Applications
• Require near real-time response
• Thousands of applications at LinkedIn
○ E.g. Live search indices, Notifications
Nearline Applications
• Require continuous, low-latency access to data
○ Data could be spread across multiple database systems
• Need an easy way to move data to applications
○ App devs should focus on event processing and not on
data access
Heterogeneous Data Systems
Espresso
(LinkedIn’s
document store)
Microsoft
EventHubs
Building the Right Infrastructure
• Build separate, specialized
solutions to stream data from
and to each different system?
○ Slows down development
○ Hard to manage!
Microsoft
EventHubs
...
Streaming
System C
Streaming
System D
Streaming
System A
Streaming
System B
...
Nearline
Applications
Need a centralized, managed, and extensible service
to continuously deliver data in near real-time
Brooklin
Brooklin
• Streams are dynamically
provisioned and individually
configured
• Extensible: Plug-in support for
additional sources/destinations
• Streaming data pipeline service
• Propagates data from many source
types to many destination types
• Multitenant: Can run several
thousand streams simultaneously
Kinesis
EventHubs
Kafka
Destinations
Applications
Kinesis
EventHubs
Kafka
Databases
Messaging Systems
Sources
Espresso
Pluggable Sources & Destinations
Scenarios
Change Data Capture
Scenario 1:
Capturing Live Updates
1. Member updates her profile to
reflect her recent job change
Capturing Live Updates
2. LinkedIn wants to inform her
colleagues of this change
Capturing Live Updates
Member DB
Updates
News Feed
Service
query
Capturing Live Updates
Member DB
Updates
News Feed
Service
Search Indices
Service
query
query
Capturing Live Updates
Member DB
Updates
News Feed
Service
query
Notifications
Service
Standardization
Service
...
query
query
query
Search Indices
Service
Capturing Live Updates
Member DB
Updates
News Feed
Service
query
Notifications
Service
Standardization
Service
...
query
query
query
Search Indices
Service
Capturing Live Updates
Member DB
Updates
News Feed
Service
query
Notifications
Service
Standardization
Service
...
query
query
query
Search Indices
Service
• Isolation: Applications are decoupled
from the sources and don’t compete
for resources with online queries
• Applications can be at different
points in change timelines
• Brooklin can stream database
updates to a change stream
• Data processing applications
consume from change streams
Change Data Capture (CDC)
Change Data Capture (CDC)
Messaging System
News Feed
Service
Notifications
Service
Standardization
Service
Search Indices
Service
Member DB
Updates
Streaming Bridge
Scenario 2:
● Across…
○ cloud services
○ clusters
○ data centers
Stream Data from X to Y
Streaming Bridge
• Data pipe to move data between
different environments
• Enforce policy: Encryption,
Obfuscation, Data formats
● Aggregating data from all data centers into a centralized place
● Moving data between LinkedIn and external cloud services (e.g. Azure)
● Brooklin has replaced Kafka MirrorMaker (KMM) at LinkedIn
○ Issues with KMM: didn’t scale well, difficult to operate and manage, poor
failure isolation
Mirroring Kafka Data
Use Brooklin to Mirror Kafka Data
DestinationsSources
Messaging systems
Microsoft
EventHubs
Messaging systems
Microsoft
EventHubs
Databases Databases
Kafka
MirrorMaker
Topology
Datacenter B
aggregate
tracking
tracking
Datacenter A
aggregate
tracking
tracking
KMM
aggregate
metrics
metrics
aggregate
metrics
metrics
Datacenter C
aggregate
tracking
tracking
aggregate
metrics
metrics
...
KMM KMM
KMM KMM KMM
KMM KMM KMM KMM KMM KMM
KMM KMM KMM KMM KMM KMM
... ... ...
Brooklin Kafka Mirroring Topology
Datacenter A
aggregate
tracking
tracking
Brooklin
metrics
aggregate
metrics
Datacenter B
aggregate
tracking
tracking metrics
aggregate
metrics
Datacenter C
aggregate
tracking
tracking metrics
aggregate
metrics
...Brooklin Brooklin
Brooklin Kafka Mirroring
● Optimized for stability and operability
● Manually pause and resume mirroring at every level
○ Entire pipeline, topic, topic-partition
● Can auto-pause partitions facing mirroring issues
○ Auto-resumes the partitions after a configurable duration
● Flow of messages from other partitions is unaffected
Application Use Cases
Security
Cache
Application Use Cases
Security
Cache Search Indices
Application Use Cases
Security
Cache Search Indices ETL or Data
warehouse
Application Use Cases
Security
Cache Search Indices ETL or Data
warehouse
Materialized Views or
Replication
Application Use Cases
Security
Cache Search Indices ETL or Data
warehouse
Materialized Views or
Replication
Repartitioning
Application Use Cases
Adjunct Data
Application Use Cases
BridgeAdjunct Data
Application Use Cases
Bridge Serde, Encryption,
Policy
Adjunct Data
Application Use Cases
Bridge Serde, Encryption,
Policy
Standardization,
Notifications …
Adjunct Data
Application Use Cases
Architecture
Stream updates made to Member Profile
Example:
Capturing Live Updates
Member DB
Updates
News Feed
Service
● Scenario: Stream Espresso Member Profile updates into Kafka
○ Source Database: Espresso (Member DB, Profile table)
○ Destination: Kafka
○ Application: News Feed service
Example
• Describes the data pipeline
• Mapping between source and
destination
• Holds the configuration for the pipeline
Datastream
Name: MemberProfileChangeStream
Source: MemberDB/ProfileTable
Type: Espresso
Partitions: 8
Destination: ProfileTopic
Type: Kafka
Partitions: 8
Metadata:
Application: News Feed service
Owner: newsfeed@linkedin.com
1. Client makes REST call to create datastream
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
create POST /datastream
News Feed
service
2. Create request goes to any Brooklin instance
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
3. Datastream is written to ZooKeeper
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
4. Leader coordinator is notified of new datastream
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
5. Leader coordinator calculates work distribution
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
6. Leader coordinator writes the assignments to ZK
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
7. ZooKeeper is used to communicate the assignments
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
8. Coordinators hand task assignments to consumers
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
9. Consumers start streaming data from the source
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
10. Consumers propagate data to producers
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
11. Producers write data to the destination
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
12. App consumes from Kafka
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
13. Destinations can be shared by apps
Brooklin Client
Load Balancer
ZooKeeper
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
(Leader)
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Brooklin Instance
Datastream
Management
Service (DMS) Espresso Consumer
Coordinator
Kafka Producer
Member DB
News Feed
service
News Feed
service
News Feed
service
Brooklin Architecture
ZooKeeper
Brooklin Instance
Brooklin Instance
Coordinator
Datastream
Management
Service
(DMS)
ConsumerA
ConsumerB
ProducerX
ProducerY
ProducerZ
Brooklin Instance
Datastream
Management
Service
(DMS)
ConsumerA
ProducerX
ConsumerB
ProducerY
ProducerZ
Coordinator
(Leader)
Brooklin Instance
Coordinator
Datastream
Management
Service
(DMS)
ConsumerA
ConsumerB
ProducerX
ProducerY
ProducerZ
Brooklin Instance
Current & Future
Current
• Consumers:
○ Espresso
○ Oracle
○ Kafka
○ EventHubs
○ Kinesis
• Producers:
○ Kafka
○ EventHubs
• APIs are standardized to support additional
sources and destinations
• Multitenant: Can power thousands of
datastreams across several source and
destination types
• Guarantees: At-least-once delivery, order is
maintained at partition level
• Kafka mirroring improvements: finer control
of pipelines (pause/auto-pause partitions),
improved latency with flushless-produce
mode
Sources & Destinations Features
38Bmessages/day
2K+datastreams
1K+unique sources
200+applications
Brooklin in Production
Brooklin streams with Espresso, Oracle, or EventHubs as the source
2T+messages/day
200+datastreams
10K+topics
Brooklin in Production
Brooklin streams mirroring Kafka data
Future
• Consumers:
○ MySQL
○ Cosmos DB
○ Azure SQL
• Producers:
○ Azure Blob storage
○ Kinesis
○ Cosmos DB
○ Azure SQL
○ Couchbase
Sources & Destinations Open Source
• Plan to open source Brooklin in
2019 (soon!)
Optimizations
• Brooklin auto-scaling
• Passthrough compression
• Read optimizations: Read once,
Write multiple
Thank you
Questions?
Celia Kung
: ckung@linkedin.com
: /in/celiakkung/
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
linkedin-streams-brooklin/

A Dive into Streams @LinkedIn with Brooklin

  • 1.
    Dive into Streamswith Brooklin Celia Kung LinkedIn
  • 2.
    InfoQ.com: News &Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ linkedin-streams-brooklin/
  • 3.
    Presented at QConNew York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4.
  • 5.
  • 6.
    Nearline Applications • Requirenear real-time response • Thousands of applications at LinkedIn ○ E.g. Live search indices, Notifications
  • 7.
    Nearline Applications • Requirecontinuous, low-latency access to data ○ Data could be spread across multiple database systems • Need an easy way to move data to applications ○ App devs should focus on event processing and not on data access
  • 8.
  • 9.
    Building the RightInfrastructure • Build separate, specialized solutions to stream data from and to each different system? ○ Slows down development ○ Hard to manage! Microsoft EventHubs ... Streaming System C Streaming System D Streaming System A Streaming System B ... Nearline Applications
  • 10.
    Need a centralized,managed, and extensible service to continuously deliver data in near real-time
  • 11.
  • 12.
    Brooklin • Streams aredynamically provisioned and individually configured • Extensible: Plug-in support for additional sources/destinations • Streaming data pipeline service • Propagates data from many source types to many destination types • Multitenant: Can run several thousand streams simultaneously
  • 13.
  • 14.
  • 15.
  • 16.
    Capturing Live Updates 1.Member updates her profile to reflect her recent job change
  • 17.
    Capturing Live Updates 2.LinkedIn wants to inform her colleagues of this change
  • 18.
    Capturing Live Updates MemberDB Updates News Feed Service query
  • 19.
    Capturing Live Updates MemberDB Updates News Feed Service Search Indices Service query query
  • 20.
    Capturing Live Updates MemberDB Updates News Feed Service query Notifications Service Standardization Service ... query query query Search Indices Service
  • 21.
    Capturing Live Updates MemberDB Updates News Feed Service query Notifications Service Standardization Service ... query query query Search Indices Service
  • 22.
    Capturing Live Updates MemberDB Updates News Feed Service query Notifications Service Standardization Service ... query query query Search Indices Service
  • 23.
    • Isolation: Applicationsare decoupled from the sources and don’t compete for resources with online queries • Applications can be at different points in change timelines • Brooklin can stream database updates to a change stream • Data processing applications consume from change streams Change Data Capture (CDC)
  • 24.
    Change Data Capture(CDC) Messaging System News Feed Service Notifications Service Standardization Service Search Indices Service Member DB Updates
  • 25.
  • 26.
    ● Across… ○ cloudservices ○ clusters ○ data centers Stream Data from X to Y
  • 27.
    Streaming Bridge • Datapipe to move data between different environments • Enforce policy: Encryption, Obfuscation, Data formats
  • 28.
    ● Aggregating datafrom all data centers into a centralized place ● Moving data between LinkedIn and external cloud services (e.g. Azure) ● Brooklin has replaced Kafka MirrorMaker (KMM) at LinkedIn ○ Issues with KMM: didn’t scale well, difficult to operate and manage, poor failure isolation Mirroring Kafka Data
  • 29.
    Use Brooklin toMirror Kafka Data DestinationsSources Messaging systems Microsoft EventHubs Messaging systems Microsoft EventHubs Databases Databases
  • 30.
    Kafka MirrorMaker Topology Datacenter B aggregate tracking tracking Datacenter A aggregate tracking tracking KMM aggregate metrics metrics aggregate metrics metrics DatacenterC aggregate tracking tracking aggregate metrics metrics ... KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM ... ... ...
  • 31.
    Brooklin Kafka MirroringTopology Datacenter A aggregate tracking tracking Brooklin metrics aggregate metrics Datacenter B aggregate tracking tracking metrics aggregate metrics Datacenter C aggregate tracking tracking metrics aggregate metrics ...Brooklin Brooklin
  • 32.
    Brooklin Kafka Mirroring ●Optimized for stability and operability ● Manually pause and resume mirroring at every level ○ Entire pipeline, topic, topic-partition ● Can auto-pause partitions facing mirroring issues ○ Auto-resumes the partitions after a configurable duration ● Flow of messages from other partitions is unaffected
  • 33.
  • 34.
  • 35.
  • 36.
    Security Cache Search IndicesETL or Data warehouse Application Use Cases
  • 37.
    Security Cache Search IndicesETL or Data warehouse Materialized Views or Replication Application Use Cases
  • 38.
    Security Cache Search IndicesETL or Data warehouse Materialized Views or Replication Repartitioning Application Use Cases
  • 39.
  • 40.
  • 41.
    Bridge Serde, Encryption, Policy AdjunctData Application Use Cases
  • 42.
  • 43.
  • 44.
    Stream updates madeto Member Profile Example:
  • 45.
    Capturing Live Updates MemberDB Updates News Feed Service
  • 46.
    ● Scenario: StreamEspresso Member Profile updates into Kafka ○ Source Database: Espresso (Member DB, Profile table) ○ Destination: Kafka ○ Application: News Feed service Example
  • 47.
    • Describes thedata pipeline • Mapping between source and destination • Holds the configuration for the pipeline Datastream Name: MemberProfileChangeStream Source: MemberDB/ProfileTable Type: Espresso Partitions: 8 Destination: ProfileTopic Type: Kafka Partitions: 8 Metadata: Application: News Feed service Owner: newsfeed@linkedin.com
  • 48.
    1. Client makesREST call to create datastream Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB create POST /datastream News Feed service
  • 49.
    2. Create requestgoes to any Brooklin instance Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 50.
    3. Datastream iswritten to ZooKeeper Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 51.
    4. Leader coordinatoris notified of new datastream Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 52.
    5. Leader coordinatorcalculates work distribution Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 53.
    6. Leader coordinatorwrites the assignments to ZK Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 54.
    7. ZooKeeper isused to communicate the assignments Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 55.
    8. Coordinators handtask assignments to consumers Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 56.
    9. Consumers startstreaming data from the source Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 57.
    10. Consumers propagatedata to producers Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 58.
    11. Producers writedata to the destination Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 59.
    12. App consumesfrom Kafka Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service
  • 60.
    13. Destinations canbe shared by apps Brooklin Client Load Balancer ZooKeeper Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Member DB News Feed service News Feed service News Feed service
  • 61.
    Brooklin Architecture ZooKeeper Brooklin Instance BrooklinInstance Coordinator Datastream Management Service (DMS) ConsumerA ConsumerB ProducerX ProducerY ProducerZ Brooklin Instance Datastream Management Service (DMS) ConsumerA ProducerX ConsumerB ProducerY ProducerZ Coordinator (Leader) Brooklin Instance Coordinator Datastream Management Service (DMS) ConsumerA ConsumerB ProducerX ProducerY ProducerZ Brooklin Instance
  • 62.
  • 63.
    Current • Consumers: ○ Espresso ○Oracle ○ Kafka ○ EventHubs ○ Kinesis • Producers: ○ Kafka ○ EventHubs • APIs are standardized to support additional sources and destinations • Multitenant: Can power thousands of datastreams across several source and destination types • Guarantees: At-least-once delivery, order is maintained at partition level • Kafka mirroring improvements: finer control of pipelines (pause/auto-pause partitions), improved latency with flushless-produce mode Sources & Destinations Features
  • 64.
    38Bmessages/day 2K+datastreams 1K+unique sources 200+applications Brooklin inProduction Brooklin streams with Espresso, Oracle, or EventHubs as the source
  • 65.
  • 66.
    Future • Consumers: ○ MySQL ○Cosmos DB ○ Azure SQL • Producers: ○ Azure Blob storage ○ Kinesis ○ Cosmos DB ○ Azure SQL ○ Couchbase Sources & Destinations Open Source • Plan to open source Brooklin in 2019 (soon!) Optimizations • Brooklin auto-scaling • Passthrough compression • Read optimizations: Read once, Write multiple
  • 67.
  • 68.
  • 70.
    Watch the videowith slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ linkedin-streams-brooklin/