Modern Requirements for Messaging and Queuing
Evaluating Streaming
Data Solutions
Lots of technology options
How do you choose the right solution?
Apache RocketMQ
Google Pub-Sub
Fundamental requirements
Easy application
development
Make architects and
developers productive
Simple to deploy
and manage
Eliminate overhead and
complexity in production
Adaptable for
future needs
Handle evolving usage
and demand without
complexity
Easy development
Obstacles
• Limited, inconsistent language
support
• Different solutions or add-ons for
different models and scenarios
• Require heavyweight client
applications
Enablers
• Broad, mature language support
• Unified solution handling multiple
models and scenarios
• Smart broker that reduces
burdens on applications
Simple deployment & operation
Performance
Low latency & high-throughput for
producers and consumers
Resiliency
Automatic regulation and recovery
from failures
Data protection
No data loss or impairment during
failures and maintenance
Manageability
Simple deployment, management,
monitoring, and troubleshooting

Meeting future needs
Scalability
Fast, nondisruptive
scaling to continue
meeting SLAs
Multi-tenancy
Provide isolation &
resource management to
support additional
workloads
Expansion
Multi-datacenter, multi-
geo replication and
coordination
Compatibility
Connectivity and stable
interfaces to continue
supporting applications
• Pub-sub messaging and queueing
• Designed for low latency, high
throughput, ease of use
• Developed and deployed in production at
Yahoo!
• Open source (incubating at Apache)
Apache Pulsar
Multi-layer, scalable architecture
• Independent layers for serving (brokers)
and storage (bookies)
• Storage layer built on Apache
BookKeeper
Pulsar architecture
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Storage (Apache BookKeeper)
Apache Pulsar
Producer Consumer
Building applications with Pulsar
Flexibility
• Pub-sub messaging + message
queuing
• Java, C++, Python, WebSocket
API clients
• At least once + exactly once
• Apache Kafka compatibility
Simple development
• Intelligent client libraries
handle service discovery,
reconnection, batching, and
more
• Automated cursor
management
Easy iteration
• Easily test in production
environment
• Deploy standalone, bare
metal, or cloud
Unified model
Messaging +
queuing in
one system
• Scale-out architecture
• Add new brokers & bookies at any time
• No data redistribution needed
• Performance isolation
• I/O isolation between writes and reads
• Soft and hard isolation
Performance
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Storage (Apache BookKeeper)
Apache Pulsar
Producer Consumer
Broker Broker
Storage in Pulsar
Legacy Architectures
● Storage co-resident with processing
● Partition-centric
● Cumbersome to scale--data
redistribution, performance impact
Partition 1
Partition 1
(copy)
Broker
Partition 1
(copy)
Partition 1Logical
View
Physical
Storage
Pulsar Architecture
● Storage decoupled from processing
● Partitions stored as segments
● Flexible, easy scalability
Bookie
Segment 1
Segment 2
Segment 3
Bookie
Segment 1
Segment 3
Bookie
Segment 2
Segment 3
Bookie
Segment 2
Segment 1
Broker Broker Broker
Partition (Logical)
Scaling
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Scaling Clusters
• Scale processing and storage
independently
• New bookies ramp up on traffic
quickly
• No data rebalancing
Scaling Datacenters
• Multi-datacenter replication
• Active-active replicas
• Unified logical deployment
Pulsar 1
Pulsar 3
Pulsar 2
Scaling Workloads
• Multi-tenancy
• Resource management
• Storage quotas, flow control, back
pressure, and rate limiting
• Millions of topics per cluster
Broker
Topic
Topic
Topic
Topic
Topic
Resiliency
• Built-in flow control, back pressure, and
rate limiting
• Non-disruptive recovery
Data protection
• Data automatically replicated
• Data committed to storage before
acknowledgement (configurable)
• End-to-end encryption
Resiliency and data protection
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Producer Consumer
• In production for 3+ years at Yahoo
• Powering critical products like:
• Yahoo Mail, Yahoo Finance, Gemini Ads, Flickr and
Sherpa (NoSQL database)
• 80+ tenants
• 2.3 Million topics
• 100 B messages / day
• Full-mesh replication in 8 data-centers
Pulsar example: Yahoo! Usage
Part of the Streamlio unified architecture
Interactive
Querying
Metadata
Management
Operational
Monitoring
Chargeback
Security
Authentication
Quota
Management
Rules
Engine
• Website: https://streaml.io
• Twitter: @streamlio
• Slack: https://learn-streamlio.slack.com
• Try Streamlio: https://streaml.io/docs/getting-started
Learn more
Evaluating Streaming Data Solutions

Evaluating Streaming Data Solutions

  • 1.
    Modern Requirements forMessaging and Queuing Evaluating Streaming Data Solutions
  • 2.
    Lots of technologyoptions How do you choose the right solution? Apache RocketMQ Google Pub-Sub
  • 3.
    Fundamental requirements Easy application development Makearchitects and developers productive Simple to deploy and manage Eliminate overhead and complexity in production Adaptable for future needs Handle evolving usage and demand without complexity
  • 4.
    Easy development Obstacles • Limited,inconsistent language support • Different solutions or add-ons for different models and scenarios • Require heavyweight client applications Enablers • Broad, mature language support • Unified solution handling multiple models and scenarios • Smart broker that reduces burdens on applications
  • 5.
    Simple deployment &operation Performance Low latency & high-throughput for producers and consumers Resiliency Automatic regulation and recovery from failures Data protection No data loss or impairment during failures and maintenance Manageability Simple deployment, management, monitoring, and troubleshooting

  • 6.
    Meeting future needs Scalability Fast,nondisruptive scaling to continue meeting SLAs Multi-tenancy Provide isolation & resource management to support additional workloads Expansion Multi-datacenter, multi- geo replication and coordination Compatibility Connectivity and stable interfaces to continue supporting applications
  • 7.
    • Pub-sub messagingand queueing • Designed for low latency, high throughput, ease of use • Developed and deployed in production at Yahoo! • Open source (incubating at Apache) Apache Pulsar
  • 8.
    Multi-layer, scalable architecture •Independent layers for serving (brokers) and storage (bookies) • Storage layer built on Apache BookKeeper Pulsar architecture Broker Broker Broker Bookie Bookie Bookie Bookie Bookie Storage (Apache BookKeeper) Apache Pulsar Producer Consumer
  • 9.
    Building applications withPulsar Flexibility • Pub-sub messaging + message queuing • Java, C++, Python, WebSocket API clients • At least once + exactly once • Apache Kafka compatibility Simple development • Intelligent client libraries handle service discovery, reconnection, batching, and more • Automated cursor management Easy iteration • Easily test in production environment • Deploy standalone, bare metal, or cloud
  • 10.
  • 11.
    • Scale-out architecture •Add new brokers & bookies at any time • No data redistribution needed • Performance isolation • I/O isolation between writes and reads • Soft and hard isolation Performance Broker Broker Broker Bookie Bookie Bookie Bookie Bookie Storage (Apache BookKeeper) Apache Pulsar Producer Consumer
  • 12.
    Broker Broker Storage inPulsar Legacy Architectures ● Storage co-resident with processing ● Partition-centric ● Cumbersome to scale--data redistribution, performance impact Partition 1 Partition 1 (copy) Broker Partition 1 (copy) Partition 1Logical View Physical Storage Pulsar Architecture ● Storage decoupled from processing ● Partitions stored as segments ● Flexible, easy scalability Bookie Segment 1 Segment 2 Segment 3 Bookie Segment 1 Segment 3 Bookie Segment 2 Segment 3 Bookie Segment 2 Segment 1 Broker Broker Broker Partition (Logical)
  • 13.
    Scaling Broker Broker Broker BookieBookie Bookie Bookie Bookie Scaling Clusters • Scale processing and storage independently • New bookies ramp up on traffic quickly • No data rebalancing Scaling Datacenters • Multi-datacenter replication • Active-active replicas • Unified logical deployment Pulsar 1 Pulsar 3 Pulsar 2 Scaling Workloads • Multi-tenancy • Resource management • Storage quotas, flow control, back pressure, and rate limiting • Millions of topics per cluster Broker Topic Topic Topic Topic Topic
  • 14.
    Resiliency • Built-in flowcontrol, back pressure, and rate limiting • Non-disruptive recovery Data protection • Data automatically replicated • Data committed to storage before acknowledgement (configurable) • End-to-end encryption Resiliency and data protection Broker Broker Broker Bookie Bookie Bookie Bookie Bookie Producer Consumer
  • 15.
    • In productionfor 3+ years at Yahoo • Powering critical products like: • Yahoo Mail, Yahoo Finance, Gemini Ads, Flickr and Sherpa (NoSQL database) • 80+ tenants • 2.3 Million topics • 100 B messages / day • Full-mesh replication in 8 data-centers Pulsar example: Yahoo! Usage
  • 16.
    Part of theStreamlio unified architecture Interactive Querying Metadata Management Operational Monitoring Chargeback Security Authentication Quota Management Rules Engine
  • 17.
    • Website: https://streaml.io •Twitter: @streamlio • Slack: https://learn-streamlio.slack.com • Try Streamlio: https://streaml.io/docs/getting-started Learn more