Why Splunk Chose Pulsar_Karthik Ramasamy

StreamNative
StreamNativeStreamNative
© 2019 SPLUNK INC.
Why Splunk Chose Pulsar
June 2019
Karthik Ramasamy
Splunk
© 2020 SPLUNK INC.
Karthik
Ramasamy
Senior Director of Engineering
@karthikz
streaming @splunk | ex-CEO of @streamlio | co-creator of @heronstreaming | ex @Twitter | Ph.D
During the course of this presentation, we may make forward-looking statements
regarding future events or plans of the company. We caution you that such statements
reflect our current expectations and estimates based on factors currently known to us
and that actual events or results may differ materially. The forward-looking statements
made in the this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, it may not contain current or
accurate information. We do not assume any obligation to update any forward-
looking statements made herein.
In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only,
and shall not be incorporated into any contract or other commitment. Splunk undertakes
no obligation either to develop the features or functionalities described or to include any
such feature or functionality in a future release.
Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the
United States and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020
Splunk Inc. All rights reserved
Forward-
Looking
Statements
© 2020 SPLUNK INC.
© 2019 SPLUNK INC.
Agenda 1) Introduction to Splunk
2) Streaming system requirements
3) How Pulsar satisfies the requirements?
4) Apache Pulsar at Splunk
5) Questions?
© 2020 SPLUNK INC.
Cloud 5G IoT AI
Mobility Virtualization Robotic Process

Automation
Blockchain VR
Platforms
New technologies are 

enabling and fueling digitization
© 2020 SPLUNK INC.
Data is Transforming Everything
The way we work, live and play
© 2020 SPLUNK INC.
Data

LakesMaster Data
Management
ETL
Point Data
Management 

Solutions
Data

Silos
Business
Processes
The 

Data-to-Everything
Platform
IT
Security
DevOps
© 2019 SPLUNK INC.
Core of Emerging Use Cases
Streaming data
transformation
Data
distribution
Real-time analytics
Real-time monitoring and
notifications
IoT analytics
!
Event-driven workflows
Messaging / Streaming Systems
Interactive applications
Log processing and
analytics
© 2020 SPLUNK INC.
Streaming System Requirements
DurabilityScalability
Fault
Tolerance
High
Availability
Sharing &
Isolation
Messaging
Models
Client
Languages
Persistence Type Safety
Deployment in
k8s
© 2020 SPLUNK INC.
Streaming System Requirements
AdoptionEcosystem Community Licensing
Disaster
Recovery
Operability TCO Observability
© 2019 SPLUNK INC.
Requirement #1 - Scalability
✦ Traffic can wildly vary while the system in production
✦ System need to scale up with no effect to publish/consume throughput and latency
✦ Support for linear increase/decrease in publish/consume throughput as new nodes are added
✦ Automatic spreading out load to new machines as new nodes are added
✦ Scalability across different dimensions - serving and storage
© 2019 SPLUNK INC.
Scalability
Consumer
Producer
Producer
Producer
Consumer
Consumer
Consumer
Messaging
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Event storage
Function Processing
WorkerWorker
✦ Independent layers for processing, serving and storage
✦ Messaging and processing built on Apache Pulsar
✦ Storage built on Apache BookKeeper
© 2019 SPLUNK INC.
Requirement #2 - Durability
✦ Splunk applications have different types of durability
✦ Persistent Durability - No data loss in the presence of nodes failures or entire cluster failure - e.g security &
compliance
✦ Replicated Durability - No data loss in the presence of limited nodes failures - e.g, machine logs
✦ Transient Durability - Data loss in the presence of failures - e.g metrics data
© 2019 SPLUNK INC.
Durability
Bookie
Bookie
BookieBrokerProducer
Journal
Journal
Journal
fsync
fsync
fsync
© 2019 SPLUNK INC.
Requirement #3 - Fault Tolerance
✦ Ability of the system to function under component failures
✦ Ideally without any manual intervention up to a certain degree
© 2019 SPLUNK INC.
Pulsar Fault Tolerance
Segment 1
Segment 2
Segment n
.
.
.
Segment 2
Segment 3
Segment n
.
.
.
Segment 3
Segment 1
Segment n
.
.
.
Segment 1
Segment 2
Segment n
.
.
.
Storage
Broker
Serving
Broker Broker
✦ Broker Failure
✦ Topic reassigned to available broker based on load
✦ Can construct the previous state consistently
✦ No data needs to be copied
✦ Bookie Failure
✦ Immediate switch to a new node
✦ Background process copies segments to other bookies
to maintain replication factor
© 2019 SPLUNK INC.
Requirement #4 - High Availability
✦ System should continue to function in the cloud or on-prem in following conditions, if applicable
✦ When two nodes/instances fail
✦ When an availability zone or a rack fails
© 2019 SPLUNK INC.
Pulsar High Availability
Segment 1
Segment 2
Segment n
.
.
.
Segment 2
Segment 3
Segment n
.
.
.
Segment 3
Segment 1
Segment n
.
.
.
Storage
Broker
Serving
Broker Broker
✦ Node Failures
✦ Broker failures
✦ Bookie failures
✦ Handled similar to respective component failures
✦ Zone/Rack Failures
✦ Bookies provide rack awareness
✦ Broker replicate data to different racks/zones
✦ In the presence of zone/rack failure, data is available
in other zones
Zone A Zone B Zone C
© 2019 SPLUNK INC.
Requirement #5 - Sharing and Isolation
✦ System should have the capabilities to
✦ Share many applications on the same cluster for cost and manageability purposes
✦ Isolate different applications on their own machines in the same cluster when needed
© 2019 SPLUNK INC.
Sharing and Isolation
Apache Pulsar Cluster
Product
Safety
ETL
Fraud
Detection
Topic-1
Account History
Topic-2
User Clustering
Topic-1
Risk Classification
MarketingCampaigns
ETL
Topic-1
Budgeted Spend
Topic-2
Demographic Classification
Topic-1
Location Resolution
Data
Serving
Microservice
Topic-1
Customer Authentication
10 TB
7 TB
5 TB
✦ Software isolation
Storage quotas, flow control, back pressure, rate limiting
✦ Hardware isolation
Constrain some tenants on a subset of brokers/bookies
© 2019 SPLUNK INC.
Requirement #6 - Client Languages
Apache Pulsar Cluster
Java
Python
Go
C++ C
Officially supported by the project
© 2019 SPLUNK INC.
Requirement #7 - Multiple Messaging Models
✦ Splunk applications require different consuming models
✦ Collect once and deliver once capability (e.g) process S3 file and ingest into index
✦ Receive data once and deliver many times (e.g) multiple pipelines sharing same data for different
types of processing
✦ Avoid two systems, if possible - from cost and operations perspective
✦ Avoid any additional infra-level code, if possible, that emulates one semantics on top of another
system
© 2020 SPLUNK INC.
Pulsar Messaging Models
• Shared Subscription
• Key Shared Subscription
Messaging Queuing
• Exclusive Subscription
• Failover Subscription
Native support avoids two systems and extra infrastructure code
that requires maintenance
© 2019 SPLUNK INC.
Requirement #8 - Persistence
Producer
Producer
Producer
Consumer
Consumer
Cold storage
Hot storage
Topic
✦ Offload cold data to lower-cost storage (e.g.
cloud storage, HDFS)
✦ Manual or automatic (configurable threshold)
✦ Transparent to publishers and consumers
✦ Allows near-infinite event storage at low cost
(e.g) compliance and security
© 2019 SPLUNK INC.
Requirement #9 - Type Safety
✦ Splunk applications are varied
✦ One class requires fixed schema
✦ Another class requires fixed schema with evolution
✦ Other class requires flexibility for no schema or handled at the application level
✦ Avoid bringing another system for schema management
✦ Support for multiple different types -
© 2019 SPLUNK INC.
Pulsar Schema Registry
✦ Provides type safety to applications built on top of Pulsar
✦ Server side - system enforces type safety and ensures that producers and consumers remain synced
✦ Schema registry enables clients to upload data schemas on a topic basis.
✦ Schemas dictate which data types are recognized as valid for that topic
© 2019 SPLUNK INC.
Requirement #10 - Ease of Deployment in k8s
✦ Splunk uses k8s for orchestration
✦ System should be easily deployable in k8s
✦ Surface area of the system exposed outside k8s should be minimal - one single end point backed by
✦ Should be able to segregate the nodes receiving external traffic
✦ Should be flexible to deploy from CI/CD pipelines for testing and development
© 2019 SPLUNK INC.
Pulsar Deployment in k8s
Broker Broker Broker
Segment 1
Segment 2
Segment n
.
.
.
Segment 2
Segment 3
Segment n
.
.
.
Segment 3
Segment 1
Segment n
.
.
.
Segment 1
Segment 2
Segment n
.
.
.
S
LB
Proxy Proxy Proxy
Broker Broker Broker
Segment 1
Segment 2
Segment n
.
.
.
Segment 2
Segment 3
Segment n
.
.
.
Segment 3
Segment 1
Segment n
.
.
.
Segment 1
Segment 2
Segment n
.
.
.
S
LB
Proxy Proxy Proxy
Aggregated Deployment Segregated Deployment
© 2019 SPLUNK INC.
Requirement #11 - Operability
✦ System should be online and continue to serve production traffic in the following scenarios
✦ OS upgrades
✦ Security patches
✦ Disk swapping
✦ Upgrading
✦ Self adjusting components
✦ Bookies turn themselves into readonly when 90% of disk is full
✦ Load manager to balance traffic across brokers
© 2019 SPLUNK INC.
Requirement #12 - Disaster Recovery
✦ Critical enterprise data flows through Splunk products
✦ Customer expect continuous availability in cloud / on-premise
✦ Required to handle data center failures seamlessly
✦ Pulsar provides both
✦ Asynchronous Replication
✦ Synchronous Replication
© 2019 SPLUNK INC.
Disaster Recovery - Async Replication
✦ Two independent clusters, primary/
standby or primary/primary
configuration
✦ Configured tenants and namespaces
replicate to standby
✦ Data published to primary is
asynchronously replicated to standby
✦ Producers and consumers restarted in
second datacenter upon primary failure
✦ With replicated subscriptions,
consumers start close to where they
left off
Producers
(active)
Datacenter A
Consumers
(active)
Pulsar Cluster
(primary)
Datacenter B
Producers
(standby)
Consumers
(standby)
Pulsar Cluster
(standby)
Pulsar
replication
ZooKeeper ZooKeeper
© 2019 SPLUNK INC.
Requirement #13 - Performance & TCO
✦ Splunk application requirements are very varied
✦ real-time (< 10 ms)
✦ near real-time (< few mins)
✦ high throughput (ability to handle multi PB/day in a single cluster)
✦ Conducted a detailed performance study comparing with Kafka
© 2019 SPLUNK INC.
Performance
✦ Pulsar provides consistently 5x-50x lower in latency
✦ Pulsar uses 20-30% less brokers + bookies as it efficiently exploits available disk bandwidth
✦ Pulsar uses 50–60% less CPU cores with complete control of memory
✦ Pulsar single partition throughput is 5x higher and 5x-50x lower in latency
© 2019 SPLUNK INC.
Pulsar is 1.5-2x lower in capex
cost with 5-50x improvement in
latency and 2-3x lower in opex
due to layered architecture
© 2019 SPLUNK INC.
Requirement #14 - Observability
✦ When in production, we need visibility about overall health of the system and its components
✦ System should expose detailed relevant metrics
✦ Should be able to easy to debug and troubleshoot
© 2019 SPLUNK INC.
Pulsar Observability
✦ System overview metrics
✦ Messaging metrics
✦ Topic metrics
✦ Function metrics
✦ Broker metrics
✦ Bookie metrics
✦ Proxy metrics
✦ JVM metrics
✦ Log metrics
✦ Zookeeper metrics
✦ Container metrics
✦ Host metrics
© 2019 SPLUNK INC.
Requirement #15 - Ecosystem
It is growing!
© 2019 SPLUNK INC.
Requirement #16 - Adoption
Over 600 companies and growing!
© 2020 SPLUNK INC.
Requirement #17 - Community
280
contributors
30
committers
600+
Companies
5.7K
github stars
© 2019 SPLUNK INC.
Requirement #18 - Licensing
✦ Apache License 2.0
✦ Affiliated with vendor neutral institutions - Apache/CNCF
✦ Avoid vendor controlled components, if needed
✦ Vendor could change the license later
© 2019 SPLUNK INC.
Apache Pulsar vs Apache Kafka
Multi-tenancy
A single cluster can support many
tenants and use cases
Seamless Cluster Expansion
Expand the cluster without any
down time
High throughput & Low Latency
Can reach 1.8 M messages/s in a
single partition and publish
latency of 5ms at 99pct
Durability
Data replicated and synced to disk
Geo-replication
Out of box support for
geographically distributed
applications
Unified messaging model
Support both Topic & Queue
semantic in a single model
Tiered Storage
Hot/warm data for real time access and
cold event data in cheaper storage
Pulsar Functions
Flexible light weight compute
Highly scalable
Can support millions of topics, makes
data modeling easier
Licensing
Apache 2.0 - no vendor specific
licensing
Multiprotocol Handlers
Support for AMPQ, MQTT and
Kafka
OSS
Several core features of Pulsar are in
Apache as compared to Kafka
© 2019 SPLUNK INC.
Apache Pulsar at Splunk
✦ Apache Pulsar as a service running in production processing several billions of messages/day
✦ Apache Pulsar is integrated as the message bus with Splunk DSP 1.1.0 - core streaming product
✦ Apache Pulsar is being introduced in other initiatives as well.
© 2019 SPLUNK INC.
Splunk DSP
A real time stream processing solution that collects, processes and delivers data to Splunk and other
destinations in milliseconds
Splunk Data Stream Processor
Detect Data Patterns or Conditions
Mask Sensitive Data
Aggregate Format
Normalize Transform
Filter Enhance
Turn Raw Data Into

High-value Information
Protect Sensitive Data
Distribute Data To Splunk

Or Other Destinations
Data

Warehouse
Public

Cloud
Message

Bus
© 2020 SPLUNK INC.
Data driven decision making is
challenged with multiple
instances, subsidiaries,

on-premise + cloud/multi-cloud
InsightsData VisibilityControl
Massive amounts of data
make it hard to collect,
protect and deliver the right
data to the right users and
systems
Generate business-
critical insights faster to
remain competitive in
data-driven environment
DSP solves the
challenges
© 2019 SPLUNK INC.
DSP Architecture
HEC
S2S
Batch
Apache Pulsar
Stream Processing
Engine
External
Systems
REST Client
Forwarders
Data Source
Splunk
Indexer
Apache Pulsar is at the core of DSP
© 2020 SPLUNK INC.
Closing Remarks
Future Work
✦ Auto-partitioning
✦ Pluggable metadata store
✦ Enhancing the state store
Current Work
✦ Improved Go client
✦ Support for batch connectors
✦ Pulsar k8s operator
✦ Critical bug fixes
Splunk is committed to advancing Apache Pulsar - as it is used by our core products and cloud services
Thank You
© 2019 SPLUNK INC.
1 of 47

Recommended

Introduction to Apache Kafka by
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
52.2K views70 slides
The top 3 challenges running multi-tenant Flink at scale by
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
328 views16 slides
Introducing the Apache Flink Kubernetes Operator by
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
770 views37 slides
Kafka replication apachecon_2013 by
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
21.3K views31 slides
Kafka 101 by
Kafka 101Kafka 101
Kafka 101Clement Demonchy
2.4K views41 slides
Apache Pulsar: The Next Generation Messaging and Queuing System by
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemDatabricks
939 views39 slides

More Related Content

What's hot

Evening out the uneven: dealing with skew in Flink by
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
2.5K views35 slides
Handle Large Messages In Apache Kafka by
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaJiangjie Qin
46.7K views59 slides
How Orange Financial combat financial frauds over 50M transactions a day usin... by
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...StreamNative
811 views38 slides
Multi-Datacenter Kafka - Strata San Jose 2017 by
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
4.4K views39 slides
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases by
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
321 views43 slides
Using Queryable State for Fun and Profit by
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
257 views37 slides

What's hot(20)

Evening out the uneven: dealing with skew in Flink by Flink Forward
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward2.5K views
Handle Large Messages In Apache Kafka by Jiangjie Qin
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin46.7K views
How Orange Financial combat financial frauds over 50M transactions a day usin... by StreamNative
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
StreamNative811 views
Multi-Datacenter Kafka - Strata San Jose 2017 by Gwen (Chen) Shapira
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira4.4K views
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases by Shivji Kumar Jha
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha321 views
Using Queryable State for Fun and Profit by Flink Forward
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward257 views
Scaling Apache Pulsar to 10 PB/day by Karthik Ramasamy
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/day
Karthik Ramasamy1.1K views
Apache Kafka Architecture & Fundamentals Explained by confluent
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent27.7K views
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul... by StreamNative
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
StreamNative378 views
Running & Scaling Large Elasticsearch Clusters by Fred de Villamil
Running & Scaling Large Elasticsearch ClustersRunning & Scaling Large Elasticsearch Clusters
Running & Scaling Large Elasticsearch Clusters
Fred de Villamil10.9K views
From cache to in-memory data grid. Introduction to Hazelcast. by Taras Matyashovsky
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky42.6K views
Introduction to Apache Kafka by Shiao-An Yuan
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan1.6K views
Scalability, Availability & Stability Patterns by Jonas Bonér
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér515.9K views
Apache Kafka - Martin Podval by Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval3.4K views
Running Apache Spark on Kubernetes: Best Practices and Pitfalls by Databricks
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks2.9K views
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records by ScyllaDB
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
ScyllaDB671 views
Getting Started with Apache Spark on Kubernetes by Databricks
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks709 views
Stream processing using Kafka by Knoldus Inc.
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.1.6K views

Similar to Why Splunk Chose Pulsar_Karthik Ramasamy

Apache Pulsar @Splunk by
Apache Pulsar @SplunkApache Pulsar @Splunk
Apache Pulsar @SplunkKarthik Ramasamy
835 views75 slides
Splunk und Multi-Cloud by
Splunk und Multi-CloudSplunk und Multi-Cloud
Splunk und Multi-CloudSplunk
336 views28 slides
Splunk and Multicloud by
Splunk and MulticloudSplunk and Multicloud
Splunk and MulticloudSplunk
368 views28 slides
Splunk and Multicloud by
Splunk and Multicloud Splunk and Multicloud
Splunk and Multicloud Splunk
92 views28 slides
Splunk Cloud and Splunk Enterprise 7.2 by
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2 Splunk
471 views42 slides
Splunk Cloud and Splunk Enterprise 7.2 by
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2 Splunk
261 views42 slides

Similar to Why Splunk Chose Pulsar_Karthik Ramasamy(20)

Splunk und Multi-Cloud by Splunk
Splunk und Multi-CloudSplunk und Multi-Cloud
Splunk und Multi-Cloud
Splunk336 views
Splunk and Multicloud by Splunk
Splunk and MulticloudSplunk and Multicloud
Splunk and Multicloud
Splunk368 views
Splunk and Multicloud by Splunk
Splunk and Multicloud Splunk and Multicloud
Splunk and Multicloud
Splunk92 views
Splunk Cloud and Splunk Enterprise 7.2 by Splunk
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
Splunk471 views
Splunk Cloud and Splunk Enterprise 7.2 by Splunk
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
Splunk261 views
Splunk Cloud and Splunk Enterprise 7.2 by Splunk
Splunk Cloud and Splunk Enterprise 7.2Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
Splunk1.4K views
What's New with the Latest Splunk Platform Release by Splunk
What's New with the Latest Splunk Platform ReleaseWhat's New with the Latest Splunk Platform Release
What's New with the Latest Splunk Platform Release
Splunk301 views
Turning Data Into Business Outcomes with the Splunk Platform by Splunk
Turning Data Into Business Outcomes with the Splunk PlatformTurning Data Into Business Outcomes with the Splunk Platform
Turning Data Into Business Outcomes with the Splunk Platform
Splunk287 views
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote by StreamNative
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 KeynoteScaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
StreamNative744 views
Alle Neuigkeiten im letzten Plattform Release by Splunk
Alle Neuigkeiten im letzten Plattform ReleaseAlle Neuigkeiten im letzten Plattform Release
Alle Neuigkeiten im letzten Plattform Release
Splunk109 views
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020 by Databricks
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Databricks1.4K views
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC... by Edge AI and Vision Alliance
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
Splunk Artificial Intelligence & Machine Learning Webinar by Splunk
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk2.8K views
Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App! by Harry McLaren
Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App!Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App!
Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App!
Harry McLaren860 views
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se... by IRJET Journal
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
IRJET Journal6 views
.conf Go Zurich 2022 - Platform Session by Splunk
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
Splunk97 views
CisCon 2018 - Analytics per Storage Area Networks by AreaNetworking.it
CisCon 2018 - Analytics per Storage Area NetworksCisCon 2018 - Analytics per Storage Area Networks
CisCon 2018 - Analytics per Storage Area Networks
AreaNetworking.it359 views

More from StreamNative

Building an Asynchronous Application Framework with Python and Pulsar - Pulsa... by
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
128 views26 slides
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys... by
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
150 views66 slides
Distributed Database Design Decisions to Support High Performance Event Strea... by
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
27 views36 slides
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022 by
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
61 views39 slides
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022 by
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
101 views33 slides
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ... by
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
41 views34 slides

More from StreamNative(20)

Building an Asynchronous Application Framework with Python and Pulsar - Pulsa... by StreamNative
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative128 views
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys... by StreamNative
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
StreamNative150 views
Distributed Database Design Decisions to Support High Performance Event Strea... by StreamNative
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
StreamNative27 views
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022 by StreamNative
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
StreamNative61 views
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022 by StreamNative
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative101 views
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ... by StreamNative
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
StreamNative41 views
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac... by StreamNative
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
StreamNative31 views
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022 by StreamNative
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
StreamNative282 views
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ... by StreamNative
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative78 views
Understanding Broker Load Balancing - Pulsar Summit SF 2022 by StreamNative
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
StreamNative41 views
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa... by StreamNative
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative19 views
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022 by StreamNative
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative32 views
Event-Driven Applications Done Right - Pulsar Summit SF 2022 by StreamNative
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative32 views
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022 by StreamNative
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
StreamNative21 views
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022 by StreamNative
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
StreamNative21 views
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022 by StreamNative
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative82 views
Welcome and Opening Remarks - Pulsar Summit SF 2022 by StreamNative
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
StreamNative23 views
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa... by StreamNative
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative1.4K views
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi... by StreamNative
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
StreamNative317 views
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021 by StreamNative
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
StreamNative79 views

Recently uploaded

[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks by
[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks
[DSC Europe 23] Aleksandar Tomcic - Adversarial AttacksDataScienceConferenc1
5 views20 slides
PRIVACY AWRE PERSONAL DATA STORAGE by
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGEantony420421
5 views56 slides
MOSORE_BRESCIA by
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 views8 slides
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptxDataScienceConferenc1
5 views16 slides
Data about the sector workshop by
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
12 views27 slides
LIVE OAK MEMORIAL PARK.pptx by
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptxms2332always
7 views6 slides

Recently uploaded(20)

PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204215 views
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821712 views
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9016 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views

Why Splunk Chose Pulsar_Karthik Ramasamy

  • 1. © 2019 SPLUNK INC. Why Splunk Chose Pulsar June 2019 Karthik Ramasamy Splunk
  • 2. © 2020 SPLUNK INC. Karthik Ramasamy Senior Director of Engineering @karthikz streaming @splunk | ex-CEO of @streamlio | co-creator of @heronstreaming | ex @Twitter | Ph.D
  • 3. During the course of this presentation, we may make forward-looking statements regarding future events or plans of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results may differ materially. The forward-looking statements made in the this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, it may not contain current or accurate information. We do not assume any obligation to update any forward- looking statements made herein. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionalities described or to include any such feature or functionality in a future release. Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved Forward- Looking Statements © 2020 SPLUNK INC.
  • 4. © 2019 SPLUNK INC. Agenda 1) Introduction to Splunk 2) Streaming system requirements 3) How Pulsar satisfies the requirements? 4) Apache Pulsar at Splunk 5) Questions?
  • 5. © 2020 SPLUNK INC. Cloud 5G IoT AI Mobility Virtualization Robotic Process
 Automation Blockchain VR Platforms New technologies are 
 enabling and fueling digitization
  • 6. © 2020 SPLUNK INC. Data is Transforming Everything The way we work, live and play
  • 7. © 2020 SPLUNK INC. Data
 LakesMaster Data Management ETL Point Data Management 
 Solutions Data
 Silos Business Processes The 
 Data-to-Everything Platform IT Security DevOps
  • 8. © 2019 SPLUNK INC. Core of Emerging Use Cases Streaming data transformation Data distribution Real-time analytics Real-time monitoring and notifications IoT analytics ! Event-driven workflows Messaging / Streaming Systems Interactive applications Log processing and analytics
  • 9. © 2020 SPLUNK INC. Streaming System Requirements DurabilityScalability Fault Tolerance High Availability Sharing & Isolation Messaging Models Client Languages Persistence Type Safety Deployment in k8s
  • 10. © 2020 SPLUNK INC. Streaming System Requirements AdoptionEcosystem Community Licensing Disaster Recovery Operability TCO Observability
  • 11. © 2019 SPLUNK INC. Requirement #1 - Scalability ✦ Traffic can wildly vary while the system in production ✦ System need to scale up with no effect to publish/consume throughput and latency ✦ Support for linear increase/decrease in publish/consume throughput as new nodes are added ✦ Automatic spreading out load to new machines as new nodes are added ✦ Scalability across different dimensions - serving and storage
  • 12. © 2019 SPLUNK INC. Scalability Consumer Producer Producer Producer Consumer Consumer Consumer Messaging Broker Broker Broker Bookie Bookie Bookie Bookie Bookie Event storage Function Processing WorkerWorker ✦ Independent layers for processing, serving and storage ✦ Messaging and processing built on Apache Pulsar ✦ Storage built on Apache BookKeeper
  • 13. © 2019 SPLUNK INC. Requirement #2 - Durability ✦ Splunk applications have different types of durability ✦ Persistent Durability - No data loss in the presence of nodes failures or entire cluster failure - e.g security & compliance ✦ Replicated Durability - No data loss in the presence of limited nodes failures - e.g, machine logs ✦ Transient Durability - Data loss in the presence of failures - e.g metrics data
  • 14. © 2019 SPLUNK INC. Durability Bookie Bookie BookieBrokerProducer Journal Journal Journal fsync fsync fsync
  • 15. © 2019 SPLUNK INC. Requirement #3 - Fault Tolerance ✦ Ability of the system to function under component failures ✦ Ideally without any manual intervention up to a certain degree
  • 16. © 2019 SPLUNK INC. Pulsar Fault Tolerance Segment 1 Segment 2 Segment n .
.
. Segment 2 Segment 3 Segment n .
.
. Segment 3 Segment 1 Segment n .
.
. Segment 1 Segment 2 Segment n .
.
. Storage Broker Serving Broker Broker ✦ Broker Failure ✦ Topic reassigned to available broker based on load ✦ Can construct the previous state consistently ✦ No data needs to be copied ✦ Bookie Failure ✦ Immediate switch to a new node ✦ Background process copies segments to other bookies to maintain replication factor
  • 17. © 2019 SPLUNK INC. Requirement #4 - High Availability ✦ System should continue to function in the cloud or on-prem in following conditions, if applicable ✦ When two nodes/instances fail ✦ When an availability zone or a rack fails
  • 18. © 2019 SPLUNK INC. Pulsar High Availability Segment 1 Segment 2 Segment n .
.
. Segment 2 Segment 3 Segment n .
.
. Segment 3 Segment 1 Segment n .
.
. Storage Broker Serving Broker Broker ✦ Node Failures ✦ Broker failures ✦ Bookie failures ✦ Handled similar to respective component failures ✦ Zone/Rack Failures ✦ Bookies provide rack awareness ✦ Broker replicate data to different racks/zones ✦ In the presence of zone/rack failure, data is available in other zones Zone A Zone B Zone C
  • 19. © 2019 SPLUNK INC. Requirement #5 - Sharing and Isolation ✦ System should have the capabilities to ✦ Share many applications on the same cluster for cost and manageability purposes ✦ Isolate different applications on their own machines in the same cluster when needed
  • 20. © 2019 SPLUNK INC. Sharing and Isolation Apache Pulsar Cluster Product Safety ETL Fraud Detection Topic-1 Account History Topic-2 User Clustering Topic-1 Risk Classification MarketingCampaigns ETL Topic-1 Budgeted Spend Topic-2 Demographic Classification Topic-1 Location Resolution Data Serving Microservice Topic-1 Customer Authentication 10 TB 7 TB 5 TB ✦ Software isolation Storage quotas, flow control, back pressure, rate limiting ✦ Hardware isolation Constrain some tenants on a subset of brokers/bookies
  • 21. © 2019 SPLUNK INC. Requirement #6 - Client Languages Apache Pulsar Cluster Java Python Go C++ C Officially supported by the project
  • 22. © 2019 SPLUNK INC. Requirement #7 - Multiple Messaging Models ✦ Splunk applications require different consuming models ✦ Collect once and deliver once capability (e.g) process S3 file and ingest into index ✦ Receive data once and deliver many times (e.g) multiple pipelines sharing same data for different types of processing ✦ Avoid two systems, if possible - from cost and operations perspective ✦ Avoid any additional infra-level code, if possible, that emulates one semantics on top of another system
  • 23. © 2020 SPLUNK INC. Pulsar Messaging Models • Shared Subscription • Key Shared Subscription Messaging Queuing • Exclusive Subscription • Failover Subscription Native support avoids two systems and extra infrastructure code that requires maintenance
  • 24. © 2019 SPLUNK INC. Requirement #8 - Persistence Producer Producer Producer Consumer Consumer Cold storage Hot storage Topic ✦ Offload cold data to lower-cost storage (e.g. cloud storage, HDFS) ✦ Manual or automatic (configurable threshold) ✦ Transparent to publishers and consumers ✦ Allows near-infinite event storage at low cost (e.g) compliance and security
  • 25. © 2019 SPLUNK INC. Requirement #9 - Type Safety ✦ Splunk applications are varied ✦ One class requires fixed schema ✦ Another class requires fixed schema with evolution ✦ Other class requires flexibility for no schema or handled at the application level ✦ Avoid bringing another system for schema management ✦ Support for multiple different types -
  • 26. © 2019 SPLUNK INC. Pulsar Schema Registry ✦ Provides type safety to applications built on top of Pulsar ✦ Server side - system enforces type safety and ensures that producers and consumers remain synced ✦ Schema registry enables clients to upload data schemas on a topic basis. ✦ Schemas dictate which data types are recognized as valid for that topic
  • 27. © 2019 SPLUNK INC. Requirement #10 - Ease of Deployment in k8s ✦ Splunk uses k8s for orchestration ✦ System should be easily deployable in k8s ✦ Surface area of the system exposed outside k8s should be minimal - one single end point backed by ✦ Should be able to segregate the nodes receiving external traffic ✦ Should be flexible to deploy from CI/CD pipelines for testing and development
  • 28. © 2019 SPLUNK INC. Pulsar Deployment in k8s Broker Broker Broker Segment 1 Segment 2 Segment n .
.
. Segment 2 Segment 3 Segment n .
.
. Segment 3 Segment 1 Segment n .
.
. Segment 1 Segment 2 Segment n .
.
. S LB Proxy Proxy Proxy Broker Broker Broker Segment 1 Segment 2 Segment n .
.
. Segment 2 Segment 3 Segment n .
.
. Segment 3 Segment 1 Segment n .
.
. Segment 1 Segment 2 Segment n .
.
. S LB Proxy Proxy Proxy Aggregated Deployment Segregated Deployment
  • 29. © 2019 SPLUNK INC. Requirement #11 - Operability ✦ System should be online and continue to serve production traffic in the following scenarios ✦ OS upgrades ✦ Security patches ✦ Disk swapping ✦ Upgrading ✦ Self adjusting components ✦ Bookies turn themselves into readonly when 90% of disk is full ✦ Load manager to balance traffic across brokers
  • 30. © 2019 SPLUNK INC. Requirement #12 - Disaster Recovery ✦ Critical enterprise data flows through Splunk products ✦ Customer expect continuous availability in cloud / on-premise ✦ Required to handle data center failures seamlessly ✦ Pulsar provides both ✦ Asynchronous Replication ✦ Synchronous Replication
  • 31. © 2019 SPLUNK INC. Disaster Recovery - Async Replication ✦ Two independent clusters, primary/ standby or primary/primary configuration ✦ Configured tenants and namespaces replicate to standby ✦ Data published to primary is asynchronously replicated to standby ✦ Producers and consumers restarted in second datacenter upon primary failure ✦ With replicated subscriptions, consumers start close to where they left off Producers (active) Datacenter A Consumers (active) Pulsar Cluster (primary) Datacenter B Producers (standby) Consumers (standby) Pulsar Cluster (standby) Pulsar replication ZooKeeper ZooKeeper
  • 32. © 2019 SPLUNK INC. Requirement #13 - Performance & TCO ✦ Splunk application requirements are very varied ✦ real-time (< 10 ms) ✦ near real-time (< few mins) ✦ high throughput (ability to handle multi PB/day in a single cluster) ✦ Conducted a detailed performance study comparing with Kafka
  • 33. © 2019 SPLUNK INC. Performance ✦ Pulsar provides consistently 5x-50x lower in latency ✦ Pulsar uses 20-30% less brokers + bookies as it efficiently exploits available disk bandwidth ✦ Pulsar uses 50–60% less CPU cores with complete control of memory ✦ Pulsar single partition throughput is 5x higher and 5x-50x lower in latency
  • 34. © 2019 SPLUNK INC. Pulsar is 1.5-2x lower in capex cost with 5-50x improvement in latency and 2-3x lower in opex due to layered architecture
  • 35. © 2019 SPLUNK INC. Requirement #14 - Observability ✦ When in production, we need visibility about overall health of the system and its components ✦ System should expose detailed relevant metrics ✦ Should be able to easy to debug and troubleshoot
  • 36. © 2019 SPLUNK INC. Pulsar Observability ✦ System overview metrics ✦ Messaging metrics ✦ Topic metrics ✦ Function metrics ✦ Broker metrics ✦ Bookie metrics ✦ Proxy metrics ✦ JVM metrics ✦ Log metrics ✦ Zookeeper metrics ✦ Container metrics ✦ Host metrics
  • 37. © 2019 SPLUNK INC. Requirement #15 - Ecosystem It is growing!
  • 38. © 2019 SPLUNK INC. Requirement #16 - Adoption Over 600 companies and growing!
  • 39. © 2020 SPLUNK INC. Requirement #17 - Community 280 contributors 30 committers 600+ Companies 5.7K github stars
  • 40. © 2019 SPLUNK INC. Requirement #18 - Licensing ✦ Apache License 2.0 ✦ Affiliated with vendor neutral institutions - Apache/CNCF ✦ Avoid vendor controlled components, if needed ✦ Vendor could change the license later
  • 41. © 2019 SPLUNK INC. Apache Pulsar vs Apache Kafka Multi-tenancy A single cluster can support many tenants and use cases Seamless Cluster Expansion Expand the cluster without any down time High throughput & Low Latency Can reach 1.8 M messages/s in a single partition and publish latency of 5ms at 99pct Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Topic & Queue semantic in a single model Tiered Storage Hot/warm data for real time access and cold event data in cheaper storage Pulsar Functions Flexible light weight compute Highly scalable Can support millions of topics, makes data modeling easier Licensing Apache 2.0 - no vendor specific licensing Multiprotocol Handlers Support for AMPQ, MQTT and Kafka OSS Several core features of Pulsar are in Apache as compared to Kafka
  • 42. © 2019 SPLUNK INC. Apache Pulsar at Splunk ✦ Apache Pulsar as a service running in production processing several billions of messages/day ✦ Apache Pulsar is integrated as the message bus with Splunk DSP 1.1.0 - core streaming product ✦ Apache Pulsar is being introduced in other initiatives as well.
  • 43. © 2019 SPLUNK INC. Splunk DSP A real time stream processing solution that collects, processes and delivers data to Splunk and other destinations in milliseconds Splunk Data Stream Processor Detect Data Patterns or Conditions Mask Sensitive Data Aggregate Format Normalize Transform Filter Enhance Turn Raw Data Into
 High-value Information Protect Sensitive Data Distribute Data To Splunk
 Or Other Destinations Data
 Warehouse Public
 Cloud Message
 Bus
  • 44. © 2020 SPLUNK INC. Data driven decision making is challenged with multiple instances, subsidiaries,
 on-premise + cloud/multi-cloud InsightsData VisibilityControl Massive amounts of data make it hard to collect, protect and deliver the right data to the right users and systems Generate business- critical insights faster to remain competitive in data-driven environment DSP solves the challenges
  • 45. © 2019 SPLUNK INC. DSP Architecture HEC S2S Batch Apache Pulsar Stream Processing Engine External Systems REST Client Forwarders Data Source Splunk Indexer Apache Pulsar is at the core of DSP
  • 46. © 2020 SPLUNK INC. Closing Remarks Future Work ✦ Auto-partitioning ✦ Pluggable metadata store ✦ Enhancing the state store Current Work ✦ Improved Go client ✦ Support for batch connectors ✦ Pulsar k8s operator ✦ Critical bug fixes Splunk is committed to advancing Apache Pulsar - as it is used by our core products and cloud services
  • 47. Thank You © 2019 SPLUNK INC.