● Senior Developer at
Nutanix responsible for all
things pulsar
● Love spending time with
data (stores, steam,
analytics etc)
● Ex-MySQL - started out
with 3 great years building
MySQL Replication
● Contributions to pulsar &
MySQL
Who am I ?
https://www.linkedin.com/in/shivjijha/
https://twitter.com/ShivjiJha
● Helping customers
manage cost and security
for hybrid cloud.
● Crunch (& stream) data to
find insights about cost
and security
● Needed pub/sub to store
events and replay when
required
What do we do ?
https://www.nutanix.com/products/beam
Platforms We Use
Platforms We Use
How do we
Choose
a platform ??
Avoid bias
towards
familiar
technology
The First Steps
Summarising the github comment
1. Kafka alternative - incubating apache project PULSAR
2. Open sourced by Yahoo
3. Hundreds of billions of messages per day in pulsar at Yahoo
4. Solving annoying problems in kafka like:
a. Topic management
b. Disruptive rebalances
5. Same raw power (throughput, latencies etc)
6. Stateless brokers
7. Apache bookkeeper for storage
8. Stream + queue
Wow, that is
a lot of
Promise!!
First principles - Requirements?
1. Coordination
2. Persistence
3. Scale compute and storage independently
4. Reliability
5. High Availability
6. Fault tolerance
7. Client ecosystem
Requirement # 1
✓ Coordination
Requirement # 1
✓ Coordination
Requirement # 1
1. Coordination
Requirement # 2
✓ Persistence
Requirement # 2
✓ Persistence
Requirement # 2
✓ Persistence
Requirement # 2
✓ Persistence
Requirement # 3
✓ Scale compute and storage independently
Requirement # 3
✓ Scale compute and storage independently
Requirement # 3
✓ Scale compute and storage independently
Brokers => serve msg
Requirement # 3
✓ Scale compute and storage independently
Bookies => store
Brokers => serve msg
Requirement # 3
✓ Scale compute and storage independently
Bookies => store
Brokers => serve msg
Requirement # 4
✓ High Availability
Requirement # 4
✓ High Availability
Requirement # 4
✓ High Availability
Replicated brokers
Replicated bookies
Requirement # 4
✓ High Availability
Replicated brokers
Replicated bookies
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
✓ Tunable fault tolerance (bookkeeper)
✓ Ensemble size
✓ Write quorum size
✓ Ack quorum size
--ensemble 2 --writeQuorum 2 --ackQuorum 2
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
✓ Tunable fault tolerance (bookkeeper)
✓ Ensemble size
✓ Write quorum size
✓ Ack quorum size
When scaling bookie cluster,
finetune quorum sizes
--ensemble 2 --writeQuorum 2 --ackQuorum 2
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
✓ Tunable fault tolerance (bookkeeper)
✓ Ensemble size
✓ Write quorum size
✓ Ack quorum size
--ensemble 2 --writeQuorum 2 --ackQuorum 2
When scaling bookie cluster,
finetune quorum sizes
Requirement # 6
✓ Client ecosystem
✓ Work in progress
Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
✓ Bonus features
✓ Load balancer auto balances topics among brokers
✓ Tiered storage
✓ Unified platform (Stream + Queue)
✓ Tiered topic structure
Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
✓ Bonus features
✓ Load balancer auto balances topics among brokers
✓ Tiered storage
✓ Unified platform (Stream + Queue)
✓ Tiered topic structure
Tuning Configurations
✓ Configurations could be optimized for backward compatibility
✓ Not necessarily for performance
✓ Not necessarily for latest features
✓ Perf Test for your use cases and tune!
Performance Testing
Pulsar with
Test Sync
Message
Test Async
Message
Tuning Configurations
✓ Durability vs throughput (bookkeeper.conf)
# Maximum latency to impose on a journal write to achieve grouping
journalMaxGroupWaitMSec=2
Tuning Configurations
✓ Disable auto recovery in bookkeeper when out for maintenance!
bookkeeper shell autorecovery -disable
STOP / MAINTENANCE / START
bookkeeper shell autorecovery -enable
Tuning Configurations
✓ Auto recovery vs throughput (broker.conf)
✓ If you have a small number of bookies, and a bookie goes down, auto recovery
may overwhelm the remaining bookies
✓ Number of entries that a replication will re-replicate in parallel
maxPendingReadRequestsPerThread=2500
rereplicationEntryBatchSize=100
Contribute to stay in sync
1. Development is fast, in fact very fast
a. Don’t maintain forks, easier to contribute
https://github.com/apache/pulsar/graphs/contributors
Contribute to stay in sync
1. Development is fast, in fact very fast
a. Don’t maintain forks, easier to contribute
2. We do the same!
https://github.com/apache/pulsar/graphs/contributors
Pulsar Use cases In Beam
&
Event Sourcing
1. Persisting your application's state by storing the history that
determines the current state of your application.
State of application at
any point in time
State of application at
this instant of time
https://docs.microsoft.com/en-us/previous-versions/msp-n-p/jj591559(v=pandp.10)
● History of events
● Past Tense verbs
● Immutable
● Ordered
● Restore for state at any
point in time
● Use: CQRS, Audit trail etc
Event Sourcing
https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
Representing Events (Schema)
1. Pulsar supports bytes, string, avro, ptobuff, json etc
2. Schemaless?
a. Any code that manipulates the data needs to make some
assumptions about its structure
b. All producers and consumers know the hidden implicit schema
3. Opinion : Use schema as far as possible.
a. Pulsar supports schema registry out of the box
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Different attributes
depending on kind
of event
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Different attributes
depending on kind
of event
Obviously, easy for
schemaless,
still needs care!
What to put on ONE topic?
1. Two choices:
a. Topic == collection of events of same type
b. Topic == events that need relative ordering guarantee.
https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
What to put on ONE topic?
1. Two choices:
a. Topic == collection of events of same type
b. Topic == events that need relative ordering guarantee.
2. Winner: choice (b)
https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
Avro / Proto (Struct) Schema
1. Language agnostic schema. Being stuck with one language sucks!
2. JSON seems first pick if you use REST, but
a. slow and
b. too verbose.
c. Complete Schema shipped with every message
3. Avro and proto are good.
4. We like Avro for its wide adoption.
a. And use pulsar’s built in schema registry
5. Consider keeping schema flat and fat (denormalize)!
https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
Schema Evolution
1. Choose a schema-auto-update strategy that suits use case.
a. We keep it forward compatible (add fields, delete optional fields)
b. Data produced with new schema can be read by consumers using last schema
c. Update producer, then consumers when they have time / need.
2. Each avro message contains an avro schema id & version.
3. Decode with the exact writer schema.
Summarizing Lessons
✓ Avoid bias to “known” when choosing a platform.
✓ Tune re-replication (ensemble, write quorum, ack quorum) when
scaling out bookies horizontally.
✓ Use schema, as far as possible!
✓ Tune configuration for size, resource, throughput, durability etc.
May be optimized for backward compatibility.
✓ Disable auto-recovery of bookie before taking down.
✓ Balance recovery with incoming user traffic.
✓ Put events that require ordering on same topic.
Q & A time

Lessons from managing a Pulsar cluster (Nutanix)

  • 2.
    ● Senior Developerat Nutanix responsible for all things pulsar ● Love spending time with data (stores, steam, analytics etc) ● Ex-MySQL - started out with 3 great years building MySQL Replication ● Contributions to pulsar & MySQL Who am I ? https://www.linkedin.com/in/shivjijha/ https://twitter.com/ShivjiJha
  • 3.
    ● Helping customers managecost and security for hybrid cloud. ● Crunch (& stream) data to find insights about cost and security ● Needed pub/sub to store events and replay when required What do we do ? https://www.nutanix.com/products/beam
  • 4.
  • 5.
  • 6.
    How do we Choose aplatform ??
  • 7.
  • 8.
  • 9.
    Summarising the githubcomment 1. Kafka alternative - incubating apache project PULSAR 2. Open sourced by Yahoo 3. Hundreds of billions of messages per day in pulsar at Yahoo 4. Solving annoying problems in kafka like: a. Topic management b. Disruptive rebalances 5. Same raw power (throughput, latencies etc) 6. Stateless brokers 7. Apache bookkeeper for storage 8. Stream + queue
  • 10.
    Wow, that is alot of Promise!!
  • 11.
    First principles -Requirements? 1. Coordination 2. Persistence 3. Scale compute and storage independently 4. Reliability 5. High Availability 6. Fault tolerance 7. Client ecosystem
  • 12.
    Requirement # 1 ✓Coordination
  • 13.
    Requirement # 1 ✓Coordination
  • 14.
    Requirement # 1 1.Coordination
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    Requirement # 3 ✓Scale compute and storage independently
  • 20.
    Requirement # 3 ✓Scale compute and storage independently
  • 21.
    Requirement # 3 ✓Scale compute and storage independently Brokers => serve msg
  • 22.
    Requirement # 3 ✓Scale compute and storage independently Bookies => store Brokers => serve msg
  • 23.
    Requirement # 3 ✓Scale compute and storage independently Bookies => store Brokers => serve msg
  • 24.
    Requirement # 4 ✓High Availability
  • 25.
    Requirement # 4 ✓High Availability
  • 26.
    Requirement # 4 ✓High Availability Replicated brokers Replicated bookies
  • 27.
    Requirement # 4 ✓High Availability Replicated brokers Replicated bookies
  • 28.
    Requirement # 5 ✓Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies)
  • 29.
    Requirement # 5 ✓Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies) ✓ Tunable fault tolerance (bookkeeper) ✓ Ensemble size ✓ Write quorum size ✓ Ack quorum size --ensemble 2 --writeQuorum 2 --ackQuorum 2
  • 30.
    Requirement # 5 ✓Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies) ✓ Tunable fault tolerance (bookkeeper) ✓ Ensemble size ✓ Write quorum size ✓ Ack quorum size When scaling bookie cluster, finetune quorum sizes --ensemble 2 --writeQuorum 2 --ackQuorum 2
  • 31.
    Requirement # 5 ✓Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies) ✓ Tunable fault tolerance (bookkeeper) ✓ Ensemble size ✓ Write quorum size ✓ Ack quorum size --ensemble 2 --writeQuorum 2 --ackQuorum 2 When scaling bookie cluster, finetune quorum sizes
  • 32.
    Requirement # 6 ✓Client ecosystem ✓ Work in progress
  • 33.
    Requirement # 6 ✓Client ecosystem ✓ Work in progress ✓ Compensating factors: ✓ Clients are easier to change, just a library afterall! ✓ Very active community (slack) ✓ Quick turnaround (and quick fixes) for critical issues
  • 34.
    Requirement # 6 ✓Client ecosystem ✓ Work in progress ✓ Compensating factors: ✓ Clients are easier to change, just a library afterall! ✓ Very active community (slack) ✓ Quick turnaround (and quick fixes) for critical issues ✓ Bonus features ✓ Load balancer auto balances topics among brokers ✓ Tiered storage ✓ Unified platform (Stream + Queue) ✓ Tiered topic structure
  • 35.
    Requirement # 6 ✓Client ecosystem ✓ Work in progress ✓ Compensating factors: ✓ Clients are easier to change, just a library afterall! ✓ Very active community (slack) ✓ Quick turnaround (and quick fixes) for critical issues ✓ Bonus features ✓ Load balancer auto balances topics among brokers ✓ Tiered storage ✓ Unified platform (Stream + Queue) ✓ Tiered topic structure
  • 36.
    Tuning Configurations ✓ Configurationscould be optimized for backward compatibility ✓ Not necessarily for performance ✓ Not necessarily for latest features ✓ Perf Test for your use cases and tune!
  • 37.
  • 39.
  • 40.
    Tuning Configurations ✓ Durabilityvs throughput (bookkeeper.conf) # Maximum latency to impose on a journal write to achieve grouping journalMaxGroupWaitMSec=2
  • 41.
    Tuning Configurations ✓ Disableauto recovery in bookkeeper when out for maintenance! bookkeeper shell autorecovery -disable STOP / MAINTENANCE / START bookkeeper shell autorecovery -enable
  • 42.
    Tuning Configurations ✓ Autorecovery vs throughput (broker.conf) ✓ If you have a small number of bookies, and a bookie goes down, auto recovery may overwhelm the remaining bookies ✓ Number of entries that a replication will re-replicate in parallel maxPendingReadRequestsPerThread=2500 rereplicationEntryBatchSize=100
  • 43.
    Contribute to stayin sync 1. Development is fast, in fact very fast a. Don’t maintain forks, easier to contribute https://github.com/apache/pulsar/graphs/contributors
  • 44.
    Contribute to stayin sync 1. Development is fast, in fact very fast a. Don’t maintain forks, easier to contribute 2. We do the same! https://github.com/apache/pulsar/graphs/contributors
  • 45.
  • 46.
    Event Sourcing 1. Persistingyour application's state by storing the history that determines the current state of your application. State of application at any point in time State of application at this instant of time https://docs.microsoft.com/en-us/previous-versions/msp-n-p/jj591559(v=pandp.10)
  • 47.
    ● History ofevents ● Past Tense verbs ● Immutable ● Ordered ● Restore for state at any point in time ● Use: CQRS, Audit trail etc Event Sourcing https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
  • 48.
    Representing Events (Schema) 1.Pulsar supports bytes, string, avro, ptobuff, json etc 2. Schemaless? a. Any code that manipulates the data needs to make some assumptions about its structure b. All producers and consumers know the hidden implicit schema 3. Opinion : Use schema as far as possible. a. Pulsar supports schema registry out of the box
  • 49.
    Representing Events (Schema) 1.Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types
  • 50.
    Representing Events (Schema) 1.Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types Add custom fields for UI etc
  • 51.
    Representing Events (Schema) 1.Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types Add custom fields for UI etc Different attributes depending on kind of event
  • 52.
    Representing Events (Schema) 1.Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types Add custom fields for UI etc Different attributes depending on kind of event Obviously, easy for schemaless, still needs care!
  • 53.
    What to puton ONE topic? 1. Two choices: a. Topic == collection of events of same type b. Topic == events that need relative ordering guarantee. https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
  • 54.
    What to puton ONE topic? 1. Two choices: a. Topic == collection of events of same type b. Topic == events that need relative ordering guarantee. 2. Winner: choice (b) https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
  • 55.
    Avro / Proto(Struct) Schema 1. Language agnostic schema. Being stuck with one language sucks! 2. JSON seems first pick if you use REST, but a. slow and b. too verbose. c. Complete Schema shipped with every message 3. Avro and proto are good. 4. We like Avro for its wide adoption. a. And use pulsar’s built in schema registry 5. Consider keeping schema flat and fat (denormalize)! https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
  • 56.
    Schema Evolution 1. Choosea schema-auto-update strategy that suits use case. a. We keep it forward compatible (add fields, delete optional fields) b. Data produced with new schema can be read by consumers using last schema c. Update producer, then consumers when they have time / need. 2. Each avro message contains an avro schema id & version. 3. Decode with the exact writer schema.
  • 57.
    Summarizing Lessons ✓ Avoidbias to “known” when choosing a platform. ✓ Tune re-replication (ensemble, write quorum, ack quorum) when scaling out bookies horizontally. ✓ Use schema, as far as possible! ✓ Tune configuration for size, resource, throughput, durability etc. May be optimized for backward compatibility. ✓ Disable auto-recovery of bookie before taking down. ✓ Balance recovery with incoming user traffic. ✓ Put events that require ordering on same topic.
  • 58.
    Q & Atime