SlideShare a Scribd company logo
1 of 19
Download to read offline
How it’s similar to the databases you know and love, and how
it’s not.
What is Apache Kafka?
Kenny Gorman
Founder and CEO
www.eventador.io
www.kennygorman.com
@kennygorman
I have done database foo for my whole career, going on 25
years.
Sybase, Oracle DBA, PostgreSQL DBA, MySQL aficionado,
MongoDB early adopter, founded two companies based on
data technologies
Broke lots of stuff, lost data before, recovered said data,
stayed up many nights, on-call shift horror stories
Apache Kafka is really cool, as fellow database nerds you
will appreciate it.
I am a database nerd
‘02 had hair ^
Now… lol
Kafka
Comparison with the databases you are familiar with
Apache Kafka is an open-source stream processing platform pub/sub message
platform developed by the Apache Software Foundation written in Scala and Java.
The project aims blah blah blah pub/sub message queue architected as a
distributed transaction log,"[3]
Blah blah blah to process streaming data. Blah blah
blah.
The design is heavily influenced by transaction logs.[4]
Kafka
High Performance Streaming Data
Persistent
Distributed
Fault Tolerant
K.I.S.S.
Many Modern Use Cases
Why Kafka?
- It’s a stream of data. A boundless stream of data.
Pub/Sub Messaging Attributes
Image: https://kafka.apache.org
{“temperature”: 29}
{“temperature”: 29}
{“temperature”: 30}
{“temperature”: 29}
{“temperature”: 29}
{“temperature”: 30}
{“temperature”: 29}
{“temperature”: 29}
Logical Data Organization
PostgreSQL MongoDB Kafka
Database Database Topic Files
Fixed Schema Non Fixed Schema Key/Value Message
Table Collection Topic
Row Document Message
Column Name/Value Pairs
Shard Partition
Storage Architecture
PostgreSQL MongoDB Kafka
Stores data in files on disk Stores data in files on disk Stores data in files on disk
Has journal for recovery (WAL) Has journal for recovery (Oplog) Is a commit log
FS + Buffer Cache FS for caching * FS for caching
Random Access, Indexing Random Access, Indexing Sequential access
- Core to design of Kafka
- Partitioning
- Consumers and Consumer Groups
- Offsets ~= High Water Mark
Topics
Image: https://kafka.apache.org
- Kafka topics are glorified distributed write ahead logs
- Append only
- k/v pairs where the key decides the partition it lives in
- Sendfile system call optimization
- Client controlled routing
Performance
- Topics are replicated among any number of servers (brokers)
- Topics can be configured individually
- Topic partitions are the unit of replication
The unit of replication is the topic partition. Under non-failure conditions, each partition in Kafka has a single
leader and zero or more followers.
Availability and Fault Tolerance
MongoDB Majority Consensus (Raft-like in 3.2)
Kafka ISR set vote, stored in ZK
Application Programming Interfaces
PostgreSQL MongoDB Kafka
Insert sql = “insert into mytable ..”
db.execute(sql)
db.commit()
db.mytable.save({“baz”:1}) producer.send(“mytopic”, “{‘baz’:1}”)
Query sql = “select * from …”
cursor = db.execute(sql)
for record in cursor:
print record
db.mytable.find({“baz”:1}) consumer = get_from_topic(“mytopic”)
for message in consumer:
print message
Update sql = “update mytable set ..”
db.execute(sql)
db.commit()
db.mytable.update({“baz”:1,
“baz”:2})
Delete sql = “delete from mytable ..”
db.execute(sql)
db.commit()
db.mytable.remove({“baz”:1})
conn = database_connect()
cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cur.execute(
"""
SELECT a.lastname, a.firstname, a.email,
a.userid, a.password, a.username, b.orgname
FROM users a, orgs b
WHERE a.orgid = b.orgid
AND a.orgid = %(orgid)s
""", {"orgid": orgid}
)
results = cur.fetchall()
for result in results:
print result
Typical RDBMS
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='localhost:1234')
for _ in range(100):
producer.send('foobar', b'some_message_bytes')
Publishing
- Flush frequency/batch
- Partition keys
Subscribing (Consume)
from kafka import KafkaConsumer
consumer = KafkaConsumer(bootstrap_servers='localhost:9092')
consumer.subscribe('my-topic')
for msg in consumer:
print (msg)
try:
msg_count = 0
while running:
msg = consumer.poll(timeout=1.0)
if msg is None: continue
msg_process(msg) # application-specific processing
msg_count += 1
if msg_count % MIN_COMMIT_COUNT == 0:
consumer.commit(async=False)
finally:
# Shut down consumer
consumer.close()
Subscribing (Consume)
- Continuous ‘cursor’
- Offset management
- Partition assignment
- No simple command console like psql or mongo shell
- BOFJCiS
- Kafkacat, jq
- Shell scripts, mirrormaker, etc.
- PrestoDB
Tooling
PostgreSQL:
- Shared Buffers
- WAL/recovery
MongoDB (mmapv2)
- directoryPerDB
- FStuning
Settings and Tunables
Kafka:
- Xmx ~ 90% memory
- log.retention.hours
https://kafka.apache.org/documentation
We are hiring!
www.eventador.io
@kennygorman
Contact

More Related Content

What's hot

Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debeziumKasun Don
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Camuel Gilyadov
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbaseiammutex
 
Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQLPhilipp Fehre
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectDatabricks
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidVenu Ryali
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using KafkaVenu Ryali
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...Michael Stack
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesPhil Peace
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connectYi Zhang
 

What's hot (20)

Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQL
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connect
 

Similar to What is apache Kafka?

Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaMark Bittmann
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast DataAkka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast DataLightbend
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuideInexture Solutions
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learningfelixcss
 
SSR: Structured Streaming on R for Machine Learning with Felix Cheung
SSR: Structured Streaming on R for Machine Learning with Felix CheungSSR: Structured Streaming on R for Machine Learning with Felix Cheung
SSR: Structured Streaming on R for Machine Learning with Felix CheungDatabricks
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Shirshanka Das
 

Similar to What is apache Kafka? (20)

Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast DataAkka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
SSR: Structured Streaming on R for Machine Learning with Felix Cheung
SSR: Structured Streaming on R for Machine Learning with Felix CheungSSR: Structured Streaming on R for Machine Learning with Felix Cheung
SSR: Structured Streaming on R for Machine Learning with Felix Cheung
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

What is apache Kafka?

  • 1. How it’s similar to the databases you know and love, and how it’s not. What is Apache Kafka? Kenny Gorman Founder and CEO www.eventador.io www.kennygorman.com @kennygorman
  • 2. I have done database foo for my whole career, going on 25 years. Sybase, Oracle DBA, PostgreSQL DBA, MySQL aficionado, MongoDB early adopter, founded two companies based on data technologies Broke lots of stuff, lost data before, recovered said data, stayed up many nights, on-call shift horror stories Apache Kafka is really cool, as fellow database nerds you will appreciate it. I am a database nerd ‘02 had hair ^ Now… lol
  • 3. Kafka Comparison with the databases you are familiar with
  • 4. Apache Kafka is an open-source stream processing platform pub/sub message platform developed by the Apache Software Foundation written in Scala and Java. The project aims blah blah blah pub/sub message queue architected as a distributed transaction log,"[3] Blah blah blah to process streaming data. Blah blah blah. The design is heavily influenced by transaction logs.[4] Kafka
  • 5. High Performance Streaming Data Persistent Distributed Fault Tolerant K.I.S.S. Many Modern Use Cases Why Kafka?
  • 6. - It’s a stream of data. A boundless stream of data. Pub/Sub Messaging Attributes Image: https://kafka.apache.org {“temperature”: 29} {“temperature”: 29} {“temperature”: 30} {“temperature”: 29} {“temperature”: 29} {“temperature”: 30} {“temperature”: 29} {“temperature”: 29}
  • 7. Logical Data Organization PostgreSQL MongoDB Kafka Database Database Topic Files Fixed Schema Non Fixed Schema Key/Value Message Table Collection Topic Row Document Message Column Name/Value Pairs Shard Partition
  • 8. Storage Architecture PostgreSQL MongoDB Kafka Stores data in files on disk Stores data in files on disk Stores data in files on disk Has journal for recovery (WAL) Has journal for recovery (Oplog) Is a commit log FS + Buffer Cache FS for caching * FS for caching Random Access, Indexing Random Access, Indexing Sequential access
  • 9. - Core to design of Kafka - Partitioning - Consumers and Consumer Groups - Offsets ~= High Water Mark Topics Image: https://kafka.apache.org
  • 10. - Kafka topics are glorified distributed write ahead logs - Append only - k/v pairs where the key decides the partition it lives in - Sendfile system call optimization - Client controlled routing Performance
  • 11. - Topics are replicated among any number of servers (brokers) - Topics can be configured individually - Topic partitions are the unit of replication The unit of replication is the topic partition. Under non-failure conditions, each partition in Kafka has a single leader and zero or more followers. Availability and Fault Tolerance MongoDB Majority Consensus (Raft-like in 3.2) Kafka ISR set vote, stored in ZK
  • 12. Application Programming Interfaces PostgreSQL MongoDB Kafka Insert sql = “insert into mytable ..” db.execute(sql) db.commit() db.mytable.save({“baz”:1}) producer.send(“mytopic”, “{‘baz’:1}”) Query sql = “select * from …” cursor = db.execute(sql) for record in cursor: print record db.mytable.find({“baz”:1}) consumer = get_from_topic(“mytopic”) for message in consumer: print message Update sql = “update mytable set ..” db.execute(sql) db.commit() db.mytable.update({“baz”:1, “baz”:2}) Delete sql = “delete from mytable ..” db.execute(sql) db.commit() db.mytable.remove({“baz”:1})
  • 13. conn = database_connect() cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) cur.execute( """ SELECT a.lastname, a.firstname, a.email, a.userid, a.password, a.username, b.orgname FROM users a, orgs b WHERE a.orgid = b.orgid AND a.orgid = %(orgid)s """, {"orgid": orgid} ) results = cur.fetchall() for result in results: print result Typical RDBMS
  • 14. from kafka import KafkaProducer producer = KafkaProducer(bootstrap_servers='localhost:1234') for _ in range(100): producer.send('foobar', b'some_message_bytes') Publishing - Flush frequency/batch - Partition keys
  • 15. Subscribing (Consume) from kafka import KafkaConsumer consumer = KafkaConsumer(bootstrap_servers='localhost:9092') consumer.subscribe('my-topic') for msg in consumer: print (msg)
  • 16. try: msg_count = 0 while running: msg = consumer.poll(timeout=1.0) if msg is None: continue msg_process(msg) # application-specific processing msg_count += 1 if msg_count % MIN_COMMIT_COUNT == 0: consumer.commit(async=False) finally: # Shut down consumer consumer.close() Subscribing (Consume) - Continuous ‘cursor’ - Offset management - Partition assignment
  • 17. - No simple command console like psql or mongo shell - BOFJCiS - Kafkacat, jq - Shell scripts, mirrormaker, etc. - PrestoDB Tooling
  • 18. PostgreSQL: - Shared Buffers - WAL/recovery MongoDB (mmapv2) - directoryPerDB - FStuning Settings and Tunables Kafka: - Xmx ~ 90% memory - log.retention.hours