SlideShare a Scribd company logo
Kafka
A little introduction
Pub-Sub Messaging System
Distributed
Performance
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000
                                             Sequential disk read
                      10,000
                                             faster than random
                       1,000

                        100
                                                memory read
                         10

                          1           Disk          SSD                   Memory


                               Random access
                               Sequential Access          Source: http://queue.acm.org/detail.cfm?id=1563874
Persistent
Length    Magic Value Checksum   Payload


4 bytes     1 byte     4 bytes   n bytes
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Useful Things


• http://incubator.apache.org/kafka/
• https://github.com/pingles/clj-kafka

More Related Content

What's hot

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent os
Manusia Tenan
 
Iscsi
IscsiIscsi
Iscsi
Md Shihab
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservices
Salo Shp
 
ubunturef
ubunturefubunturef
ubunturef
wensheng wei
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespaces
Kiratech
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
Tom Croucher
 
Disk suit 4 setup and installation
Disk suit 4 setup and installationDisk suit 4 setup and installation
Disk suit 4 setup and installation
ppratish
 
FreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSFreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPS
Ryo ONODERA
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.il
Amir Langer
 
Unixtoolbox
UnixtoolboxUnixtoolbox
Unixtoolbox
LILIANA FERNANDEZ
 

What's hot (10)

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent os
 
Iscsi
IscsiIscsi
Iscsi
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservices
 
ubunturef
ubunturefubunturef
ubunturef
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespaces
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
 
Disk suit 4 setup and installation
Disk suit 4 setup and installationDisk suit 4 setup and installation
Disk suit 4 setup and installation
 
FreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSFreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPS
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.il
 
Unixtoolbox
UnixtoolboxUnixtoolbox
Unixtoolbox
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

Kafka - A little introduction

  • 2.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9.
  • 11. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 12. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 13. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 14. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 Sequential disk read 10,000 faster than random 1,000 100 memory read 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 16.
  • 17.
  • 18.
  • 19. Length Magic Value Checksum Payload 4 bytes 1 byte 4 bytes n bytes
  • 20.
  • 21.
  • 22.
  • 23. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 24. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 25.
  • 26. Useful Things • http://incubator.apache.org/kafka/ • https://github.com/pingles/clj-kafka

Editor's Notes

  1. \n
  2. built by linkedin to process + store high-volume activity stream data, but its really a general use messaging system...\n\n
  3. at it’s heart, its a pub-sub messaging system...\n
  4. It starts with a broker\n
  5. Publishers connect to the broker\n
  6. and send their messages, \n
  7. So we connect some consumers and they can pull messages.\n\nnote when they connect, we’ll receive all messages for a topic, not just since they’ve connected more on that later...\n
  8. but its also distributed, which is to say...\n
  9. we can have multiple brokers in multiple places and aggregate together...\n\ninternally we can also partition within topics to allow parallel consumption, but thats for another talk...\n
  10. before we get into what makes it particularly different (persistence), its useful to understand some of the engineering decisions behind how it works.\n\nperformance is interesting because the behaviour of disks / memory has informed the way kafka has been built to embrace disk persistence\n
  11. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  12. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  13. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  14. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  15. \n
  16. it starts with a topic, a text description for the messages contained within. we use it to describe how to deserialize the message bytes\n
  17. so we send a message to the topic, what happens?\n
  18. kafka creates a file\nand it persists the message, which is to say it hands it off to the O/S to write\n\nfiles are just sets of bytes, nothing clever\n\ninternally it abstracts the collection of message bytes into a messageset, which is then backed by a file\n\nso what does each message look like...\n
  19. so, our message length is n - 9 bytes\n\nwith a 91 byte payload we have a 100 byte message.\n\nwhich means our next message would start at offset 100\n
  20. and we can see our offsets at the bottom...\n
  21. so we have the offsets which lets us send all messages to consumers, not just those that were sent after they connected... \n
  22. up to the consumer to remember what they’ve consumed, but means you can re-consume an entire set of messages easily, which is very useful when integrating with long-term storage like HDFS...\n\nquick look at the way it works\n
  23. \nour input to the hadoop job is a token file that specifies the offset to read from, the topic etc.\n\nhaving read the token, the mapper connects, and consumes messages from a given offset\n\nthe mapper outputs 2 sets of data- the mapped output, such as the message payloads, and an updated token file with the last read offset.\n\nthis is the key, successful completion of the job results in new metadata for the next run and the output data\n\nmeans that if the job fails we can re-run and it’ll consume from the last consumed offset\n
  24. the newly created output becomes the next input\n
  25. and this is why kafka is an interesting messaging system\n\nsuitable for batch and realtime\n
  26. \n