The document discusses Apache Kafka, a distributed publish-subscribe messaging system developed at LinkedIn. It describes how LinkedIn uses Kafka to integrate large amounts of user activity and other data across its products. Key aspects of Kafka's design allow it to scale to LinkedIn's high throughput requirements, including using a log structure and data partitioning for parallelism. LinkedIn relies on Kafka to transport over 500 billion messages per day between systems and for real-time analytics.
#21 Now you maybe wondering why it works so well? For example, why it can be both highly durable by persisting data to disks while still maintaining high throughput?
#24 Topic = message stream
Topic has partitions, partitions are distributed to brokers
#46 Non-Java / Scala
C / C++ / .NET
Go
Clojure
Ruby
Node.js
PHP
Python
Erlang
HTTP REST
Command line
etc ..
https://cwiki.apache.org/confluence/display/KAFKA/Clients
Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.
C - High performance C library with full protocol support
C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset.
Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.
Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2.
Clojure - Clojure DSL for the Kafka API
JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation
stdin & stdout
https://cwiki.apache.org/confluence/display/KAFKA/Clients
#47 Non-Java / Scala
C / C++ / .NET
Go
Clojure
Ruby
Node.js
PHP
Python
Erlang
HTTP REST
Command line
etc ..