Logs Distribuídos
uma introdução ao Apache Kafka
Pedro Arthur P. R. Duarte
pedroarthur.jedi@gmail.com
Conceitos básicos
2
Origens e Objetivos
Projeto criado pela equipe de arquitetura do LinkedIn. O objetivo do
projeto é prover uma insfraestrutura de mensagens de alto troughput
e fácil escalabilidade.
3
Mas... Como?
O servidores de mensagem (aka brokers) preocupam-se somente
com a manutenção do estado do cluster Kafka;
Para o mundo externo, os brokers provem somente a visão de
tópicos e offsets para que os clientes possam manter seu próprio
estado.
4
"Apache Kafka is
pusblish-subscribe
messaging rethought as a
distributed commit log"
Apache Kafka’s home page
http://kafka.apache.org
5
Log? Log tipo arquivos de log?
$ sudo tail -n 5 -f /var/log/syslog
Jun 4 19:37:07 inception systemd[1]: Time has been ch ...
Jun 4 19:37:07 inception systemd[1]: apt-daily.timer: ...
Jun 4 19:39:13 inception systemd[1]: Started Cleanup ...
Jun 4 19:40:01 inception CRON[5892]: (root) CMD (test ...
Jun 4 19:44:51 inception org.kde.kaccessibleapp[6056] ...
_
6
Log? Log tipo arquivos de log?
$ sudo tail -n 5 -f /var/log/syslog
Jun 4 19:37:07 inception systemd[1]: Time has been ch ...
Jun 4 19:37:07 inception systemd[1]: apt-daily.timer: ...
Jun 4 19:39:13 inception systemd[1]: Started Cleanup ...
Jun 4 19:40:01 inception CRON[5892]: (root) CMD (test ...
Jun 4 19:44:51 inception org.kde.kaccessibleapp[6056] ...
Jun 4 19:49:02 inception ntpd[711]: receive: Unexpect ...
_
6
Log? Log tipo arquivos de log?
$ sudo tail -n 5 -f /var/log/syslog
Jun 4 19:37:07 inception systemd[1]: Time has been ch ...
Jun 4 19:37:07 inception systemd[1]: apt-daily.timer: ...
Jun 4 19:39:13 inception systemd[1]: Started Cleanup ...
Jun 4 19:40:01 inception CRON[5892]: (root) CMD (test ...
Jun 4 19:44:51 inception org.kde.kaccessibleapp[6056] ...
Jun 4 19:49:02 inception ntpd[711]: receive: Unexpect ...
Jun 4 19:55:31 inception kernel: [11996.667253] hrtim ...
_
6
O que é um log?
time
First entry
...
(t) entry
(t+1) entry
(t+2) entry
(t+n) entry
...
next entry
7
Onde logs são utilizados?
Sempre que precismos armazenar o que aconteceu e quando
aconteceu...
8
Onde logs são utilizados?
Sistemas de Base de Dados
– PostgreSQL’s Write-Ahead Loggind (WAL)
– Oracles 10/11G Redo Log
Sistemas de controle de versão
– Git, The Stupid Content Tracker
– Subversion
9
Git’s log: um log, mas nem tanto
$ git log --oneline
498d410 Fixes message format and adds some logging
e09c955 Enhances the ContainerStub with a StoredObject stub
18fe603 Puts topic name in a configuration companion object
89d9c5d Separates consumer from producer configuration
a9f1a76 Adds a provisory container stub
d800dfe Creates consumer configuration
fa4da8e Removes trash
4808450 Adds kafka producer
333b14f Let there be light
10
Git’s reflog: um verdadeiro log
$ git reflog
498d410 HEAD@0: commit (amend): Fixes message format and adds some ...
9147167 HEAD@1: commit (amend): Fixes message format and adds some ...
97d8661 HEAD@2: commit: Fixes message format and adds some logging
e09c955 HEAD@3: commit: Enhances the ContainerStub with a StoredOb ...
18fe603 HEAD@4: rebase finished: returning to refs/heads/master
18fe603 HEAD@5: rebase: checkout refs/remotes/origin/master
d800dfe HEAD@6: rebase finished: returning to refs/heads/master
d800dfe HEAD@7: rebase: Creates consumer configuration
fa4da8e HEAD@8: rebase: checkout refs/remotes/origin/master
701b3e6 HEAD@9: commit: Creates consumer configuration
4808450 HEAD@10: commit: Adds kafka producer
333b14f HEAD@11: clone: from https://pedroarthur@bitbucket.org/ped ...
11
Integração de Dados
12
Mais sistemas, mais dados, mais problemas...
13
Utilizando um log como um perspective broker
14
Replicação de Máquinas de
Estado
15
Replicação Ativo/Ativo e Back-up Primário
16
A Single Source of Truth
17
Detalhes do Kafka
18
Visão geral do Kafka
Zookeeper
cluster
Producer
Producer
Producer
Consumer
Group A
Consumer
Group B
Consumer
Group C
Block Storage
19
"Apache Zookeeper is an ...
server which enables
highly reliable distributed
coordination"
Apache Zookeeper’s home page
20
Zookeeper
Do ponto de vista das interfaces de programação, o Zookeeper é um
"sistema de arquivos com garantias"; por exemplo
– Diferentes clientes tentam criar um arquivo, somente um deles
receberá uma notificação positiva;
– Diferentes clientes tentam alterar um arquivo, somente um deles
receberá a notificação de escrita e os demais devem retentar a
operação
21
Anatomia de um Tópico
Partition 0 @ broker_a
Partition 1 @ broker_b
Partition 2 @ broker_c
topic
Consumer 2
Consumer 0
Consumer 1
Cosumer Group A
Producer
Producer
Producer
22
Use case
23
Anonimizado xD
A/B data
aggregation
Source A
Database
Source B
Database
A/B Aggregation
Database
A/B Aggregation
Topic
Data Enrichment
Pipeline
Data Analysis
Engine
A Topic
B Topic
Source A Data
Transformation
Source B Data
Transformation
Flume
Master Node
Data Source A
Data Source B2
Data Source B1
Query API
Publish/Subscriber
Integration API
Analysis
Database
24
Programando para Kafka
25
Producer em Scala (>= 0.9.0.0)
val properties = new Properties () {
put("bootstrap.servers", "broker1 :9092 , broker2 :9092")
put("key.serializer ",
"org.apache.kafka.common. serialization . StringSerializer ")
put("value.serializer ",
"org.apache.kafka.common. serialization . StringSerializer ")
put("acks", "1")
put("retries", "0")
/* any colour you like */
}
val producer = new KafkaProducer [String , String ]( properties )
26
Producer em Scala (>= 0.9.0.0)
val message = new ProducerRecord [String , String ](
" the_destination_topic ", "entry key", "your data")
/* just try to send data */
val future: Future[ RecordMetadata ] = producer.send(message)
/* try to send data and call -me back after it */
val futureAndCallback : Future[ RecordMetadata ] =
producer.send(message ,
new Callback () {
def onCompletion (
metadata: RecordMetadata , exception: Exception) {
/* (metadata XOR exception) is non -null :( */
}
})
producer.close () /* release */
27
Consumer em Scala (>= 0.9.0.0)
val properties = new Properties () {
put("bootstrap.servers", "broker1 :9092 , broker2 :9092")
put("group.id", " the_group_id ")
put("enable.auto.commit", "true")
put("auto.commit.interval.ms", "1000")
put("key. deserializer ",
"org.apache.kafka.common. serialization . StringDeserializer ")
put("value. deserializer ",
"org.apache.kafka.common. serialization . StringDeserializer ")
/* any colour you like */
}
val consumer = new KafkaConsumer [String , String ]( properties )
28
Consumer em Scala (>= 0.9.0.0)
/* subscribe to as many topics as you like */
consumer.subscribe(Arrays.asList(" the_destination_topic "))
while (true) {
val records: /* argument is the timeout in millis */
ConsumerRecords [String , String] = consumer.poll (100)
records foreach {
record: ConsumerRecord [String , String] =>
log.info("${record.topic ()} is at ${record.offset ()}")
}
}
29
Referência Básica
30
Jay Krep’s I Heart Logs
"Why a book about logs? Thats easy: the humble log is an
abstraction that lies at the heart of many systems, from NoSQL
databases to cryptocurrencies. Even though most engineers dont
think much about them, this short book shows you why logs are
worthy of your attention ..."
Release Date: October 2014
31
Demonstação
32
Kafkabox: sincronização de arquivos
Pequeno projeto prova de conceito utilizando Kafka, OpenStack
Swift e inotify; baixe-o com o comando a seguir:
$ git clone http://bitbucket.com/pedroarthur/kafkabox/
33
Implementação
– Inotify: escute todas as mudanças do diretório;
– Para cada mudança,
– Envie mudança para o Swift via HTTP;
– Escreva a mudança ocorrida no tópico Kafka.
34
Logs Distribuídos
uma introdução ao Apache Kafka
Pedro Arthur P. R. Duarte
pedroarthur.jedi@gmail.com

TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distribuídos

  • 1.
    Logs Distribuídos uma introduçãoao Apache Kafka Pedro Arthur P. R. Duarte pedroarthur.jedi@gmail.com
  • 2.
  • 3.
    Origens e Objetivos Projetocriado pela equipe de arquitetura do LinkedIn. O objetivo do projeto é prover uma insfraestrutura de mensagens de alto troughput e fácil escalabilidade. 3
  • 4.
    Mas... Como? O servidoresde mensagem (aka brokers) preocupam-se somente com a manutenção do estado do cluster Kafka; Para o mundo externo, os brokers provem somente a visão de tópicos e offsets para que os clientes possam manter seu próprio estado. 4
  • 5.
    "Apache Kafka is pusblish-subscribe messagingrethought as a distributed commit log" Apache Kafka’s home page http://kafka.apache.org 5
  • 6.
    Log? Log tipoarquivos de log? $ sudo tail -n 5 -f /var/log/syslog Jun 4 19:37:07 inception systemd[1]: Time has been ch ... Jun 4 19:37:07 inception systemd[1]: apt-daily.timer: ... Jun 4 19:39:13 inception systemd[1]: Started Cleanup ... Jun 4 19:40:01 inception CRON[5892]: (root) CMD (test ... Jun 4 19:44:51 inception org.kde.kaccessibleapp[6056] ... _ 6
  • 7.
    Log? Log tipoarquivos de log? $ sudo tail -n 5 -f /var/log/syslog Jun 4 19:37:07 inception systemd[1]: Time has been ch ... Jun 4 19:37:07 inception systemd[1]: apt-daily.timer: ... Jun 4 19:39:13 inception systemd[1]: Started Cleanup ... Jun 4 19:40:01 inception CRON[5892]: (root) CMD (test ... Jun 4 19:44:51 inception org.kde.kaccessibleapp[6056] ... Jun 4 19:49:02 inception ntpd[711]: receive: Unexpect ... _ 6
  • 8.
    Log? Log tipoarquivos de log? $ sudo tail -n 5 -f /var/log/syslog Jun 4 19:37:07 inception systemd[1]: Time has been ch ... Jun 4 19:37:07 inception systemd[1]: apt-daily.timer: ... Jun 4 19:39:13 inception systemd[1]: Started Cleanup ... Jun 4 19:40:01 inception CRON[5892]: (root) CMD (test ... Jun 4 19:44:51 inception org.kde.kaccessibleapp[6056] ... Jun 4 19:49:02 inception ntpd[711]: receive: Unexpect ... Jun 4 19:55:31 inception kernel: [11996.667253] hrtim ... _ 6
  • 9.
    O que éum log? time First entry ... (t) entry (t+1) entry (t+2) entry (t+n) entry ... next entry 7
  • 10.
    Onde logs sãoutilizados? Sempre que precismos armazenar o que aconteceu e quando aconteceu... 8
  • 11.
    Onde logs sãoutilizados? Sistemas de Base de Dados – PostgreSQL’s Write-Ahead Loggind (WAL) – Oracles 10/11G Redo Log Sistemas de controle de versão – Git, The Stupid Content Tracker – Subversion 9
  • 12.
    Git’s log: umlog, mas nem tanto $ git log --oneline 498d410 Fixes message format and adds some logging e09c955 Enhances the ContainerStub with a StoredObject stub 18fe603 Puts topic name in a configuration companion object 89d9c5d Separates consumer from producer configuration a9f1a76 Adds a provisory container stub d800dfe Creates consumer configuration fa4da8e Removes trash 4808450 Adds kafka producer 333b14f Let there be light 10
  • 13.
    Git’s reflog: umverdadeiro log $ git reflog 498d410 HEAD@0: commit (amend): Fixes message format and adds some ... 9147167 HEAD@1: commit (amend): Fixes message format and adds some ... 97d8661 HEAD@2: commit: Fixes message format and adds some logging e09c955 HEAD@3: commit: Enhances the ContainerStub with a StoredOb ... 18fe603 HEAD@4: rebase finished: returning to refs/heads/master 18fe603 HEAD@5: rebase: checkout refs/remotes/origin/master d800dfe HEAD@6: rebase finished: returning to refs/heads/master d800dfe HEAD@7: rebase: Creates consumer configuration fa4da8e HEAD@8: rebase: checkout refs/remotes/origin/master 701b3e6 HEAD@9: commit: Creates consumer configuration 4808450 HEAD@10: commit: Adds kafka producer 333b14f HEAD@11: clone: from https://pedroarthur@bitbucket.org/ped ... 11
  • 14.
  • 15.
    Mais sistemas, maisdados, mais problemas... 13
  • 16.
    Utilizando um logcomo um perspective broker 14
  • 17.
  • 18.
    Replicação Ativo/Ativo eBack-up Primário 16
  • 19.
    A Single Sourceof Truth 17
  • 20.
  • 21.
    Visão geral doKafka Zookeeper cluster Producer Producer Producer Consumer Group A Consumer Group B Consumer Group C Block Storage 19
  • 22.
    "Apache Zookeeper isan ... server which enables highly reliable distributed coordination" Apache Zookeeper’s home page 20
  • 23.
    Zookeeper Do ponto devista das interfaces de programação, o Zookeeper é um "sistema de arquivos com garantias"; por exemplo – Diferentes clientes tentam criar um arquivo, somente um deles receberá uma notificação positiva; – Diferentes clientes tentam alterar um arquivo, somente um deles receberá a notificação de escrita e os demais devem retentar a operação 21
  • 24.
    Anatomia de umTópico Partition 0 @ broker_a Partition 1 @ broker_b Partition 2 @ broker_c topic Consumer 2 Consumer 0 Consumer 1 Cosumer Group A Producer Producer Producer 22
  • 25.
  • 26.
    Anonimizado xD A/B data aggregation SourceA Database Source B Database A/B Aggregation Database A/B Aggregation Topic Data Enrichment Pipeline Data Analysis Engine A Topic B Topic Source A Data Transformation Source B Data Transformation Flume Master Node Data Source A Data Source B2 Data Source B1 Query API Publish/Subscriber Integration API Analysis Database 24
  • 27.
  • 28.
    Producer em Scala(>= 0.9.0.0) val properties = new Properties () { put("bootstrap.servers", "broker1 :9092 , broker2 :9092") put("key.serializer ", "org.apache.kafka.common. serialization . StringSerializer ") put("value.serializer ", "org.apache.kafka.common. serialization . StringSerializer ") put("acks", "1") put("retries", "0") /* any colour you like */ } val producer = new KafkaProducer [String , String ]( properties ) 26
  • 29.
    Producer em Scala(>= 0.9.0.0) val message = new ProducerRecord [String , String ]( " the_destination_topic ", "entry key", "your data") /* just try to send data */ val future: Future[ RecordMetadata ] = producer.send(message) /* try to send data and call -me back after it */ val futureAndCallback : Future[ RecordMetadata ] = producer.send(message , new Callback () { def onCompletion ( metadata: RecordMetadata , exception: Exception) { /* (metadata XOR exception) is non -null :( */ } }) producer.close () /* release */ 27
  • 30.
    Consumer em Scala(>= 0.9.0.0) val properties = new Properties () { put("bootstrap.servers", "broker1 :9092 , broker2 :9092") put("group.id", " the_group_id ") put("enable.auto.commit", "true") put("auto.commit.interval.ms", "1000") put("key. deserializer ", "org.apache.kafka.common. serialization . StringDeserializer ") put("value. deserializer ", "org.apache.kafka.common. serialization . StringDeserializer ") /* any colour you like */ } val consumer = new KafkaConsumer [String , String ]( properties ) 28
  • 31.
    Consumer em Scala(>= 0.9.0.0) /* subscribe to as many topics as you like */ consumer.subscribe(Arrays.asList(" the_destination_topic ")) while (true) { val records: /* argument is the timeout in millis */ ConsumerRecords [String , String] = consumer.poll (100) records foreach { record: ConsumerRecord [String , String] => log.info("${record.topic ()} is at ${record.offset ()}") } } 29
  • 32.
  • 33.
    Jay Krep’s IHeart Logs "Why a book about logs? Thats easy: the humble log is an abstraction that lies at the heart of many systems, from NoSQL databases to cryptocurrencies. Even though most engineers dont think much about them, this short book shows you why logs are worthy of your attention ..." Release Date: October 2014 31
  • 34.
  • 35.
    Kafkabox: sincronização dearquivos Pequeno projeto prova de conceito utilizando Kafka, OpenStack Swift e inotify; baixe-o com o comando a seguir: $ git clone http://bitbucket.com/pedroarthur/kafkabox/ 33
  • 36.
    Implementação – Inotify: escutetodas as mudanças do diretório; – Para cada mudança, – Envie mudança para o Swift via HTTP; – Escreva a mudança ocorrida no tópico Kafka. 34
  • 37.
    Logs Distribuídos uma introduçãoao Apache Kafka Pedro Arthur P. R. Duarte pedroarthur.jedi@gmail.com