Tutorial Kafka-Storm

Sistemas distribuidos escalables
Tutorial
Miguel Cárcamo Vásquez
Daniel Wladdimiro Cottet
Profesores: Erika Rosas Olivos
Nicolás Hidalgo Castillo
Departamento de Ingenier´ıa Informática
Universidad de Santiago de Chile
November, 2014
M. Cárcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 1 / 31

Kafka
What is Kafka?
Apache Kafka is publish-subscribe messaging rethought as a distributed
commit log.
• Fast
• Hundreds of megabytes
• Scalable
• Elastically
• Transparently
• Durable
• Persisted on disk

Kafka
Architecture
It is a distributed, partitioned, replicated commit log service. It provides
the functionality of a messaging system, but with a unique design.

Kafka
Architecture
A two server Kafka cluster hosting four partitions (P0-P3) with two
consumer groups. Consumer group A has two consumer instances and
group B has four.

Kafka
Zookeper
zookeeperServer.sh
bin/zookeeper-server-start.sh ../conﬁg/zookeeper.properties
Conﬁguration
• dataDir
• clientPort
• maxClientCnxns

Kafka
Kafka Server
kafkaServer.sh
bin/kafka-server-start.sh ../config/server.properties
Mandatory configuration
• broker.id
• log.dirs
• zookeeper.connect
Optional configuration
• Log basics
• num.partition
• Log Retention Policy
• log.retention.hours
• log.flush.interval.messages
• log.flush.interval.ms

Kafka
Create Topics
createTopics.sh
bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1
–partitions 1 –topic $1
Parameters
• replication-factor
• partitions
• topic
Configuration –config
• max.message.bytes
• index.interval.bytes
• flush.messages
• flush.ms

Kafka
Check Topics
checkTopics.sh
bin/kafka-topics.sh –list –zookeeper localhost:2181

Kafka
Producer
createProducer.sh
bin/kafka-console-producer.sh –broker-list localhost:9092 –topic $1
• metadata.broker.list
• request.required.acks
• producer.type
• serializer.class
• compression.codec
• request.timeout.ms

Kafka
Consumer
createConsumer.sh
bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic $1
–from-beginning
• group.id
• zookeeper.connect
• fetch.message.max.bytes
• consumer.id

Kafka
Clients
Producer Daemon Storm
Python Scala DSL
Go (AKA golang) HTTP REST
C JRuby
C++ Perl
.NET Clojure
Ruby Node.js

Kafka
Multi-Broker
createMultiBroker.sh
cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties
config/server-1.properties:
broker.id=1
port=9093
log.dir=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
port=9094
log.dir=/tmp/kafka-logs-2

Kafka
Create Kafka Server
Kafka Server 1
../bin/kafka-server-start.sh conﬁg/server-1.properties &
Kafka Server 2
../bin/kafka-server-start.sh conﬁg/server-2.properties &

Topic with replication
Create new topic
../bin/kafka-topics.sh –create –zookeeper localhost:2181
–replication-factor 3 –partitions 1 –topic my-replicated-topic
Show topic
../bin/kafka-topics.sh –describe –zookeeper localhost:2181 –topic
my-replicated-topic

Fault Tolerance
Kill replication
ps -ef — grep server-1.properties
kill -9 # pid

Storm
What is Storm?
• Computation platform for stream data processing
• Fault Tolerant
• Scalable
• Distributed
• Reliable
• Learn, code and run

Architecture
Fig. 1: Storm Cluster

Spouts & Bolts
Fig. 2: Spouts & Bolts

Physical & Logical
Fig. 3: Physical & Logical Architecture

Before coding
• Install maven or graddle
• Install Eclipse (only if you want to)

Coding a Spout
Structure
• import libraries
• public class ”SpoutName” extends BaseRichSpout
• class variables
• public void open(Map conf, TopologyContext topologyContext,
SpoutOutCollector collector)
• public void nextTuple()
• public void declareOutputFields(OutputFields declarer)
• Your methods

Coding a Bolt
Structure
• public class ”BoltName” extends BaseRichBolt
• class variables
• public ”BoltName”() (Constructor)
• public void prepare(Map map, TopologyContext topologyContext,
OutputCollector collector)
• public void execute(Tuple input)
• public void declareOutputFields(OutputFields declarer)
• Your methods

Coding a Topology
Structure
• public class Topology
• class variables
• public static void main(String[] args)
• Config config = new Config()
• TopologyBuilder b = new TopologyBuilder()
• b.setSpout(”SpoutName”, new ”SpoutName”)
• b.setBolt(”BoltName”, new
”BoltName”.shuffleGroping(”SpoutName”))
• final LocalCluster cluster = new LocalCluster()
• cluster.submitTopology(”TopologyName”, config, b.createTopology())

Compile & Run
• Download a Storm release , unpack it, and put the unpacked bin/
directory on your PATH.
• cd myapp
• mvn package
• storm jar target/my-app-1.0-SNAPSHOT.jar
com.mycompany.app.App

Grouping
Fig. 4: Groupings

Grouping
• Shuffle: Stream tuples are randomly distributed such that each bolt is
guaranteed to get an equal number of tuples.
• Fields: Stream tuples are partitioned by the fields specified in the
grouping.
• All grouping: Stream tuples are replicated across all the bolts.
• Global grouping: entire stream goes to a single bolt.
• Direct Grouping: the source decides which component will receive the
tuple.

Project Topology
Fig. 5: Project Topology

Web Services
Node.js
Install Node.js
https://github.com/joyent/node/archive/master.zip
./conﬁgure
make
make install
Run web services
node server.js

Kafka
Server Start
Stages
1 zookeeperServer.sh
2 kafkaServer.sh
3 createTopics.sh voteLog

Web Services
Connection Kafka
Install API Kafka-Python
pip install ./kafka-python
runKafkaLogs.sh
./tail2kafka/tail2kafka -l ../logs/vote-info.log -t voteLog -s localhost -p
9092 -d 5
Final stage
createProducer.sh voteLog

Questions?

Tutorial Kafka-Storm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Tutorial Kafka-Storm

Similar to Tutorial Kafka-Storm (20)

Recently uploaded

Recently uploaded (20)

Tutorial Kafka-Storm