SlideShare a Scribd company logo
Streaming Twitter data using Kafka
In this blog we will be discussing about how to stream twitter data using kafka. Before going
through this blog, we hope that you have pre installed kafka and zookeeper services in your system.
Please go through this blog to Install kafka
Please go through this blog to Install zookeeper
Twitter provides Hosebird client (hbc), a robust Java HTTP library for consuming Twitter’s Streaming
API.
Hosebird is the server implementation of the Twitter Streaming API. The Streaming API allows clients to
receive Tweets in near real-time. Various resources allow filtered, sampled or full access to some or all
Tweets. Every Twitter account has access to the Streaming API and any developer can build
applications today. Hosebird also powers the recently announced User Streams feature that streams all
events related to a given user to drive desktop Twitter clients.
We will start kafka and zookeeper services
Start zookeeper server by moving into the bin folder of zookeeper installed directory by using the
command
zkServer.sh start
Start kafka server by moving into the bin folder of kafka installed directory by using the command
./kafka-server-start.sh ../config/server.properties
In kafka we use two classes called Producers and Consumers.
The Producer class is used to create new messages for a specific Topic and optional Partition.
Twitter Producer Class
package kafka;
import java.util.*;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import com.google.common.collect.Lists;
import com.twitter.hbc.ClientBuilder;
import com.twitter.hbc.core.Client;
import com.twitter.hbc.core.Constants;
import com.twitter.hbc.core.endpoint.StatusesFilterEndpoint;
import com.twitter.hbc.core.processor.StringDelimitedProcessor;
import com.twitter.hbc.httpclient.auth.Authentication;
import com.twitter.hbc.httpclient.auth.OAuth1;
public class TwitterKafkaProducer {
private static final String topic = "hadoop";
public static void run() throws InterruptedException {
Properties properties = new Properties();
properties.put("metadata.broker.list", "localhost:9092");
properties.put("serializer.class", "kafka.serializer.StringEncoder");
properties.put("client.id","camus");
ProducerConfig producerConfig = new ProducerConfig(properties);
kafka.javaapi.producer.Producer<String, String> producer = new
kafka.javaapi.producer.Producer<String, String>(
producerConfig);
BlockingQueue<String> queue = new LinkedBlockingQueue<String>(100000);
StatusesFilterEndpoint endpoint = new StatusesFilterEndpoint();
endpoint.trackTerms(Lists.newArrayList("twitterapi",
"#AAPSweep"));
String consumerKey= TwitterSourceConstant.CONSUMER_KEY_KEY;
String consumerSecret=TwitterSourceConstant.CONSUMER_SECRET_KEY;
String accessToken=TwitterSourceConstant.ACCESS_TOKEN_KEY;
String accessTokenSecret=TwitterSourceConstant.ACCESS_TOKEN_SECRET_KEY;
Authentication auth = new OAuth1(consumerKey, consumerSecret,
accessToken,
accessTokenSecret);
Client client = new ClientBuilder().hosts(Constants.STREAM_HOST)
.endpoint(endpoint).authentication(auth)
.processor(new StringDelimitedProcessor(queue)).build();
client.connect();
for (int msgRead = 0; msgRead < 1000; msgRead++) {
KeyedMessage<String, String> message = null;
try {
message = new KeyedMessage<String, String>(topic, queue.take());
} catch (InterruptedException e) {
//e.printStackTrace();
System.out.println("Stream ended");
}
producer.send(message);
}
producer.close();
client.stop();
}
public static void main(String[] args) {
try {
TwitterKafkaProducer.run();
} catch (InterruptedException e) {
System.out.println(e);
}
}
}
Twitter authorization is done through
consumerKey,consumerSecret,accessToken,accessTokenSecret.
So we are passing them through a class called TwitterSourceConstant
public class TwitterSourceConstant {
public static final String CONSUMER_KEY_KEY = "xxxxxxxxxxxxxxxxxxxxxxxx";
public static final String CONSUMER_SECRET_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxx";
public static final String ACCESS_TOKEN_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxx";
public static final String ACCESS_TOKEN_SECRET_KEY = "xxxxxxxxxxxxxxxxxxxxxx";
In the private static final String topic = "hadoop"; of producer class we will pass our
topic to stream the particular data from twitter.
So first we need to start this producer class to start streaming data from twitter.
Now we will write a consumer class to print the streamed tweets. The consumer class is as follows
package kafka;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
public class KafkaConsumer {
private ConsumerConnector consumerConnector = null;
private final String topic = "twitter-topic1";
public void initialize() {
Properties props = new Properties();
props.put("zookeeper.connect", "localhost:2181");
props.put("group.id", "testgroup");
props.put("zookeeper.session.timeout.ms", "400");
props.put("zookeeper.sync.time.ms", "300");
props.put("auto.commit.interval.ms", "100");
ConsumerConfig conConfig = new ConsumerConfig(props);
consumerConnector = Consumer.createJavaConsumerConnector(conConfig);
}
public void consume() {
//Key = topic name, Value = No. of threads for topic
Map<String, Integer> topicCount = new HashMap<String, Integer>();
topicCount.put(topic, new Integer(1));
//ConsumerConnector creates the message stream for each topic
Map<String, List<KafkaStream<byte[], byte[]>>> consumerStreams =
consumerConnector.createMessageStreams(topicCount);
// Get Kafka stream for topic 'mytopic'
List<KafkaStream<byte[], byte[]>> kStreamList =
consumerStreams.get(topic);
// Iterate stream using ConsumerIterator
for (final KafkaStream<byte[], byte[]> kStreams : kStreamList) {
ConsumerIterator<byte[], byte[]> consumerIte =
kStreams.iterator();
while (consumerIte.hasNext())
System.out.println("Message consumed from topic[" +
topic + "] : " +
new
String(consumerIte.next().message()));
}
//Shutdown the consumer connector
if (consumerConnector != null) consumerConnector.shutdown();
}
public static void main(String[] args) throws InterruptedException {
KafkaConsumer kafkaConsumer = new KafkaConsumer();
// Configure Kafka consumer
kafkaConsumer.initialize();
// Start consumption
kafkaConsumer.consume();
}
}
When we run this above consumer class it will print all the tweets collected in that moment.
We have build this project through maven and the pom.xml file is as follows
pom.xml
<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.twitter</groupId>
<artifactId>hbc-example</artifactId>
<version>2.2.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Hosebird Client Examples</name>
<properties>
<git.dir>${project.basedir}/../.git</git.dir>
<!-- this makes maven-tools not bump us to snapshot versions -->
<stabilized>true</stabilized>
<!-- Fill these in via https://dev.twitter.com/apps -->
<consumer.key>TODO</consumer.key>
<consumer.secret>TODO</consumer.secret>
<access.token>TODO</access.token>
<access.token.secret>TODO</access.token.secret>
</properties>
<dependencies>
<dependency>
<groupId>com.twitter</groupId>
<artifactId>hbc-twitter4j</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.7</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
</plugin>
</plugins>
</build>
</project>
We need to run the producer and consumer programs in eclipse. So first we need to run the producer
to stream the tweets from twitter.
The eclipse console of the Producer is as shown in the screen shot.
Now let's run consumer class of kafka. The console of the consumer with the collected tweets is as
shown in the below screen shot.
Here we have collected the tweets related to hadoop topic which we have set in the producer class.
We can also check for the topics on which kafka is running now by using the command
./kafka-topics.sh --zookeeper localhost:2181 –list
We can check the consumer console also parallely to check the tweets which are being collected in
the real time by using the below command:
./kafka-console-consumer.sh --zookeeper localhost:2181 --topic "hadoop" –from-beginning
Below is the screen shot of the consumer console with the tweets.
So this is how we collect streaming data from twitter using kafka.
We hope this blog helped you in understanding how to collect streaming data from twitter using
kafka. Keep visiting our site www.acadgild.com for more updates on Big data and other
technologies.

More Related Content

What's hot

Hands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with RelationalHands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with Relational
Amazon Web Services
 
PowerShell Technical Overview
PowerShell Technical OverviewPowerShell Technical Overview
PowerShell Technical Overview
allandcp
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
ennui2342
 
Moving Drupal to the Cloud
Moving Drupal to the CloudMoving Drupal to the Cloud
Moving Drupal to the Cloud
Ari Davidow
 
Automated testing web services - part 1
Automated testing web services - part 1Automated testing web services - part 1
Automated testing web services - part 1
Aleh Struneuski
 
Hands-on Lab - Combaring Redis with Relational
Hands-on Lab - Combaring Redis with RelationalHands-on Lab - Combaring Redis with Relational
Hands-on Lab - Combaring Redis with Relational
Amazon Web Services
 
Paul Lammertsma: Account manager & sync
Paul Lammertsma: Account manager & syncPaul Lammertsma: Account manager & sync
Paul Lammertsma: Account manager & sync
mdevtalk
 
Антипаттерны модульного тестирования
Антипаттерны модульного тестированияАнтипаттерны модульного тестирования
Антипаттерны модульного тестирования
MitinPavel
 
Csphtp1 22
Csphtp1 22Csphtp1 22
Csphtp1 22
HUST
 
Alert Logic
Alert LogicAlert Logic
Alert Logic
Amazon Web Services
 
Apache cassandra - future without boundaries (part3)
Apache cassandra - future without boundaries (part3)Apache cassandra - future without boundaries (part3)
Apache cassandra - future without boundaries (part3)
Return on Intelligence
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Next generation message driven systems with Akka
Next generation message driven systems with AkkaNext generation message driven systems with Akka
Next generation message driven systems with Akka
Johan Andrén
 
How to send gzipped requests with boto3
How to send gzipped requests with boto3How to send gzipped requests with boto3
How to send gzipped requests with boto3
Luciano Mammino
 
Windows PowerShell
Windows PowerShellWindows PowerShell
Windows PowerShell
Sandun Perera
 
String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>
String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>
String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>
Muhammad Sohail
 
Open sourcing the store
Open sourcing the storeOpen sourcing the store
Open sourcing the store
Mike Nakhimovich
 
AllegroCache with Web Servers
AllegroCache with Web ServersAllegroCache with Web Servers
AllegroCache with Web Servers
webhostingguy
 
Programming Sideways: Asynchronous Techniques for Android
Programming Sideways: Asynchronous Techniques for AndroidProgramming Sideways: Asynchronous Techniques for Android
Programming Sideways: Asynchronous Techniques for Android
Emanuele Di Saverio
 

What's hot (20)

Hands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with RelationalHands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with Relational
 
PowerShell Technical Overview
PowerShell Technical OverviewPowerShell Technical Overview
PowerShell Technical Overview
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
 
Moving Drupal to the Cloud
Moving Drupal to the CloudMoving Drupal to the Cloud
Moving Drupal to the Cloud
 
Automated testing web services - part 1
Automated testing web services - part 1Automated testing web services - part 1
Automated testing web services - part 1
 
Hands-on Lab - Combaring Redis with Relational
Hands-on Lab - Combaring Redis with RelationalHands-on Lab - Combaring Redis with Relational
Hands-on Lab - Combaring Redis with Relational
 
Paul Lammertsma: Account manager & sync
Paul Lammertsma: Account manager & syncPaul Lammertsma: Account manager & sync
Paul Lammertsma: Account manager & sync
 
Антипаттерны модульного тестирования
Антипаттерны модульного тестированияАнтипаттерны модульного тестирования
Антипаттерны модульного тестирования
 
Csphtp1 22
Csphtp1 22Csphtp1 22
Csphtp1 22
 
Alert Logic
Alert LogicAlert Logic
Alert Logic
 
Apache cassandra - future without boundaries (part3)
Apache cassandra - future without boundaries (part3)Apache cassandra - future without boundaries (part3)
Apache cassandra - future without boundaries (part3)
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
 
Next generation message driven systems with Akka
Next generation message driven systems with AkkaNext generation message driven systems with Akka
Next generation message driven systems with Akka
 
How to send gzipped requests with boto3
How to send gzipped requests with boto3How to send gzipped requests with boto3
How to send gzipped requests with boto3
 
Windows PowerShell
Windows PowerShellWindows PowerShell
Windows PowerShell
 
String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>
String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>
String.fromCharCode(60)script>alert("XSS")String.fromCharCode(60)/script>
 
Open sourcing the store
Open sourcing the storeOpen sourcing the store
Open sourcing the store
 
AllegroCache with Web Servers
AllegroCache with Web ServersAllegroCache with Web Servers
AllegroCache with Web Servers
 
Programming Sideways: Asynchronous Techniques for Android
Programming Sideways: Asynchronous Techniques for AndroidProgramming Sideways: Asynchronous Techniques for Android
Programming Sideways: Asynchronous Techniques for Android
 

Viewers also liked

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in Recommendations
Mohammad Hossein Taghavi
 
(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning
Yves Raimond
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Dawen Liang
 

Viewers also liked (6)

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in Recommendations
 
(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
 

Similar to Streaming twitter data using kafka

Securing your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris KelloggSecuring your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris Kellogg
StreamNative
 
Servlets
ServletsServlets
Servlets
Geethu Mohan
 
Multi Client Development with Spring
Multi Client Development with SpringMulti Client Development with Spring
Multi Client Development with Spring
Joshua Long
 
Windows 8 metro applications
Windows 8 metro applicationsWindows 8 metro applications
Windows 8 metro applications
Alex Golesh
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Bot builder v4 HOL
Bot builder v4 HOLBot builder v4 HOL
Bot builder v4 HOL
Cheah Eng Soon
 
Efficient HTTP Apis
Efficient HTTP ApisEfficient HTTP Apis
Efficient HTTP Apis
Adrian Cole
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
Doug Chang
 
Finatra v2
Finatra v2Finatra v2
Finatra v2
Steve Cosenza
 
Networked APIs with swift
Networked APIs with swiftNetworked APIs with swift
Networked APIs with swift
Tim Burks
 
Kafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsKafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 steps
Azmath Mohamad
 
Ibm mq with c# sending and receiving messages
Ibm mq with c# sending and receiving messagesIbm mq with c# sending and receiving messages
Ibm mq with c# sending and receiving messages
Shreesha Rao
 
#3 (Multi Threads With TCP)
#3 (Multi Threads With TCP)#3 (Multi Threads With TCP)
#3 (Multi Threads With TCP)
Ghadeer AlHasan
 
High Performance RPC with Finagle
High Performance RPC with FinagleHigh Performance RPC with Finagle
High Performance RPC with Finagle
Samir Bessalah
 
Vert.x for Microservices Architecture
Vert.x for Microservices ArchitectureVert.x for Microservices Architecture
Vert.x for Microservices Architecture
Idan Fridman
 
10 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 2020
10 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 202010 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 2020
10 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 2020
Matt Raible
 
[NDC 2019] Enterprise-Grade Serverless
[NDC 2019] Enterprise-Grade Serverless[NDC 2019] Enterprise-Grade Serverless
[NDC 2019] Enterprise-Grade Serverless
KatyShimizu
 
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
KatyShimizu
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界
Amazon Web Services
 
SignalR
SignalRSignalR
SignalR
LearningTech
 

Similar to Streaming twitter data using kafka (20)

Securing your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris KelloggSecuring your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris Kellogg
 
Servlets
ServletsServlets
Servlets
 
Multi Client Development with Spring
Multi Client Development with SpringMulti Client Development with Spring
Multi Client Development with Spring
 
Windows 8 metro applications
Windows 8 metro applicationsWindows 8 metro applications
Windows 8 metro applications
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Bot builder v4 HOL
Bot builder v4 HOLBot builder v4 HOL
Bot builder v4 HOL
 
Efficient HTTP Apis
Efficient HTTP ApisEfficient HTTP Apis
Efficient HTTP Apis
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
Finatra v2
Finatra v2Finatra v2
Finatra v2
 
Networked APIs with swift
Networked APIs with swiftNetworked APIs with swift
Networked APIs with swift
 
Kafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsKafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 steps
 
Ibm mq with c# sending and receiving messages
Ibm mq with c# sending and receiving messagesIbm mq with c# sending and receiving messages
Ibm mq with c# sending and receiving messages
 
#3 (Multi Threads With TCP)
#3 (Multi Threads With TCP)#3 (Multi Threads With TCP)
#3 (Multi Threads With TCP)
 
High Performance RPC with Finagle
High Performance RPC with FinagleHigh Performance RPC with Finagle
High Performance RPC with Finagle
 
Vert.x for Microservices Architecture
Vert.x for Microservices ArchitectureVert.x for Microservices Architecture
Vert.x for Microservices Architecture
 
10 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 2020
10 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 202010 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 2020
10 Excellent Ways to Secure Spring Boot Applications - Okta Webinar 2020
 
[NDC 2019] Enterprise-Grade Serverless
[NDC 2019] Enterprise-Grade Serverless[NDC 2019] Enterprise-Grade Serverless
[NDC 2019] Enterprise-Grade Serverless
 
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界
 
SignalR
SignalRSignalR
SignalR
 

Recently uploaded

Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 

Recently uploaded (20)

Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 

Streaming twitter data using kafka

  • 1. Streaming Twitter data using Kafka In this blog we will be discussing about how to stream twitter data using kafka. Before going through this blog, we hope that you have pre installed kafka and zookeeper services in your system. Please go through this blog to Install kafka Please go through this blog to Install zookeeper Twitter provides Hosebird client (hbc), a robust Java HTTP library for consuming Twitter’s Streaming API. Hosebird is the server implementation of the Twitter Streaming API. The Streaming API allows clients to receive Tweets in near real-time. Various resources allow filtered, sampled or full access to some or all Tweets. Every Twitter account has access to the Streaming API and any developer can build applications today. Hosebird also powers the recently announced User Streams feature that streams all events related to a given user to drive desktop Twitter clients. We will start kafka and zookeeper services Start zookeeper server by moving into the bin folder of zookeeper installed directory by using the command zkServer.sh start Start kafka server by moving into the bin folder of kafka installed directory by using the command ./kafka-server-start.sh ../config/server.properties
  • 2. In kafka we use two classes called Producers and Consumers. The Producer class is used to create new messages for a specific Topic and optional Partition. Twitter Producer Class package kafka; import java.util.*; import java.util.concurrent.BlockingQueue; import java.util.concurrent.LinkedBlockingQueue; import kafka.producer.KeyedMessage; import kafka.producer.ProducerConfig; import com.google.common.collect.Lists; import com.twitter.hbc.ClientBuilder; import com.twitter.hbc.core.Client; import com.twitter.hbc.core.Constants; import com.twitter.hbc.core.endpoint.StatusesFilterEndpoint; import com.twitter.hbc.core.processor.StringDelimitedProcessor; import com.twitter.hbc.httpclient.auth.Authentication; import com.twitter.hbc.httpclient.auth.OAuth1; public class TwitterKafkaProducer { private static final String topic = "hadoop"; public static void run() throws InterruptedException { Properties properties = new Properties(); properties.put("metadata.broker.list", "localhost:9092"); properties.put("serializer.class", "kafka.serializer.StringEncoder");
  • 3. properties.put("client.id","camus"); ProducerConfig producerConfig = new ProducerConfig(properties); kafka.javaapi.producer.Producer<String, String> producer = new kafka.javaapi.producer.Producer<String, String>( producerConfig); BlockingQueue<String> queue = new LinkedBlockingQueue<String>(100000); StatusesFilterEndpoint endpoint = new StatusesFilterEndpoint(); endpoint.trackTerms(Lists.newArrayList("twitterapi", "#AAPSweep")); String consumerKey= TwitterSourceConstant.CONSUMER_KEY_KEY; String consumerSecret=TwitterSourceConstant.CONSUMER_SECRET_KEY; String accessToken=TwitterSourceConstant.ACCESS_TOKEN_KEY; String accessTokenSecret=TwitterSourceConstant.ACCESS_TOKEN_SECRET_KEY; Authentication auth = new OAuth1(consumerKey, consumerSecret, accessToken, accessTokenSecret); Client client = new ClientBuilder().hosts(Constants.STREAM_HOST) .endpoint(endpoint).authentication(auth) .processor(new StringDelimitedProcessor(queue)).build(); client.connect(); for (int msgRead = 0; msgRead < 1000; msgRead++) { KeyedMessage<String, String> message = null; try { message = new KeyedMessage<String, String>(topic, queue.take()); } catch (InterruptedException e) { //e.printStackTrace(); System.out.println("Stream ended"); } producer.send(message); } producer.close(); client.stop(); } public static void main(String[] args) { try { TwitterKafkaProducer.run(); } catch (InterruptedException e) { System.out.println(e); } } } Twitter authorization is done through consumerKey,consumerSecret,accessToken,accessTokenSecret. So we are passing them through a class called TwitterSourceConstant public class TwitterSourceConstant { public static final String CONSUMER_KEY_KEY = "xxxxxxxxxxxxxxxxxxxxxxxx";
  • 4. public static final String CONSUMER_SECRET_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxx"; public static final String ACCESS_TOKEN_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxx"; public static final String ACCESS_TOKEN_SECRET_KEY = "xxxxxxxxxxxxxxxxxxxxxx"; In the private static final String topic = "hadoop"; of producer class we will pass our topic to stream the particular data from twitter. So first we need to start this producer class to start streaming data from twitter. Now we will write a consumer class to print the streamed tweets. The consumer class is as follows package kafka; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Properties; import kafka.consumer.Consumer; import kafka.consumer.ConsumerConfig; import kafka.consumer.ConsumerIterator; import kafka.consumer.KafkaStream; import kafka.javaapi.consumer.ConsumerConnector; public class KafkaConsumer { private ConsumerConnector consumerConnector = null; private final String topic = "twitter-topic1"; public void initialize() { Properties props = new Properties(); props.put("zookeeper.connect", "localhost:2181"); props.put("group.id", "testgroup"); props.put("zookeeper.session.timeout.ms", "400"); props.put("zookeeper.sync.time.ms", "300"); props.put("auto.commit.interval.ms", "100"); ConsumerConfig conConfig = new ConsumerConfig(props); consumerConnector = Consumer.createJavaConsumerConnector(conConfig); } public void consume() { //Key = topic name, Value = No. of threads for topic Map<String, Integer> topicCount = new HashMap<String, Integer>(); topicCount.put(topic, new Integer(1)); //ConsumerConnector creates the message stream for each topic Map<String, List<KafkaStream<byte[], byte[]>>> consumerStreams = consumerConnector.createMessageStreams(topicCount); // Get Kafka stream for topic 'mytopic' List<KafkaStream<byte[], byte[]>> kStreamList = consumerStreams.get(topic); // Iterate stream using ConsumerIterator for (final KafkaStream<byte[], byte[]> kStreams : kStreamList) { ConsumerIterator<byte[], byte[]> consumerIte = kStreams.iterator(); while (consumerIte.hasNext()) System.out.println("Message consumed from topic[" + topic + "] : " +
  • 5. new String(consumerIte.next().message())); } //Shutdown the consumer connector if (consumerConnector != null) consumerConnector.shutdown(); } public static void main(String[] args) throws InterruptedException { KafkaConsumer kafkaConsumer = new KafkaConsumer(); // Configure Kafka consumer kafkaConsumer.initialize(); // Start consumption kafkaConsumer.consume(); } } When we run this above consumer class it will print all the tweets collected in that moment. We have build this project through maven and the pom.xml file is as follows pom.xml <?xml version="1.0"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.twitter</groupId> <artifactId>hbc-example</artifactId> <version>2.2.1-SNAPSHOT</version> <packaging>jar</packaging> <name>Hosebird Client Examples</name> <properties> <git.dir>${project.basedir}/../.git</git.dir> <!-- this makes maven-tools not bump us to snapshot versions --> <stabilized>true</stabilized> <!-- Fill these in via https://dev.twitter.com/apps --> <consumer.key>TODO</consumer.key> <consumer.secret>TODO</consumer.secret> <access.token>TODO</access.token> <access.token.secret>TODO</access.token.secret> </properties> <dependencies> <dependency> <groupId>com.twitter</groupId> <artifactId>hbc-twitter4j</artifactId> <version>2.2.0</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-simple</artifactId> <version>1.7.2</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.10</artifactId> <version>0.8.2.1</version> </dependency> </dependencies>
  • 6. <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-deploy-plugin</artifactId> <version>2.7</version> <configuration> <skip>true</skip> </configuration> </plugin> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>1.2.1</version> </plugin> </plugins> </build> </project> We need to run the producer and consumer programs in eclipse. So first we need to run the producer to stream the tweets from twitter. The eclipse console of the Producer is as shown in the screen shot. Now let's run consumer class of kafka. The console of the consumer with the collected tweets is as shown in the below screen shot.
  • 7. Here we have collected the tweets related to hadoop topic which we have set in the producer class. We can also check for the topics on which kafka is running now by using the command ./kafka-topics.sh --zookeeper localhost:2181 –list We can check the consumer console also parallely to check the tweets which are being collected in the real time by using the below command: ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic "hadoop" –from-beginning Below is the screen shot of the consumer console with the tweets.
  • 8. So this is how we collect streaming data from twitter using kafka. We hope this blog helped you in understanding how to collect streaming data from twitter using kafka. Keep visiting our site www.acadgild.com for more updates on Big data and other technologies.