Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data lambda architecture - Streaming Layer Hands On

509 views

Published on

This presentation describes Hands on guide BIG Data Streaming Pipeline AWS Cloud Platform using Apache Kafka, Apache Hadoop, Apache Spark and Apache Cassandra.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big data lambda architecture - Streaming Layer Hands On

  1. 1. Big Data Pipeline Lambda Architecture - Streaming(Real-Time) Layer with Apache Kafka Apache Hadoop Apache Spark Apache Cassandra on Amazon Web Services Cloud Platform
  2. 2. INGEST STORE Process Visualize BIG Data Pipeline Data Pipeline
  3. 3. AngularJS Web App ClickStream Data Apache Web Logs Log/Data File Spark Streaming Spark SQL Apache Kafka S3 HDFS Apache Cassandra AngularJS Web App April INGEST STREA M PROCES S VISUALIZE STORE Interactive Queries Spark Cluster TCP Sockets BIG Data Streaming (Real-Time) Layer Pipeline
  4. 4. Install Kafka - 3 Node Cluster on AWS
  5. 5. 3 EC2 instance for Kafka Cluster
  6. 6. Repeat commands for all - 3 EC2 instance for Kafka Cluster cat /etc/*-release sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer java -version mkdir kafka cd kafka wget http://download.nextag.com/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz tar -zxvf kafka_2.11-0.10.0.0.tgz cd kafka_2.11-0.10.0.0 ZooKeeper ==> 172.31.48.208 / 52.91.1.93 Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211 Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194
  7. 7. Modify config/server.properties for kafka-datanode1 & kafkadatanode2 ZooKeeper ==> 172.31.48.208 / 52.91.1.93 Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211 Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194 Kafka-datanode1 (set following properties for config/server.properties) ubuntu@ip-172-31-63-203:~/kafka/kafka_2.11-0.10.0.0$ vi config/server.properties broker.id=1 listeners=PLAINTEXT://172.31.63.203:9092 advertised.listeners=PLAINTEXT://54.173.215.211:9092 zookeeper.connect=52.91.1.93:2181 Kafka-datanode2 (set following properties for config/server.properties) ubuntu@ip-172-31-9-25:~/kafka/kafka_2.11-0.10.0.0$ vi config/server.properties broker.id=2 listeners=PLAINTEXT://172.31.9.25:9092 advertised.listeners=PLAINTEXT://54.226.29.194:9092 zookeeper.connect=52.91.1.93:2181
  8. 8. Launch zookeeper / datanode1 / datanode2 ZooKeeper ==> 172.31.48.208 / 52.91.1.93 Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211 Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194 1) Start zookeeper bin/zookeeper-server-start.sh config/zookeeper.properties 2) Start server on Kafka-datanode1 bin/kafka-server-start.sh config/server.properties 3) Start server on Kafka-datanode2 bin/kafka-server-start.sh config/server.properties 4) Create Topic & Start consumer bin/kafka-topics.sh --zookeeper 52.91.1.93:2181 --create --topic data --partitions 1 --replication-factor 2 bin/kafka-console-consumer.sh --zookeeper 52.91.1.93:2181 --topic data --from-beginning
  9. 9. Java - Kafka Producer Sample Applicationpackage com.himanshu; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; //import util.properties packages import java.util.Properties; //import simple producer packages import org.apache.kafka.clients.producer.Producer; //import KafkaProducer packages import org.apache.kafka.clients.producer.KafkaProducer; //import ProducerRecord packages import org.apache.kafka.clients.producer.ProducerRecord; public class DataProducer { public static void main(String[] args) { // Check arguments length value /* if(args.length == 0) { System.out.println("Enter topic name"); return; } */ //Assign topicName to string variable String topicName = "data"; //args[0].toString(); // create instance for properties to access producer configs Properties props = new Properties(); //Assign localhost id props.put("bootstrap.servers", "54.173.215.211:9092,54.226.29.194:9092"); //props.put("metadata.broker.list", "172.31.63.203:9092,172.31.9.25:9092");
  10. 10. //Set acknowledgements for producer requests. props.put("acks", "all"); //If the request fails, the producer can automatically retry, props.put("retries", 0); //Specify buffer size in config props.put("batch.size", 16384); //Reduce the no of requests less than 0 props.put("linger.ms", 1); //The buffer.memory controls the total amount of memory available to the producer for buffering. props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer<String, String>(props); String csvFile = "/Users/himanshu/Documents/workspace/KafkaProducer/src/com/himanshu/invoice.txt"; String csvSplitBy = ","; BufferedReader br = null; String lineInvoice = ""; try { br = new BufferedReader(new FileReader(csvFile)); while((lineInvoice = br.readLine()) != null ) { String[] invoice = lineInvoice.split(csvSplitBy); producer.send(new ProducerRecord<String, String>(topicName, lineInvoice)); System.out.println("Message sent successfully...."); } } catch (FileNotFoundException e) { e.printStackTrace(); } Java - Kafka Producer Sample Application
  11. 11. catch (IOException e) { e.printStackTrace(); } finally { producer.close(); if (br != null) { try { br.close(); } catch (IOException e) { e.printStackTrace(); } } } } } Java - Kafka Producer Sample Application
  12. 12. Sample data which we will be sending to Kafka Server from Java Kafka Producer (csv file)
  13. 13. Message received on kafka datanode1
  14. 14. RealTime Streaming with Kafka Apache Spark Apache Cassandra
  15. 15. Launch Kafka Cluster (Zookeeper/kafka datanode1/ kafka datanode2)
  16. 16. Execute Python / Kafka Spark Job
  17. 17. Sample data which we will be sending to Kafka Server from Java Kafka Producer (csv file)
  18. 18. Python Spark Job Processing Data from AWS Kafka Cluster
  19. 19. Python Spark Job Processing Data from AWS Kafka Cluster & Processed Data stored in AWS Cassandra Cluster
  20. 20. Sample data which we will be sending to Kafka Server from Java Kafka Producer
  21. 21. Python Spark Job Processing Data from AWS Kafka Cluster & Processed Data stored in AWS Cassandra Cluster
  22. 22. Apache Spark UI
  23. 23. Python Spark Streaming Application
  24. 24. Thank You hkbhadraa@gmail.com

×