SlideShare a Scribd company logo
Big Data Pipeline
Lambda Architecture - Streaming(Real-Time) Layer
with
Apache Kafka
Apache Hadoop
Apache Spark
Apache Cassandra
on Amazon Web Services Cloud Platform
INGEST STORE Process Visualize
BIG Data Pipeline
Data Pipeline
AngularJS
Web App
ClickStream
Data
Apache
Web Logs
Log/Data File
Spark
Streaming
Spark
SQL
Apache
Kafka
S3
HDFS
Apache
Cassandra
AngularJS
Web App
April
INGEST STREA
M
PROCES
S
VISUALIZE
STORE
Interactive
Queries
Spark Cluster
TCP
Sockets
BIG Data Streaming (Real-Time) Layer Pipeline
Install Kafka - 3 Node Cluster on AWS
3 EC2 instance for Kafka Cluster
Repeat commands for all - 3 EC2 instance for Kafka Cluster
cat /etc/*-release
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version
mkdir kafka
cd kafka
wget http://download.nextag.com/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
tar -zxvf kafka_2.11-0.10.0.0.tgz
cd kafka_2.11-0.10.0.0
ZooKeeper ==> 172.31.48.208 / 52.91.1.93
Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211
Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194
Modify config/server.properties for
kafka-datanode1 & kafkadatanode2
ZooKeeper ==> 172.31.48.208 / 52.91.1.93
Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211
Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194
Kafka-datanode1 (set following properties for config/server.properties)
ubuntu@ip-172-31-63-203:~/kafka/kafka_2.11-0.10.0.0$ vi config/server.properties
broker.id=1
listeners=PLAINTEXT://172.31.63.203:9092
advertised.listeners=PLAINTEXT://54.173.215.211:9092
zookeeper.connect=52.91.1.93:2181
Kafka-datanode2 (set following properties for config/server.properties)
ubuntu@ip-172-31-9-25:~/kafka/kafka_2.11-0.10.0.0$ vi config/server.properties
broker.id=2
listeners=PLAINTEXT://172.31.9.25:9092
advertised.listeners=PLAINTEXT://54.226.29.194:9092
zookeeper.connect=52.91.1.93:2181
Launch zookeeper / datanode1 / datanode2
ZooKeeper ==> 172.31.48.208 / 52.91.1.93
Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211
Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194
1) Start zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
2) Start server on Kafka-datanode1
bin/kafka-server-start.sh config/server.properties
3) Start server on Kafka-datanode2
bin/kafka-server-start.sh config/server.properties
4) Create Topic & Start consumer
bin/kafka-topics.sh --zookeeper 52.91.1.93:2181 --create --topic data --partitions 1 --replication-factor 2
bin/kafka-console-consumer.sh --zookeeper 52.91.1.93:2181 --topic data --from-beginning
Java - Kafka Producer Sample Applicationpackage com.himanshu;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
//import util.properties packages
import java.util.Properties;
//import simple producer packages
import org.apache.kafka.clients.producer.Producer;
//import KafkaProducer packages
import org.apache.kafka.clients.producer.KafkaProducer;
//import ProducerRecord packages
import org.apache.kafka.clients.producer.ProducerRecord;
public class DataProducer {
public static void main(String[] args) {
// Check arguments length value
/*
if(args.length == 0) {
System.out.println("Enter topic name");
return;
}
*/
//Assign topicName to string variable
String topicName = "data"; //args[0].toString();
// create instance for properties to access producer configs
Properties props = new Properties();
//Assign localhost id
props.put("bootstrap.servers", "54.173.215.211:9092,54.226.29.194:9092");
//props.put("metadata.broker.list", "172.31.63.203:9092,172.31.9.25:9092");
//Set acknowledgements for producer requests.
props.put("acks", "all");
//If the request fails, the producer can automatically retry,
props.put("retries", 0);
//Specify buffer size in config
props.put("batch.size", 16384);
//Reduce the no of requests less than 0
props.put("linger.ms", 1);
//The buffer.memory controls the total amount of memory available to the producer for buffering.
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
String csvFile = "/Users/himanshu/Documents/workspace/KafkaProducer/src/com/himanshu/invoice.txt";
String csvSplitBy = ",";
BufferedReader br = null;
String lineInvoice = "";
try {
br = new BufferedReader(new FileReader(csvFile));
while((lineInvoice = br.readLine()) != null ) {
String[] invoice = lineInvoice.split(csvSplitBy);
producer.send(new ProducerRecord<String, String>(topicName, lineInvoice));
System.out.println("Message sent successfully....");
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
Java - Kafka Producer Sample Application
catch (IOException e) {
e.printStackTrace();
}
finally {
producer.close();
if (br != null) {
try {
br.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
Java - Kafka Producer Sample Application
Sample data which we will be sending to Kafka Server
from Java Kafka Producer (csv file)
Message received on kafka datanode1
RealTime Streaming with
Kafka
Apache Spark
Apache Cassandra
Launch Kafka Cluster
(Zookeeper/kafka datanode1/ kafka datanode2)
Execute Python / Kafka Spark Job
Sample data which we will be sending to Kafka Server
from Java Kafka Producer (csv file)
Python Spark Job Processing Data from AWS Kafka Cluster
Python Spark Job Processing Data from AWS Kafka Cluster
&
Processed Data stored in AWS Cassandra Cluster
Sample data which we will be sending to Kafka Server
from Java Kafka Producer
Python Spark Job Processing Data from AWS Kafka Cluster
&
Processed Data stored in AWS Cassandra Cluster
Apache Spark UI
Python Spark Streaming Application
Thank You
hkbhadraa@gmail.com

More Related Content

What's hot

Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...Akhil Das
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
Amazon Web Services
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
Amazon Web Services
 
Lab Manual Managed Database Basics
Lab Manual Managed Database BasicsLab Manual Managed Database Basics
Lab Manual Managed Database Basics
Amazon Web Services
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for Redis
Amazon Web Services
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
Rico Lin
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax Academy
 
Ansible with AWS
Ansible with AWSAnsible with AWS
Ansible with AWS
Allan Denot
 
Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021
TomStraub5
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
Gwen (Chen) Shapira
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
Marilyn Waldman
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
Lab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with RelationalLab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with Relational
Amazon Web Services
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Chris Fregly
 
Cron in der Cloud - Die Top 10 Hitparade
Cron in der Cloud - Die Top 10 HitparadeCron in der Cloud - Die Top 10 Hitparade
Cron in der Cloud - Die Top 10 Hitparade
QAware GmbH
 
Docker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profitDocker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profit
Maxime Petazzoni
 
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Lightbend
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
Sadayuki Furuhashi
 
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and MarathonDeploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and Marathon
Discover Pinterest
 

What's hot (20)

Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
 
Lab Manual Managed Database Basics
Lab Manual Managed Database BasicsLab Manual Managed Database Basics
Lab Manual Managed Database Basics
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for Redis
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenter
 
Ansible with AWS
Ansible with AWSAnsible with AWS
Ansible with AWS
 
Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
 
Lab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with RelationalLab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with Relational
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Cron in der Cloud - Die Top 10 Hitparade
Cron in der Cloud - Die Top 10 HitparadeCron in der Cloud - Die Top 10 Hitparade
Cron in der Cloud - Die Top 10 Hitparade
 
Docker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profitDocker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profit
 
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
 
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and MarathonDeploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and Marathon
 

Viewers also liked

Setup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands OnSetup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands On
hkbhadraa
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
hkbhadraa
 
Gamification
GamificationGamification
Gamification
hkbhadraa
 
Internet of things
Internet of thingsInternet of things
Internet of things
hkbhadraa
 
Kafka aws
Kafka awsKafka aws
Kafka aws
Ariel Moskovich
 
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDEIBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
hkbhadraa
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
Mrigendra Sharma
 
Project management part 2
Project management part 2Project management part 2
Project management part 2
hkbhadraa
 
Project management part 5
Project management part 5Project management part 5
Project management part 5
hkbhadraa
 
Project management part 4
Project management part 4Project management part 4
Project management part 4
hkbhadraa
 
Решения Oracle для Big Data
Решения Oracle для Big DataРешения Oracle для Big Data
Решения Oracle для Big Data
Andrey Akulov
 
Project management part 1
Project management part 1Project management part 1
Project management part 1
hkbhadraa
 
Project management part 3
Project management part 3Project management part 3
Project management part 3
hkbhadraa
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Amazon Web Services
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
Jeff Henrikson
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
Monal Daxini
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
Jen Aman
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 

Viewers also liked (20)

Setup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands OnSetup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands On
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
 
Gamification
GamificationGamification
Gamification
 
Internet of things
Internet of thingsInternet of things
Internet of things
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDEIBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Project management part 2
Project management part 2Project management part 2
Project management part 2
 
Project management part 5
Project management part 5Project management part 5
Project management part 5
 
Project management part 4
Project management part 4Project management part 4
Project management part 4
 
Решения Oracle для Big Data
Решения Oracle для Big DataРешения Oracle для Big Data
Решения Oracle для Big Data
 
Project management part 1
Project management part 1Project management part 1
Project management part 1
 
Project management part 3
Project management part 3Project management part 3
Project management part 3
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 

Similar to Big data lambda architecture - Streaming Layer Hands On

Continuous Delivery: The Next Frontier
Continuous Delivery: The Next FrontierContinuous Delivery: The Next Frontier
Continuous Delivery: The Next Frontier
Carlos Sanchez
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
Matt Ray
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016
StackIQ
 
DevOps Enabling Your Team
DevOps Enabling Your TeamDevOps Enabling Your Team
DevOps Enabling Your Team
GR8Conf
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabricandymccurdy
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Joe Stein
 
Docker for Ruby Developers
Docker for Ruby DevelopersDocker for Ruby Developers
Docker for Ruby Developers
Aptible
 
Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013
Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013
Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013
Carlos Sanchez
 
Automating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWSAutomating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWS
Chris Brown
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
confluent
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
Lessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containersLessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containers
Ben Hall
 
Training
TrainingTraining
Training
HemantDunga1
 
Journey to Microservice architecture via Amazon Lambda
Journey to Microservice architecture via Amazon LambdaJourney to Microservice architecture via Amazon Lambda
Journey to Microservice architecture via Amazon Lambda
Axilis
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
Dave Pitts
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
HostedbyConfluent
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
Amazon Web Services
 

Similar to Big data lambda architecture - Streaming Layer Hands On (20)

Continuous Delivery: The Next Frontier
Continuous Delivery: The Next FrontierContinuous Delivery: The Next Frontier
Continuous Delivery: The Next Frontier
 
infra-as-code
infra-as-codeinfra-as-code
infra-as-code
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016
 
DevOps Enabling Your Team
DevOps Enabling Your TeamDevOps Enabling Your Team
DevOps Enabling Your Team
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
Network Manual
Network ManualNetwork Manual
Network Manual
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Docker for Ruby Developers
Docker for Ruby DevelopersDocker for Ruby Developers
Docker for Ruby Developers
 
Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013
Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013
Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013
 
Automating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWSAutomating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWS
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Lessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containersLessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containers
 
Training
TrainingTraining
Training
 
Journey to Microservice architecture via Amazon Lambda
Journey to Microservice architecture via Amazon LambdaJourney to Microservice architecture via Amazon Lambda
Journey to Microservice architecture via Amazon Lambda
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Big data lambda architecture - Streaming Layer Hands On

  • 1. Big Data Pipeline Lambda Architecture - Streaming(Real-Time) Layer with Apache Kafka Apache Hadoop Apache Spark Apache Cassandra on Amazon Web Services Cloud Platform
  • 2. INGEST STORE Process Visualize BIG Data Pipeline Data Pipeline
  • 3. AngularJS Web App ClickStream Data Apache Web Logs Log/Data File Spark Streaming Spark SQL Apache Kafka S3 HDFS Apache Cassandra AngularJS Web App April INGEST STREA M PROCES S VISUALIZE STORE Interactive Queries Spark Cluster TCP Sockets BIG Data Streaming (Real-Time) Layer Pipeline
  • 4. Install Kafka - 3 Node Cluster on AWS
  • 5. 3 EC2 instance for Kafka Cluster
  • 6. Repeat commands for all - 3 EC2 instance for Kafka Cluster cat /etc/*-release sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer java -version mkdir kafka cd kafka wget http://download.nextag.com/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz tar -zxvf kafka_2.11-0.10.0.0.tgz cd kafka_2.11-0.10.0.0 ZooKeeper ==> 172.31.48.208 / 52.91.1.93 Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211 Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194
  • 7. Modify config/server.properties for kafka-datanode1 & kafkadatanode2 ZooKeeper ==> 172.31.48.208 / 52.91.1.93 Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211 Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194 Kafka-datanode1 (set following properties for config/server.properties) ubuntu@ip-172-31-63-203:~/kafka/kafka_2.11-0.10.0.0$ vi config/server.properties broker.id=1 listeners=PLAINTEXT://172.31.63.203:9092 advertised.listeners=PLAINTEXT://54.173.215.211:9092 zookeeper.connect=52.91.1.93:2181 Kafka-datanode2 (set following properties for config/server.properties) ubuntu@ip-172-31-9-25:~/kafka/kafka_2.11-0.10.0.0$ vi config/server.properties broker.id=2 listeners=PLAINTEXT://172.31.9.25:9092 advertised.listeners=PLAINTEXT://54.226.29.194:9092 zookeeper.connect=52.91.1.93:2181
  • 8. Launch zookeeper / datanode1 / datanode2 ZooKeeper ==> 172.31.48.208 / 52.91.1.93 Kafka-datanode1 ==> 172.31.63.203 / 54.173.215.211 Kafka-datanode2 ==> 172.31.9.25 / 54.226.29.194 1) Start zookeeper bin/zookeeper-server-start.sh config/zookeeper.properties 2) Start server on Kafka-datanode1 bin/kafka-server-start.sh config/server.properties 3) Start server on Kafka-datanode2 bin/kafka-server-start.sh config/server.properties 4) Create Topic & Start consumer bin/kafka-topics.sh --zookeeper 52.91.1.93:2181 --create --topic data --partitions 1 --replication-factor 2 bin/kafka-console-consumer.sh --zookeeper 52.91.1.93:2181 --topic data --from-beginning
  • 9. Java - Kafka Producer Sample Applicationpackage com.himanshu; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; //import util.properties packages import java.util.Properties; //import simple producer packages import org.apache.kafka.clients.producer.Producer; //import KafkaProducer packages import org.apache.kafka.clients.producer.KafkaProducer; //import ProducerRecord packages import org.apache.kafka.clients.producer.ProducerRecord; public class DataProducer { public static void main(String[] args) { // Check arguments length value /* if(args.length == 0) { System.out.println("Enter topic name"); return; } */ //Assign topicName to string variable String topicName = "data"; //args[0].toString(); // create instance for properties to access producer configs Properties props = new Properties(); //Assign localhost id props.put("bootstrap.servers", "54.173.215.211:9092,54.226.29.194:9092"); //props.put("metadata.broker.list", "172.31.63.203:9092,172.31.9.25:9092");
  • 10. //Set acknowledgements for producer requests. props.put("acks", "all"); //If the request fails, the producer can automatically retry, props.put("retries", 0); //Specify buffer size in config props.put("batch.size", 16384); //Reduce the no of requests less than 0 props.put("linger.ms", 1); //The buffer.memory controls the total amount of memory available to the producer for buffering. props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer<String, String>(props); String csvFile = "/Users/himanshu/Documents/workspace/KafkaProducer/src/com/himanshu/invoice.txt"; String csvSplitBy = ","; BufferedReader br = null; String lineInvoice = ""; try { br = new BufferedReader(new FileReader(csvFile)); while((lineInvoice = br.readLine()) != null ) { String[] invoice = lineInvoice.split(csvSplitBy); producer.send(new ProducerRecord<String, String>(topicName, lineInvoice)); System.out.println("Message sent successfully...."); } } catch (FileNotFoundException e) { e.printStackTrace(); } Java - Kafka Producer Sample Application
  • 11. catch (IOException e) { e.printStackTrace(); } finally { producer.close(); if (br != null) { try { br.close(); } catch (IOException e) { e.printStackTrace(); } } } } } Java - Kafka Producer Sample Application
  • 12. Sample data which we will be sending to Kafka Server from Java Kafka Producer (csv file)
  • 13. Message received on kafka datanode1
  • 14. RealTime Streaming with Kafka Apache Spark Apache Cassandra
  • 15. Launch Kafka Cluster (Zookeeper/kafka datanode1/ kafka datanode2)
  • 16. Execute Python / Kafka Spark Job
  • 17. Sample data which we will be sending to Kafka Server from Java Kafka Producer (csv file)
  • 18. Python Spark Job Processing Data from AWS Kafka Cluster
  • 19. Python Spark Job Processing Data from AWS Kafka Cluster & Processed Data stored in AWS Cassandra Cluster
  • 20. Sample data which we will be sending to Kafka Server from Java Kafka Producer
  • 21. Python Spark Job Processing Data from AWS Kafka Cluster & Processed Data stored in AWS Cassandra Cluster
  • 23. Python Spark Streaming Application