SlideShare a Scribd company logo
1 of 15
Hadoop Training
FLUME
Page 1Classification: Restricted
Agenda
• Flume Overview
• Flume Agent
• Sinks
• Flume Installation
• What is Netcat & Telnet?
Page 2Classification: Restricted
Apache Flume is a tool used to collect streaming data such as log files,
events from various sources.
Data which is to be collected will be produced by various sources like
applications servers, social networking sites and various others. This data
will be in the form of log files and events.
Log file − In general, a log file is a file that lists events/actions that occur in
an operating system. For example, web servers list every request made to
the server in the log files.
Processing the log files produces info:: −
Understanding the application performance and various software and
hardware failures.
The user behavior and derive better business insights.
An event is the basic unit of the data transported inside Flume.
When the rate of incoming data exceeds the rate at which data can be
written to the destination, Flume acts as a mediator between data
producers and the centralized stores and provides a steady flow of data
between them
Flume Overview
Page 3Classification: Restricted
Flume deploys as one or more agents, each contained within its own instance
of the Java Virtual Machine (JVM).
Agents consist of three components: sources, sinks, and channels. An agent
must have at least one of each in order to run. Sources collect incoming data as
events. Sinks write events out, and channels provide a queue to connect the
source and sink
Flume Agent
Page 4Classification: Restricted
Sources
Flume agents may have more than one source, but must have at least one. Sources
require a name and a type; the type then dictates additional configuration
parameters.
On consuming an event, Flume sources write the event to a channel. Importantly,
sources write to their channels as transactions. By dealing in events and
transactions, Flume agents maintain end-to-end flow reliability. Events are not
dropped inside a Flume agent unless the channel is explicitly allowed to discard
them due to a full queue.
Channels
Channels are the mechanism by which Flume agents transfer events from their
sources to their sinks. Events written to the channel by a source are not removed
from the channel until a sink removes that event in a transaction. This allows Flume
sinks to retry writes in the event of a failure in the external repository (such as
HDFS or an outgoing network connection). For example, if the network between a
Flume agent and a Hadoop cluster goes down, the channel will keep all events
queued until the sink can correctly write to the cluster and close its transactions
with the channel.
Flume Agent
Page 5Classification: Restricted
Sinks provide Flume agents output capability — if you need to write to a new
type storage, just write a Java class that implements the necessary classes.
Like sources, sinks correspond to a type of output: writes to HDFS or HBase,
remote procedure calls to other agents, or any number of other external
repositories. Sinks remove events from the channel in transactions and write
them to output. Transactions close when the event is successfully written,
ensuring that all events are committed to their final destination.
Sinks
Page 6Classification: Restricted
Follow the steps mentioned below to install and configure Flume on a linux
box. Flume agent requires hadoop configurations available on the same
node.
•Download the latest version of Flume from here.
•Change directory to /usr/local/work
Command :$cd usr/local/work
•Untar the < apache-flume-<version>-bin.tar.gz>
command :$sudo tar –xzvf apache-flume-1.5.0-bin.tar.gz
•Move to flume directory
command: sudo mv usr/local/work/apache-flume-1.5.0-bin flume
Flume Installation
Page 7Classification: Restricted
•Add Flume to Path in user bash profile
command :$sudo nano ~/.bashrc
export FLUME_HOME="/usr/local/work/flume"
export PATH=$PATH:$FLUME_HOME/bin
Copy the config file in Flume conf folder to change for custom agents
Command: $ cd /usr/local/work/flume/conf
Command :$ sudo cp flume-conf.properties.template flume.conf
•Command :$ sudo cp flume-env.sh.template flume-env.sh
•Open flume-env.sh Command :$ sudo nano flume-env.sh
•6.1 Configure Java
JAVA_HOME=/usr/local/work/java
Modify the flume.conf in conf directory and add required to it. Also
comment the existing properties.
Flume Installation
Page 8Classification: Restricted
Go to /usr/local/work/flume/conf
then open the file flume.conf using:::::::
This configuration lets a user generate events and subsequently logs
them to the console.
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano
flume.conf
commentt all the lines and paste::::::::
anand.sources = mis
anand.sinks = his
anand.channels = c
# Describe/configure the source
anand.sources.mis.type = netcat
anand.sources.mis.bind = localhost
anand.sources.mis.port = 44444
Flume Installation
Page 9Classification: Restricted
# Describe the sink
anand.sinks.his.type = logger
# Use a channel which buffers events in memory
anand.channels.c.type = memory
anand.channels.c.capacity = 1000
anand.channels.c.transactionCapacity = 100
# Bind the source and sink to the channel
anand.sources.mis.channels = c
anand.sinks.his.channel = c
Flume Installation
Page 10Classification: Restricted
This configuration defines a single agent named anand. anand has a source that listens for
data on port 44444, a channel that buffers event data in memory, and a sink that logs event
data to the console.
mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf conf --conf-file
conf/flume.conf --name anand -Dflume.root.logger=INFO,console or
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf conf --conf-file
flume.conf --name anand -Dflume.root.logger=INFO,console
now open another terminal and do the following
From a separate terminal, we can then telnet port 44444 and send Flume an event:
mishra@mishra-VirtualBox:~$ telnet localhost 44444
you will get the following on your screen:::
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Now type anything which you want as your streaming data communicating through telnet
port
like:: hello andy....how r u??? Now check on the terminal on which flume is running. You will
find same output
Flume Installation
Page 11Classification: Restricted
What is netcat?
netcat:: functions it can do various other things like creating socket servers to
listen for incoming connections on ports, transfer files from the terminal etc.
Netcat is a computer networking service for reading from and writing network
connections using TCP or UDP
More technically speaking, netcat can act as a socket server or client and interact
with other programs at the same time sending and receiving data through the
network.
Ncat is a feature-packed networking utility which reads and writes data across
networks from the command line
What is telnet?
A network protocol that allows a user on one computer to log into another
computer that is part of the same network. Telnet is a user command and an
underlying TCP/IP protocol for accessing remote computers. Telnet is most likely
to be used by program developers and anyone who has a need to use specific
applications or data located at a particular host computer.
What is Netcat & Telnet?
Page 12Classification: Restricted
IN case when you want your data in hdfs::::
open flume.conf file and paste::::::
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano flume.conf
comment all lines with # and paste:::::
agent.sources = mis
agent.sinks = his
agent.channels = c
# Describe/configure the source
agent.sources.mis.type = netcat
agent.sources.mis.bind = localhost
agent.sources.mis.port = 44444
# Define a sink that outputs to logger.
agent.sinks.his.type = hdfs
agent.sinks.his.hdfs.path =hdfs://localhost:8020/flumedata/
agent.sinks.his.hdfs.fileType = DataStream
agent.sinks.his.hdfs.writeFormat = Text
agent.channels.c.type = memory
agent.channels.c.capacity = 1000
agent.channels.c.transactionCapacity = 100
# Bind the source and sink to the channel
agent.sources.mis.channels = c
agent.sinks.his.channel = c
Flume
Page 13Classification: Restricted
now run the command
mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf
conf --conf-file conf/flume.conf --name agent -
Dflume.root.logger=INFO,console
or
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf
conf --conf-file flume.conf --name agent -Dflume.root.logger=INFO,console
telnet localhost 44444
write anything here
mishra@mishra-VirtualBox:~$ hadoop fs -cat
/flumedata/FlumeData.1467740902549
Flume
Page 14Classification: Restricted
Thank You!

More Related Content

What's hot

Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingRapheephan Thongkham-Uan
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an IntroductionErik Schmiegelow
 
Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Michael Lamont
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce AnandMHadoop
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS AppendYue Chen
 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Steve Hoffman
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 

What's hot (20)

Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
 
Unit 2
Unit 2Unit 2
Unit 2
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
Flume
FlumeFlume
Flume
 
Dns
DnsDns
Dns
 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Apache HTTP Server
Apache HTTP ServerApache HTTP Server
Apache HTTP Server
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 

Similar to Hadoop Flume Training - Collect Streaming Data Using Apache Flume

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with FlumeRatnakar Pawar
 
Go with the Flow-v2
Go with the Flow-v2Go with the Flow-v2
Go with the Flow-v2Zobair Khan
 
Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptxJayesh Patil
 
Meeting 9 nfs network file system
Meeting 9   nfs network file systemMeeting 9   nfs network file system
Meeting 9 nfs network file systemSyaiful Ahdan
 
Using an FTP client - Client server computing
Using an FTP client -  Client server computingUsing an FTP client -  Client server computing
Using an FTP client - Client server computinglordmwesh
 
lamp technology
lamp technologylamp technology
lamp technologyDeepa
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Network file system (nfs)
Network file system (nfs)Network file system (nfs)
Network file system (nfs)Raghu nath
 
Ch 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet ServersCh 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet Serverswebhostingguy
 
Cracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTFCracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTFRiyaz Walikar
 
Freeware Security Tools You Need
Freeware Security Tools You NeedFreeware Security Tools You Need
Freeware Security Tools You Needamiable_indian
 

Similar to Hadoop Flume Training - Collect Streaming Data Using Apache Flume (20)

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Go with the Flow-v2
Go with the Flow-v2Go with the Flow-v2
Go with the Flow-v2
 
Go with the Flow
Go with the Flow Go with the Flow
Go with the Flow
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
 
Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptx
 
Meeting 9 nfs network file system
Meeting 9   nfs network file systemMeeting 9   nfs network file system
Meeting 9 nfs network file system
 
Using an FTP client - Client server computing
Using an FTP client -  Client server computingUsing an FTP client -  Client server computing
Using an FTP client - Client server computing
 
Cita310chap09
Cita310chap09Cita310chap09
Cita310chap09
 
Netkitmig
NetkitmigNetkitmig
Netkitmig
 
lamp technology
lamp technologylamp technology
lamp technology
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technology
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Flume basic
Flume basicFlume basic
Flume basic
 
Network file system (nfs)
Network file system (nfs)Network file system (nfs)
Network file system (nfs)
 
Ch 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet ServersCh 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet Servers
 
Cracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTFCracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTF
 
Squid server
Squid serverSquid server
Squid server
 
Freeware Security Tools You Need
Freeware Security Tools You NeedFreeware Security Tools You Need
Freeware Security Tools You Need
 

More from AnandMHadoop

Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig ContinuedAnandMHadoop
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsAnandMHadoop
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn ConceptsAnandMHadoop
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 

More from AnandMHadoop (6)

Overview of Java
Overview of Java Overview of Java
Overview of Java
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn Concepts
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Hadoop Flume Training - Collect Streaming Data Using Apache Flume

  • 2. Page 1Classification: Restricted Agenda • Flume Overview • Flume Agent • Sinks • Flume Installation • What is Netcat & Telnet?
  • 3. Page 2Classification: Restricted Apache Flume is a tool used to collect streaming data such as log files, events from various sources. Data which is to be collected will be produced by various sources like applications servers, social networking sites and various others. This data will be in the form of log files and events. Log file − In general, a log file is a file that lists events/actions that occur in an operating system. For example, web servers list every request made to the server in the log files. Processing the log files produces info:: − Understanding the application performance and various software and hardware failures. The user behavior and derive better business insights. An event is the basic unit of the data transported inside Flume. When the rate of incoming data exceeds the rate at which data can be written to the destination, Flume acts as a mediator between data producers and the centralized stores and provides a steady flow of data between them Flume Overview
  • 4. Page 3Classification: Restricted Flume deploys as one or more agents, each contained within its own instance of the Java Virtual Machine (JVM). Agents consist of three components: sources, sinks, and channels. An agent must have at least one of each in order to run. Sources collect incoming data as events. Sinks write events out, and channels provide a queue to connect the source and sink Flume Agent
  • 5. Page 4Classification: Restricted Sources Flume agents may have more than one source, but must have at least one. Sources require a name and a type; the type then dictates additional configuration parameters. On consuming an event, Flume sources write the event to a channel. Importantly, sources write to their channels as transactions. By dealing in events and transactions, Flume agents maintain end-to-end flow reliability. Events are not dropped inside a Flume agent unless the channel is explicitly allowed to discard them due to a full queue. Channels Channels are the mechanism by which Flume agents transfer events from their sources to their sinks. Events written to the channel by a source are not removed from the channel until a sink removes that event in a transaction. This allows Flume sinks to retry writes in the event of a failure in the external repository (such as HDFS or an outgoing network connection). For example, if the network between a Flume agent and a Hadoop cluster goes down, the channel will keep all events queued until the sink can correctly write to the cluster and close its transactions with the channel. Flume Agent
  • 6. Page 5Classification: Restricted Sinks provide Flume agents output capability — if you need to write to a new type storage, just write a Java class that implements the necessary classes. Like sources, sinks correspond to a type of output: writes to HDFS or HBase, remote procedure calls to other agents, or any number of other external repositories. Sinks remove events from the channel in transactions and write them to output. Transactions close when the event is successfully written, ensuring that all events are committed to their final destination. Sinks
  • 7. Page 6Classification: Restricted Follow the steps mentioned below to install and configure Flume on a linux box. Flume agent requires hadoop configurations available on the same node. •Download the latest version of Flume from here. •Change directory to /usr/local/work Command :$cd usr/local/work •Untar the < apache-flume-<version>-bin.tar.gz> command :$sudo tar –xzvf apache-flume-1.5.0-bin.tar.gz •Move to flume directory command: sudo mv usr/local/work/apache-flume-1.5.0-bin flume Flume Installation
  • 8. Page 7Classification: Restricted •Add Flume to Path in user bash profile command :$sudo nano ~/.bashrc export FLUME_HOME="/usr/local/work/flume" export PATH=$PATH:$FLUME_HOME/bin Copy the config file in Flume conf folder to change for custom agents Command: $ cd /usr/local/work/flume/conf Command :$ sudo cp flume-conf.properties.template flume.conf •Command :$ sudo cp flume-env.sh.template flume-env.sh •Open flume-env.sh Command :$ sudo nano flume-env.sh •6.1 Configure Java JAVA_HOME=/usr/local/work/java Modify the flume.conf in conf directory and add required to it. Also comment the existing properties. Flume Installation
  • 9. Page 8Classification: Restricted Go to /usr/local/work/flume/conf then open the file flume.conf using::::::: This configuration lets a user generate events and subsequently logs them to the console. mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano flume.conf commentt all the lines and paste:::::::: anand.sources = mis anand.sinks = his anand.channels = c # Describe/configure the source anand.sources.mis.type = netcat anand.sources.mis.bind = localhost anand.sources.mis.port = 44444 Flume Installation
  • 10. Page 9Classification: Restricted # Describe the sink anand.sinks.his.type = logger # Use a channel which buffers events in memory anand.channels.c.type = memory anand.channels.c.capacity = 1000 anand.channels.c.transactionCapacity = 100 # Bind the source and sink to the channel anand.sources.mis.channels = c anand.sinks.his.channel = c Flume Installation
  • 11. Page 10Classification: Restricted This configuration defines a single agent named anand. anand has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name anand -Dflume.root.logger=INFO,console or mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf conf --conf-file flume.conf --name anand -Dflume.root.logger=INFO,console now open another terminal and do the following From a separate terminal, we can then telnet port 44444 and send Flume an event: mishra@mishra-VirtualBox:~$ telnet localhost 44444 you will get the following on your screen::: Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Now type anything which you want as your streaming data communicating through telnet port like:: hello andy....how r u??? Now check on the terminal on which flume is running. You will find same output Flume Installation
  • 12. Page 11Classification: Restricted What is netcat? netcat:: functions it can do various other things like creating socket servers to listen for incoming connections on ports, transfer files from the terminal etc. Netcat is a computer networking service for reading from and writing network connections using TCP or UDP More technically speaking, netcat can act as a socket server or client and interact with other programs at the same time sending and receiving data through the network. Ncat is a feature-packed networking utility which reads and writes data across networks from the command line What is telnet? A network protocol that allows a user on one computer to log into another computer that is part of the same network. Telnet is a user command and an underlying TCP/IP protocol for accessing remote computers. Telnet is most likely to be used by program developers and anyone who has a need to use specific applications or data located at a particular host computer. What is Netcat & Telnet?
  • 13. Page 12Classification: Restricted IN case when you want your data in hdfs:::: open flume.conf file and paste:::::: mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano flume.conf comment all lines with # and paste::::: agent.sources = mis agent.sinks = his agent.channels = c # Describe/configure the source agent.sources.mis.type = netcat agent.sources.mis.bind = localhost agent.sources.mis.port = 44444 # Define a sink that outputs to logger. agent.sinks.his.type = hdfs agent.sinks.his.hdfs.path =hdfs://localhost:8020/flumedata/ agent.sinks.his.hdfs.fileType = DataStream agent.sinks.his.hdfs.writeFormat = Text agent.channels.c.type = memory agent.channels.c.capacity = 1000 agent.channels.c.transactionCapacity = 100 # Bind the source and sink to the channel agent.sources.mis.channels = c agent.sinks.his.channel = c Flume
  • 14. Page 13Classification: Restricted now run the command mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name agent - Dflume.root.logger=INFO,console or mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf conf --conf-file flume.conf --name agent -Dflume.root.logger=INFO,console telnet localhost 44444 write anything here mishra@mishra-VirtualBox:~$ hadoop fs -cat /flumedata/FlumeData.1467740902549 Flume