SlideShare a Scribd company logo
Hadoop Training
FLUME
Page 1Classification: Restricted
Agenda
• Flume Overview
• Flume Agent
• Sinks
• Flume Installation
• What is Netcat & Telnet?
Page 2Classification: Restricted
Apache Flume is a tool used to collect streaming data such as log files,
events from various sources.
Data which is to be collected will be produced by various sources like
applications servers, social networking sites and various others. This data
will be in the form of log files and events.
Log file − In general, a log file is a file that lists events/actions that occur in
an operating system. For example, web servers list every request made to
the server in the log files.
Processing the log files produces info:: −
Understanding the application performance and various software and
hardware failures.
The user behavior and derive better business insights.
An event is the basic unit of the data transported inside Flume.
When the rate of incoming data exceeds the rate at which data can be
written to the destination, Flume acts as a mediator between data
producers and the centralized stores and provides a steady flow of data
between them
Flume Overview
Page 3Classification: Restricted
Flume deploys as one or more agents, each contained within its own instance
of the Java Virtual Machine (JVM).
Agents consist of three components: sources, sinks, and channels. An agent
must have at least one of each in order to run. Sources collect incoming data as
events. Sinks write events out, and channels provide a queue to connect the
source and sink
Flume Agent
Page 4Classification: Restricted
Sources
Flume agents may have more than one source, but must have at least one. Sources
require a name and a type; the type then dictates additional configuration
parameters.
On consuming an event, Flume sources write the event to a channel. Importantly,
sources write to their channels as transactions. By dealing in events and
transactions, Flume agents maintain end-to-end flow reliability. Events are not
dropped inside a Flume agent unless the channel is explicitly allowed to discard
them due to a full queue.
Channels
Channels are the mechanism by which Flume agents transfer events from their
sources to their sinks. Events written to the channel by a source are not removed
from the channel until a sink removes that event in a transaction. This allows Flume
sinks to retry writes in the event of a failure in the external repository (such as
HDFS or an outgoing network connection). For example, if the network between a
Flume agent and a Hadoop cluster goes down, the channel will keep all events
queued until the sink can correctly write to the cluster and close its transactions
with the channel.
Flume Agent
Page 5Classification: Restricted
Sinks provide Flume agents output capability — if you need to write to a new
type storage, just write a Java class that implements the necessary classes.
Like sources, sinks correspond to a type of output: writes to HDFS or HBase,
remote procedure calls to other agents, or any number of other external
repositories. Sinks remove events from the channel in transactions and write
them to output. Transactions close when the event is successfully written,
ensuring that all events are committed to their final destination.
Sinks
Page 6Classification: Restricted
Follow the steps mentioned below to install and configure Flume on a linux
box. Flume agent requires hadoop configurations available on the same
node.
•Download the latest version of Flume from here.
•Change directory to /usr/local/work
Command :$cd usr/local/work
•Untar the < apache-flume-<version>-bin.tar.gz>
command :$sudo tar –xzvf apache-flume-1.5.0-bin.tar.gz
•Move to flume directory
command: sudo mv usr/local/work/apache-flume-1.5.0-bin flume
Flume Installation
Page 7Classification: Restricted
•Add Flume to Path in user bash profile
command :$sudo nano ~/.bashrc
export FLUME_HOME="/usr/local/work/flume"
export PATH=$PATH:$FLUME_HOME/bin
Copy the config file in Flume conf folder to change for custom agents
Command: $ cd /usr/local/work/flume/conf
Command :$ sudo cp flume-conf.properties.template flume.conf
•Command :$ sudo cp flume-env.sh.template flume-env.sh
•Open flume-env.sh Command :$ sudo nano flume-env.sh
•6.1 Configure Java
JAVA_HOME=/usr/local/work/java
Modify the flume.conf in conf directory and add required to it. Also
comment the existing properties.
Flume Installation
Page 8Classification: Restricted
Go to /usr/local/work/flume/conf
then open the file flume.conf using:::::::
This configuration lets a user generate events and subsequently logs
them to the console.
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano
flume.conf
commentt all the lines and paste::::::::
anand.sources = mis
anand.sinks = his
anand.channels = c
# Describe/configure the source
anand.sources.mis.type = netcat
anand.sources.mis.bind = localhost
anand.sources.mis.port = 44444
Flume Installation
Page 9Classification: Restricted
# Describe the sink
anand.sinks.his.type = logger
# Use a channel which buffers events in memory
anand.channels.c.type = memory
anand.channels.c.capacity = 1000
anand.channels.c.transactionCapacity = 100
# Bind the source and sink to the channel
anand.sources.mis.channels = c
anand.sinks.his.channel = c
Flume Installation
Page 10Classification: Restricted
This configuration defines a single agent named anand. anand has a source that listens for
data on port 44444, a channel that buffers event data in memory, and a sink that logs event
data to the console.
mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf conf --conf-file
conf/flume.conf --name anand -Dflume.root.logger=INFO,console or
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf conf --conf-file
flume.conf --name anand -Dflume.root.logger=INFO,console
now open another terminal and do the following
From a separate terminal, we can then telnet port 44444 and send Flume an event:
mishra@mishra-VirtualBox:~$ telnet localhost 44444
you will get the following on your screen:::
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Now type anything which you want as your streaming data communicating through telnet
port
like:: hello andy....how r u??? Now check on the terminal on which flume is running. You will
find same output
Flume Installation
Page 11Classification: Restricted
What is netcat?
netcat:: functions it can do various other things like creating socket servers to
listen for incoming connections on ports, transfer files from the terminal etc.
Netcat is a computer networking service for reading from and writing network
connections using TCP or UDP
More technically speaking, netcat can act as a socket server or client and interact
with other programs at the same time sending and receiving data through the
network.
Ncat is a feature-packed networking utility which reads and writes data across
networks from the command line
What is telnet?
A network protocol that allows a user on one computer to log into another
computer that is part of the same network. Telnet is a user command and an
underlying TCP/IP protocol for accessing remote computers. Telnet is most likely
to be used by program developers and anyone who has a need to use specific
applications or data located at a particular host computer.
What is Netcat & Telnet?
Page 12Classification: Restricted
IN case when you want your data in hdfs::::
open flume.conf file and paste::::::
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano flume.conf
comment all lines with # and paste:::::
agent.sources = mis
agent.sinks = his
agent.channels = c
# Describe/configure the source
agent.sources.mis.type = netcat
agent.sources.mis.bind = localhost
agent.sources.mis.port = 44444
# Define a sink that outputs to logger.
agent.sinks.his.type = hdfs
agent.sinks.his.hdfs.path =hdfs://localhost:8020/flumedata/
agent.sinks.his.hdfs.fileType = DataStream
agent.sinks.his.hdfs.writeFormat = Text
agent.channels.c.type = memory
agent.channels.c.capacity = 1000
agent.channels.c.transactionCapacity = 100
# Bind the source and sink to the channel
agent.sources.mis.channels = c
agent.sinks.his.channel = c
Flume
Page 13Classification: Restricted
now run the command
mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf
conf --conf-file conf/flume.conf --name agent -
Dflume.root.logger=INFO,console
or
mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf
conf --conf-file flume.conf --name agent -Dflume.root.logger=INFO,console
telnet localhost 44444
write anything here
mishra@mishra-VirtualBox:~$ hadoop fs -cat
/flumedata/FlumeData.1467740902549
Flume
Page 14Classification: Restricted
Thank You!

More Related Content

What's hot

Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
Rapheephan Thongkham-Uan
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
Erik Schmiegelow
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
vishal choudhary
 
Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Michael Lamont
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
Rupak Roy
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
AnandMHadoop
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
Yue Chen
 
Flume
FlumeFlume
Dns
DnsDns
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Steve Hoffman
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
Alexander Alten
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
Prashant Gupta
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
Rupak Roy
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
Jazan University
 
Apache HTTP Server
Apache HTTP ServerApache HTTP Server
Apache HTTP Server
Tan Huynh Cong
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
Rupak Roy
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
kawamuray
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
kawamuray
 

What's hot (20)

Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
 
Unit 2
Unit 2Unit 2
Unit 2
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
Flume
FlumeFlume
Flume
 
Dns
DnsDns
Dns
 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Apache HTTP Server
Apache HTTP ServerApache HTTP Server
Apache HTTP Server
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 

Similar to Session 09 - Flume

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
Ratnakar Pawar
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
PrabhuSundarraj1
 
Go with the Flow
Go with the Flow Go with the Flow
Go with the Flow-v2
Go with the Flow-v2Go with the Flow-v2
Go with the Flow-v2Zobair Khan
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
Nitin Kumar
 
Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptx
Jayesh Patil
 
Meeting 9 nfs network file system
Meeting 9   nfs network file systemMeeting 9   nfs network file system
Meeting 9 nfs network file system
Syaiful Ahdan
 
Using an FTP client - Client server computing
Using an FTP client -  Client server computingUsing an FTP client -  Client server computing
Using an FTP client - Client server computing
lordmwesh
 
lamp technology
lamp technologylamp technology
lamp technologyDeepa
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
megrhi haikel
 
Flume basic
Flume basicFlume basic
Flume basic
Uday Vakalapudi
 
Network file system (nfs)
Network file system (nfs)Network file system (nfs)
Network file system (nfs)Raghu nath
 
Ch 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet ServersCh 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet Serverswebhostingguy
 
Cracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTFCracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTF
Riyaz Walikar
 
Squid server
Squid serverSquid server
Squid server
Rohit Phulsunge
 
Freeware Security Tools You Need
Freeware Security Tools You NeedFreeware Security Tools You Need
Freeware Security Tools You Need
amiable_indian
 

Similar to Session 09 - Flume (20)

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Go with the Flow
Go with the Flow Go with the Flow
Go with the Flow
 
Go with the Flow-v2
Go with the Flow-v2Go with the Flow-v2
Go with the Flow-v2
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
 
Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptx
 
Meeting 9 nfs network file system
Meeting 9   nfs network file systemMeeting 9   nfs network file system
Meeting 9 nfs network file system
 
Using an FTP client - Client server computing
Using an FTP client -  Client server computingUsing an FTP client -  Client server computing
Using an FTP client - Client server computing
 
Cita310chap09
Cita310chap09Cita310chap09
Cita310chap09
 
Netkitmig
NetkitmigNetkitmig
Netkitmig
 
lamp technology
lamp technologylamp technology
lamp technology
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technology
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Flume basic
Flume basicFlume basic
Flume basic
 
Network file system (nfs)
Network file system (nfs)Network file system (nfs)
Network file system (nfs)
 
Ch 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet ServersCh 22: Web Hosting and Internet Servers
Ch 22: Web Hosting and Internet Servers
 
Cracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTFCracking CTFs The Sysbypass CTF
Cracking CTFs The Sysbypass CTF
 
Squid server
Squid serverSquid server
Squid server
 
Freeware Security Tools You Need
Freeware Security Tools You NeedFreeware Security Tools You Need
Freeware Security Tools You Need
 

More from AnandMHadoop

Overview of Java
Overview of Java Overview of Java
Overview of Java
AnandMHadoop
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
AnandMHadoop
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
AnandMHadoop
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
AnandMHadoop
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn Concepts
AnandMHadoop
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
AnandMHadoop
 

More from AnandMHadoop (6)

Overview of Java
Overview of Java Overview of Java
Overview of Java
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn Concepts
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 

Session 09 - Flume

  • 2. Page 1Classification: Restricted Agenda • Flume Overview • Flume Agent • Sinks • Flume Installation • What is Netcat & Telnet?
  • 3. Page 2Classification: Restricted Apache Flume is a tool used to collect streaming data such as log files, events from various sources. Data which is to be collected will be produced by various sources like applications servers, social networking sites and various others. This data will be in the form of log files and events. Log file − In general, a log file is a file that lists events/actions that occur in an operating system. For example, web servers list every request made to the server in the log files. Processing the log files produces info:: − Understanding the application performance and various software and hardware failures. The user behavior and derive better business insights. An event is the basic unit of the data transported inside Flume. When the rate of incoming data exceeds the rate at which data can be written to the destination, Flume acts as a mediator between data producers and the centralized stores and provides a steady flow of data between them Flume Overview
  • 4. Page 3Classification: Restricted Flume deploys as one or more agents, each contained within its own instance of the Java Virtual Machine (JVM). Agents consist of three components: sources, sinks, and channels. An agent must have at least one of each in order to run. Sources collect incoming data as events. Sinks write events out, and channels provide a queue to connect the source and sink Flume Agent
  • 5. Page 4Classification: Restricted Sources Flume agents may have more than one source, but must have at least one. Sources require a name and a type; the type then dictates additional configuration parameters. On consuming an event, Flume sources write the event to a channel. Importantly, sources write to their channels as transactions. By dealing in events and transactions, Flume agents maintain end-to-end flow reliability. Events are not dropped inside a Flume agent unless the channel is explicitly allowed to discard them due to a full queue. Channels Channels are the mechanism by which Flume agents transfer events from their sources to their sinks. Events written to the channel by a source are not removed from the channel until a sink removes that event in a transaction. This allows Flume sinks to retry writes in the event of a failure in the external repository (such as HDFS or an outgoing network connection). For example, if the network between a Flume agent and a Hadoop cluster goes down, the channel will keep all events queued until the sink can correctly write to the cluster and close its transactions with the channel. Flume Agent
  • 6. Page 5Classification: Restricted Sinks provide Flume agents output capability — if you need to write to a new type storage, just write a Java class that implements the necessary classes. Like sources, sinks correspond to a type of output: writes to HDFS or HBase, remote procedure calls to other agents, or any number of other external repositories. Sinks remove events from the channel in transactions and write them to output. Transactions close when the event is successfully written, ensuring that all events are committed to their final destination. Sinks
  • 7. Page 6Classification: Restricted Follow the steps mentioned below to install and configure Flume on a linux box. Flume agent requires hadoop configurations available on the same node. •Download the latest version of Flume from here. •Change directory to /usr/local/work Command :$cd usr/local/work •Untar the < apache-flume-<version>-bin.tar.gz> command :$sudo tar –xzvf apache-flume-1.5.0-bin.tar.gz •Move to flume directory command: sudo mv usr/local/work/apache-flume-1.5.0-bin flume Flume Installation
  • 8. Page 7Classification: Restricted •Add Flume to Path in user bash profile command :$sudo nano ~/.bashrc export FLUME_HOME="/usr/local/work/flume" export PATH=$PATH:$FLUME_HOME/bin Copy the config file in Flume conf folder to change for custom agents Command: $ cd /usr/local/work/flume/conf Command :$ sudo cp flume-conf.properties.template flume.conf •Command :$ sudo cp flume-env.sh.template flume-env.sh •Open flume-env.sh Command :$ sudo nano flume-env.sh •6.1 Configure Java JAVA_HOME=/usr/local/work/java Modify the flume.conf in conf directory and add required to it. Also comment the existing properties. Flume Installation
  • 9. Page 8Classification: Restricted Go to /usr/local/work/flume/conf then open the file flume.conf using::::::: This configuration lets a user generate events and subsequently logs them to the console. mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano flume.conf commentt all the lines and paste:::::::: anand.sources = mis anand.sinks = his anand.channels = c # Describe/configure the source anand.sources.mis.type = netcat anand.sources.mis.bind = localhost anand.sources.mis.port = 44444 Flume Installation
  • 10. Page 9Classification: Restricted # Describe the sink anand.sinks.his.type = logger # Use a channel which buffers events in memory anand.channels.c.type = memory anand.channels.c.capacity = 1000 anand.channels.c.transactionCapacity = 100 # Bind the source and sink to the channel anand.sources.mis.channels = c anand.sinks.his.channel = c Flume Installation
  • 11. Page 10Classification: Restricted This configuration defines a single agent named anand. anand has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name anand -Dflume.root.logger=INFO,console or mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf conf --conf-file flume.conf --name anand -Dflume.root.logger=INFO,console now open another terminal and do the following From a separate terminal, we can then telnet port 44444 and send Flume an event: mishra@mishra-VirtualBox:~$ telnet localhost 44444 you will get the following on your screen::: Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Now type anything which you want as your streaming data communicating through telnet port like:: hello andy....how r u??? Now check on the terminal on which flume is running. You will find same output Flume Installation
  • 12. Page 11Classification: Restricted What is netcat? netcat:: functions it can do various other things like creating socket servers to listen for incoming connections on ports, transfer files from the terminal etc. Netcat is a computer networking service for reading from and writing network connections using TCP or UDP More technically speaking, netcat can act as a socket server or client and interact with other programs at the same time sending and receiving data through the network. Ncat is a feature-packed networking utility which reads and writes data across networks from the command line What is telnet? A network protocol that allows a user on one computer to log into another computer that is part of the same network. Telnet is a user command and an underlying TCP/IP protocol for accessing remote computers. Telnet is most likely to be used by program developers and anyone who has a need to use specific applications or data located at a particular host computer. What is Netcat & Telnet?
  • 13. Page 12Classification: Restricted IN case when you want your data in hdfs:::: open flume.conf file and paste:::::: mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ sudo nano flume.conf comment all lines with # and paste::::: agent.sources = mis agent.sinks = his agent.channels = c # Describe/configure the source agent.sources.mis.type = netcat agent.sources.mis.bind = localhost agent.sources.mis.port = 44444 # Define a sink that outputs to logger. agent.sinks.his.type = hdfs agent.sinks.his.hdfs.path =hdfs://localhost:8020/flumedata/ agent.sinks.his.hdfs.fileType = DataStream agent.sinks.his.hdfs.writeFormat = Text agent.channels.c.type = memory agent.channels.c.capacity = 1000 agent.channels.c.transactionCapacity = 100 # Bind the source and sink to the channel agent.sources.mis.channels = c agent.sinks.his.channel = c Flume
  • 14. Page 13Classification: Restricted now run the command mishra@mishra-VirtualBox:/usr/local/work/flume$ bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name agent - Dflume.root.logger=INFO,console or mishra@mishra-VirtualBox:/usr/local/work/flume/conf$ flume-ng agent --conf conf --conf-file flume.conf --name agent -Dflume.root.logger=INFO,console telnet localhost 44444 write anything here mishra@mishra-VirtualBox:~$ hadoop fs -cat /flumedata/FlumeData.1467740902549 Flume