SlideShare a Scribd company logo
1 of 18
Download to read offline
APACHE FLUME NG
         Kai Voigt, Cloudera Inc
         London, Hadoop User Group, 10 Oct 2012




Donnerstag, 11. Oktober 12
“          FLUME IS A DISTRIBUTED,
              RELIABLE, AND AVAILABLE
              SERVICE FOR
              EFFICIENTLY COLLECTING,
              AGGREGATING, AND
              MOVING LARGE AMOUNTS
              OF LOG DATA
                                        ”
Donnerstag, 11. Oktober 12
“          FLUME IS A DISTRIBUTED,
              RELIABLE, AND AVAILABLE
              SERVICE FOR
              EFFICIENTLY COLLECTING,
              AGGREGATING, AND
              MOVING LARGE AMOUNTS
              OF LOG DATA
                                        ”
Donnerstag, 11. Oktober 12
httpd
                             /var/log/htaccess



                                 Flume


                                  HDFS



Donnerstag, 11. Oktober 12
5


Donnerstag, 11. Oktober 12
mysource           mysink




                                    mychannel




      myagent.sources = mysource
      myagent.sinks = mysink
      myagent.channels = mychannel




               6


Donnerstag, 11. Oktober 12
mysource           mysink




                                    mychannel




      myagent.sources.mysource.type = exec
      myagent.sources.mysource.command = tail -F /var/log/htaccess
      myagent.sources.mysource.channels = mychannel




               7


Donnerstag, 11. Oktober 12
mysource           mysink




                                    mychannel




      myagent.sinks.mysink.type = hdfs
      myagent.sinks.mysink.hdfs.path = /user/cloudera/htaccess
      myagent.sinks.mysink.hdfs.fileType = DataStream
      myagent.sinks.mysink.channel = mychannel




               8


Donnerstag, 11. Oktober 12
mysource           mysink




                                    mychannel




      myagent.channels.mychannel.type = memory
      myagent.channels.mychannel.capacity = 1000
      myagent.channels.mychannel.transactionCapactiy = 100




               9


Donnerstag, 11. Oktober 12
$ flume-ng agent --conf-file simple.conf --name myagent
      $ hadoop fs -ls htaccess
      -rw-r--r--   1 cloudera cloudera       1001 2012-09-30 05:58
      htaccess/FlumeData.1348999108529
      -rw-r--r--   1 cloudera cloudera        993 2012-09-30 05:58
      htaccess/FlumeData.1348999108530
      -rw-r--r--   1 cloudera cloudera        997 2012-09-30 05:59
      htaccess/FlumeData.1348999108531
      -rw-r--r--   1 cloudera cloudera       1009 2012-09-30 05:59
      htaccess/FlumeData.1348999108532
      ...




              10


Donnerstag, 11. Oktober 12
“          FLUME IS A DISTRIBUTED,
              RELIABLE, AND AVAILABLE
              SERVICE FOR
              EFFICIENTLY COLLECTING,
              AGGREGATING, AND
              MOVING LARGE AMOUNTS
              OF LOG DATA
                                        ”
Donnerstag, 11. Oktober 12
MULTI HOP




              12


Donnerstag, 11. Oktober 12
myagent1.sinks = mysink
      myagent1.sinks.mysink.type = avro
      myagent1.sinks.mysink.bind = 10.10.10.20
      myagent1.sinks.mysink.port = 4141

      myagent2.sources = mysource
      myagent2.sources.mysource.type = avro
      myagent2.sources.mysource.bind = 10.10.10.10
      myagent2.sources.mysource.port = 4141




              13


Donnerstag, 11. Oktober 12
CONSOLIDATION




              14


Donnerstag, 11. Oktober 12
MULTIPLEXING




              15


Donnerstag, 11. Oktober 12
Sources              Sinks    Channels

           Avro                 Avro     Memory

           Exec                 Logger   JDBC

           NetCat               IRC      File

           Sequence Generator   File

           Syslog               HBase

           Scribe




              16


Donnerstag, 11. Oktober 12
DEMO
        DEMO
        DEMO
        DEMO
        DEMO


Donnerstag, 11. Oktober 12
Thank you!
        kai@cloudera.com
        http://www.cloudera.com/




Donnerstag, 11. Oktober 12

More Related Content

Viewers also liked

Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)Cloudera, Inc.
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem GetInData
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingRapheephan Thongkham-Uan
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopGetInData
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteThe Hive
 
Apache Flume
Apache FlumeApache Flume
Apache FlumeGetInData
 
Streaming analytics better than batch when and why - (Big Data Tech 2017)
Streaming analytics better than batch   when and why - (Big Data Tech 2017)Streaming analytics better than batch   when and why - (Big Data Tech 2017)
Streaming analytics better than batch when and why - (Big Data Tech 2017)GetInData
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014francelabs
 

Viewers also liked (17)

Avro intro
Avro introAvro intro
Avro intro
 
Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 
HCatalog
HCatalogHCatalog
HCatalog
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in Hadoop
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom White
 
3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Streaming analytics better than batch when and why - (Big Data Tech 2017)
Streaming analytics better than batch   when and why - (Big Data Tech 2017)Streaming analytics better than batch   when and why - (Big Data Tech 2017)
Streaming analytics better than batch when and why - (Big Data Tech 2017)
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
 

Similar to Apache Flume NG

Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureRaven Tools
 
Optimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and ContentOptimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and ContentRaven Tools
 
Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)benwaine
 
Macruby - RubyConf Presentation 2010
Macruby - RubyConf Presentation 2010Macruby - RubyConf Presentation 2010
Macruby - RubyConf Presentation 2010Matt Aimonetti
 
Big Data & Cloud | Cloud Storage Simplified | Adrian Cole
Big Data & Cloud | Cloud Storage Simplified | Adrian ColeBig Data & Cloud | Cloud Storage Simplified | Adrian Cole
Big Data & Cloud | Cloud Storage Simplified | Adrian ColeJAX London
 
Rcos presentation
Rcos presentationRcos presentation
Rcos presentationmskmoorthy
 
soft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Cachingsoft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Cachingsoft-shake.ch
 
Lightning talks percona live mysql_2012
Lightning talks percona live mysql_2012Lightning talks percona live mysql_2012
Lightning talks percona live mysql_2012Giuseppe Maxia
 
Your first rails app - 2
 Your first rails app - 2 Your first rails app - 2
Your first rails app - 2Blazing Cloud
 
Adapt and respond: keeping responsive into the future
Adapt and respond: keeping responsive into the futureAdapt and respond: keeping responsive into the future
Adapt and respond: keeping responsive into the futureChris Mills
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsAndy Gross
 
Building A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionBuilding A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionPhil Cryer
 
How To Build A Web Service
How To Build A Web ServiceHow To Build A Web Service
How To Build A Web ServiceMoses Ngone
 
Interop 2011 - Scaling Platform As A Service
Interop 2011 - Scaling Platform As A ServiceInterop 2011 - Scaling Platform As A Service
Interop 2011 - Scaling Platform As A ServicePatrick Chanezon
 
TripCase Unit Testing with Jasmine
TripCase Unit Testing with JasmineTripCase Unit Testing with Jasmine
TripCase Unit Testing with JasmineStephen Pond
 
Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!Mark Jaquith
 
Drupal and the rise of the documents
Drupal and the rise of the documentsDrupal and the rise of the documents
Drupal and the rise of the documentsClaudio Beatrice
 
Big app design for Node.js
Big app design for Node.jsBig app design for Node.js
Big app design for Node.jsSergi Mansilla
 

Similar to Apache Flume NG (20)

Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & Structure
 
Optimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and ContentOptimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and Content
 
Fluentd and WebHDFS
Fluentd and WebHDFSFluentd and WebHDFS
Fluentd and WebHDFS
 
Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)
 
Macruby - RubyConf Presentation 2010
Macruby - RubyConf Presentation 2010Macruby - RubyConf Presentation 2010
Macruby - RubyConf Presentation 2010
 
Big Data & Cloud | Cloud Storage Simplified | Adrian Cole
Big Data & Cloud | Cloud Storage Simplified | Adrian ColeBig Data & Cloud | Cloud Storage Simplified | Adrian Cole
Big Data & Cloud | Cloud Storage Simplified | Adrian Cole
 
Rcos presentation
Rcos presentationRcos presentation
Rcos presentation
 
soft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Cachingsoft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Caching
 
Lightning talks percona live mysql_2012
Lightning talks percona live mysql_2012Lightning talks percona live mysql_2012
Lightning talks percona live mysql_2012
 
Your first rails app - 2
 Your first rails app - 2 Your first rails app - 2
Your first rails app - 2
 
Adapt and respond: keeping responsive into the future
Adapt and respond: keeping responsive into the futureAdapt and respond: keeping responsive into the future
Adapt and respond: keeping responsive into the future
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
 
Building A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionBuilding A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage Solution
 
How To Build A Web Service
How To Build A Web ServiceHow To Build A Web Service
How To Build A Web Service
 
Interop 2011 - Scaling Platform As A Service
Interop 2011 - Scaling Platform As A ServiceInterop 2011 - Scaling Platform As A Service
Interop 2011 - Scaling Platform As A Service
 
Cors michael
Cors michaelCors michael
Cors michael
 
TripCase Unit Testing with Jasmine
TripCase Unit Testing with JasmineTripCase Unit Testing with Jasmine
TripCase Unit Testing with Jasmine
 
Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!
 
Drupal and the rise of the documents
Drupal and the rise of the documentsDrupal and the rise of the documents
Drupal and the rise of the documents
 
Big app design for Node.js
Big app design for Node.jsBig app design for Node.js
Big app design for Node.js
 

More from huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

More from huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Apache Flume NG

  • 1. APACHE FLUME NG Kai Voigt, Cloudera Inc London, Hadoop User Group, 10 Oct 2012 Donnerstag, 11. Oktober 12
  • 2. FLUME IS A DISTRIBUTED, RELIABLE, AND AVAILABLE SERVICE FOR EFFICIENTLY COLLECTING, AGGREGATING, AND MOVING LARGE AMOUNTS OF LOG DATA ” Donnerstag, 11. Oktober 12
  • 3. FLUME IS A DISTRIBUTED, RELIABLE, AND AVAILABLE SERVICE FOR EFFICIENTLY COLLECTING, AGGREGATING, AND MOVING LARGE AMOUNTS OF LOG DATA ” Donnerstag, 11. Oktober 12
  • 4. httpd /var/log/htaccess Flume HDFS Donnerstag, 11. Oktober 12
  • 6. mysource mysink mychannel myagent.sources = mysource myagent.sinks = mysink myagent.channels = mychannel 6 Donnerstag, 11. Oktober 12
  • 7. mysource mysink mychannel myagent.sources.mysource.type = exec myagent.sources.mysource.command = tail -F /var/log/htaccess myagent.sources.mysource.channels = mychannel 7 Donnerstag, 11. Oktober 12
  • 8. mysource mysink mychannel myagent.sinks.mysink.type = hdfs myagent.sinks.mysink.hdfs.path = /user/cloudera/htaccess myagent.sinks.mysink.hdfs.fileType = DataStream myagent.sinks.mysink.channel = mychannel 8 Donnerstag, 11. Oktober 12
  • 9. mysource mysink mychannel myagent.channels.mychannel.type = memory myagent.channels.mychannel.capacity = 1000 myagent.channels.mychannel.transactionCapactiy = 100 9 Donnerstag, 11. Oktober 12
  • 10. $ flume-ng agent --conf-file simple.conf --name myagent $ hadoop fs -ls htaccess -rw-r--r-- 1 cloudera cloudera 1001 2012-09-30 05:58 htaccess/FlumeData.1348999108529 -rw-r--r-- 1 cloudera cloudera 993 2012-09-30 05:58 htaccess/FlumeData.1348999108530 -rw-r--r-- 1 cloudera cloudera 997 2012-09-30 05:59 htaccess/FlumeData.1348999108531 -rw-r--r-- 1 cloudera cloudera 1009 2012-09-30 05:59 htaccess/FlumeData.1348999108532 ... 10 Donnerstag, 11. Oktober 12
  • 11. FLUME IS A DISTRIBUTED, RELIABLE, AND AVAILABLE SERVICE FOR EFFICIENTLY COLLECTING, AGGREGATING, AND MOVING LARGE AMOUNTS OF LOG DATA ” Donnerstag, 11. Oktober 12
  • 12. MULTI HOP 12 Donnerstag, 11. Oktober 12
  • 13. myagent1.sinks = mysink myagent1.sinks.mysink.type = avro myagent1.sinks.mysink.bind = 10.10.10.20 myagent1.sinks.mysink.port = 4141 myagent2.sources = mysource myagent2.sources.mysource.type = avro myagent2.sources.mysource.bind = 10.10.10.10 myagent2.sources.mysource.port = 4141 13 Donnerstag, 11. Oktober 12
  • 14. CONSOLIDATION 14 Donnerstag, 11. Oktober 12
  • 15. MULTIPLEXING 15 Donnerstag, 11. Oktober 12
  • 16. Sources Sinks Channels Avro Avro Memory Exec Logger JDBC NetCat IRC File Sequence Generator File Syslog HBase Scribe 16 Donnerstag, 11. Oktober 12
  • 17. DEMO DEMO DEMO DEMO DEMO Donnerstag, 11. Oktober 12
  • 18. Thank you! kai@cloudera.com http://www.cloudera.com/ Donnerstag, 11. Oktober 12