SlideShare a Scribd company logo
March 2012
Apache Flume (NG)
Alexander Lorenz | Customer Operations Engineer
Overview

    • Stream data (events, not files) from clients
      to sinks
    • Clients: files, syslog, avro, …
    • Sinks: HDFS files, HBase, …
    • Configurable reliability levels
      – Best effort: “Fast and loose”
      – Guaranteed delivery: “Deliver no matter what”
    • Configurable routing / topology

2                                    ©2012
                       Cloudera, Inc. All Rights Reserved.
Architecture
    Component   Function

    Agent       The JVM running Flume. One per machine. Runs
                many sources and sinks.
    Client      Produces data in the form of events. Runs in a
                separate thread.
    Sink        Receives events from a channel. Runs in a separate
                thread.
    Channel     Connects sources to sinks (like a queue).
                Implements the reliability semantics.
    Event       A single datum; a log record, an avro object, etc.
                Normally around ~4KB.




3                                 ©2012
                    Cloudera, Inc. All Rights Reserved.
Agent

    • Runs many clients and sinks
    • Java properties-based configuration
    • Low overhead (-Xmx20m)
      – But adding RAM increases performance




4                                   ©2012
                      Cloudera, Inc. All Rights Reserved.
Sources

    • Plugin interface
    • Managed by a SourceRunner that controls
      threading and execution model (e.g. polling
      vs. event-based)
    • Included: exec, avro, syslog, …




5                                   ©2012
                      Cloudera, Inc. All Rights Reserved.
Sources
    public class MySource implements PollableSource {
      public Status process() {
        // Do something to create an Event..
        Event e = EventBuilder.withBody(…).build();
        // A channel instance is injected by Flume.
        Transaction tx = channel.getTransaction();
        tx.begin();
        try {
          channel.put(e);
          tx.commit();
        } catch (ChannelException ex) {
          tx.rollback();
          return Status.BACKOFF;
        } finally {
          tx.close();
        }
        return Status.READY;
      }
    }




6                                          ©2012
                             Cloudera, Inc. All Rights Reserved.
Channel

    •   Plugin interface
    •   Transactional
    •   Provide queuing between source / sink
    •   Provide reliability semantics
        – MemoryChannel: Basically a Java
          BlockingQueue.
        – JDBC
        – WAL


7                                     ©2012
                        Cloudera, Inc. All Rights Reserved.
Sinks

    • Plugin interface
    • Managed by a SinkRunner that controls
      threading and execution model
    • Included: HDFS files (various formats)




8                                  ©2012
                     Cloudera, Inc. All Rights Reserved.
Sinks Code
    public class MySink implements PollableSink {
      public Status process() {
        Transaction tx = channel.getTransaction();
        tx.begin();
        try {
          Event e = channel.take();
          if (e != null) {
            // …
            tx.commit();
          } else {
            return Status.BACKOFF;
          }
        } catch (ChannelException ex) {
          tx.rollback();
          return Status.BACKOFF;
        } finally {
          tx.close();
        }
        return Status.READY;
      }
    }




9                                          ©2012
                             Cloudera, Inc. All Rights Reserved.
Tiered collection

 • Send events from agents to another tier of
   agents to aggregate
 • Use an Avro sink (really just a client) to
   send events to an Avro source (really just a
   server) in another machine
 • Failover supported
 • Load balancing (soon)
 • Transactions guarantee handoff

10                               ©2012
                   Cloudera, Inc. All Rights Reserved.
Tiered collection – The handoff

 •   Agent 1: Tx begin
 •   Agent 1: Channel take event
 •   Agent 1: Sink send
 •   Agent 2: Tx begin
 •   Agent 2: Channel put
 •   Agent 2: Tx commit, respond OK
 •   Agent 1: Tx commit (or rollback)


11                                 ©2012
                     Cloudera, Inc. All Rights Reserved.
Configuration

 • done in a single file
 • identifier for flows
 • <identifier>.type.subtype.parameter.config,
   where <identifier> is the name of the agent
 • flow has a client, type, channel, sink
 • can have multiple channels in one flow




12                                ©2012
                    Cloudera, Inc. All Rights Reserved.
Simple Configuration Example
   syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = Console

syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140

syslog-agent.sources.Syslog.channels =
MemoryChannel-1
syslog-agent.sinks.Console.channel = MemoryChannel-1

syslog-agent.sinks.Console.type = logger
syslog-agent.channels.MemoryChannel-1.type = memory



13                                 ©2012
                     Cloudera, Inc. All Rights Reserved.
HDFS Configuration Example
   syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = HDFS-LAB

syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140

syslog-agent.sources.Syslog.channels = MemoryChannel-1
syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1

syslog-agent.sinks.HDFS-LAB.type = hdfs

syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/
flumetest/'%{host}''
syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles
syslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60
syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFile
syslog-agent.channels.MemoryChannel-1.type = memory


 14                                     ©2012
                          Cloudera, Inc. All Rights Reserved.
Features

 •   Fan out: One source, many channels
 •   Fan in: Many sources, one channel
 •   Processors (aka: decorators)
 •   Auto-batching of events in RPCs…
 •   Multiplexing channels for datamining
 •   Avro implementation in both ways



15                                 ©2012
                     Cloudera, Inc. All Rights Reserved.
Thank You

 • Web: https://cwiki.apache.org/FLUME/
   getting-started.html
 • ML: flume-user@incubator.apache.org

 • Mail: alexander@cloudera.com
 • Blog: mapredit.blogspot.com
 • Twitter: @mapredit


16                              ©2012
                  Cloudera, Inc. All Rights Reserved.

More Related Content

What's hot

Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
dwmclary
 
Apache flume
Apache flumeApache flume
Apache flume
Ramakrishna kapa
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
Alexander Alten
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
Arinto Murdopo
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
Rapheephan Thongkham-Uan
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
Ratnakar Pawar
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
Arvind Prabhakar
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
GetInData
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
Cloudera, Inc.
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016Jayesh Thakrar
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
DataWorks Summit
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
StreamNative
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
Omid Vahdaty
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Christophe Marchal
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
Alexander Alten
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
JinfengHuang3
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
StreamNative
 

What's hot (20)

Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
Apache flume
Apache flumeApache flume
Apache flume
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 

Similar to Apache Flume (NG)

Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
Hari Shreedharan
 
Play Framework
Play FrameworkPlay Framework
Play Framework
Harinath Krishnamoorthy
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server InternalsPraveen Gollakota
 
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementOracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
MarketingArrowECS_CZ
 
How we scale DroneCi on demand
How we scale DroneCi on demandHow we scale DroneCi on demand
How we scale DroneCi on demand
Patrick Jahns
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
DataWorks Summit
 
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingLessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networking
Tony Georgiev
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Lucidworks (Archived)
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
Bryan Bende
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
DataWorks Summit
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAkshaya Mahapatra
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
Julien Vermillard
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
Amazon Web Services
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
Globus
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
Docker, Inc.
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
Yevgeniy Brikman
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Docker, Inc.
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
Eran Harel
 

Similar to Apache Flume (NG) (20)

Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Play Framework
Play FrameworkPlay Framework
Play Framework
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementOracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
 
How we scale DroneCi on demand
How we scale DroneCi on demandHow we scale DroneCi on demand
How we scale DroneCi on demand
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingLessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networking
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
signalr
signalrsignalr
signalr
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
 
Weblogic
WeblogicWeblogic
Weblogic
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
 

More from Alexander Alten

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
Alexander Alten
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
Alexander Alten
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
Alexander Alten
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
Alexander Alten
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
Alexander Alten
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
Alexander Alten
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Alexander Alten
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
Alexander Alten
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)Alexander Alten
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache HadoopAlexander Alten
 

More from Alexander Alten (10)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Apache Flume (NG)

  • 1. March 2012 Apache Flume (NG) Alexander Lorenz | Customer Operations Engineer
  • 2. Overview • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, … • Sinks: HDFS files, HBase, … • Configurable reliability levels – Best effort: “Fast and loose” – Guaranteed delivery: “Deliver no matter what” • Configurable routing / topology 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB. 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – But adding RAM increases performance 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, … 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. Sources public class MySource implements PollableSource { public Status process() { // Do something to create an Event.. Event e = EventBuilder.withBody(…).build(); // A channel instance is injected by Flume. Transaction tx = channel.getTransaction(); tx.begin(); try { channel.put(e); tx.commit(); } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Channel • Plugin interface • Transactional • Provide queuing between source / sink • Provide reliability semantics – MemoryChannel: Basically a Java BlockingQueue. – JDBC – WAL 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. Sinks • Plugin interface • Managed by a SinkRunner that controls threading and execution model • Included: HDFS files (various formats) 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. Sinks Code public class MySink implements PollableSink { public Status process() { Transaction tx = channel.getTransaction(); tx.begin(); try { Event e = channel.take(); if (e != null) { // … tx.commit(); } else { return Status.BACKOFF; } } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. Tiered collection • Send events from agents to another tier of agents to aggregate • Use an Avro sink (really just a client) to send events to an Avro source (really just a server) in another machine • Failover supported • Load balancing (soon) • Transactions guarantee handoff 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. Tiered collection – The handoff • Agent 1: Tx begin • Agent 1: Channel take event • Agent 1: Sink send • Agent 2: Tx begin • Agent 2: Channel put • Agent 2: Tx commit, respond OK • Agent 1: Tx commit (or rollback) 11 ©2012 Cloudera, Inc. All Rights Reserved.
  • 12. Configuration • done in a single file • identifier for flows • <identifier>.type.subtype.parameter.config, where <identifier> is the name of the agent • flow has a client, type, channel, sink • can have multiple channels in one flow 12 ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. Simple Configuration Example syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = Console syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.Console.channel = MemoryChannel-1 syslog-agent.sinks.Console.type = logger syslog-agent.channels.MemoryChannel-1.type = memory 13 ©2012 Cloudera, Inc. All Rights Reserved.
  • 14. HDFS Configuration Example syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = HDFS-LAB syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.type = hdfs syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/ flumetest/'%{host}'' syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles syslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60 syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFile syslog-agent.channels.MemoryChannel-1.type = memory 14 ©2012 Cloudera, Inc. All Rights Reserved.
  • 15. Features • Fan out: One source, many channels • Fan in: Many sources, one channel • Processors (aka: decorators) • Auto-batching of events in RPCs… • Multiplexing channels for datamining • Avro implementation in both ways 15 ©2012 Cloudera, Inc. All Rights Reserved.
  • 16. Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit 16 ©2012 Cloudera, Inc. All Rights Reserved.