• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache Flume (NG)
 

Apache Flume (NG)

on

  • 10,550 views

Apache FlumeNG presentation, held in Stuttgart. Includes coding and configuration examples

Apache FlumeNG presentation, held in Stuttgart. Includes coding and configuration examples

Statistics

Views

Total Views
10,550
Views on SlideShare
10,191
Embed Views
359

Actions

Likes
13
Downloads
252
Comments
0

25 Embeds 359

http://tedwon.com 112
http://mapredit.blogspot.de 74
http://mapredit.blogspot.com 70
http://mapredit.blogspot.in 38
http://mapredit.blogspot.fr 12
http://cluster1.cafe.daum.net 9
http://www.docseek.net 5
http://mapredit.blogspot.kr 5
http://mapredit.blogspot.co.uk 5
http://cafe.daum.net 4
http://www.linkedin.com 4
http://mapredit.blogspot.tw 3
http://mapredit.blogspot.co.il 2
http://mapredit.blogspot.com.au 2
http://mapredit.blogspot.ca 2
http://mapredit.blogspot.ru 2
http://mapredit.blogspot.nl 2
http://mapredit.blogspot.it 1
https://twitter.com 1
http://www.slashdocs.com 1
http://mapredit.blogspot.hk 1
http://mapredit.blogspot.gr 1
http://mapredit.blogspot.co.at 1
http://mapredit.blogspot.se 1
http://mapredit.blogspot.ch 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Apache Flume (NG) Apache Flume (NG) Presentation Transcript

    • March 2012Apache Flume (NG)Alexander Lorenz | Customer Operations Engineer
    • Overview • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, … • Sinks: HDFS files, HBase, … • Configurable reliability levels – Best effort: “Fast and loose” – Guaranteed delivery: “Deliver no matter what” • Configurable routing / topology2 ©2012 Cloudera, Inc. All Rights Reserved.
    • Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB.3 ©2012 Cloudera, Inc. All Rights Reserved.
    • Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – But adding RAM increases performance4 ©2012 Cloudera, Inc. All Rights Reserved.
    • Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, …5 ©2012 Cloudera, Inc. All Rights Reserved.
    • Sources public class MySource implements PollableSource { public Status process() { // Do something to create an Event.. Event e = EventBuilder.withBody(…).build(); // A channel instance is injected by Flume. Transaction tx = channel.getTransaction(); tx.begin(); try { channel.put(e); tx.commit(); } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } }6 ©2012 Cloudera, Inc. All Rights Reserved.
    • Channel • Plugin interface • Transactional • Provide queuing between source / sink • Provide reliability semantics – MemoryChannel: Basically a Java BlockingQueue. – JDBC – WAL7 ©2012 Cloudera, Inc. All Rights Reserved.
    • Sinks • Plugin interface • Managed by a SinkRunner that controls threading and execution model • Included: HDFS files (various formats)8 ©2012 Cloudera, Inc. All Rights Reserved.
    • Sinks Code public class MySink implements PollableSink { public Status process() { Transaction tx = channel.getTransaction(); tx.begin(); try { Event e = channel.take(); if (e != null) { // … tx.commit(); } else { return Status.BACKOFF; } } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } }9 ©2012 Cloudera, Inc. All Rights Reserved.
    • Tiered collection • Send events from agents to another tier of agents to aggregate • Use an Avro sink (really just a client) to send events to an Avro source (really just a server) in another machine • Failover supported • Load balancing (soon) • Transactions guarantee handoff10 ©2012 Cloudera, Inc. All Rights Reserved.
    • Tiered collection – The handoff • Agent 1: Tx begin • Agent 1: Channel take event • Agent 1: Sink send • Agent 2: Tx begin • Agent 2: Channel put • Agent 2: Tx commit, respond OK • Agent 1: Tx commit (or rollback)11 ©2012 Cloudera, Inc. All Rights Reserved.
    • Configuration • done in a single file • identifier for flows • <identifier>.type.subtype.parameter.config, where <identifier> is the name of the agent • flow has a client, type, channel, sink • can have multiple channels in one flow12 ©2012 Cloudera, Inc. All Rights Reserved.
    • Simple Configuration Example syslog-agent.sources = Syslogsyslog-agent.channels = MemoryChannel-1syslog-agent.sinks = Consolesyslog-agent.sources.Syslog.type = syslogTcpsyslog-agent.sources.Syslog.port = 5140syslog-agent.sources.Syslog.channels =MemoryChannel-1syslog-agent.sinks.Console.channel = MemoryChannel-1syslog-agent.sinks.Console.type = loggersyslog-agent.channels.MemoryChannel-1.type = memory13 ©2012 Cloudera, Inc. All Rights Reserved.
    • HDFS Configuration Example syslog-agent.sources = Syslogsyslog-agent.channels = MemoryChannel-1syslog-agent.sinks = HDFS-LABsyslog-agent.sources.Syslog.type = syslogTcpsyslog-agent.sources.Syslog.port = 5140syslog-agent.sources.Syslog.channels = MemoryChannel-1syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1syslog-agent.sinks.HDFS-LAB.type = hdfssyslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/flumetest/%{host}syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfilessyslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFilesyslog-agent.channels.MemoryChannel-1.type = memory 14 ©2012 Cloudera, Inc. All Rights Reserved.
    • Features • Fan out: One source, many channels • Fan in: Many sources, one channel • Processors (aka: decorators) • Auto-batching of events in RPCs… • Multiplexing channels for datamining • Avro implementation in both ways15 ©2012 Cloudera, Inc. All Rights Reserved.
    • Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit16 ©2012 Cloudera, Inc. All Rights Reserved.