Apache Flume
Upcoming SlideShare
Loading in...5
×
 

Apache Flume

on

  • 9,900 views

Brief description of Apache Flume 0.9.x for EEDC assignment

Brief description of Apache Flume 0.9.x for EEDC assignment

Statistics

Views

Total Views
9,900
Views on SlideShare
9,237
Embed Views
663

Actions

Likes
12
Downloads
438
Comments
0

2 Embeds 663

http://otnira.wordpress.com 499
http://www.otnira.com 164

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Apache Flume Apache Flume Presentation Transcript

  • Arinto Murdopo Josep Subirats Group 4 EEDC 2012
  • Outline● Current problem● What is Apache Flume?● The Flume Model ○ Flows and Nodes ○ Agent, Processor and Collector Nodes ○ Data and Control Path● Flume goals ○ Reliability ○ Scalability ○ Extensibility ○ Manageability● Use case: Near Realtime Aggregator 
  • Current Problem● Situation:You have hundreds of services running in different serversthat produce lots of large logs which should be analyzedaltogether. You have Hadoop to process them. ● Problem:How do I send all my logs to a place that has Hadoop? Ineed a reliable, scalable, extensible and manageable wayto do it! View slide
  • What is Apache Flume?● It is a distributed data collection service that gets flows of data (like logs) from their source and aggregates them to where they have to be processed.● Goals: reliability, scalability, extensibility, manageability. Exactly what I needed! View slide
  • The Flume Model: Flows and Nodes● A flow corresponds to a type of data source (server logs, machine monitoring metrics...).● Flows are comprised of nodes chained together (see slide 7).
  • The Flume Model: Flows and Nodes● In a Node, data come in through a source... ...are optionally processed by one or more decorators... ...and then are transmitted out via a sink.   Examples: Console, Exec, Syslog, IRC, Twitter, other nodes...   Examples: Console, local files, HDFS, S3, other nodes...   Examples: wire batching, compression, sampling, projection, extraction...
  • The Flume Model: Agent, Processor andCollector Nodes● Agent: receives data from an application. ● Processor (optional): intermediate processing. ● Collector: write data to permanent storage.
  • The Flume Model: Data and ControlPath (1/2)Nodes are in the data path.
  • The Flume Model: Data and ControlPath (2/2)Masters are in the control path.● Centralized point of configuration. Multiple: ZK.● Specify sources, sinks and control data flows.
  • Flume Goals: ReliabilityTunable Failure Recovery Modes ● Best Effort ● Store on Failure and Retry ● End to End Reliability
  • Flume Goals: ScalabilityHorizontally Scalable Data PathLoad Balancing
  • Flume Goals: ScalabilityHorizontally Scalable Control Path
  • Flume Goals: Extensibility● Simple Source and Sink API ○ Event streaming and composition of simple operation  ● Plug in Architecture ○ Add your own sources, sinks, decorators    
  • Flume Goals: ManageabilityCentralized Data Flow Management Interface 
  • Flume Goals: ManageabilityConfiguring Flume   Node: tail(“file”) | filter [ console, roll (1000) { dfs(“hdfs://namenode/user/flume”) } ] ;Output Bucketing  /logs/web/2010/0715/1200/data-xxx.txt /logs/web/2010/0715/1200/data-xxy.txt /logs/web/2010/0715/1300/data-xxx.txt  /logs/web/2010/0715/1300/data-xxy.txt /logs/web/2010/0715/1400/data-xxx.txt
  • Use Case: Near Realtime Aggregator
  • ConclusionFlume is● Distributed data collection service ● Suitable for enterprise setting ● Large amount of log data to process
  • Q&AQuestions to be unveiled?  
  • References● http://www.cloudera. com/resource/chicago_data_summit_flume_an_introduction_jonathan_hsie h_hadoop_log_processing/● http://www.slideshare.net/cloudera/inside-flume● http://www.slideshare.net/cloudera/flume-intro100715● http://www.slideshare.net/cloudera/flume-austin-hug-21711