• Share
  • Email
  • Embed
  • Like
  • Private Content
Flume office-hours-110228
 

Flume office-hours-110228

on

  • 2,513 views

This is the agenda for the flume office hours held on 2/28/2011 at Cloudera HQ

This is the agenda for the flume office hours held on 2/28/2011 at Cloudera HQ

Statistics

Views

Total Views
2,513
Views on SlideShare
1,915
Embed Views
598

Actions

Likes
2
Downloads
53
Comments
0

7 Embeds 598

http://www.cloudera.com 496
http://blog.cloudera.com 93
http://static.slidesharecdn.com 5
http://www.hanrss.com 1
http://jakeo.org 1
http://www.netvibes.com 1
https://blog.cloudera.com 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Flume office-hours-110228 Flume office-hours-110228 Presentation Transcript

    • Flume Office Hours
      Community planning
      Jonathan Hsieh
      Cloudera HQ, 2/28/2011
    • Outline
      State of the world
      What’s new?
      Stories (Chime in!)
      What needs work?
      Prioritizing what is next.
      Q+A
      3
      Flume Office Hours, 2/28/2011
    • State of the world
      Flume Office Hours, 2/28/2011
      4
    • Growing user and developer community
      Github stats:
      Currently 295 watchers, 51 forks
      New Committers:
      9/10: Eric Sammer (Cloudera)
      1/11: Bruce Mitchener (Independent)
      User characteristics
      Most potential users seem to use adhoc scripts
      Most users are early adopters / startup devops
      Flume Office Hours, 2/28/2011
      5
    • A short feature history
      6/10: v0.9.0
      Initial open source release
      8/10: v0.9.1
      Fixes for hangs
      Initial compression features
      10/10: v0.9.1+29 (CDH3b3, packages)
      Added kerberized HDFS support
      Flume cookbook
      Elastic Search / Cassandra Plugins
      Initial VoldemortPlugins
      11/10: v0.9.2
      Support for other compression codecs
      Avro RPC
      Improvements to tail and exec
      Robustness improvements
      Initial Hbase /MongoDBPlugin
      2/11: v0.9.3 (CDH3b4, packages)
      Flume Node Windows support
      Initial JSON metrics support
      Multi-master functional
      Robustness improvements
      JRuby / AMQP Plugins
      S3/EC2 Blog Stories
      4/11: v0.9.3+xxx (CDH3 Stable, packages)
      Excessive Duplication fixes
      Compression fixes
      ?/11: v0.9.4
      Flume Office Hours, 2/28/2011
      6
    • Whats new?
      Flume Office Hours, 2/28/2011
      7
    • New features
      Flume node JSON metrics
      http://node:35862/node/reports
      Terser syntax
      { deco1 => { deco2 => sink } }
      deco1 deco2 sink
      Multiple collector sink support
      collector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”),
      ] }
      Limited Multi-master support
      Windows support
      Flume Office Hours, 2/28/2011
      8
    • Stories
      9
      Flume Office Hours, 2/28/2011
    • : The Standard Use Case
      HDFS
      Flume
      Master
      Agent
      server
      Agent
      Collector
      server
      Agent
      server
      Agent
      server
      10
      Agent
      server
      Agent
      Collector
      server
      Agent
      server
      Agent
      server
      Agent
      server
      Agent
      Collector
      server
      Agent
      server
      Agent
      server
      Collector tier
      Agent tier
      Flume Office Hours, 2/28/2011
    • : Multi Datacenter
      11
      HDFS
      Collector tier
      Agent
      api
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      API server
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Agent
      api
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Agent
      api
      Agent
      api
      Agent
      api
      Agent
      Collector
      api
      Agent
      proc
      Agent
      api
      Processor server
      Agent
      Collector
      api
      Agent
      api
      Agent
      proc
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Agent
      proc
      Flume Office Hours, 2/28/2011
    • : Multi Datacenter
      12
      HDFS
      Collector tier
      Agent
      api
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      API server
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Agent
      api
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Agent
      api
      Relay
      Agent
      api
      Agent
      api
      Agent
      Collector
      api
      Agent
      proc
      Agent
      api
      Processor server
      Agent
      Collector
      api
      Agent
      api
      Agent
      proc
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Agent
      proc
      Flume Office Hours, 2/28/2011
    • : Near Realtime Aggregator
      13
      HDFS
      DB
      Flume
      Agent
      Ad svr
      Collector
      Tracker
      Agent
      Ad svr
      Agent
      Ad svr
      Agent
      Ad svr
      quick
      reports
      Hive job
      verify
      reports
      Flume Office Hours, 2/28/2011
    • An enterprise story
      14
      Kerberos HDFS
      Flume
      Collector tier
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Win
      api
      API server
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Linux
      api
      D
      D
      D
      D
      D
      D
      Agent
      api
      Agent
      Collector
      api
      Agent
      api
      Linux
      api
      Flume Office Hours, 2/28/2011
      Active Directory
      / LDAP
    • An emerging community story
      15
      HDFS
      HBase
      Incremental Search Idx
      Flume
      Agent
      Hive query
      Agent
      Agent
      Collector
      Fanout
      index
      hbase
      hdfs
      Agent
      svr
      Pig query
      Key lookup
      Range query
      Search query
      Faceted query
      Flume Office Hours, 2/28/2011
    • What needs work?What comes next?
      Flume Office Hours, 2/28/2011
      16
    • Known issues
      Excessive event duplication (due to tail or e2e agent)
      Configuration translation problem in some cases
      Multi-master limited: doesn’t work with translations
      Flume Office Hours, 2/28/2011
      17
    • What’s next? (proposals)
      Fix Excessive duplication issues.
      Apache Incubator (?)
      Log4j/Log4net/logback/etc…
      Fix Multi-master limitations.
      Security upgrades for node to node comms (TLS/SSL)
      Improved metrics / GUI / usability
      Integration with open source alerting/monitoring tools
      Integration with proprietary systems
      Version proofing RPCs / State storage
      Packaging friendly plug-in install
      Multi Datacenter Story
      Performance Increases
      Inline near-realtime analytics
      Puppet/Chef style config for nodes
      Lightweight Agent
      Masterless Agent
      Better S3 / AWS support
      Flume Office Hours, 2/28/2011
      18
    • Q+A
      19
      Flume Office Hours, 2/28/2011