Flume office-hours-110228
 

Flume office-hours-110228

on

  • 2,555 views

This is the agenda for the flume office hours held on 2/28/2011 at Cloudera HQ

This is the agenda for the flume office hours held on 2/28/2011 at Cloudera HQ

Statistics

Views

Total Views
2,555
Views on SlideShare
1,956
Embed Views
599

Actions

Likes
2
Downloads
53
Comments
0

7 Embeds 599

http://www.cloudera.com 496
http://blog.cloudera.com 94
http://static.slidesharecdn.com 5
http://www.hanrss.com 1
http://jakeo.org 1
http://www.netvibes.com 1
https://blog.cloudera.com 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Flume office-hours-110228 Flume office-hours-110228 Presentation Transcript

  • Flume Office Hours
    Community planning
    Jonathan Hsieh
    Cloudera HQ, 2/28/2011
  • Outline
    State of the world
    What’s new?
    Stories (Chime in!)
    What needs work?
    Prioritizing what is next.
    Q+A
    3
    Flume Office Hours, 2/28/2011
  • State of the world
    Flume Office Hours, 2/28/2011
    4
  • Growing user and developer community
    Github stats:
    Currently 295 watchers, 51 forks
    New Committers:
    9/10: Eric Sammer (Cloudera)
    1/11: Bruce Mitchener (Independent)
    User characteristics
    Most potential users seem to use adhoc scripts
    Most users are early adopters / startup devops
    Flume Office Hours, 2/28/2011
    5
  • A short feature history
    6/10: v0.9.0
    Initial open source release
    8/10: v0.9.1
    Fixes for hangs
    Initial compression features
    10/10: v0.9.1+29 (CDH3b3, packages)
    Added kerberized HDFS support
    Flume cookbook
    Elastic Search / Cassandra Plugins
    Initial VoldemortPlugins
    11/10: v0.9.2
    Support for other compression codecs
    Avro RPC
    Improvements to tail and exec
    Robustness improvements
    Initial Hbase /MongoDBPlugin
    2/11: v0.9.3 (CDH3b4, packages)
    Flume Node Windows support
    Initial JSON metrics support
    Multi-master functional
    Robustness improvements
    JRuby / AMQP Plugins
    S3/EC2 Blog Stories
    4/11: v0.9.3+xxx (CDH3 Stable, packages)
    Excessive Duplication fixes
    Compression fixes
    ?/11: v0.9.4
    Flume Office Hours, 2/28/2011
    6
  • Whats new?
    Flume Office Hours, 2/28/2011
    7
  • New features
    Flume node JSON metrics
    http://node:35862/node/reports
    Terser syntax
    { deco1 => { deco2 => sink } }
    deco1 deco2 sink
    Multiple collector sink support
    collector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”),
    ] }
    Limited Multi-master support
    Windows support
    Flume Office Hours, 2/28/2011
    8
  • Stories
    9
    Flume Office Hours, 2/28/2011
  • : The Standard Use Case
    HDFS
    Flume
    Master
    Agent
    server
    Agent
    Collector
    server
    Agent
    server
    Agent
    server
    10
    Agent
    server
    Agent
    Collector
    server
    Agent
    server
    Agent
    server
    Agent
    server
    Agent
    Collector
    server
    Agent
    server
    Agent
    server
    Collector tier
    Agent tier
    Flume Office Hours, 2/28/2011
  • : Multi Datacenter
    11
    HDFS
    Collector tier
    Agent
    api
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    API server
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Agent
    api
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Agent
    api
    Agent
    api
    Agent
    api
    Agent
    Collector
    api
    Agent
    proc
    Agent
    api
    Processor server
    Agent
    Collector
    api
    Agent
    api
    Agent
    proc
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Agent
    proc
    Flume Office Hours, 2/28/2011
  • : Multi Datacenter
    12
    HDFS
    Collector tier
    Agent
    api
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    API server
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Agent
    api
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Agent
    api
    Relay
    Agent
    api
    Agent
    api
    Agent
    Collector
    api
    Agent
    proc
    Agent
    api
    Processor server
    Agent
    Collector
    api
    Agent
    api
    Agent
    proc
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Agent
    proc
    Flume Office Hours, 2/28/2011
  • : Near Realtime Aggregator
    13
    HDFS
    DB
    Flume
    Agent
    Ad svr
    Collector
    Tracker
    Agent
    Ad svr
    Agent
    Ad svr
    Agent
    Ad svr
    quick
    reports
    Hive job
    verify
    reports
    Flume Office Hours, 2/28/2011
  • An enterprise story
    14
    Kerberos HDFS
    Flume
    Collector tier
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Win
    api
    API server
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Linux
    api
    D
    D
    D
    D
    D
    D
    Agent
    api
    Agent
    Collector
    api
    Agent
    api
    Linux
    api
    Flume Office Hours, 2/28/2011
    Active Directory
    / LDAP
  • An emerging community story
    15
    HDFS
    HBase
    Incremental Search Idx
    Flume
    Agent
    Hive query
    Agent
    Agent
    Collector
    Fanout
    index
    hbase
    hdfs
    Agent
    svr
    Pig query
    Key lookup
    Range query
    Search query
    Faceted query
    Flume Office Hours, 2/28/2011
  • What needs work?What comes next?
    Flume Office Hours, 2/28/2011
    16
  • Known issues
    Excessive event duplication (due to tail or e2e agent)
    Configuration translation problem in some cases
    Multi-master limited: doesn’t work with translations
    Flume Office Hours, 2/28/2011
    17
  • What’s next? (proposals)
    Fix Excessive duplication issues.
    Apache Incubator (?)
    Log4j/Log4net/logback/etc…
    Fix Multi-master limitations.
    Security upgrades for node to node comms (TLS/SSL)
    Improved metrics / GUI / usability
    Integration with open source alerting/monitoring tools
    Integration with proprietary systems
    Version proofing RPCs / State storage
    Packaging friendly plug-in install
    Multi Datacenter Story
    Performance Increases
    Inline near-realtime analytics
    Puppet/Chef style config for nodes
    Lightweight Agent
    Masterless Agent
    Better S3 / AWS support
    Flume Office Hours, 2/28/2011
    18
  • Q+A
    19
    Flume Office Hours, 2/28/2011