Your SlideShare is downloading. ×
0
Flume Office Hours<br />Community planning<br />Jonathan Hsieh<br />Cloudera HQ, 2/28/2011<br />
Outline<br />State of the world<br />What’s new?<br />Stories (Chime in!)<br />What needs work?<br />Prioritizing what is ...
State of the world<br />Flume Office Hours, 2/28/2011<br />4<br />
Growing user and developer community <br />Github stats:<br />Currently 295 watchers, 51 forks<br />New Committers: <br />...
A short feature history<br />6/10: v0.9.0 <br />Initial open source release<br />8/10: v0.9.1 <br />Fixes for hangs <br />...
Whats new?<br />Flume Office Hours, 2/28/2011<br />7<br />
New features<br />Flume node JSON metrics<br />http://node:35862/node/reports<br />Terser syntax<br />{ deco1 => { deco2 =...
Stories<br />9<br />Flume Office Hours, 2/28/2011<br />
                : The Standard Use Case<br />HDFS<br />Flume<br />Master<br />Agent<br />server<br />Agent<br />Collector<...
                       : Multi Datacenter<br />11<br />HDFS<br />Collector tier<br />Agent<br />api<br />Agent<br />api<br...
                       : Multi Datacenter<br />12<br />HDFS<br />Collector tier<br />Agent<br />api<br />Agent<br />api<br...
             : Near Realtime Aggregator<br />13<br />HDFS<br />DB<br />Flume<br />Agent<br />Ad svr<br />Collector<br />Tr...
An enterprise story<br />14<br />Kerberos HDFS<br />Flume<br />Collector tier<br />Agent<br />api<br />Agent<br />Collecto...
An emerging community story<br />15<br />HDFS<br />HBase<br />Incremental Search Idx<br />Flume<br />Agent<br />Hive query...
What needs work?What comes next?<br />Flume Office Hours, 2/28/2011<br />16<br />
Known issues<br />Excessive event duplication (due to tail or e2e agent)<br />Configuration translation problem in some ca...
What’s next? (proposals)<br />Fix Excessive duplication issues.<br />Apache Incubator (?)<br />Log4j/Log4net/logback/etc…<...
Q+A<br />19<br />Flume Office Hours, 2/28/2011<br />
Flume office-hours-110228
Upcoming SlideShare
Loading in...5
×

Flume office-hours-110228

2,584

Published on

This is the agenda for the flume office hours held on 2/28/2011 at Cloudera HQ

Published in: Technology, Health & Medicine
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,584
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
54
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Flume office-hours-110228"

  1. 1.
  2. 2. Flume Office Hours<br />Community planning<br />Jonathan Hsieh<br />Cloudera HQ, 2/28/2011<br />
  3. 3. Outline<br />State of the world<br />What’s new?<br />Stories (Chime in!)<br />What needs work?<br />Prioritizing what is next.<br />Q+A<br />3<br />Flume Office Hours, 2/28/2011<br />
  4. 4. State of the world<br />Flume Office Hours, 2/28/2011<br />4<br />
  5. 5. Growing user and developer community <br />Github stats:<br />Currently 295 watchers, 51 forks<br />New Committers: <br />9/10: Eric Sammer (Cloudera)<br />1/11: Bruce Mitchener (Independent)<br />User characteristics<br />Most potential users seem to use adhoc scripts<br />Most users are early adopters / startup devops<br />Flume Office Hours, 2/28/2011<br />5<br />
  6. 6. A short feature history<br />6/10: v0.9.0 <br />Initial open source release<br />8/10: v0.9.1 <br />Fixes for hangs <br />Initial compression features<br />10/10: v0.9.1+29 (CDH3b3, packages)<br />Added kerberized HDFS support<br />Flume cookbook<br />Elastic Search / Cassandra Plugins<br />Initial VoldemortPlugins<br />11/10: v0.9.2<br />Support for other compression codecs<br />Avro RPC<br />Improvements to tail and exec<br />Robustness improvements<br />Initial Hbase /MongoDBPlugin<br />2/11: v0.9.3 (CDH3b4, packages)<br />Flume Node Windows support<br />Initial JSON metrics support<br />Multi-master functional<br />Robustness improvements<br />JRuby / AMQP Plugins<br />S3/EC2 Blog Stories<br />4/11: v0.9.3+xxx (CDH3 Stable, packages)<br />Excessive Duplication fixes<br />Compression fixes<br />?/11: v0.9.4<br />Flume Office Hours, 2/28/2011<br />6<br />
  7. 7. Whats new?<br />Flume Office Hours, 2/28/2011<br />7<br />
  8. 8. New features<br />Flume node JSON metrics<br />http://node:35862/node/reports<br />Terser syntax<br />{ deco1 => { deco2 => sink } } <br />deco1 deco2 sink <br />Multiple collector sink support<br />collector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”), <br />] }<br />Limited Multi-master support<br />Windows support<br />Flume Office Hours, 2/28/2011<br />8<br />
  9. 9. Stories<br />9<br />Flume Office Hours, 2/28/2011<br />
  10. 10. : The Standard Use Case<br />HDFS<br />Flume<br />Master<br />Agent<br />server<br />Agent<br />Collector<br />server<br />Agent<br />server<br />Agent<br />server<br />10<br />Agent<br />server<br />Agent<br />Collector<br />server<br />Agent<br />server<br />Agent<br />server<br />Agent<br />server<br />Agent<br />Collector<br />server<br />Agent<br />server<br />Agent<br />server<br />Collector tier<br />Agent tier<br />Flume Office Hours, 2/28/2011<br />
  11. 11. : Multi Datacenter<br />11<br />HDFS<br />Collector tier<br />Agent<br />api<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />API server<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />api<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />api<br />Agent<br />api<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />proc<br />Agent<br />api<br />Processor server<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />proc<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />proc<br />Flume Office Hours, 2/28/2011<br />
  12. 12. : Multi Datacenter<br />12<br />HDFS<br />Collector tier<br />Agent<br />api<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />API server<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />api<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />api<br />Relay<br />Agent<br />api<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />proc<br />Agent<br />api<br />Processor server<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />proc<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Agent<br />proc<br />Flume Office Hours, 2/28/2011<br />
  13. 13. : Near Realtime Aggregator<br />13<br />HDFS<br />DB<br />Flume<br />Agent<br />Ad svr<br />Collector<br />Tracker <br />Agent<br />Ad svr<br />Agent<br />Ad svr<br />Agent<br />Ad svr<br />quick<br />reports<br />Hive job<br />verify<br />reports<br />Flume Office Hours, 2/28/2011<br />
  14. 14. An enterprise story<br />14<br />Kerberos HDFS<br />Flume<br />Collector tier<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Win<br />api<br />API server<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Linux<br />api<br />D<br />D<br />D<br />D<br />D<br />D<br />Agent<br />api<br />Agent<br />Collector<br />api<br />Agent<br />api<br />Linux<br />api<br />Flume Office Hours, 2/28/2011<br />Active Directory<br /> / LDAP<br />
  15. 15. An emerging community story<br />15<br />HDFS<br />HBase<br />Incremental Search Idx<br />Flume<br />Agent<br />Hive query<br />Agent<br />Agent<br />Collector<br />Fanout<br />index<br />hbase<br />hdfs<br />Agent<br />svr<br />Pig query<br />Key lookup<br />Range query<br />Search query<br />Faceted query<br />Flume Office Hours, 2/28/2011<br />
  16. 16. What needs work?What comes next?<br />Flume Office Hours, 2/28/2011<br />16<br />
  17. 17. Known issues<br />Excessive event duplication (due to tail or e2e agent)<br />Configuration translation problem in some cases<br />Multi-master limited: doesn’t work with translations<br />Flume Office Hours, 2/28/2011<br />17<br />
  18. 18. What’s next? (proposals)<br />Fix Excessive duplication issues.<br />Apache Incubator (?)<br />Log4j/Log4net/logback/etc…<br />Fix Multi-master limitations.<br />Security upgrades for node to node comms (TLS/SSL)<br />Improved metrics / GUI / usability<br />Integration with open source alerting/monitoring tools<br />Integration with proprietary systems<br />Version proofing RPCs / State storage<br />Packaging friendly plug-in install<br />Multi Datacenter Story<br />Performance Increases<br />Inline near-realtime analytics<br />Puppet/Chef style config for nodes<br />Lightweight Agent<br />Masterless Agent<br />Better S3 / AWS support<br />Flume Office Hours, 2/28/2011<br />18<br />
  19. 19. Q+A<br />19<br />Flume Office Hours, 2/28/2011<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×