Buzzwords Berlin HBase Hackathon, June 2012Apache Flume and HBaseAlexander Alten-Lorenz | Customer Operations Engineer    ...
About Me    •   COPS Engineer @ Cloudera    •   Apache Flume Contributor    •   Working with hadoop since 2009    •   Blog...
Flume 1.x    • Mass event collector    • Stream data (events, not files) from clients      to sinks    • Clients: files, s...
Architecture    Component   Function    Agent       The JVM running Flume. One per machine. Runs                many sourc...
Agent    • Runs many clients and sinks    • Java properties-based configuration    • Low overhead (-Xmx20m)      – adding ...
Sources    • Plugin interface    • Managed by a SourceRunner that controls      threading and execution model (e.g. pollin...
HBase sink    ls -la flume-ng-sinks/flume-ng-hbase-sink/    src/main/java/org/apache/flume/sink/hbase/    HBaseSink.java  ...
HBaseSink.java•   Control flush()•   Using serializer•   Control the transaction•   Control rollbacks (in case of events c...
Configuration    •   Source Seq interface    •   Listening on a defined port @localhost    •   Serializer need some parame...
Configuration Examplehost1.sources = src1host1.sinks = sink1host1.channels = ch1host1.sources.src1.type = seqhost1.sources...
Take Away •   Flume collects events •   Source - Channel - Sink concept •   HBase sink needs a serializer interface •   Co...
Thank You • Web: https://cwiki.apache.org/FLUME/   getting-started.html • ML: flume-user@incubator.apache.org • Mail: alex...
Upcoming SlideShare
Loading in...5
×

Flume and HBase

4,615

Published on

Published in: Technology, Education
1 Comment
12 Likes
Statistics
Notes
  • Created a simple easy to use flume pipe that streams rss data into hbase. Have a look https://github.com/dgkris/RSSPipe
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
4,615
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
122
Comments
1
Likes
12
Embeds 0
No embeds

No notes for slide

Flume and HBase

  1. 1. Buzzwords Berlin HBase Hackathon, June 2012Apache Flume and HBaseAlexander Alten-Lorenz | Customer Operations Engineer 1
  2. 2. About Me • COPS Engineer @ Cloudera • Apache Flume Contributor • Working with hadoop since 2009 • Blogger (mapredit.blogspot.com) • Speaker at Conferences / Meetups / Tooling Events2 ©2012 Cloudera, Inc. All Rights Reserved. 2
  3. 3. Flume 1.x • Mass event collector • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, seq, exec • Sinks: HDFS files, HBase, … • Configurable routing / topology3 ©2012 Cloudera, Inc. All Rights Reserved. 3
  4. 4. Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB.4 ©2012 Cloudera, Inc. All Rights Reserved. 4
  5. 5. Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – adding RAM increases performance – setting Xms prevent in time memory allocation – Batching increase performance dramatically5 ©2012 Cloudera, Inc. All Rights Reserved. 5
  6. 6. Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, seq6 ©2012 Cloudera, Inc. All Rights Reserved. 6
  7. 7. HBase sink ls -la flume-ng-sinks/flume-ng-hbase-sink/ src/main/java/org/apache/flume/sink/hbase/ HBaseSink.java HbaseEventSerializer.java SimpleHbaseEventSerializer.java SimpleRowKeyGenerator.java7 ©2012 Cloudera, Inc. All Rights Reserved. 7
  8. 8. HBaseSink.java• Control flush()• Using serializer• Control the transaction• Control rollbacks (in case of events couldn’t written)8 ©2012 Cloudera, Inc. All Rights Reserved. 8
  9. 9. Configuration • Source Seq interface • Listening on a defined port @localhost • Serializer need some parameters • Column family and column must be known • Valid hbase-site.xml in $CLASSPATH9 ©2012 Cloudera, Inc. All Rights Reserved. 9
  10. 10. Configuration Examplehost1.sources = src1host1.sinks = sink1host1.channels = ch1host1.sources.src1.type = seqhost1.sources.src1.port = 25001host1.sources.src1.bind = localhosthost1.sources.src1.channels = ch1host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSinkhost1.sinks.sink1.channel = ch1host1.sinks.sink1.table = test3host1.sinks.sink1.columnFamily = testinghost1.sinks.sink1.column = foohost1.sinks.sink1.serializer =org.apache.flume.sink.hbase.SimpleHbaseEventSerializerhost1.sinks.sink1.serializer.payloadColumn = pcolhost1.sinks.sink1.serializer.incrementColumn = icolhost1.channels.ch1.type=memory10 ©2012 Cloudera, Inc. All Rights Reserved. 10
  11. 11. Take Away • Flume collects events • Source - Channel - Sink concept • HBase sink needs a serializer interface • Column family and column must be known11 ©2012 Cloudera, Inc. All Rights Reserved. 11
  12. 12. Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit12 ©2012 Cloudera, Inc. All Rights Reserved. 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×