Flume and HBase
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Created a simple easy to use flume pipe that streams rss data into hbase. Have a look https://github.com/dgkris/RSSPipe
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,828
On Slideshare
4,811
From Embeds
17
Number of Embeds
3

Actions

Shares
Downloads
106
Comments
1
Likes
10

Embeds 17

http://www.linkedin.com 8
https://www.linkedin.com 8
https://si0.twimg.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Buzzwords Berlin HBase Hackathon, June 2012Apache Flume and HBaseAlexander Alten-Lorenz | Customer Operations Engineer 1
  • 2. About Me • COPS Engineer @ Cloudera • Apache Flume Contributor • Working with hadoop since 2009 • Blogger (mapredit.blogspot.com) • Speaker at Conferences / Meetups / Tooling Events2 ©2012 Cloudera, Inc. All Rights Reserved. 2
  • 3. Flume 1.x • Mass event collector • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, seq, exec • Sinks: HDFS files, HBase, … • Configurable routing / topology3 ©2012 Cloudera, Inc. All Rights Reserved. 3
  • 4. Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB.4 ©2012 Cloudera, Inc. All Rights Reserved. 4
  • 5. Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – adding RAM increases performance – setting Xms prevent in time memory allocation – Batching increase performance dramatically5 ©2012 Cloudera, Inc. All Rights Reserved. 5
  • 6. Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, seq6 ©2012 Cloudera, Inc. All Rights Reserved. 6
  • 7. HBase sink ls -la flume-ng-sinks/flume-ng-hbase-sink/ src/main/java/org/apache/flume/sink/hbase/ HBaseSink.java HbaseEventSerializer.java SimpleHbaseEventSerializer.java SimpleRowKeyGenerator.java7 ©2012 Cloudera, Inc. All Rights Reserved. 7
  • 8. HBaseSink.java• Control flush()• Using serializer• Control the transaction• Control rollbacks (in case of events couldn’t written)8 ©2012 Cloudera, Inc. All Rights Reserved. 8
  • 9. Configuration • Source Seq interface • Listening on a defined port @localhost • Serializer need some parameters • Column family and column must be known • Valid hbase-site.xml in $CLASSPATH9 ©2012 Cloudera, Inc. All Rights Reserved. 9
  • 10. Configuration Examplehost1.sources = src1host1.sinks = sink1host1.channels = ch1host1.sources.src1.type = seqhost1.sources.src1.port = 25001host1.sources.src1.bind = localhosthost1.sources.src1.channels = ch1host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSinkhost1.sinks.sink1.channel = ch1host1.sinks.sink1.table = test3host1.sinks.sink1.columnFamily = testinghost1.sinks.sink1.column = foohost1.sinks.sink1.serializer =org.apache.flume.sink.hbase.SimpleHbaseEventSerializerhost1.sinks.sink1.serializer.payloadColumn = pcolhost1.sinks.sink1.serializer.incrementColumn = icolhost1.channels.ch1.type=memory10 ©2012 Cloudera, Inc. All Rights Reserved. 10
  • 11. Take Away • Flume collects events • Source - Channel - Sink concept • HBase sink needs a serializer interface • Column family and column must be known11 ©2012 Cloudera, Inc. All Rights Reserved. 11
  • 12. Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit12 ©2012 Cloudera, Inc. All Rights Reserved. 12