Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flume and HBase

5,923 views

Published on

Published in: Technology, Education

Flume and HBase

  1. 1. Buzzwords Berlin HBase Hackathon, June 2012Apache Flume and HBaseAlexander Alten-Lorenz | Customer Operations Engineer 1
  2. 2. About Me • COPS Engineer @ Cloudera • Apache Flume Contributor • Working with hadoop since 2009 • Blogger (mapredit.blogspot.com) • Speaker at Conferences / Meetups / Tooling Events2 ©2012 Cloudera, Inc. All Rights Reserved. 2
  3. 3. Flume 1.x • Mass event collector • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, seq, exec • Sinks: HDFS files, HBase, … • Configurable routing / topology3 ©2012 Cloudera, Inc. All Rights Reserved. 3
  4. 4. Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB.4 ©2012 Cloudera, Inc. All Rights Reserved. 4
  5. 5. Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – adding RAM increases performance – setting Xms prevent in time memory allocation – Batching increase performance dramatically5 ©2012 Cloudera, Inc. All Rights Reserved. 5
  6. 6. Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, seq6 ©2012 Cloudera, Inc. All Rights Reserved. 6
  7. 7. HBase sink ls -la flume-ng-sinks/flume-ng-hbase-sink/ src/main/java/org/apache/flume/sink/hbase/ HBaseSink.java HbaseEventSerializer.java SimpleHbaseEventSerializer.java SimpleRowKeyGenerator.java7 ©2012 Cloudera, Inc. All Rights Reserved. 7
  8. 8. HBaseSink.java• Control flush()• Using serializer• Control the transaction• Control rollbacks (in case of events couldn’t written)8 ©2012 Cloudera, Inc. All Rights Reserved. 8
  9. 9. Configuration • Source Seq interface • Listening on a defined port @localhost • Serializer need some parameters • Column family and column must be known • Valid hbase-site.xml in $CLASSPATH9 ©2012 Cloudera, Inc. All Rights Reserved. 9
  10. 10. Configuration Examplehost1.sources = src1host1.sinks = sink1host1.channels = ch1host1.sources.src1.type = seqhost1.sources.src1.port = 25001host1.sources.src1.bind = localhosthost1.sources.src1.channels = ch1host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSinkhost1.sinks.sink1.channel = ch1host1.sinks.sink1.table = test3host1.sinks.sink1.columnFamily = testinghost1.sinks.sink1.column = foohost1.sinks.sink1.serializer =org.apache.flume.sink.hbase.SimpleHbaseEventSerializerhost1.sinks.sink1.serializer.payloadColumn = pcolhost1.sinks.sink1.serializer.incrementColumn = icolhost1.channels.ch1.type=memory10 ©2012 Cloudera, Inc. All Rights Reserved. 10
  11. 11. Take Away • Flume collects events • Source - Channel - Sink concept • HBase sink needs a serializer interface • Column family and column must be known11 ©2012 Cloudera, Inc. All Rights Reserved. 11
  12. 12. Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit12 ©2012 Cloudera, Inc. All Rights Reserved. 12

×