Flume HBase


Published on

How to connect Flume and HBase

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • ----- Meeting Notes (8/17/11 16:51) -----Good Evening GentlemenI'm Dani----- Meeting Notes (8/17/11 17:01) -----Lets see how to hook up these guys Flume and HBase
  • ----- Meeting Notes (8/17/11 16:51) -----Just a brief background Several Patches to Flume:1. Flogger2. Few things in HBase sink3. recently contributed OpenTSDB sink
  • ----- Meeting Notes (8/17/11 16:51) -----My assumption is that folks here know what Flume does and HBase doesSo focusing on
  • ----- Meeting Notes (8/17/11 17:08) -----If anyone haven't used Flume or HBase .. let me know.
  • ----- Meeting Notes (8/17/11 17:08) -----I can take up more questions at end of presentation
  • ----- Meeting Notes (8/17/11 17:11) -----Single ROWMillion Column names
  • ----- Meeting Notes (8/17/11 17:11) -----Check out Flume User Guide
  • ----- Meeting Notes (8/17/11 17:14) -----HBase is integrated with Hive and MR
  • ----- Meeting Notes (8/17/11 17:14) -----Those who haven't used: Just think about it as "which of the overloaded functions" Flume has to use.You can change the parameters at run time.
  • ----- Meeting Notes (8/17/11 17:15) -----In daemon mode - flume-env.sh
  • Just put LAHUG in subject line
  • WE WOULD LOVE TO HOST NEXT HADOOP MEETUPOpenTSDB …. It goes a step further and gives you awesome graphs …… for your data.
  • Flume HBase

    1. 1. Hooking up Flume with HBase LA-HUG Aug’11 -Dani Abel Rayan
    2. 2. Who am I ?• Big Data Ninja at Riot Games• Flume Contributor• Cloudera Intern Alum• Graduated with Masters CS from Georgia Tech.
    3. 3. What am I presenting here ?• Flume event model• HBase data model• Compelling reasons to hook ‘em up• Configuration examples• What are the new upcoming Sinks ?• How to write new Flume-Sink.
    4. 4. What is needed before we start ..• Understanding of Flume’s architecture• Usage of Flume’s abstractions such as Plugins, Events, Sources, Sinks, Escape Sequences and Decorators*• Understanding of HBase and Hadoop• Regex• That’s it!*http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html
    5. 5. A Quick Glance …
    6. 6. Flume Event Model• A Flume event has these six main fields: Unix timestamp, Nanosecond timestamp, Priority, Source host, Body and a Metadata table with an arbitrary number of attribute value pairs.• The body is the raw log entry body. The default is to truncate the body to a maximum of 32KB per event. This is a configurable.• One can custom bucket attributes with help of escape sequences.
    7. 7. HBase Data Model
    8. 8. What is a Flume Sink ?
    9. 9. Reasons For HBase Sink• Near Real-Time aggregation of Streaming Data• Low Latency access to the aggregated data• Offline Big Data Analytics
    10. 10. Types of Flume HBase Sink1. hbase(): Highly expressivehbase("table", "rowkey", "cf1", "c1", "val1"[,"cf2", "c2", "val2", ....] {,writeBufferSize=int, writeToWal=true|false})2. attr2hbase(): Flexible and powerful semanticsbut could be confusing (at first glance)attr2hbase("table"[,"sysFamily"[,"writeBody"[,"attrPrefix"[,"writeBufferSize"[,"writeToWal"]]]]])
    11. 11. How to Use a Plugin ?• Compile. Add the jar with the new plugin classes to flume’s classpath.• In flume-site.xml, add the class names of the new sources, sinks, and/or decorators to the flume.plugin.classes property• Restart the Flume nodes (Including Master)• Verify that your plugin is loaded is to check if it is displayed on this page http://flume- master:35871/masterext.jsp
    12. 12. hbase()Source: tail(“/proc/vmstat/”)nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458Sink:regexAll(“w+)s+(w+)”,”colname”,”value") Flume Events timestamp 24353457 24353456 24353455 colname nr_active_anon nr_inactive_anon nr_free_pages value 45259 1392 594693
    13. 13. hbase()• hbase("tablename", ”%s", ”stats", ”%{colname}", ”%{value}")use %{nanos} instead of %s if you want nano-second timestamp Rowkey Timestamp Column Family: stats 24353455 T1 nr_free_pages = 594693 24353456 T2 nr_inactive_anon = 1392 24353457 T3 nr_active_anon = 45259
    14. 14. hbase()• Thus the FDL syntax would be:• node: tail(”/proc/vmstat") |regexAll("(w+)s+(w+)", ”colname", ”value")collector(300000) { hbase("table", ”%s", ”stats",”%{colname}", "%{value}") }
    15. 15. Demo
    16. 16. attr2hbase()• Don’t have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers• Source and/or decorators can produce any (reasonable) number of attributes, with dynamic names (e.g. depending on the values) and they will be written into HBase
    17. 17. attr2hbase• attr2hbase("table"[,"sysFamily"[,"writeBody"[, "attrPrefix"[,"writeBufferSize" [,"writeToWal"]]]]])• sysFamily holds the name of the column family that is used to store “system” data (event timestamp, host, priority).• In case this parameter is absent or equals “”, the sink doesn’t write “system” data
    18. 18. attr2hbase• writeBody indicates whether event body should be written with other “system” data. By default, (when this parameter is absent or equals ””) the attribute body is not written.• This parameter should have the “column- family:qualifier” format in order for the sink to write the body to the specific column- family:qualifier.
    19. 19. attr2hbase• attrPrefix defines which attributes will be written to HBase: every attribute with the name prefixed with attrPrefix parameter’s value is written. The attribute key should be in the following format to be properly written into HBase: “<attrPrefix><colfam>:<qual>”• The default value of attrPrefix is “2hb_”. This means that all attributes with names “2hb_<colfam>:<qual>” should be written to HBase.• Attribute with key “<attrPrefix>” must contain row key for Put, otherwise, if no row can be extracted, the event is skipped and no record is written to the HBase table.
    20. 20. attr2hbase example• node: tail("/proc/vmstat”) | regexAll("(w+)s+(w+)", "colname","value") value("2hb_","%{colname}%s", escape=true) value("2hb_stat:value", "%{value}", escape=true) attr2hbase("table-attr2hbase","system","body:contents")] Rowkey Timestamp Column Family: stat pgpgin1313244007 t1 value=985543 pgpgin1313244008 t2 value=985543 pgpgin1313244009 t3 value=985543
    21. 21. Demo Time
    22. 22. What are the New Plugins ?• https://cwiki.apache.org/FLUME/flume- plugins.html• I pushed OpenTSDB Sink just few weeks back
    23. 23. How to Contribute a new Plugin ?• Extend EventSink.Base• Override Open() : Have your connections setup to the Store• Override Append(): Every new Event gets processed here. Doing the “Puts” into Store• Override Close (): Yay! Cleanup the connections and flushing etc. to the Store.• Implement a SinkBuilder builder()
    24. 24. My Contacts• drayan@riotgames.com• dr@verticalengine.com• Twitter: rayanandi P.S. We are Hiring!
    25. 25. GOOD LUCK, HAVE FUN! Play Free!http://www.leagueoflegends.com/