SlideShare a Scribd company logo
1 of 25
Hooking up Flume with HBase
      LA-HUG Aug’11

        -Dani Abel Rayan
Who am I ?
•   Big Data Ninja at Riot Games
•   Flume Contributor
•   Cloudera Intern Alum
•   Graduated with Masters CS
    from Georgia Tech.
What am I presenting here ?
•   Flume event model
•   HBase data model
•   Compelling reasons to hook ‘em up
•   Configuration examples
•   What are the new upcoming Sinks ?
•   How to write new Flume-Sink.
What is needed before we start ..
• Understanding of Flume’s architecture
• Usage of Flume’s abstractions such as
  Plugins, Events, Sources, Sinks, Escape Sequences
  and Decorators*
• Understanding of HBase and Hadoop
• Regex
• That’s it!
*http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html
A Quick Glance …
Flume Event Model
• A Flume event has these six main fields: Unix
  timestamp, Nanosecond timestamp, Priority,
  Source host, Body and a Metadata table with
  an arbitrary number of attribute value pairs.
• The body is the raw log entry body. The
  default is to truncate the body to a maximum
  of 32KB per event. This is a configurable.
• One can custom bucket attributes with help of
  escape sequences.
HBase Data Model
What is a Flume Sink ?
Reasons For HBase Sink
• Near Real-Time aggregation of Streaming Data
• Low Latency access to the aggregated data
• Offline Big Data Analytics
Types of Flume HBase Sink
1. hbase(): Highly expressive
hbase("table", "rowkey", "cf1", "c1", "val1"[,"cf2", "c2", "val2", ....] {,
writeBufferSize=int, writeToWal=true|false})


2. attr2hbase(): Flexible and powerful semantics
but could be confusing (at first glance)
attr2hbase("table"[,"sysFamily"[,"writeBody"[,"attrPrefix"[,"writeBufferSize"
[,"writeToWal"]]]]])
How to Use a Plugin ?
• Compile. Add the jar with the new plugin
  classes to flume’s classpath.
• In flume-site.xml, add the class names of the
  new sources, sinks, and/or decorators to the
  flume.plugin.classes property
• Restart the Flume nodes (Including Master)
• Verify that your plugin is loaded is to check if
  it is displayed on this page http://flume-
  master:35871/masterext.jsp
hbase()
Source: tail(“/proc/vmstat/”)

nr_free_pages 594693
nr_inactive_anon 1392
nr_active_anon 45259
nr_inactive_file 107132
nr_active_file 141458


Sink:
regexAll(“w+)s+(w+)”,”colname”,”value")
                            Flume Events

        timestamp                 24353457
                                  24353456
                                  24353455
        colname                   nr_active_anon
                                  nr_inactive_anon
                                  nr_free_pages
        value                     45259
                                  1392
                                  594693
hbase()
• hbase("tablename", ”%s", ”stats", ”%{colname}", ”%{value}")
use %{nanos} instead of %s if you want nano-second timestamp



  Rowkey    Timestamp    Column Family: stats

  24353455 T1            nr_free_pages = 594693

  24353456 T2             nr_inactive_anon = 1392

  24353457 T3            nr_active_anon = 45259
hbase()
• Thus the FDL syntax would be:

• node: tail(”/proc/vmstat") |
regexAll("(w+)s+(w+)", ”colname", ”value")
collector(300000) { hbase("table", ”%s", ”stats",
”%{colname}", "%{value}") }
Demo
attr2hbase()
• Don’t have to list all possible event attributes
  you want to store in HBase along with their
  destination column families and qualifiers

• Source and/or decorators can produce any
  (reasonable) number of attributes, with
  dynamic names (e.g. depending on the values)
  and they will be written into HBase
attr2hbase
• attr2hbase("table"[,"sysFamily"[,"writeBody"[,
  "attrPrefix"[,"writeBufferSize"
  [,"writeToWal"]]]]])
• sysFamily holds the name of the column
  family that is used to store “system” data
  (event timestamp, host, priority).
• In case this parameter is absent or equals “”,
  the sink doesn’t write “system” data
attr2hbase
• writeBody indicates whether event body
  should be written with other “system” data.
  By default, (when this parameter is absent or
  equals ””) the attribute body is not written.
• This parameter should have the “column-
  family:qualifier” format in order for the sink to
  write the body to the specific column-
  family:qualifier.
attr2hbase
• attrPrefix defines which attributes will be written to HBase:
  every attribute with the name prefixed with attrPrefix
  parameter’s value is written. The attribute key should be in
  the following format to be properly written into HBase:
  “<attrPrefix><colfam>:<qual>”
• The default value of attrPrefix is “2hb_”. This means that all
  attributes with names “2hb_<colfam>:<qual>” should be
  written to HBase.
• Attribute with key “<attrPrefix>” must contain row key for
  Put, otherwise, if no row can be extracted, the event is
  skipped and no record is written to the HBase table.
attr2hbase example
• node: tail("/proc/vmstat”) | regexAll("(w+)s+(w+)",
  "colname","value") value("2hb_","%{colname}%s", escape=true)
   value("2hb_stat:value", "%{value}", escape=true)
  attr2hbase("table-attr2hbase","system","body:contents")]



     Rowkey             Timestamp        Column Family:
                                         stat
     pgpgin1313244007   t1               value=985543
     pgpgin1313244008   t2               value=985543
     pgpgin1313244009   t3               value=985543
Demo Time
What are the New Plugins ?
• https://cwiki.apache.org/FLUME/flume-
  plugins.html

• I pushed OpenTSDB Sink just few weeks back
How to Contribute a new Plugin ?
• Extend EventSink.Base
• Override Open() : Have your connections
  setup to the Store
• Override Append(): Every new Event gets
  processed here. Doing the “Puts” into Store
• Override Close (): Yay! Cleanup the
  connections and flushing etc. to the Store.
• Implement a SinkBuilder builder()
My Contacts
• drayan@riotgames.com
• dr@verticalengine.com
• Twitter: rayanandi

             P.S. We are Hiring!
GOOD LUCK,
 HAVE FUN!
           Play Free!
http://www.leagueoflegends.com/

More Related Content

What's hot

Amazon Route53へのドメイン移管
Amazon Route53へのドメイン移管Amazon Route53へのドメイン移管
Amazon Route53へのドメイン移管Jin k
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorPierre Baillet
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTPRoland Kuhn
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONOutlyer
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauSpark Summit
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Big Data Spain
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Airat Khisamov
 
Terraform 0.9 + good practices
Terraform 0.9 + good practicesTerraform 0.9 + good practices
Terraform 0.9 + good practicesRadek Simko
 
High Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. ThreadedHigh Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. ThreadedEngine Yard
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...DataStax
 
Observability with Consul Connect
Observability with Consul ConnectObservability with Consul Connect
Observability with Consul ConnectBram Vogelaar
 
Ground Control to Nomad Job Dispatch
Ground Control to Nomad Job DispatchGround Control to Nomad Job Dispatch
Ground Control to Nomad Job DispatchMichael Lange
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaPublicis Sapient Engineering
 
2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japaneseKonrad Malawski
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysisDivante
 

What's hot (20)

Amazon Route53へのドメイン移管
Amazon Route53へのドメイン移管Amazon Route53へのドメイン移管
Amazon Route53へのドメイン移管
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
 
The tale of 100 cve's
The tale of 100 cve'sThe tale of 100 cve's
The tale of 100 cve's
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
 
Terraform 0.9 + good practices
Terraform 0.9 + good practicesTerraform 0.9 + good practices
Terraform 0.9 + good practices
 
Logstash
LogstashLogstash
Logstash
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
High Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. ThreadedHigh Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. Threaded
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Observability with Consul Connect
Observability with Consul ConnectObservability with Consul Connect
Observability with Consul Connect
 
Ground Control to Nomad Job Dispatch
Ground Control to Nomad Job DispatchGround Control to Nomad Job Dispatch
Ground Control to Nomad Job Dispatch
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
 
2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 

Viewers also liked

Manual de programacion_con_robots_para_la_escuela
Manual de programacion_con_robots_para_la_escuelaManual de programacion_con_robots_para_la_escuela
Manual de programacion_con_robots_para_la_escuelaAngel De las Heras
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Spark Summit
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Spark Summit
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...Cloudera, Inc.
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 

Viewers also liked (6)

Manual de programacion_con_robots_para_la_escuela
Manual de programacion_con_robots_para_la_escuelaManual de programacion_con_robots_para_la_escuela
Manual de programacion_con_robots_para_la_escuela
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 

Similar to Flume HBase

Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQLGrant Fritchey
 
Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopHikmat Dhamee
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command lineSharat Chikkerur
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleSean Chittenden
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardITCamp
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Reactivesummit
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaAkara Sucharitakul
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introductionAlex Su
 
Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesHiroshi SHIBATA
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Writing Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaYardena Meymann
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & developmentShashwat Shriparv
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 

Similar to Flume HBase (20)

Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Into The Box 2018 - CBT
Into The Box 2018 - CBTInto The Box 2018 - CBT
Into The Box 2018 - CBT
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introduction
 
Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutes
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Writing Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & Akka
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Flume HBase

  • 1. Hooking up Flume with HBase LA-HUG Aug’11 -Dani Abel Rayan
  • 2. Who am I ? • Big Data Ninja at Riot Games • Flume Contributor • Cloudera Intern Alum • Graduated with Masters CS from Georgia Tech.
  • 3. What am I presenting here ? • Flume event model • HBase data model • Compelling reasons to hook ‘em up • Configuration examples • What are the new upcoming Sinks ? • How to write new Flume-Sink.
  • 4. What is needed before we start .. • Understanding of Flume’s architecture • Usage of Flume’s abstractions such as Plugins, Events, Sources, Sinks, Escape Sequences and Decorators* • Understanding of HBase and Hadoop • Regex • That’s it! *http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html
  • 6. Flume Event Model • A Flume event has these six main fields: Unix timestamp, Nanosecond timestamp, Priority, Source host, Body and a Metadata table with an arbitrary number of attribute value pairs. • The body is the raw log entry body. The default is to truncate the body to a maximum of 32KB per event. This is a configurable. • One can custom bucket attributes with help of escape sequences.
  • 8. What is a Flume Sink ?
  • 9. Reasons For HBase Sink • Near Real-Time aggregation of Streaming Data • Low Latency access to the aggregated data • Offline Big Data Analytics
  • 10. Types of Flume HBase Sink 1. hbase(): Highly expressive hbase("table", "rowkey", "cf1", "c1", "val1"[,"cf2", "c2", "val2", ....] {, writeBufferSize=int, writeToWal=true|false}) 2. attr2hbase(): Flexible and powerful semantics but could be confusing (at first glance) attr2hbase("table"[,"sysFamily"[,"writeBody"[,"attrPrefix"[,"writeBufferSize" [,"writeToWal"]]]]])
  • 11. How to Use a Plugin ? • Compile. Add the jar with the new plugin classes to flume’s classpath. • In flume-site.xml, add the class names of the new sources, sinks, and/or decorators to the flume.plugin.classes property • Restart the Flume nodes (Including Master) • Verify that your plugin is loaded is to check if it is displayed on this page http://flume- master:35871/masterext.jsp
  • 12. hbase() Source: tail(“/proc/vmstat/”) nr_free_pages 594693 nr_inactive_anon 1392 nr_active_anon 45259 nr_inactive_file 107132 nr_active_file 141458 Sink: regexAll(“w+)s+(w+)”,”colname”,”value") Flume Events timestamp 24353457 24353456 24353455 colname nr_active_anon nr_inactive_anon nr_free_pages value 45259 1392 594693
  • 13. hbase() • hbase("tablename", ”%s", ”stats", ”%{colname}", ”%{value}") use %{nanos} instead of %s if you want nano-second timestamp Rowkey Timestamp Column Family: stats 24353455 T1 nr_free_pages = 594693 24353456 T2 nr_inactive_anon = 1392 24353457 T3 nr_active_anon = 45259
  • 14. hbase() • Thus the FDL syntax would be: • node: tail(”/proc/vmstat") | regexAll("(w+)s+(w+)", ”colname", ”value") collector(300000) { hbase("table", ”%s", ”stats", ”%{colname}", "%{value}") }
  • 15. Demo
  • 16. attr2hbase() • Don’t have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers • Source and/or decorators can produce any (reasonable) number of attributes, with dynamic names (e.g. depending on the values) and they will be written into HBase
  • 17. attr2hbase • attr2hbase("table"[,"sysFamily"[,"writeBody"[, "attrPrefix"[,"writeBufferSize" [,"writeToWal"]]]]]) • sysFamily holds the name of the column family that is used to store “system” data (event timestamp, host, priority). • In case this parameter is absent or equals “”, the sink doesn’t write “system” data
  • 18. attr2hbase • writeBody indicates whether event body should be written with other “system” data. By default, (when this parameter is absent or equals ””) the attribute body is not written. • This parameter should have the “column- family:qualifier” format in order for the sink to write the body to the specific column- family:qualifier.
  • 19. attr2hbase • attrPrefix defines which attributes will be written to HBase: every attribute with the name prefixed with attrPrefix parameter’s value is written. The attribute key should be in the following format to be properly written into HBase: “<attrPrefix><colfam>:<qual>” • The default value of attrPrefix is “2hb_”. This means that all attributes with names “2hb_<colfam>:<qual>” should be written to HBase. • Attribute with key “<attrPrefix>” must contain row key for Put, otherwise, if no row can be extracted, the event is skipped and no record is written to the HBase table.
  • 20. attr2hbase example • node: tail("/proc/vmstat”) | regexAll("(w+)s+(w+)", "colname","value") value("2hb_","%{colname}%s", escape=true) value("2hb_stat:value", "%{value}", escape=true) attr2hbase("table-attr2hbase","system","body:contents")] Rowkey Timestamp Column Family: stat pgpgin1313244007 t1 value=985543 pgpgin1313244008 t2 value=985543 pgpgin1313244009 t3 value=985543
  • 22. What are the New Plugins ? • https://cwiki.apache.org/FLUME/flume- plugins.html • I pushed OpenTSDB Sink just few weeks back
  • 23. How to Contribute a new Plugin ? • Extend EventSink.Base • Override Open() : Have your connections setup to the Store • Override Append(): Every new Event gets processed here. Doing the “Puts” into Store • Override Close (): Yay! Cleanup the connections and flushing etc. to the Store. • Implement a SinkBuilder builder()
  • 24. My Contacts • drayan@riotgames.com • dr@verticalengine.com • Twitter: rayanandi P.S. We are Hiring!
  • 25. GOOD LUCK, HAVE FUN! Play Free! http://www.leagueoflegends.com/

Editor's Notes

  1. ----- Meeting Notes (8/17/11 16:51) -----Good Evening GentlemenI&apos;m Dani----- Meeting Notes (8/17/11 17:01) -----Lets see how to hook up these guys Flume and HBase
  2. ----- Meeting Notes (8/17/11 16:51) -----Just a brief background Several Patches to Flume:1. Flogger2. Few things in HBase sink3. recently contributed OpenTSDB sink
  3. ----- Meeting Notes (8/17/11 16:51) -----My assumption is that folks here know what Flume does and HBase doesSo focusing on
  4. ----- Meeting Notes (8/17/11 17:08) -----If anyone haven&apos;t used Flume or HBase .. let me know.
  5. ----- Meeting Notes (8/17/11 17:08) -----I can take up more questions at end of presentation
  6. ----- Meeting Notes (8/17/11 17:11) -----Single ROWMillion Column names
  7. ----- Meeting Notes (8/17/11 17:11) -----Check out Flume User Guide
  8. ----- Meeting Notes (8/17/11 17:14) -----HBase is integrated with Hive and MR
  9. ----- Meeting Notes (8/17/11 17:14) -----Those who haven&apos;t used: Just think about it as &quot;which of the overloaded functions&quot; Flume has to use.You can change the parameters at run time.
  10. ----- Meeting Notes (8/17/11 17:15) -----In daemon mode - flume-env.sh
  11. Just put LAHUG in subject line
  12. WE WOULD LOVE TO HOST NEXT HADOOP MEETUPOpenTSDB …. It goes a step further and gives you awesome graphs …… for your data.