• Save
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes

Like this? Share it with your network

Share

HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes

  • 4,301 views
Uploaded on

Presented by: Hari Shreedharan, Cloudera

Presented by: Hari Shreedharan, Cloudera

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Did create a simple solution to stream directly into HBase from RSS feeds. https://github.com/dgkris/RSSPipe
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,301
On Slideshare
3,907
From Embeds
394
Number of Embeds
2

Actions

Shares
Downloads
1
Comments
1
Likes
14

Embeds 394

http://www.scoop.it 246
http://www.bigdatanosql.com 148

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1 Streaming data into HBase using Flume Hari Shreedharan | Software Engineer, Cloudera
  • 2. Apache Flume Fundamentals • Scalable collection, aggregation of event data (i.e. logs) • The simplest “unit” of data – “Event” • Event = {Map<String, String>, byte[] body} • Dynamic, contextual event routing • Low latency, high throughput • Declarative configuration • Productive out of the box, yet powerfully extensible • Open source software 2
  • 3. Inside a Flume NG agent 3
  • 4. Why Flume? 4 • Real user issue: • HBase Rest Server – did not scale • OOM, very high latency • High ops cost • Flume was a viable alternative • Schema changes – require app changes • In Flume, just change and deploy a plugin and restart Flume. • HBase downtime/compaction/gc isolated from production app • More data – just add more Flume agents, no app changes!
  • 5. Topology: Connecting agents together 5 [Client]+  Agent [ Agent]*  Destination HBase
  • 6. Flume writes to HBase – HBase Sinks 6 • HBase Sink • Currently supports 0.90.x, 0.92.x, 0.94.x • Uses the “standard” HBase Client API • Supports security • Async HBase Sink • Uses Async HBase • No security support • Faster • Uses Async HBase 1.4.1
  • 7. Highly flexible sinks 7 • Both sinks are extremely flexible. • HBase sink uses a “serializer” to convert Flume events to HBase-friendly format. • Plugin architecture – user can drop in their own serializer • Serializers implement a very simple interface.
  • 8. Serializers 8 public interface HbaseEventSerializer { void initialize(Event event, byte[] columnFamily); public List<Row> getActions(); public List<Increment> getIncrements(); public void close(); }
  • 9. HBase Cluster performance 9 • HBase cluster itself scaled really well • No one I know of has hit scaling issues writing from Flume • Sometimes read performance was affected • Primarily due to row locks held by writes/increments • Increments made this situation more problematic • When Flume was writing to the same rows as being read, the read latency could be visibly high. • Pre-spilt tables, and uniform distribution of data also helped.
  • 10. Issues we faced – why two sinks? 10 • Wrote the HBase Sink first using HBase client API • HBase Client API great at conserving resources • Several static maps hidden away in the API meant we could not open as many connections as wanted from the same JVM • Region Servers and Flume Agents were sitting idle while data was being sent over the wire! • More threads didn’t seem to help much.
  • 11. Async HBase to the rescue! 11 • Async HBase – an easy way out • Maintained thread pools – callbacks based • Helped us get the full power of HBase • Scaled really well – allowing good HBase cluster utilization • Never seen a user complaining about Async HBase Sink performance!
  • 12. What happens now? 12 • HBase 0.95+ no longer wire compatible with Async HBase • Hoping to see Async HBase support HBase 0.95+ (and willing to contribute!) • Hoping to see an HBase API which supports a “use all my resources” mode (and willing to contribute!)
  • 13. Read and contribute! 13 • Apache Flume: http://flume.apache.org/ • https://blogs.apache.org/flume/entry/flume_ng_arc hitecture • https://blogs.apache.org/flume/entry/streaming_dat a_into_apache_hbase • https://blogs.apache.org/flume/entry/flume_perfor mance_tuning_part_1
  • 14. Read and contribute! 14 • Apache Flume: http://flume.apache.org/ • https://blogs.apache.org/flume/entry/flume_ng_arc hitecture • https://blogs.apache.org/flume/entry/streaming_dat a_into_apache_hbase • https://blogs.apache.org/flume/entry/flume_perfor mance_tuning_part_1
  • 15. Click to edit Master title style 15
  • 16. Hari Shreedharan, Software Engineer, Cloudera @harisr1234 Thank you!