Your SlideShare is downloading. ×
0
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes

3,743

Published on

Presented by: Hari Shreedharan, Cloudera

Presented by: Hari Shreedharan, Cloudera

Published in: Technology
1 Comment
16 Likes
Statistics
Notes
  • Did create a simple solution to stream directly into HBase from RSS feeds. https://github.com/dgkris/RSSPipe
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,743
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
1
Likes
16
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1 Streaming data into HBase using Flume Hari Shreedharan | Software Engineer, Cloudera
  • 2. Apache Flume Fundamentals • Scalable collection, aggregation of event data (i.e. logs) • The simplest “unit” of data – “Event” • Event = {Map<String, String>, byte[] body} • Dynamic, contextual event routing • Low latency, high throughput • Declarative configuration • Productive out of the box, yet powerfully extensible • Open source software 2
  • 3. Inside a Flume NG agent 3
  • 4. Why Flume? 4 • Real user issue: • HBase Rest Server – did not scale • OOM, very high latency • High ops cost • Flume was a viable alternative • Schema changes – require app changes • In Flume, just change and deploy a plugin and restart Flume. • HBase downtime/compaction/gc isolated from production app • More data – just add more Flume agents, no app changes!
  • 5. Topology: Connecting agents together 5 [Client]+  Agent [ Agent]*  Destination HBase
  • 6. Flume writes to HBase – HBase Sinks 6 • HBase Sink • Currently supports 0.90.x, 0.92.x, 0.94.x • Uses the “standard” HBase Client API • Supports security • Async HBase Sink • Uses Async HBase • No security support • Faster • Uses Async HBase 1.4.1
  • 7. Highly flexible sinks 7 • Both sinks are extremely flexible. • HBase sink uses a “serializer” to convert Flume events to HBase-friendly format. • Plugin architecture – user can drop in their own serializer • Serializers implement a very simple interface.
  • 8. Serializers 8 public interface HbaseEventSerializer { void initialize(Event event, byte[] columnFamily); public List<Row> getActions(); public List<Increment> getIncrements(); public void close(); }
  • 9. HBase Cluster performance 9 • HBase cluster itself scaled really well • No one I know of has hit scaling issues writing from Flume • Sometimes read performance was affected • Primarily due to row locks held by writes/increments • Increments made this situation more problematic • When Flume was writing to the same rows as being read, the read latency could be visibly high. • Pre-spilt tables, and uniform distribution of data also helped.
  • 10. Issues we faced – why two sinks? 10 • Wrote the HBase Sink first using HBase client API • HBase Client API great at conserving resources • Several static maps hidden away in the API meant we could not open as many connections as wanted from the same JVM • Region Servers and Flume Agents were sitting idle while data was being sent over the wire! • More threads didn’t seem to help much.
  • 11. Async HBase to the rescue! 11 • Async HBase – an easy way out • Maintained thread pools – callbacks based • Helped us get the full power of HBase • Scaled really well – allowing good HBase cluster utilization • Never seen a user complaining about Async HBase Sink performance!
  • 12. What happens now? 12 • HBase 0.95+ no longer wire compatible with Async HBase • Hoping to see Async HBase support HBase 0.95+ (and willing to contribute!) • Hoping to see an HBase API which supports a “use all my resources” mode (and willing to contribute!)
  • 13. Read and contribute! 13 • Apache Flume: http://flume.apache.org/ • https://blogs.apache.org/flume/entry/flume_ng_arc hitecture • https://blogs.apache.org/flume/entry/streaming_dat a_into_apache_hbase • https://blogs.apache.org/flume/entry/flume_perfor mance_tuning_part_1
  • 14. Read and contribute! 14 • Apache Flume: http://flume.apache.org/ • https://blogs.apache.org/flume/entry/flume_ng_arc hitecture • https://blogs.apache.org/flume/entry/streaming_dat a_into_apache_hbase • https://blogs.apache.org/flume/entry/flume_perfor mance_tuning_part_1
  • 15. Click to edit Master title style 15
  • 16. Hari Shreedharan, Software Engineer, Cloudera @harisr1234 Thank you!

×