feed aggregator powered by
hbase & python




                    Andrei Savu
                    wurbe #25
Objectives

 Highly scalable feed aggregator
 Play with python & thrift
 Provide some sample code
 Provide detailed instal...
Table Structure

 3 tables: Feeds, Urls, UrlsIndex

 Feeds: all feeds
 Urls: data extracted from feeds
 UrlsIndex: index t...
Source code

 http://github.com/andreisavu/feedaggregator

 detailed install instructions
Lessons learned
Lesson #1: Hbase Game Rules

  Not relations
  No joins
  No sophisticated query engine
  No column typing
  No transactio...
Lesson #2: Design your index

 <cat>/<w3c_timestamp>

 time sorting = lexicographic sorting
Lesson #3: No charsets

  convert everything to bytes

... but store the original charset
Questions?

http://www.andreisavu.ro

http://twitter.com/andreisavu

contact@andreisavu.ro
Upcoming SlideShare
Loading in...5
×

HBase Feed Aggregator Wurbe 25

895

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
895
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

HBase Feed Aggregator Wurbe 25

  1. 1. feed aggregator powered by hbase & python Andrei Savu wurbe #25
  2. 2. Objectives Highly scalable feed aggregator Play with python & thrift Provide some sample code Provide detailed install instructions Learn new stuff
  3. 3. Table Structure 3 tables: Feeds, Urls, UrlsIndex Feeds: all feeds Urls: data extracted from feeds UrlsIndex: index table
  4. 4. Source code http://github.com/andreisavu/feedaggregator detailed install instructions
  5. 5. Lessons learned
  6. 6. Lesson #1: Hbase Game Rules Not relations No joins No sophisticated query engine No column typing No transactions No secondary indices ... all done in application code
  7. 7. Lesson #2: Design your index <cat>/<w3c_timestamp> time sorting = lexicographic sorting
  8. 8. Lesson #3: No charsets convert everything to bytes ... but store the original charset
  9. 9. Questions? http://www.andreisavu.ro http://twitter.com/andreisavu contact@andreisavu.ro
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×