feed aggregator powered by
hbase & python




                    Andrei Savu
                    wurbe #25
Objectives

 Highly scalable feed aggregator
 Play with python & thrift
 Provide some sample code
 Provide detailed install instructions
 Learn new stuff
Table Structure

 3 tables: Feeds, Urls, UrlsIndex

 Feeds: all feeds
 Urls: data extracted from feeds
 UrlsIndex: index table
Source code

 http://github.com/andreisavu/feedaggregator

 detailed install instructions
Lessons learned
Lesson #1: Hbase Game Rules

  Not relations
  No joins
  No sophisticated query engine
  No column typing
  No transactions
  No secondary indices

... all done in application code
Lesson #2: Design your index

 <cat>/<w3c_timestamp>

 time sorting = lexicographic sorting
Lesson #3: No charsets

  convert everything to bytes

... but store the original charset
Questions?

http://www.andreisavu.ro

http://twitter.com/andreisavu

contact@andreisavu.ro

HBase Feed Aggregator Wurbe 25