LilyA SMART DATA PLATFORMMAKING BIG DATA APPS EASY     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthoug...
the (lily)rowloglibrary  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Lily Architecture                    (components)                                   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJN...
Lily Architecture                                                                          ?                    (component...
Lily 101                                                                                         Mo» data repository on to...
use of rowlog inside lily» feed Solr index with (Lily|HBase) record updates» maintain secondary indices (i.e. linkindex)» ...
UC1: message queue (mq)         record            update        Indexer         update        Solr index entry    possible...
UC1: message queue (mq)     record                           Indexer         update        Solr index entry               ...
UC1: message queue (mq)                                           Indexer                                        Indexer  ...
MQ requirements» async (cope with Solr ‘lag’)» guaranteed execution» no concurrent processing of 2 msg about the same reco...
UC2: write-ahead-log (WAL)» secondary actions » pushing messages onto MQ (!) » updating secondary indices (i.e. linkindex)...
the rowlog library                                                       VM                listener                       ...
global queue» separate HBase table» 1 msg per record update per subscription» key = (shard id +) subscription ID + timesta...
row-local queue                        RECORDS table (HBASE)                Row-locaL queue DATA    ROW 1    ROW 2    ROW ...
row-local queue                  CF1                                                  CF2                 data            ...
why row-local queue?» predates Inbox-concept (Google Megastore)» msgs will appear on rowlog if and only if updates have re...
rowlog sharding» MQ and WAL tables tend to be smallish » MQ depends on performance of Solr indexing » WAL size = number of...
last words» RowLog library can be used independent from Lily (!) » part of the Lily source tree   » Apache license » www.l...
Thank you !                               for your attention                               for your questions             ...
Upcoming SlideShare
Loading in...5
×

The Lily RowLog library

1,918

Published on

Presentation on the Lily RowLog library as presented to the HBase/Hadoop meetup on the eve of Hadoop World 2011

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,918
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

The Lily RowLog library

  1. 1. LilyA SMART DATA PLATFORMMAKING BIG DATA APPS EASY IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  2. 2. the (lily)rowloglibrary IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  3. 3. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  4. 4. Lily Architecture ? (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  5. 5. Lily 101 Mo» data repository on top of HBase r Ha e inf do Tue op o? W Me sday orl» records with fields tB alr 1:15P d oo M m» rich data types + schema» versioning» Java + REST api» indexes into Solr (et al)» a bunch more: smart data at scale, made easy» Apache license - www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  6. 6. use of rowlog inside lily» feed Solr index with (Lily|HBase) record updates» maintain secondary indices (i.e. linkindex)» shared concerns: » reliability » consistency » manageability » (scalability) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  7. 7. UC1: message queue (mq) record update Indexer update Solr index entry possible failure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  8. 8. UC1: message queue (mq) record Indexer update Solr index entry update ? MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  9. 9. UC1: message queue (mq) Indexer Indexer record Indexer update Solr index entry update MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  10. 10. MQ requirements» async (cope with Solr ‘lag’)» guaranteed execution» no concurrent processing of 2 msg about the same record» no extra tech (HBase should be good enough) » management complexity » benefits from scalability, resilience, etc IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  11. 11. UC2: write-ahead-log (WAL)» secondary actions » pushing messages onto MQ (!) » updating secondary indices (i.e. linkindex)» requirements » sec. actions eventually get executed, in predefined order » further updates to record denied until sec. actions succeeded » synchronous » pre-update: check WAL for outstanding actions + cleanup mechanism IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  12. 12. the rowlog library VM listener subscription listener subscription RowLog RowLog subscription subscription Netty global row-local listener queue storage (HBase) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  13. 13. global queue» separate HBase table» 1 msg per record update per subscription» key = (shard id +) subscription ID + timestamp + (data table) rowkey + sequence nr» rowlog processor (single instance, managed by ZK)» data always appended/deleted from table end (boo!) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  14. 14. row-local queue RECORDS table (HBASE) Row-locaL queue DATA ROW 1 ROW 2 ROW 3 ROW 4 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  15. 15. row-local queue CF1 CF2 data payload execution state 1 2 1 2ROW X payload payload data dataROW YROW Z message ID consumer id state IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  16. 16. why row-local queue?» predates Inbox-concept (Google Megastore)» msgs will appear on rowlog if and only if updates have really happened » rely on atomic row operation guarantee of HBase » msgs on global queue without local counterparts can be discarded» ‘msgs’ on global rowlog can be small » just point to msgs in row-local queue » actual payload sits there» optimized processing of msgs per row (i.e. combine) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  17. 17. rowlog sharding» MQ and WAL tables tend to be smallish » MQ depends on performance of Solr indexing » WAL size = number of simultaneous operations» risk for contention (all data in one region)➡ introduction of RowLog sharding (Lily 1.1)➡ continuous puts/deletes on HBase table = not very efficient ➙ long-term need to replace this IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  18. 18. last words» RowLog library can be used independent from Lily (!) » part of the Lily source tree » Apache license » www.lilyproject.org» shameless plug: go and check out Lily, HBase+Solr- backed repository for content-centric apps IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  19. 19. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×