Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Lily RowLog library

2,457 views

Published on

Presentation on the Lily RowLog library as presented to the HBase/Hadoop meetup on the eve of Hadoop World 2011

Published in: Technology, Business
  • Be the first to comment

The Lily RowLog library

  1. 1. LilyA SMART DATA PLATFORMMAKING BIG DATA APPS EASY IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  2. 2. the (lily)rowloglibrary IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  3. 3. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  4. 4. Lily Architecture ? (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  5. 5. Lily 101 Mo» data repository on top of HBase r Ha e inf do Tue op o? W Me sday orl» records with fields tB alr 1:15P d oo M m» rich data types + schema» versioning» Java + REST api» indexes into Solr (et al)» a bunch more: smart data at scale, made easy» Apache license - www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  6. 6. use of rowlog inside lily» feed Solr index with (Lily|HBase) record updates» maintain secondary indices (i.e. linkindex)» shared concerns: » reliability » consistency » manageability » (scalability) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  7. 7. UC1: message queue (mq) record update Indexer update Solr index entry possible failure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  8. 8. UC1: message queue (mq) record Indexer update Solr index entry update ? MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  9. 9. UC1: message queue (mq) Indexer Indexer record Indexer update Solr index entry update MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  10. 10. MQ requirements» async (cope with Solr ‘lag’)» guaranteed execution» no concurrent processing of 2 msg about the same record» no extra tech (HBase should be good enough) » management complexity » benefits from scalability, resilience, etc IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  11. 11. UC2: write-ahead-log (WAL)» secondary actions » pushing messages onto MQ (!) » updating secondary indices (i.e. linkindex)» requirements » sec. actions eventually get executed, in predefined order » further updates to record denied until sec. actions succeeded » synchronous » pre-update: check WAL for outstanding actions + cleanup mechanism IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  12. 12. the rowlog library VM listener subscription listener subscription RowLog RowLog subscription subscription Netty global row-local listener queue storage (HBase) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  13. 13. global queue» separate HBase table» 1 msg per record update per subscription» key = (shard id +) subscription ID + timestamp + (data table) rowkey + sequence nr» rowlog processor (single instance, managed by ZK)» data always appended/deleted from table end (boo!) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  14. 14. row-local queue RECORDS table (HBASE) Row-locaL queue DATA ROW 1 ROW 2 ROW 3 ROW 4 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  15. 15. row-local queue CF1 CF2 data payload execution state 1 2 1 2ROW X payload payload data dataROW YROW Z message ID consumer id state IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  16. 16. why row-local queue?» predates Inbox-concept (Google Megastore)» msgs will appear on rowlog if and only if updates have really happened » rely on atomic row operation guarantee of HBase » msgs on global queue without local counterparts can be discarded» ‘msgs’ on global rowlog can be small » just point to msgs in row-local queue » actual payload sits there» optimized processing of msgs per row (i.e. combine) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  17. 17. rowlog sharding» MQ and WAL tables tend to be smallish » MQ depends on performance of Solr indexing » WAL size = number of simultaneous operations» risk for contention (all data in one region)➡ introduction of RowLog sharding (Lily 1.1)➡ continuous puts/deletes on HBase table = not very efficient ➙ long-term need to replace this IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  18. 18. last words» RowLog library can be used independent from Lily (!) » part of the Lily source tree » Apache license » www.lilyproject.org» shameless plug: go and check out Lily, HBase+Solr- backed repository for content-centric apps IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  19. 19. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

×