The Lily RowLog library
Upcoming SlideShare
Loading in...5
×
 

The Lily RowLog library

on

  • 2,151 views

Presentation on the Lily RowLog library as presented to the HBase/Hadoop meetup on the eve of Hadoop World 2011

Presentation on the Lily RowLog library as presented to the HBase/Hadoop meetup on the eve of Hadoop World 2011

Statistics

Views

Total Views
2,151
Views on SlideShare
2,133
Embed Views
18

Actions

Likes
2
Downloads
11
Comments
0

6 Embeds 18

http://ngdata.com 6
http://ot-web 5
http://ot-web.outerthought.org 2
http://outerthought-ngdata-001.openminds.be 2
http://www.ngdata.com 2
http://ot-web.outernet 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Lily RowLog library The Lily RowLog library Presentation Transcript

  • LilyA SMART DATA PLATFORMMAKING BIG DATA APPS EASY IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • the (lily)rowloglibrary IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  • Lily Architecture ? (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  • Lily 101 Mo» data repository on top of HBase r Ha e inf do Tue op o? W Me sday orl» records with fields tB alr 1:15P d oo M m» rich data types + schema» versioning» Java + REST api» indexes into Solr (et al)» a bunch more: smart data at scale, made easy» Apache license - www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  • use of rowlog inside lily» feed Solr index with (Lily|HBase) record updates» maintain secondary indices (i.e. linkindex)» shared concerns: » reliability » consistency » manageability » (scalability) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  • UC1: message queue (mq) record update Indexer update Solr index entry possible failure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  • UC1: message queue (mq) record Indexer update Solr index entry update ? MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  • UC1: message queue (mq) Indexer Indexer record Indexer update Solr index entry update MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  • MQ requirements» async (cope with Solr ‘lag’)» guaranteed execution» no concurrent processing of 2 msg about the same record» no extra tech (HBase should be good enough) » management complexity » benefits from scalability, resilience, etc IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  • UC2: write-ahead-log (WAL)» secondary actions » pushing messages onto MQ (!) » updating secondary indices (i.e. linkindex)» requirements » sec. actions eventually get executed, in predefined order » further updates to record denied until sec. actions succeeded » synchronous » pre-update: check WAL for outstanding actions + cleanup mechanism IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  • the rowlog library VM listener subscription listener subscription RowLog RowLog subscription subscription Netty global row-local listener queue storage (HBase) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  • global queue» separate HBase table» 1 msg per record update per subscription» key = (shard id +) subscription ID + timestamp + (data table) rowkey + sequence nr» rowlog processor (single instance, managed by ZK)» data always appended/deleted from table end (boo!) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  • row-local queue RECORDS table (HBASE) Row-locaL queue DATA ROW 1 ROW 2 ROW 3 ROW 4 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  • row-local queue CF1 CF2 data payload execution state 1 2 1 2ROW X payload payload data dataROW YROW Z message ID consumer id state IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  • why row-local queue?» predates Inbox-concept (Google Megastore)» msgs will appear on rowlog if and only if updates have really happened » rely on atomic row operation guarantee of HBase » msgs on global queue without local counterparts can be discarded» ‘msgs’ on global rowlog can be small » just point to msgs in row-local queue » actual payload sits there» optimized processing of msgs per row (i.e. combine) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • rowlog sharding» MQ and WAL tables tend to be smallish » MQ depends on performance of Solr indexing » WAL size = number of simultaneous operations» risk for contention (all data in one region)➡ introduction of RowLog sharding (Lily 1.1)➡ continuous puts/deletes on HBase table = not very efficient ➙ long-term need to replace this IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • last words» RowLog library can be used independent from Lily (!) » part of the Lily source tree » Apache license » www.lilyproject.org» shameless plug: go and check out Lily, HBase+Solr- backed repository for content-centric apps IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org