Lily for the Bay Area HBase UG - NYC edition

3,940 views

Published on

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,940
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
77
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Lily for the Bay Area HBase UG - NYC edition

  1. 1. Presenting Lily Bay Area HBase UG - NYC - 10/11/2010 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  2. 2. Devoxx: Nov. 15-19, Antwerp, Belgium NoSQL/Cloud track IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  3. 3. Outerthought » software product company » scalable content applications » open source product portfolio » Java, REST, internet THIS NOTEBOOK BELONGS TO: Noteblock_03.indd 1 23/05/10 14:42 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  4. 4. Technology »Lily : NoSQL-based content repository (HBaseN OTESOLR) N GS TO: THIS + B OOK B ELO » Kauri : REST centric webapp dev framework » Daisy : techdoc / QDoc / publishing CMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  5. 5. Needs for Scalable Content » wire-speed capturing ➡ NoSQL & write- optimized storage » batch-oriented post- processing ➡ map/reduce » semantic lifting : ➡ Natural Language extracting knowledge Processing out of noise » data and inferred data ➡ smart content become one repositories IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  6. 6. customers The Lily Project REST-centric content cloud-scale content applications batch } partners } alternative processing and content app UI augmentation ins and outs indexes process framework (enrichment) coordination us content repository: store + search IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  7. 7. Lily essentials » www.lilyproject.org » Apache license for maximal flexibility » (lots of) documentation at docs.outerthought.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  8. 8. Lily content repository » Scalable store (HBase) and search (SOLR) content » flexible content model application » index maintenance repository » high-level API » base foundation IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  9. 9. HBase » a datamodel where you can have column families which keep all versions and others which do not, which fits very well on our CMS document model » ordered tables with the ability to do range scans on them, which allows to build scalable indexes on top of it » HDFS, a convenient place to store large blobs » Apache license and community, a familiar environment for us IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  10. 10. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  11. 11. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  12. 12. 1. Store, 2. Search...? Ouch. » CMS = two types of search » structured, ‘logic’ search » numbers, strings » based on logic (SQL, anyone?) » information retrieval (or: full-text search) » text » based on statistics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  13. 13. Search ponderings » All of that, at scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  14. 14. Structured Search » HBase Indexing Library » idea from Google App Engine datastore indexes » http://code.google.com/appengine/articles/ index_building.html rowkey col col rowkey col order A val3 foo6 val2-B B val2 foo7 val3-A content table index table A IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  15. 15. Full-text / IR search » Lucene? » no sharding (for scale) » no replication (for availability) » batched index updates (not real-time) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  16. 16. Beyond Lucene » Katta » scalable architecture, however only search, no indexing » Elastic Search » very young (sorry) » hbasene et al. » stores inverted index in HBase, does not scale all features » SOLR » widely used, schema, facets, query syntax, cloud branch IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  17. 17. ? + = r ? ! O asy E IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  18. 18. ➙ Need for reliable queuing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  19. 19. Connecting things » we needed a reliable bridge between our main storage (HBase) and our index/search server(s) (SOLR) » indexing, reindexing, mass reindexing (M/R) » we need a reliable method of updating HBase secondary indexes » all of that eventually to run distributed » distribution means coping with failure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  20. 20. Solution » ... a QUEUE ! (Meh) » ACMEMessageQueue ? Bzzzzzt. We wanted fault-safe HBase persistence for the queues. Also for ease of administration. » ➙ WAL & Queue implemented on top of HBase tables IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  21. 21. WAL & Queue = RowLog Library » WAL » Queue » guaranteed execution » triggering of async of synchronous actions actions » call doesn’t return before » e.g. (re)index (updated) secondary action finishes record with SOLR back-end » e.g. update secondary indexes » size depends on speed of » if all goes well, back-end process size = #concurrent ops » useful outside of Lily context as well! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  22. 22. The Sum » Lily model (records & fields) » mapped onto HBase (=storage) » indexed and searchable through SOLR » using a WAL/Queue mechanism implemented in HBase » runtime based on Kauri » with client/server comms via Avro (and a REST interface with JSON) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  23. 23. Architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  24. 24. Architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  25. 25. Lily roadmap » development started Sept. 2009 » development trunk opened Jul. 2010 » end of Oct. 2010: milestone/beta release » fully distributable » spec-complete » Onwards: » ‘business-level’ 1.0 release (packaging, testing, performance) » user/auth management & access control » UI framework (Kauri) » ins and outs, semantic lifting IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  26. 26. Thanks for your hospitality and attention ! THIS NOTEBOOK BELONGS TO: » stevenn@outerthought.org Noteblock_03.indd 1 23/05/10 14:42 » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26

×