SlideShare a Scribd company logo
1 of 26
Download to read offline
Presenting Lily
Bay Area HBase UG - NYC - 10/11/2010




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Devoxx: Nov. 15-19, Antwerp, Belgium
NoSQL/Cloud track




      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   2
Outerthought


» software product company

» scalable content applications

» open source product portfolio

» Java, REST, internet
                                                                                         THIS NOTEBOOK BELONGS TO:




                                                                   Noteblock_03.indd 1                               23/05/10 14:42




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                                                      3
Technology


»Lily : NoSQL-based content
 repository (HBaseN OTESOLR) N GS TO:
               THIS
                    + B OOK B ELO
» Kauri : REST centric webapp dev framework
» Daisy : techdoc / QDoc / publishing CMS




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   4
Needs for Scalable Content
» wire-speed capturing                                  ➡ NoSQL & write-
                                                                 optimized storage
» batch-oriented post-
 processing                                             ➡        map/reduce

» semantic lifting :                                    ➡ Natural Language
 extracting knowledge                                            Processing
 out of noise
» data and inferred data                                ➡ smart content
 become one                                                      repositories

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org    5
customers




                                The Lily Project


 REST-centric            content
                                cloud-scale content applications


                                                                                     batch
                                                                                                  }   partners




                                                                                                  }
                                                                   alternative   processing and
content app UI        augmentation        ins and outs
                                                                     indexes        process
  framework           (enrichment)
                                                                                  coordination



                                                                                                      us
                               content repository: store + search




                 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                        6
Lily essentials


» www.lilyproject.org

» Apache license for maximal flexibility

» (lots of) documentation at
 docs.outerthought.org



       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   7
Lily content repository

» Scalable store (HBase) and
 search (SOLR)
                                                                       content
» flexible content model                                               application
» index maintenance
                                                                            repository

» high-level API

» base foundation



        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org        8
HBase
» a datamodel where you can have column
 families which keep all versions and others
 which do not, which fits very well on our
 CMS document model
» ordered tables with the ability to do range
 scans on them, which allows to build
 scalable indexes on top of it
» HDFS, a convenient place to store large blobs

» Apache license and community, a familiar
 environment for us

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   9
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11
1. Store, 2. Search...? Ouch.

» CMS = two types of search
 » structured, ‘logic’ search
  » numbers, strings
  » based on logic          (SQL, anyone?)

 » information retrieval (or: full-text search)
  » text
  » based on statistics



        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   12
Search ponderings




» All of that, at scale




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   13
Structured Search
» HBase Indexing Library
 » idea from Google App Engine datastore indexes
 » http://code.google.com/appengine/articles/
  index_building.html

    rowkey             col              col                             rowkey          col



                                                          order
      A               val3             foo6                              val2-B

      B               val2             foo7                              val3-A

                 content table                                              index table A


          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org           14
Full-text / IR search


» Lucene?
 » no sharding (for scale)
 » no replication (for availability)
 » batched index updates (not real-time)




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   15
Beyond Lucene
» Katta
  » scalable architecture, however only search, no indexing

» Elastic Search
  » very young (sorry)

» hbasene et al.
  » stores inverted index in HBase, does not scale all features

» SOLR
  » widely used, schema, facets, query syntax, cloud branch




          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   16
?
                             +
                         =
                                                r ?
                                      ! O
                      asy
                    E
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   17
➙ Need for reliable queuing




  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   18
Connecting things
» we needed a reliable bridge between our
 main storage (HBase) and our index/search
 server(s) (SOLR)
 » indexing, reindexing, mass reindexing (M/R)

» we need a reliable method of updating
 HBase secondary indexes
» all of that eventually to run distributed

» distribution means coping with failure

       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   19
Solution

» ... a QUEUE ! (Meh)

» ACMEMessageQueue ? Bzzzzzt.
 We wanted fault-safe HBase persistence for
 the queues.
 Also for ease of administration.
» ➙ WAL  & Queue implemented on top of
 HBase tables


       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   20
WAL & Queue = RowLog Library
» WAL                                                  » Queue
 » guaranteed execution                                   » triggering of async
   of synchronous actions                                     actions
 » call doesn’t return before                             » e.g. (re)index (updated)
   secondary action finishes                                   record with SOLR back-end
 » e.g. update secondary indexes                          » size depends on speed of
 » if all goes well,                                          back-end process
   size = #concurrent ops
 » useful outside of Lily context
   as well!



              IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21
The Sum
» Lily model (records & fields)

» mapped onto HBase (=storage)

» indexed and searchable through
 SOLR
» using a WAL/Queue mechanism
 implemented in HBase
» runtime based on Kauri

» with client/server comms via Avro
 (and a REST interface with JSON)

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   22
Architecture
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23
Architecture
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   24
Lily roadmap
» development started Sept. 2009
 » development trunk opened Jul. 2010

» end of Oct. 2010: milestone/beta release
 » fully distributable
 » spec-complete
» Onwards:
 » ‘business-level’ 1.0 release (packaging, testing, performance)
 » user/auth management & access control
 » UI framework (Kauri)
 » ins and outs, semantic lifting

         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25
Thanks for your
                                                                       hospitality and
                                                                       attention !



                      THIS NOTEBOOK BELONGS TO:
                                                                   » stevenn@outerthought.org

Noteblock_03.indd 1                               23/05/10 14:42
                                                                   »     @stevenn

                            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   26

More Related Content

Viewers also liked

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureGord Sissons
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
Lucandra
LucandraLucandra
Lucandraotisg
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Swiss Big Data User Group
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchCloudera, Inc.
 

Viewers also liked (13)

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructure
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Lucandra
LucandraLucandra
Lucandra
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
 

Similar to Lily for the Bay Area HBase UG - NYC edition

KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversNGDATA
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of DataNGDATA
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work WebinarNGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesNGDATA
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog libraryNGDATA
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNGDATA
 
Exoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandraExoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandraDataStax Academy
 
Logging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructuresLogging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructuresMohammed Aboullaite
 
C&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape ReviewC&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape ReviewPar-Tec S.p.A.
 
Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015Patrick Chanezon
 
Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...Lucas Jellema
 
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Dan Kuebrich
 
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frogWebinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frogQualiQuali
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
 

Similar to Lily for the Bay Area HBase UG - NYC edition (20)

KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database servers
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologies
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
Exoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandraExoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandra
 
Logging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructuresLogging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructures
 
C&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape ReviewC&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape Review
 
Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015
 
Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...
 
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012
 
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frogWebinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 

More from NGDATA

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataNGDATA
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghentNGDATA
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UKNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at DevoxxNGDATA
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNGDATA
 

More from NGDATA (8)

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart Data
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
 
Big Data
Big DataBig Data
Big Data
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Lily for the Bay Area HBase UG - NYC edition

  • 1. Presenting Lily Bay Area HBase UG - NYC - 10/11/2010 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 2. Devoxx: Nov. 15-19, Antwerp, Belgium NoSQL/Cloud track IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  • 3. Outerthought » software product company » scalable content applications » open source product portfolio » Java, REST, internet THIS NOTEBOOK BELONGS TO: Noteblock_03.indd 1 23/05/10 14:42 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  • 4. Technology »Lily : NoSQL-based content repository (HBaseN OTESOLR) N GS TO: THIS + B OOK B ELO » Kauri : REST centric webapp dev framework » Daisy : techdoc / QDoc / publishing CMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  • 5. Needs for Scalable Content » wire-speed capturing ➡ NoSQL & write- optimized storage » batch-oriented post- processing ➡ map/reduce » semantic lifting : ➡ Natural Language extracting knowledge Processing out of noise » data and inferred data ➡ smart content become one repositories IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  • 6. customers The Lily Project REST-centric content cloud-scale content applications batch } partners } alternative processing and content app UI augmentation ins and outs indexes process framework (enrichment) coordination us content repository: store + search IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  • 7. Lily essentials » www.lilyproject.org » Apache license for maximal flexibility » (lots of) documentation at docs.outerthought.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  • 8. Lily content repository » Scalable store (HBase) and search (SOLR) content » flexible content model application » index maintenance repository » high-level API » base foundation IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  • 9. HBase » a datamodel where you can have column families which keep all versions and others which do not, which fits very well on our CMS document model » ordered tables with the ability to do range scans on them, which allows to build scalable indexes on top of it » HDFS, a convenient place to store large blobs » Apache license and community, a familiar environment for us IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  • 10. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  • 11. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  • 12. 1. Store, 2. Search...? Ouch. » CMS = two types of search » structured, ‘logic’ search » numbers, strings » based on logic (SQL, anyone?) » information retrieval (or: full-text search) » text » based on statistics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  • 13. Search ponderings » All of that, at scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  • 14. Structured Search » HBase Indexing Library » idea from Google App Engine datastore indexes » http://code.google.com/appengine/articles/ index_building.html rowkey col col rowkey col order A val3 foo6 val2-B B val2 foo7 val3-A content table index table A IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  • 15. Full-text / IR search » Lucene? » no sharding (for scale) » no replication (for availability) » batched index updates (not real-time) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  • 16. Beyond Lucene » Katta » scalable architecture, however only search, no indexing » Elastic Search » very young (sorry) » hbasene et al. » stores inverted index in HBase, does not scale all features » SOLR » widely used, schema, facets, query syntax, cloud branch IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • 17. ? + = r ? ! O asy E IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • 18. ➙ Need for reliable queuing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • 19. Connecting things » we needed a reliable bridge between our main storage (HBase) and our index/search server(s) (SOLR) » indexing, reindexing, mass reindexing (M/R) » we need a reliable method of updating HBase secondary indexes » all of that eventually to run distributed » distribution means coping with failure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  • 20. Solution » ... a QUEUE ! (Meh) » ACMEMessageQueue ? Bzzzzzt. We wanted fault-safe HBase persistence for the queues. Also for ease of administration. » ➙ WAL & Queue implemented on top of HBase tables IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  • 21. WAL & Queue = RowLog Library » WAL » Queue » guaranteed execution » triggering of async of synchronous actions actions » call doesn’t return before » e.g. (re)index (updated) secondary action finishes record with SOLR back-end » e.g. update secondary indexes » size depends on speed of » if all goes well, back-end process size = #concurrent ops » useful outside of Lily context as well! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • 22. The Sum » Lily model (records & fields) » mapped onto HBase (=storage) » indexed and searchable through SOLR » using a WAL/Queue mechanism implemented in HBase » runtime based on Kauri » with client/server comms via Avro (and a REST interface with JSON) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  • 23. Architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  • 24. Architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  • 25. Lily roadmap » development started Sept. 2009 » development trunk opened Jul. 2010 » end of Oct. 2010: milestone/beta release » fully distributable » spec-complete » Onwards: » ‘business-level’ 1.0 release (packaging, testing, performance) » user/auth management & access control » UI framework (Kauri) » ins and outs, semantic lifting IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 26. Thanks for your hospitality and attention ! THIS NOTEBOOK BELONGS TO: » stevenn@outerthought.org Noteblock_03.indd 1 23/05/10 14:42 » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26