SlideShare a Scribd company logo
1 of 26
Download to read offline
Presenting Lily
Bay Area HBase UG - NYC - 10/11/2010




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Devoxx: Nov. 15-19, Antwerp, Belgium
NoSQL/Cloud track




      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   2
Outerthought


» software product company

» scalable content applications

» open source product portfolio

» Java, REST, internet
                                                                                         THIS NOTEBOOK BELONGS TO:




                                                                   Noteblock_03.indd 1                               23/05/10 14:42




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                                                      3
Technology


»Lily : NoSQL-based content
 repository (HBaseN OTESOLR) N GS TO:
               THIS
                    + B OOK B ELO
» Kauri : REST centric webapp dev framework
» Daisy : techdoc / QDoc / publishing CMS




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   4
Needs for Scalable Content
» wire-speed capturing                                  ➡ NoSQL & write-
                                                                 optimized storage
» batch-oriented post-
 processing                                             ➡        map/reduce

» semantic lifting :                                    ➡ Natural Language
 extracting knowledge                                            Processing
 out of noise
» data and inferred data                                ➡ smart content
 become one                                                      repositories

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org    5
customers




                                The Lily Project


 REST-centric            content
                                cloud-scale content applications


                                                                                     batch
                                                                                                  }   partners




                                                                                                  }
                                                                   alternative   processing and
content app UI        augmentation        ins and outs
                                                                     indexes        process
  framework           (enrichment)
                                                                                  coordination



                                                                                                      us
                               content repository: store + search




                 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                        6
Lily essentials


» www.lilyproject.org

» Apache license for maximal flexibility

» (lots of) documentation at
 docs.outerthought.org



       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   7
Lily content repository

» Scalable store (HBase) and
 search (SOLR)
                                                                       content
» flexible content model                                               application
» index maintenance
                                                                            repository

» high-level API

» base foundation



        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org        8
HBase
» a datamodel where you can have column
 families which keep all versions and others
 which do not, which fits very well on our
 CMS document model
» ordered tables with the ability to do range
 scans on them, which allows to build
 scalable indexes on top of it
» HDFS, a convenient place to store large blobs

» Apache license and community, a familiar
 environment for us

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   9
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11
1. Store, 2. Search...? Ouch.

» CMS = two types of search
 » structured, ‘logic’ search
  » numbers, strings
  » based on logic          (SQL, anyone?)

 » information retrieval (or: full-text search)
  » text
  » based on statistics



        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   12
Search ponderings




» All of that, at scale




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   13
Structured Search
» HBase Indexing Library
 » idea from Google App Engine datastore indexes
 » http://code.google.com/appengine/articles/
  index_building.html

    rowkey             col              col                             rowkey          col



                                                          order
      A               val3             foo6                              val2-B

      B               val2             foo7                              val3-A

                 content table                                              index table A


          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org           14
Full-text / IR search


» Lucene?
 » no sharding (for scale)
 » no replication (for availability)
 » batched index updates (not real-time)




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   15
Beyond Lucene
» Katta
  » scalable architecture, however only search, no indexing

» Elastic Search
  » very young (sorry)

» hbasene et al.
  » stores inverted index in HBase, does not scale all features

» SOLR
  » widely used, schema, facets, query syntax, cloud branch




          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   16
?
                             +
                         =
                                                r ?
                                      ! O
                      asy
                    E
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   17
➙ Need for reliable queuing




  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   18
Connecting things
» we needed a reliable bridge between our
 main storage (HBase) and our index/search
 server(s) (SOLR)
 » indexing, reindexing, mass reindexing (M/R)

» we need a reliable method of updating
 HBase secondary indexes
» all of that eventually to run distributed

» distribution means coping with failure

       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   19
Solution

» ... a QUEUE ! (Meh)

» ACMEMessageQueue ? Bzzzzzt.
 We wanted fault-safe HBase persistence for
 the queues.
 Also for ease of administration.
» ➙ WAL  & Queue implemented on top of
 HBase tables


       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   20
WAL & Queue = RowLog Library
» WAL                                                  » Queue
 » guaranteed execution                                   » triggering of async
   of synchronous actions                                     actions
 » call doesn’t return before                             » e.g. (re)index (updated)
   secondary action finishes                                   record with SOLR back-end
 » e.g. update secondary indexes                          » size depends on speed of
 » if all goes well,                                          back-end process
   size = #concurrent ops
 » useful outside of Lily context
   as well!



              IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21
The Sum
» Lily model (records & fields)

» mapped onto HBase (=storage)

» indexed and searchable through
 SOLR
» using a WAL/Queue mechanism
 implemented in HBase
» runtime based on Kauri

» with client/server comms via Avro
 (and a REST interface with JSON)

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   22
Architecture
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23
Architecture
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   24
Lily roadmap
» development started Sept. 2009
 » development trunk opened Jul. 2010

» end of Oct. 2010: milestone/beta release
 » fully distributable
 » spec-complete
» Onwards:
 » ‘business-level’ 1.0 release (packaging, testing, performance)
 » user/auth management & access control
 » UI framework (Kauri)
 » ins and outs, semantic lifting

         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25
Thanks for your
                                                                       hospitality and
                                                                       attention !



                      THIS NOTEBOOK BELONGS TO:
                                                                   » stevenn@outerthought.org

Noteblock_03.indd 1                               23/05/10 14:42
                                                                   »     @stevenn

                            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   26

More Related Content

Viewers also liked

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureGord Sissons
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
Lucandra
LucandraLucandra
Lucandraotisg
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Swiss Big Data User Group
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchCloudera, Inc.
 

Viewers also liked (13)

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructure
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Lucandra
LucandraLucandra
Lucandra
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
 

Similar to Lily for the Bay Area HBase UG - NYC edition

KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversNGDATA
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of DataNGDATA
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work WebinarNGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesNGDATA
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog libraryNGDATA
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNGDATA
 
Exoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandraExoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandraDataStax Academy
 
Logging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructuresLogging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructuresMohammed Aboullaite
 
C&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape ReviewC&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape ReviewPar-Tec S.p.A.
 
Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015Patrick Chanezon
 
Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...Lucas Jellema
 
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Dan Kuebrich
 
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frogWebinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frogQualiQuali
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
 

Similar to Lily for the Bay Area HBase UG - NYC edition (20)

KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database servers
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologies
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
Exoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandraExoscale: Pithos: your personal S3 object store on cassandra
Exoscale: Pithos: your personal S3 object store on cassandra
 
Logging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructuresLogging and Monitoring your container-based infrastructures
Logging and Monitoring your container-based infrastructures
 
C&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape ReviewC&CNR2019 - Containers Landscape Review
C&CNR2019 - Containers Landscape Review
 
Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015Docker Platform and Ecosystem Nov 2015
Docker Platform and Ecosystem Nov 2015
 
Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...Cloud Native Application Development - build fast, cheap, scalable and agile ...
Cloud Native Application Development - build fast, cheap, scalable and agile ...
 
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012
 
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frogWebinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
Webinar leveraging-cloud-sandboxes-with-ansible-jenkins-j frog
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 

More from NGDATA

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataNGDATA
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghentNGDATA
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UKNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at DevoxxNGDATA
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNGDATA
 

More from NGDATA (8)

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart Data
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
 
Big Data
Big DataBig Data
Big Data
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
 

Recently uploaded

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MIRomil Mishra
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfROWELL MARQUINA
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementDianaGray10
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 

Recently uploaded (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MI
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions management
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 

Lily for the Bay Area HBase UG - NYC edition

  • 1. Presenting Lily Bay Area HBase UG - NYC - 10/11/2010 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 2. Devoxx: Nov. 15-19, Antwerp, Belgium NoSQL/Cloud track IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  • 3. Outerthought » software product company » scalable content applications » open source product portfolio » Java, REST, internet THIS NOTEBOOK BELONGS TO: Noteblock_03.indd 1 23/05/10 14:42 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  • 4. Technology »Lily : NoSQL-based content repository (HBaseN OTESOLR) N GS TO: THIS + B OOK B ELO » Kauri : REST centric webapp dev framework » Daisy : techdoc / QDoc / publishing CMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  • 5. Needs for Scalable Content » wire-speed capturing ➡ NoSQL & write- optimized storage » batch-oriented post- processing ➡ map/reduce » semantic lifting : ➡ Natural Language extracting knowledge Processing out of noise » data and inferred data ➡ smart content become one repositories IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  • 6. customers The Lily Project REST-centric content cloud-scale content applications batch } partners } alternative processing and content app UI augmentation ins and outs indexes process framework (enrichment) coordination us content repository: store + search IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  • 7. Lily essentials » www.lilyproject.org » Apache license for maximal flexibility » (lots of) documentation at docs.outerthought.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  • 8. Lily content repository » Scalable store (HBase) and search (SOLR) content » flexible content model application » index maintenance repository » high-level API » base foundation IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  • 9. HBase » a datamodel where you can have column families which keep all versions and others which do not, which fits very well on our CMS document model » ordered tables with the ability to do range scans on them, which allows to build scalable indexes on top of it » HDFS, a convenient place to store large blobs » Apache license and community, a familiar environment for us IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  • 10. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  • 11. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  • 12. 1. Store, 2. Search...? Ouch. » CMS = two types of search » structured, ‘logic’ search » numbers, strings » based on logic (SQL, anyone?) » information retrieval (or: full-text search) » text » based on statistics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  • 13. Search ponderings » All of that, at scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  • 14. Structured Search » HBase Indexing Library » idea from Google App Engine datastore indexes » http://code.google.com/appengine/articles/ index_building.html rowkey col col rowkey col order A val3 foo6 val2-B B val2 foo7 val3-A content table index table A IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  • 15. Full-text / IR search » Lucene? » no sharding (for scale) » no replication (for availability) » batched index updates (not real-time) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  • 16. Beyond Lucene » Katta » scalable architecture, however only search, no indexing » Elastic Search » very young (sorry) » hbasene et al. » stores inverted index in HBase, does not scale all features » SOLR » widely used, schema, facets, query syntax, cloud branch IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • 17. ? + = r ? ! O asy E IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • 18. ➙ Need for reliable queuing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • 19. Connecting things » we needed a reliable bridge between our main storage (HBase) and our index/search server(s) (SOLR) » indexing, reindexing, mass reindexing (M/R) » we need a reliable method of updating HBase secondary indexes » all of that eventually to run distributed » distribution means coping with failure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  • 20. Solution » ... a QUEUE ! (Meh) » ACMEMessageQueue ? Bzzzzzt. We wanted fault-safe HBase persistence for the queues. Also for ease of administration. » ➙ WAL & Queue implemented on top of HBase tables IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  • 21. WAL & Queue = RowLog Library » WAL » Queue » guaranteed execution » triggering of async of synchronous actions actions » call doesn’t return before » e.g. (re)index (updated) secondary action finishes record with SOLR back-end » e.g. update secondary indexes » size depends on speed of » if all goes well, back-end process size = #concurrent ops » useful outside of Lily context as well! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • 22. The Sum » Lily model (records & fields) » mapped onto HBase (=storage) » indexed and searchable through SOLR » using a WAL/Queue mechanism implemented in HBase » runtime based on Kauri » with client/server comms via Avro (and a REST interface with JSON) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  • 23. Architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  • 24. Architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  • 25. Lily roadmap » development started Sept. 2009 » development trunk opened Jul. 2010 » end of Oct. 2010: milestone/beta release » fully distributable » spec-complete » Onwards: » ‘business-level’ 1.0 release (packaging, testing, performance) » user/auth management & access control » UI framework (Kauri) » ins and outs, semantic lifting IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 26. Thanks for your hospitality and attention ! THIS NOTEBOOK BELONGS TO: » stevenn@outerthought.org Noteblock_03.indd 1 23/05/10 14:42 » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26