SlideShare a Scribd company logo
1 of 47
Download to read offline
Lily
Smart data at scale



    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
big data,
big problems

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MOORE vs data

                                                                                      data

» coping with volume + need for
  timeliness = parallel processing                                                       moore
» data becomes business-critical =
  resilience through distributed
  architectures
» Hadoop, MapReduce, HBase:
  the future data platform




          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org          3
the CHALLENGES


» process ALL data
» process data in REAL-TIME
» derive INSIGHTS
» provide INSTANT FEEDBACK




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   4
current thinking


                         ETL

                                     data
    data STORE                                                      analytics
                                   warehouse




batched, off-line, overnight
     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   5
1. store and manage all YOUR data

                      DATA




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   6
2. store user behaviour, nearby

                       DATA




USER
Behavior




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   7
3. analyze usage patterns

                       DATA                  data processing




USER
Behavior




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   8
4. add domain knowledge

                       DATA                  data processing




USER
Behavior


                                                        domain
                                                      knowledge
                                                        patterns
                                                        rules
                                                        keywords
                                                        lists
                                                        ...




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   9
5. process, in real-time

                       DATA                  data processing

                                              recommendations
                                              semantic augmentation
                                              Analytics

USER
Behavior


                                                        domain
                                                      knowledge
                                                        patterns
                                                        rules
                                                        keywords
                                                        lists
                                                        ...




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10
6. augment data

                       DATA                  data processing

                                              recommendations
                                              semantic augmentation
                                              Analytics

USER
Behavior


                                                        domain
                                                      knowledge
                                                        patterns
                                                        rules
                                                        keywords
                                                        lists
                                                        ...




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11
data insights
              SMARTER DATA                  data processing
                           s
                   relation
                                             recommendations
                                             semantic augmentation
                                             Analytics




                                                       domain
                                                     knowledge
                                                       patterns
                                                       rules
                                                       keywords
                                                       lists
                                                       ...




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   12
data insights
                    SMARTER DATA                  data processing
                                 s
                         relation
                                                   recommendations
                                                   semantic augmentation
                                                   Analytics




                                                             domain
                                                           knowledge
                                                             patterns
                                                             rules
                                                             keywords
                                                             lists

SMART DATA, at SCALE
                                                             ...




... and in real time
      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   13
stories

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
HYPER-PERSONAL
recommendations
                             NEWS

                                                     TOGETHERNESS
                                                     interestingness




                                                               organisations
                                                               names
                                                               locations
                                                               brands



news aggregator
scale
        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   15
up-selling
CROSS-SELLING
                           product
                           CATALOG
                                                      recommendedness
                                                      relatedness




                                                                product
                                                                families
                                                                related
                                                                activities
                                                                social graph


e-retail
real-time
         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   16
competitive
innovation
                            patents

                                                      (dis)SIMILARITY




                                                                companies
                                                                people
                                                                materials
                                                                processes



IP research
insights
         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   17
Outerthought
» software product company
» scalable content applications
» open source product portfolio
» Java, REST, internet

                                                                                                THIS NOTEBOOK BELONGS TO:




“The world is moving
                                                                          Noteblock_03.indd 1                               23/05/10 14:42




from content as a cost to
data as an opportunity.”

         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                                                   18
Lily 1.0 (CR)


                                  data
  data STORE          +         warehouse             +        analytics




                              real time
                }
                              Lily 2.0
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   19
Lily (now)

» Large-scale content storage, indexing and search
» Current pilots



    e-retail     mobile media         isp           e-gov        ip research


» up-to now: 4 man-years investment (since Sept/2009)



               IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   20
roadmap

» now: Lily 0.3
                                                                » Along the road:
» april 2011 : Lily 1.0
                                                                   Lily SaaS edition
» Q3 2011
  » real-time statistics + analytics
» Q2 2012 : Lily 2.0
  » real-time data processing engine
  » Data Insights



            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21
open source




» www.lilyproject.org
» docs.outerthought.org/lily-docs-current/




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   22
Lily Core Concepts
» storage
 » HBase
 » repository model
 » versioning, varianting, mixins
» indexing
 » mapping
» search
 » SOLR


           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23
falling in love with Hbase : phase 1

» automatic scaling to large data sets
» fault-tolerance
» flexible datamodel with sparse data
» commodity hardware
» efficient random access
» community-based open source
» Java if possible


         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   24
falling in love with Hbase : phase 2



» need for consistency
» atomic single-row updates
» M/R for index regeneration




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25
falling in love with Hbase : phase 3


 HBase
» datamodel with column families and cell versioning
» ordered tables with range scans
» HDFS for blob storage
» Apache



        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   26
Lily Repository Model




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   27
Lily Datatypes




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   28
Mixins




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   29
Sample Lily Schema (excerpt)
                                                                 

{
namespaces:
{
                                                                 



name:
"b$name",




/*
Declaration
of
namespace
prefixes.
*/
                                                                 



valueType:
{
primitive:
"STRING"
},




"org.lilyproject.bookssample":
"b",
                                                                 



scope:
"versioned"




"org.lilyproject.vtag":
"vtag"
                                                                 

},


},
                                                                 

{
fieldTypes:
[
                                                                 



name:
"b$bio",


{
                                                                 



valueType:
{
primitive:
"STRING"
},




name:
"b$title",
                                                                 



scope:
"versioned"




valueType:
{
primitive:
"STRING"
},
                                                                 

},




scope:
"versioned"
                                                                 

{


},
                                                                 



name:
"vtag$last",


{
                                                                 



valueType:
{
primitive:
"LONG"
},




name:
"b$pages",
                                                                 



scope:
"non_versioned"




valueType:
{
primitive:
"INTEGER"
},
                                                                 

}




scope:
"versioned"
                                                                 

],


},
                                                                 recordTypes:
[


{
                                                                 

{




name:
"b$language",
                                                                 



name:
"b$Book",




valueType:
{
primitive:
"STRING"
},
                                                                 



fields:
[




scope:
"versioned"
                                                                 





{name:
"b$title",
mandatory:
true
},


},
                                                                 





{name:
"b$pages",
mandatory:
false
},


{
                                                                 





{name:
"b$language",
mandatory:
false
},




name:
"b$authors",
                                                                 





{name:
"b$authors",
mandatory:
false
},




valueType:
{
primitive:
"LINK",
multiValue:
true
},
                                                                 





{name:
"vtag$last",
mandatory:
false
}




scope:
"versioned"
                                                                 



]


},
                                                                 

},

                                                                 ...


                    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                     30
Lily Versioning




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   31
Flexible content model
» generic enough to accomodate many popular content
 schemas
 » HTML5, CMIS, RDF, NewsML, Dublin Core, ...
 » academically verified
 » not limited to ‘content applications’ only
» developer convenience
 » higher level constructs
 » schema reuse
 » versioning, linking, ...

         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   32
Lily Architecture
(deployment)




           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   33
Lily Architecture
                    (components)




                                   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   34
HBase RowLog Library


» need for sync/async operations
 » updating of secondary indexes (i.e. tables)
 » feeding of Indexer (= bridge to SOLR index maintenance)
» not: transactions
» need for distribution and durability




         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   35
HBase RowLog Library
» WAL                                                    » Queue
 » guaranteed execution of synchronous                      » triggering of async actions
   actions
                                                            » e.g. (re)index (updated) record with
 » call doesn’t return before secondary
                                                                SOLR back-end
   action finishes
                                                            » size depends on speed of back-end
 » e.g. update secondary index tables
                                                                process
 » if all goes well,
   size = #concurrent ops
 » useful outside of Lily context as well!




                IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            36
The Lily Indexer

                                                                                                   sharding towards
                  indexing of multiple   incremental index                          blob content
denormalization                                              batch index building                   multiple SOLR
                  versions of a record        updating                               extraction
                                                                                                       instances




                     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                        37
Indexing configuration (SOLR)
<schema name="example" version="1.2">

<types>
  [snipped: see SOLR example schema]
</types>

 <fields>
   <!-- Fields which are required by Lily -->
   <field name="@@key" type="string" indexed="true" stored="true" required="true"/>
   <field name="@@id" type="string" indexed="true" stored="true" required="true"/>
   <field name="@@vtag" type="string" indexed="true" stored="true" required="true"/>
   <field name="@@versionless" type="string" indexed="true" stored="true" required="false"/>

  <!-- Your own fields -->
  <field name="title" type="text" indexed="true" stored="true" required="false"/>
  <field name="authors" type="text" indexed="true" stored="true" required="false"
                                                                 multiValued="true"/>
</fields>

<uniqueKey>@@key</uniqueKey>

<defaultSearchField>title</defaultSearchField>

<solrQueryParser defaultOperator="OR"/>

</schema>




             IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org         38
Indexer configuration (Lily)
<?xml version="1.0"?>
<indexer xmlns:b="org.lilyproject.bookssample">
  <cases>
    <case recordType="b:Book" variant="*" vtags="last" indexVersionless="true"/>
  </cases>

  <indexFields>
    <indexField name="title">
      <value>
        <field name="b:title"/>
      </value>
    </indexField>

    <indexField name="authors">
      <value>
        <deref>
          <follow field="b:authors"/>
          <field name="b:name"/>
        </deref>
      </value>
    </indexField>
  </indexFields>

</indexer>




             IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   39
(opt.) Sharding configuration
{
  shardingKey: {
    value: {
      source: "variantProperty",
      property: "language"
    },
    type: "string"
  },

  mapping: {
    type: "list",
    entries: [
      { shard: "shard1", values: ["en", "it"] },
      { shard: "shard2", values: ["nl", "de", "es"] }
    ]
  }
}




             IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   40
Lily API


» Java (using Avro)
  » http://docs.outerthought.org/lily-docs-current/g3/g1/390-lily.html

» REST (HTTP + JSON)
  » http://docs.outerthought.org/lily-docs-current/g3/g2/427-lily.html

» All docs
  » http://docs.outerthought.org/lily-docs-current/ext/toc/




              IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   41
Demo
» http://outerthought.blip.tv/file/4245615/




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   42
Lily and HBase

» adds high-level content model
 » data types
 » versioning
 » blob storage on HDFS
» focus on sparse (efficient) storage
» RowLog for synchronous cross-table updates and async
 message queues

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   43
Lily and SOLR

» provides flexible mapping between HBase content
  model and SOLR index fields
» interactive and batch (M/R) index maintenance
» sharding
» use(s) SOLR as-is: loose, flexible, extensible coupling
» search access via SOLR (HTTP) API



         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   44
Lily and CDH

» we intend to rely on CDH-‘blessed’ versions of HBase/
 HDFS/ZK
 » 700 patches and testing
» next: adopting similar distribution lay-out
» since we contribute patches to ASF HBase trunk, we would
  expect CDH to track closely (until HBase 1.0)
» some Lily users could be interested in ‘CDH-level’ services


        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   45
goodbye


» It’s open source !
» Content Repository: available now
  (Lily model + HBase + SOLR + RowLog)
» Lily 1.0 soon, will mainly focus on differentiating open
  source and enterprise edition
» “HBase is wa de max maat.”



         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   46
Thank you !
                               for your attention
                               for your questions

                               » stevenn@outerthought.org

                               »           @stevenn

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

More Related Content

Similar to Lily at HUG UK

Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of DataNGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work WebinarNGDATA
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsMarcos Álvarez-Díaz
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNGDATA
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversNGDATA
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011SEO CAMP
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopDataWorks Summit
 
Big Data For Investment Research Management
Big Data For Investment Research ManagementBig Data For Investment Research Management
Big Data For Investment Research ManagementIDT Partners
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docbutest
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseNGDATA
 
Some Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information TechnologySome Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information TechnologyFranz-Josef Behr
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics CourseSyracuse University
 
Summit 2011 ods edw technical
Summit 2011 ods edw technicalSummit 2011 ods edw technical
Summit 2011 ods edw technicalGreg Turmel
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵CHENHuiMei
 

Similar to Lily at HUG UK (20)

Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business Analytics
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database servers
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
 
Big Data For Investment Research Management
Big Data For Investment Research ManagementBig Data For Investment Research Management
Big Data For Investment Research Management
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
Some Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information TechnologySome Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information Technology
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics Course
 
Data mining
Data miningData mining
Data mining
 
Summit 2011 ods edw technical
Summit 2011 ods edw technicalSummit 2011 ods edw technical
Summit 2011 ods edw technical
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
 

More from NGDATA

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog libraryNGDATA
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghentNGDATA
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaNGDATA
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesNGDATA
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at DevoxxNGDATA
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNGDATA
 

More from NGDATA (11)

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
 
Big Data
Big DataBig Data
Big Data
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologies
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Lily at HUG UK

  • 1. Lily Smart data at scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 2. big data, big problems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 3. MOORE vs data data » coping with volume + need for timeliness = parallel processing moore » data becomes business-critical = resilience through distributed architectures » Hadoop, MapReduce, HBase: the future data platform IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  • 4. the CHALLENGES » process ALL data » process data in REAL-TIME » derive INSIGHTS » provide INSTANT FEEDBACK IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  • 5. current thinking ETL data data STORE analytics warehouse batched, off-line, overnight IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  • 6. 1. store and manage all YOUR data DATA IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  • 7. 2. store user behaviour, nearby DATA USER Behavior IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  • 8. 3. analyze usage patterns DATA data processing USER Behavior IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  • 9. 4. add domain knowledge DATA data processing USER Behavior domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  • 10. 5. process, in real-time DATA data processing recommendations semantic augmentation Analytics USER Behavior domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  • 11. 6. augment data DATA data processing recommendations semantic augmentation Analytics USER Behavior domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  • 12. data insights SMARTER DATA data processing s relation recommendations semantic augmentation Analytics domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  • 13. data insights SMARTER DATA data processing s relation recommendations semantic augmentation Analytics domain knowledge patterns rules keywords lists SMART DATA, at SCALE ... ... and in real time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  • 14. stories IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 15. HYPER-PERSONAL recommendations NEWS TOGETHERNESS interestingness organisations names locations brands news aggregator scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  • 16. up-selling CROSS-SELLING product CATALOG recommendedness relatedness product families related activities social graph e-retail real-time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • 17. competitive innovation patents (dis)SIMILARITY companies people materials processes IP research insights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • 18. Outerthought » software product company » scalable content applications » open source product portfolio » Java, REST, internet THIS NOTEBOOK BELONGS TO: “The world is moving Noteblock_03.indd 1 23/05/10 14:42 from content as a cost to data as an opportunity.” IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • 19. Lily 1.0 (CR) data data STORE + warehouse + analytics real time } Lily 2.0 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  • 20. Lily (now) » Large-scale content storage, indexing and search » Current pilots e-retail mobile media isp e-gov ip research » up-to now: 4 man-years investment (since Sept/2009) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  • 21. roadmap » now: Lily 0.3 » Along the road: » april 2011 : Lily 1.0 Lily SaaS edition » Q3 2011 » real-time statistics + analytics » Q2 2012 : Lily 2.0 » real-time data processing engine » Data Insights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • 22. open source » www.lilyproject.org » docs.outerthought.org/lily-docs-current/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  • 23. Lily Core Concepts » storage » HBase » repository model » versioning, varianting, mixins » indexing » mapping » search » SOLR IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  • 24. falling in love with Hbase : phase 1 » automatic scaling to large data sets » fault-tolerance » flexible datamodel with sparse data » commodity hardware » efficient random access » community-based open source » Java if possible IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  • 25. falling in love with Hbase : phase 2 » need for consistency » atomic single-row updates » M/R for index regeneration IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 26. falling in love with Hbase : phase 3 HBase » datamodel with column families and cell versioning » ordered tables with range scans » HDFS for blob storage » Apache IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  • 27. Lily Repository Model IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
  • 28. Lily Datatypes IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  • 29. Mixins IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29
  • 30. Sample Lily Schema (excerpt) 

{ namespaces:
{ 



name:
"b$name", 



/*
Declaration
of
namespace
prefixes.
*/ 



valueType:
{
primitive:
"STRING"
}, 



"org.lilyproject.bookssample":
"b", 



scope:
"versioned" 



"org.lilyproject.vtag":
"vtag" 

}, 

}, 

{ fieldTypes:
[ 



name:
"b$bio", 

{ 



valueType:
{
primitive:
"STRING"
}, 



name:
"b$title", 



scope:
"versioned" 



valueType:
{
primitive:
"STRING"
}, 

}, 



scope:
"versioned" 

{ 

}, 



name:
"vtag$last", 

{ 



valueType:
{
primitive:
"LONG"
}, 



name:
"b$pages", 



scope:
"non_versioned" 



valueType:
{
primitive:
"INTEGER"
}, 

} 



scope:
"versioned" 

], 

}, recordTypes:
[ 

{ 

{ 



name:
"b$language", 



name:
"b$Book", 



valueType:
{
primitive:
"STRING"
}, 



fields:
[ 



scope:
"versioned" 





{name:
"b$title",
mandatory:
true
}, 

}, 





{name:
"b$pages",
mandatory:
false
}, 

{ 





{name:
"b$language",
mandatory:
false
}, 



name:
"b$authors", 





{name:
"b$authors",
mandatory:
false
}, 



valueType:
{
primitive:
"LINK",
multiValue:
true
}, 





{name:
"vtag$last",
mandatory:
false
} 



scope:
"versioned" 



] 

}, 

}, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
  • 31. Lily Versioning IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
  • 32. Flexible content model » generic enough to accomodate many popular content schemas » HTML5, CMIS, RDF, NewsML, Dublin Core, ... » academically verified » not limited to ‘content applications’ only » developer convenience » higher level constructs » schema reuse » versioning, linking, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
  • 33. Lily Architecture (deployment) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  • 34. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
  • 35. HBase RowLog Library » need for sync/async operations » updating of secondary indexes (i.e. tables) » feeding of Indexer (= bridge to SOLR index maintenance) » not: transactions » need for distribution and durability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  • 36. HBase RowLog Library » WAL » Queue » guaranteed execution of synchronous » triggering of async actions actions » e.g. (re)index (updated) record with » call doesn’t return before secondary SOLR back-end action finishes » size depends on speed of back-end » e.g. update secondary index tables process » if all goes well, size = #concurrent ops » useful outside of Lily context as well! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  • 37. The Lily Indexer sharding towards indexing of multiple incremental index blob content denormalization batch index building multiple SOLR versions of a record updating extraction instances IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  • 38. Indexing configuration (SOLR) <schema name="example" version="1.2"> <types> [snipped: see SOLR example schema] </types> <fields> <!-- Fields which are required by Lily --> <field name="@@key" type="string" indexed="true" stored="true" required="true"/> <field name="@@id" type="string" indexed="true" stored="true" required="true"/> <field name="@@vtag" type="string" indexed="true" stored="true" required="true"/> <field name="@@versionless" type="string" indexed="true" stored="true" required="false"/> <!-- Your own fields --> <field name="title" type="text" indexed="true" stored="true" required="false"/> <field name="authors" type="text" indexed="true" stored="true" required="false" multiValued="true"/> </fields> <uniqueKey>@@key</uniqueKey> <defaultSearchField>title</defaultSearchField> <solrQueryParser defaultOperator="OR"/> </schema> IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  • 39. Indexer configuration (Lily) <?xml version="1.0"?> <indexer xmlns:b="org.lilyproject.bookssample"> <cases> <case recordType="b:Book" variant="*" vtags="last" indexVersionless="true"/> </cases> <indexFields> <indexField name="title"> <value> <field name="b:title"/> </value> </indexField> <indexField name="authors"> <value> <deref> <follow field="b:authors"/> <field name="b:name"/> </deref> </value> </indexField> </indexFields> </indexer> IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
  • 40. (opt.) Sharding configuration {   shardingKey: {     value: {       source: "variantProperty",       property: "language"     },     type: "string"   },   mapping: {     type: "list",     entries: [       { shard: "shard1", values: ["en", "it"] },       { shard: "shard2", values: ["nl", "de", "es"] }     ]   } } IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 40
  • 41. Lily API » Java (using Avro) » http://docs.outerthought.org/lily-docs-current/g3/g1/390-lily.html » REST (HTTP + JSON) » http://docs.outerthought.org/lily-docs-current/g3/g2/427-lily.html » All docs » http://docs.outerthought.org/lily-docs-current/ext/toc/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 41
  • 42. Demo » http://outerthought.blip.tv/file/4245615/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 42
  • 43. Lily and HBase » adds high-level content model » data types » versioning » blob storage on HDFS » focus on sparse (efficient) storage » RowLog for synchronous cross-table updates and async message queues IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 43
  • 44. Lily and SOLR » provides flexible mapping between HBase content model and SOLR index fields » interactive and batch (M/R) index maintenance » sharding » use(s) SOLR as-is: loose, flexible, extensible coupling » search access via SOLR (HTTP) API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 44
  • 45. Lily and CDH » we intend to rely on CDH-‘blessed’ versions of HBase/ HDFS/ZK » 700 patches and testing » next: adopting similar distribution lay-out » since we contribute patches to ASF HBase trunk, we would expect CDH to track closely (until HBase 1.0) » some Lily users could be interested in ‘CDH-level’ services IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
  • 46. goodbye » It’s open source ! » Content Repository: available now (Lily model + HBase + SOLR + RowLog) » Lily 1.0 soon, will mainly focus on differentiating open source and enterprise edition » “HBase is wa de max maat.” IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 46
  • 47. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org