SlideShare a Scribd company logo
Smart data,
                 Lily                                                                        at scale
                                                                                             madE easy




                      from content storage
                      to scaling smart data


                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   2

maandag 6 juni 2011
the pain

                                                                    data


                                             need for
                                            distributed
                                            processing
                                                                             moore




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   3

maandag 6 juni 2011
the pain

              » growth of data sets
              » smart businesses need
                to apply analytics to                                            Smart data,
                activities
                                                                                 at scale
              » doing business online
                means real-time
                                                                                 madE easy
              » talent shortage


                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   4

maandag 6 juni 2011
LILY


                  The Real-time Platform built for the Age of Data.

                  We manage, track and measure your data and users,
                  and do the mat(c)hmaking in-between:
                  » provide you with business intelligence and analytics
                  » harvest user profiles and learn their interests
                  » dynamically engage your users using quality recommendations



                         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   5

maandag 6 juni 2011
where would you use lily?
        » large collections of data                              » large groups of users
             » content repositories                                 » e-commerce / retail
             » library catalogs                                     » news / media
             » (media) asset management
             » product catalogs
             » ‘live’ archives

                                                                 » ... if you want to use big
                                                                    data, but you need easy.
                        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   6

maandag 6 juni 2011
ns
                                                                           pe
                                                                           ap
                                                                      gic h
                                                                   ma
                                                                  he
                                                                  t
                                                               re
                                                             he
                                                          sw
                                                          si
                                                     +
                                                       thi




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   7

maandag 6 juni 2011
beyond content management

                                                                    marketing
                            broadcast




                                                                     revenue

                                                               product / service


                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   8

maandag 6 juni 2011
beyond content management: data + analytics

                                                              recommendations
                         call to action



                      personalised

                                                                     revenue

                                                               product / service
                                                         audience data
                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   9

maandag 6 juni 2011
LILY 2.0: smart data
                                      SMARTER DATA                 data processing
                                                  s
                                          relation
                                                                     recommendations
                                                                     semantic augmentation
                                                                     Analytics




                                                          usage
                                                         metrics              domain
                                                                            knowledge
                                                                              patterns
                                                                              rules
                                                                              keywords
                                                                              lists
                                                                              ...




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10

maandag 6 juni 2011
roadmap
              » now: highly-scalable data repository: store, index and search
              » next: with real-time usage stats gathering and analytics
              » later: and built-in context- and user-sensitive
                  recommendations

              » built on top of Google BigTable / HBase / Solr
                  » identical, robust technology in use at Facebook, Twitter,
                      StumbleUpon, Yahoo!
                  » scales widely over distributed (cloud) infrastructure

                           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11

maandag 6 juni 2011
Lily Repository Model




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   12

maandag 6 juni 2011
Sample Lily Schema (excerpt)
                                                                      

{
     namespaces:
{
                                                                      



name:
"b$name",
     



/*
Declaration
of
namespace
prefixes.
*/
                                                                      



valueType:
{
primitive:
"STRING"
},
     



"org.lilyproject.bookssample":
"b",
                                                                      



scope:
"versioned"
     



"org.lilyproject.vtag":
"vtag"
                                                                      

},
     

},
                                                                      

{
     fieldTypes:
[
                                                                      



name:
"b$bio",
     

{
                                                                      



valueType:
{
primitive:
"STRING"
},
     



name:
"b$title",
                                                                      



scope:
"versioned"
     



valueType:
{
primitive:
"STRING"
},
                                                                      

},
     



scope:
"versioned"
                                                                      

{
     

},
                                                                      



name:
"vtag$last",
     

{
                                                                      



valueType:
{
primitive:
"LONG"
},
     



name:
"b$pages",
                                                                      



scope:
"non_versioned"
     



valueType:
{
primitive:
"INTEGER"
},
                                                                      

}
     



scope:
"versioned"
                                                                      

],
     

},
                                                                      recordTypes:
[
     

{
                                                                      

{
     



name:
"b$language",
                                                                      



name:
"b$Book",
     



valueType:
{
primitive:
"STRING"
},
                                                                      



fields:
[
     



scope:
"versioned"
                                                                      





{name:
"b$title",
mandatory:
true
},
     

},
                                                                      





{name:
"b$pages",
mandatory:
false
},
     

{
                                                                      





{name:
"b$language",
mandatory:
false
},
     



name:
"b$authors",
                                                                      





{name:
"b$authors",
mandatory:
false
},
     



valueType:
{
primitive:
"LINK",
multiValue:
true
},
                                                                      





{name:
"vtag$last",
mandatory:
false
}
     



scope:
"versioned"
                                                                      



]
     

},
                                                                      

},

                                                                      ...


                         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                     13

maandag 6 juni 2011
Lily Architecture
             (deployment)




                        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   14

maandag 6 juni 2011
Lily Architecture
                           (components)




                                          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   15

maandag 6 juni 2011
HBase indexing & RowLog Library
        » building and querying                                             » need for sync/async
            indexes, GAE-style                                                 operations
                                                                               » updating of secondary indexes
                          rowkey            col          col
            content
                              A             val3         foo6                      (e.g. link tables)
              table
                              B             val2         foo7
                                                                               » feeding of Indexer
                                                                                   (= indexes Lily-content into Solr)
                                   rowkey          col                      » not: transactions
                      order




              index
            table A                val2-B
                                   val3-A                                   » need for distribution and
                                                                               durability

                                   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            16

maandag 6 juni 2011
The Lily Indexer

                                                                                                          sharding towards
                         indexing of multiple   incremental index                          blob content
       denormalization                                              batch index building                   multiple SOLR
                         versions of a record        updating                               extraction
                                                                                                              instances




                            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                        17

maandag 6 juni 2011
status june 2011

              » Lily 1.0.1 released - developing since Q4/09
              » some customers - DIY retail / media / news
                  » e-commerce platform project
                  » Lily as the data (integration) tier
              » first contrib: FrogPond (annotated Java <> Lily mapper)
                  https://bitbucket.org/calmera/frogpond


                          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   18

maandag 6 juni 2011
Next up: usage stats
              » sits in CRUD-path
              » tracks users ops against
                  records
                                                                                                interactions
                  » from both perspectives
                                                                             record                                  user
                  » arbitrary K/V properties: time,
                      location, ...




                                                                                rec
              » automatically builds user



                                                                                 om
                                                                                   me
                                                                                      nd
                                                                                      ati
                                                                                         o
                  profiles (as records)


                                                                                           ns
                                                                                                 indexes




                                                                                                                e
                                                                                                               tim
                  » tied to records ops
                  » indexed access
                  » time dimension: trending

                              IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                     19

maandag 6 juni 2011
from usage stats to recommendations ‘light’

                                record                                                     user



              » grouping of users based on
                  » shared properties
                  » shared record access
              » grouping of records based on
                  » shared properties
                                                                                     {     connections


                  » shared user operations                                            recommendations

                         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org       20

maandag 6 juni 2011
full-on recommendations

              » look at real-time-capable Mahout algorithms
              » pre-index or -calculate as much as possible
                  » save as secondary indexes
                  » present recommendations as part of record API
              » allow user to contribute ‘domain knowledge’ to
                  record processing pipeline
                  » pattern detection, keywords, ontologies, ...



                          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21

maandag 6 juni 2011
timeline


              » Lily + usage stats                                                                 10/2011
              » Lily + usage stats + light-weight analytics                                        12/2011
              » Lily + recommendations ‘light’                                                      3/2012
              » Lily 2.0 : full-on recommendations                                                  6/2012



                       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org             22

maandag 6 juni 2011
lily enterprise


              » adds tools:
                  » yum/deb package repo
                  » cluster deploy scripts
                      (also EC2)
                  » Admin UI
              » + enterprise support



                            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23

maandag 6 juni 2011
demo (if time permits)


                       message                                                 part
                      ‣to                                             ‣content
                      ‣from                                           ‣mediaType
                      ‣parts                                          ‣message
                      ‣listId
                      ‣subject
                      ‣sender




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   24

maandag 6 juni 2011
WHERE?



                                        www.lilyproject.org




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25

maandag 6 juni 2011
Thank you !
                                                   for your attention
                                                   for your questions

                                                   » stevenn@outerthought.org

                                                   »           @stevenn

                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

maandag 6 juni 2011

More Related Content

Similar to From Content Storage to Scaling Smart Data

Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
NGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
NGDATA
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
NGDATA
 
Huguk lily
Huguk lilyHuguk lily
Huguk lily
Skills Matter
 
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyHadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
Cloudera, Inc.
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
NGDATA
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
NGDATA
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NGDATA
 
ODI Overview 2013-04-09
ODI Overview 2013-04-09ODI Overview 2013-04-09
ODI Overview 2013-04-09
theODI
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
NGDATA
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business Analytics
Marcos Álvarez-Díaz
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologies
NGDATA
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
Gregory Piatetsky-Shapiro
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database servers
NGDATA
 
Beyond Online PDFs
Beyond Online PDFs Beyond Online PDFs
Beyond Online PDFs
Ocean Protocol
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
NGDATA
 
Predictive Analytics Innovation Summit
Predictive Analytics Innovation SummitPredictive Analytics Innovation Summit
Predictive Analytics Innovation Summit
Fractal Analytics
 
ODOO_klab
ODOO_klabODOO_klab
ODOO_klab
Rajiv Ranjan
 
W-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate EvolutionW-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate Evolution
jstogdill
 
Nfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnellyNfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnelly
Shane Dempsey
 

Similar to From Content Storage to Scaling Smart Data (20)

Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
 
Huguk lily
Huguk lilyHuguk lily
Huguk lily
 
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyHadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
ODI Overview 2013-04-09
ODI Overview 2013-04-09ODI Overview 2013-04-09
ODI Overview 2013-04-09
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business Analytics
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologies
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database servers
 
Beyond Online PDFs
Beyond Online PDFs Beyond Online PDFs
Beyond Online PDFs
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
Predictive Analytics Innovation Summit
Predictive Analytics Innovation SummitPredictive Analytics Innovation Summit
Predictive Analytics Innovation Summit
 
ODOO_klab
ODOO_klabODOO_klab
ODOO_klab
 
W-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate EvolutionW-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate Evolution
 
Nfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnellyNfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnelly
 

More from NGDATA

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
NGDATA
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
NGDATA
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
NGDATA
 
Big Data
Big DataBig Data
Big Data
NGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
NGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
NGDATA
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
NGDATA
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
NGDATA
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
NGDATA
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
NGDATA
 

More from NGDATA (10)

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
 
Big Data
Big DataBig Data
Big Data
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
 

Recently uploaded

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

From Content Storage to Scaling Smart Data

  • 1. Smart data, Lily at scale madE easy from content storage to scaling smart data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011
  • 2. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2 maandag 6 juni 2011
  • 3. the pain data need for distributed processing moore IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3 maandag 6 juni 2011
  • 4. the pain » growth of data sets » smart businesses need to apply analytics to Smart data, activities at scale » doing business online means real-time madE easy » talent shortage IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4 maandag 6 juni 2011
  • 5. LILY The Real-time Platform built for the Age of Data. We manage, track and measure your data and users, and do the mat(c)hmaking in-between: » provide you with business intelligence and analytics » harvest user profiles and learn their interests » dynamically engage your users using quality recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5 maandag 6 juni 2011
  • 6. where would you use lily? » large collections of data » large groups of users » content repositories » e-commerce / retail » library catalogs » news / media » (media) asset management » product catalogs » ‘live’ archives » ... if you want to use big data, but you need easy. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6 maandag 6 juni 2011
  • 7. ns pe ap gic h ma he t re he sw si + thi IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7 maandag 6 juni 2011
  • 8. beyond content management marketing broadcast revenue product / service IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8 maandag 6 juni 2011
  • 9. beyond content management: data + analytics recommendations call to action personalised revenue product / service audience data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9 maandag 6 juni 2011
  • 10. LILY 2.0: smart data SMARTER DATA data processing s relation recommendations semantic augmentation Analytics usage metrics domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10 maandag 6 juni 2011
  • 11. roadmap » now: highly-scalable data repository: store, index and search » next: with real-time usage stats gathering and analytics » later: and built-in context- and user-sensitive recommendations » built on top of Google BigTable / HBase / Solr » identical, robust technology in use at Facebook, Twitter, StumbleUpon, Yahoo! » scales widely over distributed (cloud) infrastructure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11 maandag 6 juni 2011
  • 12. Lily Repository Model IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12 maandag 6 juni 2011
  • 13. Sample Lily Schema (excerpt) 

{ namespaces:
{ 



name:
"b$name", 



/*
Declaration
of
namespace
prefixes.
*/ 



valueType:
{
primitive:
"STRING"
}, 



"org.lilyproject.bookssample":
"b", 



scope:
"versioned" 



"org.lilyproject.vtag":
"vtag" 

}, 

}, 

{ fieldTypes:
[ 



name:
"b$bio", 

{ 



valueType:
{
primitive:
"STRING"
}, 



name:
"b$title", 



scope:
"versioned" 



valueType:
{
primitive:
"STRING"
}, 

}, 



scope:
"versioned" 

{ 

}, 



name:
"vtag$last", 

{ 



valueType:
{
primitive:
"LONG"
}, 



name:
"b$pages", 



scope:
"non_versioned" 



valueType:
{
primitive:
"INTEGER"
}, 

} 



scope:
"versioned" 

], 

}, recordTypes:
[ 

{ 

{ 



name:
"b$language", 



name:
"b$Book", 



valueType:
{
primitive:
"STRING"
}, 



fields:
[ 



scope:
"versioned" 





{name:
"b$title",
mandatory:
true
}, 

}, 





{name:
"b$pages",
mandatory:
false
}, 

{ 





{name:
"b$language",
mandatory:
false
}, 



name:
"b$authors", 





{name:
"b$authors",
mandatory:
false
}, 



valueType:
{
primitive:
"LINK",
multiValue:
true
}, 





{name:
"vtag$last",
mandatory:
false
} 



scope:
"versioned" 



] 

}, 

}, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13 maandag 6 juni 2011
  • 14. Lily Architecture (deployment) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14 maandag 6 juni 2011
  • 15. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15 maandag 6 juni 2011
  • 16. HBase indexing & RowLog Library » building and querying » need for sync/async indexes, GAE-style operations » updating of secondary indexes rowkey col col content A val3 foo6 (e.g. link tables) table B val2 foo7 » feeding of Indexer (= indexes Lily-content into Solr) rowkey col » not: transactions order index table A val2-B val3-A » need for distribution and durability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16 maandag 6 juni 2011
  • 17. The Lily Indexer sharding towards indexing of multiple incremental index blob content denormalization batch index building multiple SOLR versions of a record updating extraction instances IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17 maandag 6 juni 2011
  • 18. status june 2011 » Lily 1.0.1 released - developing since Q4/09 » some customers - DIY retail / media / news » e-commerce platform project » Lily as the data (integration) tier » first contrib: FrogPond (annotated Java <> Lily mapper) https://bitbucket.org/calmera/frogpond IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18 maandag 6 juni 2011
  • 19. Next up: usage stats » sits in CRUD-path » tracks users ops against records interactions » from both perspectives record user » arbitrary K/V properties: time, location, ... rec » automatically builds user om me nd ati o profiles (as records) ns indexes e tim » tied to records ops » indexed access » time dimension: trending IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19 maandag 6 juni 2011
  • 20. from usage stats to recommendations ‘light’ record user » grouping of users based on » shared properties » shared record access » grouping of records based on » shared properties { connections » shared user operations recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20 maandag 6 juni 2011
  • 21. full-on recommendations » look at real-time-capable Mahout algorithms » pre-index or -calculate as much as possible » save as secondary indexes » present recommendations as part of record API » allow user to contribute ‘domain knowledge’ to record processing pipeline » pattern detection, keywords, ontologies, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21 maandag 6 juni 2011
  • 22. timeline » Lily + usage stats 10/2011 » Lily + usage stats + light-weight analytics 12/2011 » Lily + recommendations ‘light’ 3/2012 » Lily 2.0 : full-on recommendations 6/2012 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22 maandag 6 juni 2011
  • 23. lily enterprise » adds tools: » yum/deb package repo » cluster deploy scripts (also EC2) » Admin UI » + enterprise support IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23 maandag 6 juni 2011
  • 24. demo (if time permits) message part ‣to ‣content ‣from ‣mediaType ‣parts ‣message ‣listId ‣subject ‣sender IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24 maandag 6 juni 2011
  • 25. WHERE? www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25 maandag 6 juni 2011
  • 26. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011