A content repository for your
 PHP application or CMS?


  August 20, 2011   Sankt Augustin




  Paul Borgermans & Henri Bergius
About me
●   Active in open source / PHP community for a while
           –   PHP based CMS solutions (mostly eZ Publish)
           –              board member
●   Fancying :
     –   Apache family of projects (mainly Solr)
     –   NoSQL (Not only SQL) and scalable architectures
     –   eZ Publish & CMS systems in general
     –   Semantic aspects
●   Contact
               paul.borgermans@gmail.com
               @paulborgermans
The Pitch


In many cases of web based
applications, a content repository is
a better alternative for managing
and serving content


   (as opposed to the lower level SQL stores)
Architecture

Traditional integrated approach      Decoupled approach




 Content Management System

       Web Application

                                           Framework

                   Abstraction?


                                     CR                CR
           Database
OK, but what is a content repository?
A content repository is a provider of ...
●   Storage
       –   Flexibility in content modeling
       –   Durability
       –   Scalability / Performance
●   Services
       –   Read/Write of content, versioning
       –   Access control
       –   Information retrieval
       –   (Analytics)
       –   (Semantics)
Flexibility
●   You want it to swallow anything
●   The less data design implications, the better
●   Run-time, scriptable schema's

●   Let it map to your data-model effortlessly
        –   Mixing structured and un-structered data/blobs
●   From SQL to Object/Document oriented access
        –   Much more natural for most application domains
Durability


●   ACID, damn you!
         Of course you want it to be safe
          .. But it might be a trade-off for performance
●   Implicit / Explicit versioning (when desired)
Performance / Scalability
●   Maybe not always a
    concern




●   But should not be your
    concern beyond checking
    that it is scalable!
Services
●   Versioning
●   Information retrieval
        –   Rich, complex queries/fetches
        –   Full-text search
        –   References / Relations
●   Access control
        –   Plug-in mechanisms desired
        –   Mapping of domain specific rules (to the CR)
●   Analytics / Semantics
        –   Plugins / Tools
Challenges
●   Standardisation in API's
        –   Main API is very proper to
             underlying systems
        –   CMIS
        –   PHPCR
●   Mobile
        –   Content optimisation
        –   Extra analytics (location,
             context)
        –   Off-line use
A selection of possible engines to drive a CR
               (in NoSQL land)
CouchDB
●   Content modeling:
    Document oriented, schema free
●   API: RESTful, (PHP wrapper @koredn)
●   Scalability: distributed, master-master
●   Robustness: ACID compliant
●   Built-in full text search: no
●   Extra
        –   Off-line use cases
        –   Map / Reduce
MongoDB
●   Content modeling:
    Document oriented, schema free
●   API: binary protocol, PHP extension available
●   Scalability: Master-Slave, Sharding
●   Robustness: at a cost
●   Built-in full text search: no
●   Extra
        –   Updates in place for fields
        –   Rich (ugly?) query syntax
Hbase
●   Content modeling:
    Google big table clone, column oriented
●   API: Thrift, HTTP
●   Scalability: excellent
●   Robustness:
    Not entirely ACID, but still very good
●   Built-in full text search: no
●   Extra
         –   Built in versioning
         –   Swallows large blobs easily
Apache Solr
●   Not so much storage
    (but can be a caching storage layer too)
●   Very rich and powerful Information Retrieval Engine
●   API: HTTP, Java, several PHP wrappers
●   Scalability: very good, and getting even better*
●   Robustness
        –   SolrCloud*
●   Extra: join-like queries*



              * Solr 4.0
Lily

http://www.lilyproject.org/
Lily, “big data” content repository
●   Provides a very rich feature set
●   RESTful, Java API
●   Building on




●   Apache license
Over to Henri …
http://www.slideshare.net/bergie/phpcr-standard-content-repository-for-php

A content repository for your PHP application or CMS?

  • 1.
    A content repositoryfor your PHP application or CMS? August 20, 2011 Sankt Augustin Paul Borgermans & Henri Bergius
  • 2.
    About me ● Active in open source / PHP community for a while – PHP based CMS solutions (mostly eZ Publish) – board member ● Fancying : – Apache family of projects (mainly Solr) – NoSQL (Not only SQL) and scalable architectures – eZ Publish & CMS systems in general – Semantic aspects ● Contact paul.borgermans@gmail.com @paulborgermans
  • 3.
    The Pitch In manycases of web based applications, a content repository is a better alternative for managing and serving content (as opposed to the lower level SQL stores)
  • 4.
    Architecture Traditional integrated approach Decoupled approach Content Management System Web Application Framework Abstraction? CR CR Database
  • 5.
    OK, but whatis a content repository?
  • 6.
    A content repositoryis a provider of ... ● Storage – Flexibility in content modeling – Durability – Scalability / Performance ● Services – Read/Write of content, versioning – Access control – Information retrieval – (Analytics) – (Semantics)
  • 7.
    Flexibility ● You want it to swallow anything ● The less data design implications, the better ● Run-time, scriptable schema's ● Let it map to your data-model effortlessly – Mixing structured and un-structered data/blobs ● From SQL to Object/Document oriented access – Much more natural for most application domains
  • 8.
    Durability ● ACID, damn you! Of course you want it to be safe .. But it might be a trade-off for performance ● Implicit / Explicit versioning (when desired)
  • 9.
    Performance / Scalability ● Maybe not always a concern ● But should not be your concern beyond checking that it is scalable!
  • 10.
    Services ● Versioning ● Information retrieval – Rich, complex queries/fetches – Full-text search – References / Relations ● Access control – Plug-in mechanisms desired – Mapping of domain specific rules (to the CR) ● Analytics / Semantics – Plugins / Tools
  • 11.
    Challenges ● Standardisation in API's – Main API is very proper to underlying systems – CMIS – PHPCR ● Mobile – Content optimisation – Extra analytics (location, context) – Off-line use
  • 12.
    A selection ofpossible engines to drive a CR (in NoSQL land)
  • 13.
    CouchDB ● Content modeling: Document oriented, schema free ● API: RESTful, (PHP wrapper @koredn) ● Scalability: distributed, master-master ● Robustness: ACID compliant ● Built-in full text search: no ● Extra – Off-line use cases – Map / Reduce
  • 14.
    MongoDB ● Content modeling: Document oriented, schema free ● API: binary protocol, PHP extension available ● Scalability: Master-Slave, Sharding ● Robustness: at a cost ● Built-in full text search: no ● Extra – Updates in place for fields – Rich (ugly?) query syntax
  • 15.
    Hbase ● Content modeling: Google big table clone, column oriented ● API: Thrift, HTTP ● Scalability: excellent ● Robustness: Not entirely ACID, but still very good ● Built-in full text search: no ● Extra – Built in versioning – Swallows large blobs easily
  • 16.
    Apache Solr ● Not so much storage (but can be a caching storage layer too) ● Very rich and powerful Information Retrieval Engine ● API: HTTP, Java, several PHP wrappers ● Scalability: very good, and getting even better* ● Robustness – SolrCloud* ● Extra: join-like queries* * Solr 4.0
  • 17.
  • 18.
    Lily, “big data”content repository ● Provides a very rich feature set ● RESTful, Java API ● Building on ● Apache license
  • 22.
    Over to Henri… http://www.slideshare.net/bergie/phpcr-standard-content-repository-for-php