Implementing a JSR-283 Content Repository in PHP


Published on

Session on implementing a JSR-283 Content Repository in PHP presented at the PHP Conference in Québec, Canada in March 2008.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Implementing a JSR-283 Content Repository in PHP

  1. 1. Implementing a JSR-283 Content Repository in PHP
  2. 2. Introduction Your flight plan About me Some words about the project’s background What is a Content Repository? Why should I use a CR? Why code it ourself? Inside the TYPO3 CR Where do we stand? Our plans for the future... Inspiring people to share
  3. 3. Introduction About me Born 1977, living (mostly) in Germany Started out with BASIC on a Commodore 128 Now a PHP addict open to other languages as well Active member of the TYPO3 Association Developer with the TYPO3 project Inspiring people to share
  4. 4. Introduction Project background Inspiring people to share
  5. 5. Introduction About TYPO3 One of the leading open-source CMS Invented by Kasper Skårhøj in 1997 Written in PHP, released under GPL in 2000 Now used with small and large companies around the world Hundreds of thousands of websites built with TYPO3 Backed by a huge community and the TYPO3 Association Inspiring people to share
  6. 6. Introduction The future of TYPO3 The current architecture of TYPO3 is becoming outdated We decided to write TYPO3 5.0 – it soon became clear that we'd do more than quot;just write a new CMSquot; We started with some groundwork, resulting in the FLOW3 framework – more on that in a minute We decided to use a CR for the new version And of course we still have the ultimate goal to come up with a new TYPO3 CMS... Inspiring people to share
  7. 7. Introduction TYPO3 5.0 CMS Successor to TYPO3 v4, which is the result of 10 years of development Start from scratch, but keep the soul of TYPO3 Shall provide lower complexity make use of advanced PHP features be more (and more easily) extensible, ... Inspiring people to share
  8. 8. What happened so far FLOW3 Provides an advanced programming framework with support for Dependency Injection / Inversion of Control Aspect Oriented Programming Component and Package Management enhanced Reflection Caching MVC and more Inspiring people to share
  9. 9. What happened so far FLOW3 “Best of breed”: Inspired by the most popular frameworks and toolkits from Smalltalk, Python, Ruby and Java Picking the best concepts, skipping the annoyances Not tied to TYPO3 CMS, can be used for any PHP-based project Important to you!? Have a look at the website at Inspiring people to share
  10. 10. Introduction About the TYPO3 Association Founded in November 2004 by a group around Kasper Skårhøj It’s goals: Support TYPO3 development on a more steady basis Improve the transparency and efficiency of various aspects of the TYPO3 project Is funded by members and sponsors Financed the development of TYPO3 v5 and related projects until now Inspiring people to share
  11. 11. What is a Content Repository? Inspiring people to share
  12. 12. What is a CR? Jack Rabbit says A content repository is a hierarchical content store with support for structured and unstructured content In addition to a hierarchically structured storage system, common services of a content repository are versioning, access control, full text searching, and event monitoring Typical applications that use content repositories include content management, document management, and records management systems Inspiring people to share
  13. 13. What is a CR? But it is for Java, no? The Java Community Process (JCP) is very efficient, not only when compared to other standardization bodies The Java Specification Request (JSR) 170 led to the specification Content Repository for Java technology API (JCR) is the result First JSR with a real open source license (Apache-style) The API is defined in Java, but can be ported to other languages No, it’s not only for Java! Inspiring people to share
  14. 14. What is a CR? Nodes and Properties A Content Repository (CR) allows the storage and retrieval of arbitrary content as nodes and properties in a tree structure Inspiring people to share
  15. 15. What is a CR? Workspaces A repository can contain multiple independent workspaces that can correspond to each other, allowing comparison Inspiring people to share
  16. 16. What is a CR? The Basics The tree structure can be freely defined by the user of the CR Nodes may be typed with a rigid structure – or free-form The API abstracts the actual data storage used (RDBMS, ODBMS, files, ...) Binary content can be stored and queried as effectively as textual content Export to and import from XML are possible Versioning, locking, transactions, event listeners, ... Inspiring people to share
  17. 17. Why use a CR? Inspiring people to share
  18. 18. Why use a CR? Best of both^Wthree worlds... Inspiring people to share
  19. 19. Why use a CR? Isn’t that convincing? Inspiring people to share
  20. 20. Why use a CR? From a coder’s perspective One well-designed API instead of different ones Common language and concepts Properties instead of fields give flexibility Learn once, use everywhere Portable code allows easier reuse of existing solutions Rich set of tools No more SQL! Inspiring people to share
  21. 21. Why use a CR? Summary A content repository provides a robust storage for your content - be it text, images, or code, structured or unstructured Knowledge and tools can be reused at will A Content Repository (CR) promises to solve a lot of problems A stable standard with a fresh version in the making SQL has been around for 35+ years, CR has “just started” Inspiring people to share
  22. 22. Why code a CR in PHP? Inspiring people to share
  23. 23. Why code a CR in PHP? Inspiring people to share
  24. 24. Why code a CR in PHP? No, really... There are better reasons, of course! Inspiring people to share
  25. 25. Why code a CR in PHP? Existing implementations Jackrabbit is the reference implementation, available as open source from the Apache Foundation Day CRX is the commercial CR implementation from the quot;inventorquot; of JSR-170, Day Software Other implementations are eXo JCR and Jeceira, the latter also being dead, and others JSR-170 connectors exist Alfresco, BEA Portal Server, IBM Domino and others Inspiring people to share
  26. 26. Why code a CR in PHP? PHP ports of the JSR-170/283 API What about PHP? Travis Swicegood ported the JSR-170 API to PHP in 2005 - project is dead There is a port of the JSR-170 API available in the Jackrabbit sources, added 2005 - no relevant changes since then No full port of the JSR-283 API available today Inspiring people to share
  27. 27. Why code a CR in PHP? What about using what’s there? We tried to integrate Jackrabbit using the PHP-Java-Bridge (Almost) every call to Jackrabbit needs to be wrapped for type conversion, exception mapping, ... We ran into massive memory issues More complex to set up and maintain A dependency on Java is a no-go (not only) for our PHP-based project Inspiring people to share
  28. 28. Why code a CR in PHP? Summary Various implementations exist, mostly in Java A CR offers a truckload of advantages, we want to leverage those advantages No PHP implementation of a CR exists Using existing non-PHP implementations isn’t an alternative We need to build our own CR Inspiring people to share
  29. 29. TYPO3 Content Repository Inspiring people to share
  30. 30. The TYPO3 CR Three truths about the TYPO3 CR Goal is a pure PHP implementation of JSR-283 although functionality needed for TYPO3 CMS has priority over specification compliance for now Will take advantage of the FLOW3 framework, but not be tied to the TYPO3 CMS. Could eventually become the standard CR for the PHP community?! Inspiring people to share
  31. 31. The TYPO3 CR Porting the JSR-283 API Issues Typing, some Java types simply do not exist in PHP Constructor overloading is impossible in PHP Binary data (might be FLOW3 Resource Manager handles instead of streams) Interfaces will not be ported up-front, but as we need them Useful by-product of our development process Inspiring people to share
  32. 32. The TYPO3 CR Development model Based on the FLOW3 Framework Domain Driven Design (will be) used Use of AOP planned to avoid tight internal coupling Test Driven Development with Continuous Integration Automatic checks against coding guidelines Inspiring people to share
  33. 33. The TYPO3 CR Aspect Oriented Programming AOP is a programming paradigm Not a new concept, but still new to PHP Complements OOP by separating concerns to improve modularization OOP modularizes concerns: methods, classes, packages AOP addresses cross-cutting concerns Inspiring people to share
  34. 34. Aspect Oriented Programming Cross-cutting concerns Content Repository Domain Model Inspiring people to share
  35. 35. Aspect Oriented Programming Cross-cutting concerns Content Repository Domain Model Inspiring people to share
  36. 36. Aspect Oriented Programming Cross-cutting concerns Content Repository Domain Model Security Logging Inspiring people to share
  37. 37. Aspect Oriented Programming How AOP sounds Some language first Aspects contain advices that you want to add to your software Pointcuts expressed by pointcut expressions define where to add advices to your code Join points are events in the flow of a program, such as calling a method or throwing an exception Targets are the classes and methods being adviced by aspects Inspiring people to share
  38. 38. Aspect Oriented Programming How AOP works Three steps to AOP use Write the code for the cross-cutting concern Define a pointcut expression telling the framework where to add that code Get some coffee The (hard) work is to identify the cross-cutting concerns and to define the simplest possible pointcut expression Inspiring people to share
  39. 39. Aspect Oriented Programming Example: Logging It might be good to know who deleted the mail archive of the last four years Logging could solve this A logging aspect added at the right places solves this easily Using AOP makes changing the logging a snap keeps the code clean Inspiring people to share
  40. 40. Aspect Oriented Programming Example: Security It would have been even better to not allow deletion of the mail archive of the last four years... Security is a complex issue, solving this “right, now” seems impossible Using AOP makes changing the changing security code easier allows to add security everywhere, anytime keeps the code clean Inspiring people to share
  41. 41. The TYPO3 CR Actual data storage The underlying storage of the TYPO3CR will be a RDBMS in most cases Currently PDO is used to access SQLite Easy to use for development and unit testing The use of PDO already enables any PDO-supported database Specialized DB connectors will follow, using optimized queries, stored procedures, ... Inspiring people to share
  42. 42. Actual data storage Data storage techniques Basically we need to store a simple tree Read access must be fast, write access should be fast, as the majority of requests are read requests Traditional approach as used in TYPO3 today is to store a triplet (uid,pid,sorting) resulting an an adjacency list Alternative & sometimes faster methods Materialized Path Nested sets, Nested intervals Inspiring people to share
  43. 43. Actual data storage Nested sets Better suited to how RDBMS work internally Stores numbers determined by preorder tree traversal Very fast read access, problematic write access Concurrency demands locking On average half of all nodes need to be updated on insertion of a new node Inspiring people to share
  44. 44. Actual data storage Speeding up nested sets!? Write access can be sped up by various approaches like spacing and variable length indices for the pre/post numbers or by partitioning the data over more tables Materialized path works like adjacency list and stores the full path to the node Nested intervals sometimes considered OMPM – “Obfuscated Materialized Path Method” All methods have their (dis-)advantages Finally: DB-specific tricks change the problem! Inspiring people to share
  45. 45. The TYPO3 CR Querying the TYPO3 CR Level 1 methods Using getRootNode() and friends from the API Using XPath queries With JSR!283 Optional methods XPath will Using SQL queries be dropped Inspiring people to share
  46. 46. Querying the TYPO3 CR XPath support for TYPO3R With JSR!283 XPath will To enable XPath we need be dropped a XPath parser an efficient way to transform a XPath query into SQL for the used low-level data structure The latter is a lot easier when storing the tree as a nested set The problems caused by this have been mentioned already... Inspiring people to share
  47. 47. XPath support for TYPO3R Pre/Post Plane Encoding Stores number determined by preorder and postorder tree traversal Allows to partition the nodes into four regions, as shown for node ƒ Very fast read access, e.g. a single SELECT to query all ancestors to a node ƒ SELECT * FROM nodes WHERE pre < ƒ.pre AND post > ƒ.post Inspiring people to share
  48. 48. Querying the TYPO3 CR SQL support for TYPO3R Using SQL we need a (simple) SQL parser an efficient way to transform that SQL into equivalent SQL for the used low-level data structure This still needs to be investigated, possible approaches storing a reference to the parent node using the pre/post plane only as a cache for XPath read queries, optimizing the native storage for SQL read queries Inspiring people to share
  49. 49. The TYPO3 CR Extensions to JSR-283 A vendor may choose to offer additional features in his CR implementation The TYPO3CR will offer support for Persistency through code annotations Automatic node type generation based on class members Rules for setting up virtual root nodes based on node types Inspiring people to share
  50. 50. Extensions to JSR-283 Persistency to the CR Annotations define objects and their properties to be persistable Properties are stored in the CR according to reflection results and hints from annotations The FLOW3 persistence manager is transparently enhanced by the CR persistence mechanism An object-to-object mapper does the hard work Inspiring people to share
  51. 51. Extensions to JSR-283 Automatic node type generation Persistency stores properties in the CR according to reflection results and hints from annotations Node types can be generated automatically if wanted Manually adding content cannot break the needed structure Browsing the repository reveals a clear structure Using content from other applications is less error-prone Maybe this is utter nonsense - depends on whom you ask :) Inspiring people to share
  52. 52. Extensions to JSR-283 Virtual root nodes The repository has one root node, added nodes must be placed somewhere It might be useful to find all nodes under a common node, depending on type or other attributes Such a virtual root node is like a smart folder or playlist like a view in a RDBMS Inspiring people to share
  53. 53. The TYPO3 CR Current status Currently the code supports a subset of the required features of levels 1 & 2 and the optional parts of the JSR-283 specification Basic read & write access Namespace registration Node type discovery and registration Data storage uses the naive approach known from TYPO3 v4 Have a look at the Subversion repository for up-to-date information Inspiring people to share
  54. 54. The TYPO3 CR Future plans Write test Code Test Write test Code Test ... Inspiring people to share
  55. 55. The TYPO3 CR Summary Implementing the specification is not an easy task, but doable For the various parts a lot of research has already been done 2008 will see full-time development on the TYPO3 CR The repository is a major improvement over currently widespread ways of storing data The whole PHP community could^Wwill benefit! Inspiring people to share
  56. 56. So long and thanks for the fish Links TYPO3 Website TYPO3 Development Website FLOW3 Website TYPO3 5.0 Subsite Inspiring people to share
  57. 57. So long and thanks for the fish Questions? Inspiring people to share beer