File system on steroids an introduction to JCR Jukka Zitting Apache Jackrabbit
Agenda Big Picture Content Repository Repository Features Apache Jackrabbit
The Big Picture User Interface Processing Storage
Our Focus: Storage Main requirements Persistence Consistency Scalability Performance Main alternatives File system Database Network
Introducing The Content Repository File system Database Content Repository read write transactions structured integrity query hierarchical streams access control locking observation versioning full text unstructured
JCR, JSR 170, JSR 283 Content Repository for Java Technology API Not just the Java API, but also the content repository semantics POSIX file system defined as a C API Accessible from other environments JVM: Groovy, JRuby, Scala, etc. Network: WebDAV, Ajax (JSON)‏ Ports planned: .NET, PHP
Why Something New? Goal: Single API for all storage Universal access No content silos Existing systems don't cover all needs Reiser: “Storage layers above the FS: A sure symptom the FS developer has failed” Solution: Content repository
Content Repository Semantics Everything is content Hierarchy of named and typed nodes Content in named and typed properties Superset of file system semantics Can be used to store files and folders, and more Can be mounted as a file system With many database semantics
Granularity of Content
Granularity of Content, 1/2 File systems are typically best with coarse grained content Small files in ReiserFS, NTFS, etc.  Extended properties in many systems XML & co for fine grained content DJB: “Don't parse”
Granularity of Content, 2/2 Databases are best with fine grained content Blobs are becoming better supported Often special limitations for search, access, etc. Content repository: Uniform interface for both stream and scalar properties
Structure vs. Flexibility
Structure vs. Flexibility File systems have no constraints Any file or directory can go anywhere Naming conventions and access control Databases have nothing but constraints Structure of content is predefined Content repository: Both structured and unstructured content
Search
Search Traditionally no search in file systems Custom indexers and search APIs Google Desktop Search Mac OS X Spotlight Lucene in many applications Content repository: Built-in search with full text indexing
Transactions
Transactions File systems have limited support for atomic updates The copy-and-move trick No transactions that cover multiple changes Journaling is internal to the system Content repository: Change sets, distributed transactions
Versioning
Versioning Typically no tracking of previous versions of content Snapshots in ZFS & co. Version control systems Backups for archival vs. restore purpose Mac OS X Time Machine Content repository: Built-in versioning
Observation
Observation File system change monitoring File Alteration Monitor Polling Event APIs Triggers in databases Content repository: Standard observation API
Apache Jackrabbit
Apache Jackrabbit Fully featured JCR content repository Releases 1.0 in 2006 1.4 available since January 2008 1.5 (with explorer) planned for Q2 2.0 (with JCR 2.0) planned for 2008 Focus on conformance and flexibility
Image credits Images from the morgueFile archive, used as licensed http://morguefile.com/archive/?display=96733 , Infographe_Elle http://morguefile.com/archive/?display=81906 , msxo http://morguefile.com/archive/?display=132988 , imelenchon http://morguefile.com/archive/?display=95446 , ronnieb http://morguefile.com/archive/?display=175657 , seriousfun http://morguefile.com/archive/?display=135511 , rollingroscoe http://morguefile.com/archive/?display=134540 , cohdra http://morguefile.com/archive/?display=196920 , penywise http://morguefile.com/archive/?display=48096 , bluekdesign http://morguefile.com/archive/?display=128133 , gracey
Thank you! Questions / Comments?

File System On Steroids

  • 1.
    File system onsteroids an introduction to JCR Jukka Zitting Apache Jackrabbit
  • 2.
    Agenda Big PictureContent Repository Repository Features Apache Jackrabbit
  • 3.
    The Big PictureUser Interface Processing Storage
  • 4.
    Our Focus: StorageMain requirements Persistence Consistency Scalability Performance Main alternatives File system Database Network
  • 5.
    Introducing The ContentRepository File system Database Content Repository read write transactions structured integrity query hierarchical streams access control locking observation versioning full text unstructured
  • 6.
    JCR, JSR 170,JSR 283 Content Repository for Java Technology API Not just the Java API, but also the content repository semantics POSIX file system defined as a C API Accessible from other environments JVM: Groovy, JRuby, Scala, etc. Network: WebDAV, Ajax (JSON)‏ Ports planned: .NET, PHP
  • 7.
    Why Something New?Goal: Single API for all storage Universal access No content silos Existing systems don't cover all needs Reiser: “Storage layers above the FS: A sure symptom the FS developer has failed” Solution: Content repository
  • 8.
    Content Repository SemanticsEverything is content Hierarchy of named and typed nodes Content in named and typed properties Superset of file system semantics Can be used to store files and folders, and more Can be mounted as a file system With many database semantics
  • 9.
  • 10.
    Granularity of Content,1/2 File systems are typically best with coarse grained content Small files in ReiserFS, NTFS, etc. Extended properties in many systems XML & co for fine grained content DJB: “Don't parse”
  • 11.
    Granularity of Content,2/2 Databases are best with fine grained content Blobs are becoming better supported Often special limitations for search, access, etc. Content repository: Uniform interface for both stream and scalar properties
  • 12.
  • 13.
    Structure vs. FlexibilityFile systems have no constraints Any file or directory can go anywhere Naming conventions and access control Databases have nothing but constraints Structure of content is predefined Content repository: Both structured and unstructured content
  • 14.
  • 15.
    Search Traditionally nosearch in file systems Custom indexers and search APIs Google Desktop Search Mac OS X Spotlight Lucene in many applications Content repository: Built-in search with full text indexing
  • 16.
  • 17.
    Transactions File systemshave limited support for atomic updates The copy-and-move trick No transactions that cover multiple changes Journaling is internal to the system Content repository: Change sets, distributed transactions
  • 18.
  • 19.
    Versioning Typically notracking of previous versions of content Snapshots in ZFS & co. Version control systems Backups for archival vs. restore purpose Mac OS X Time Machine Content repository: Built-in versioning
  • 20.
  • 21.
    Observation File systemchange monitoring File Alteration Monitor Polling Event APIs Triggers in databases Content repository: Standard observation API
  • 22.
  • 23.
    Apache Jackrabbit Fullyfeatured JCR content repository Releases 1.0 in 2006 1.4 available since January 2008 1.5 (with explorer) planned for Q2 2.0 (with JCR 2.0) planned for 2008 Focus on conformance and flexibility
  • 24.
    Image credits Imagesfrom the morgueFile archive, used as licensed http://morguefile.com/archive/?display=96733 , Infographe_Elle http://morguefile.com/archive/?display=81906 , msxo http://morguefile.com/archive/?display=132988 , imelenchon http://morguefile.com/archive/?display=95446 , ronnieb http://morguefile.com/archive/?display=175657 , seriousfun http://morguefile.com/archive/?display=135511 , rollingroscoe http://morguefile.com/archive/?display=134540 , cohdra http://morguefile.com/archive/?display=196920 , penywise http://morguefile.com/archive/?display=48096 , bluekdesign http://morguefile.com/archive/?display=128133 , gracey
  • 25.