Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

File System On Steroids


Published on

Presentation at ApacheCon EU 2008 in Amsterdam

Published in: Business, Technology
  • Be the first to comment

File System On Steroids

  1. File system on steroids an introduction to JCR <ul><ul><li>Jukka Zitting </li></ul></ul><ul><ul><li>Apache Jackrabbit </li></ul></ul>
  2. Agenda <ul><li>Big Picture </li></ul><ul><li>Content Repository </li></ul><ul><li>Repository Features </li></ul><ul><li>Apache Jackrabbit </li></ul>
  3. The Big Picture User Interface Processing Storage
  4. Our Focus: Storage <ul><li>Main requirements </li></ul><ul><li>Persistence </li></ul><ul><li>Consistency </li></ul><ul><li>Scalability </li></ul><ul><li>Performance </li></ul><ul><li>Main alternatives </li></ul><ul><li>File system </li></ul><ul><li>Database </li></ul><ul><li>Network </li></ul>
  5. Introducing The Content Repository File system Database Content Repository read write transactions structured integrity query hierarchical streams access control locking observation versioning full text unstructured
  6. JCR, JSR 170, JSR 283 <ul><li>Content Repository for Java Technology API </li></ul><ul><ul><li>Not just the Java API, but also the content repository semantics </li></ul></ul><ul><ul><li>POSIX file system defined as a C API </li></ul></ul><ul><li>Accessible from other environments </li></ul><ul><ul><li>JVM: Groovy, JRuby, Scala, etc. </li></ul></ul><ul><ul><li>Network: WebDAV, Ajax (JSON)‏ </li></ul></ul><ul><ul><li>Ports planned: .NET, PHP </li></ul></ul>
  7. Why Something New? <ul><li>Goal: Single API for all storage </li></ul><ul><ul><li>Universal access </li></ul></ul><ul><ul><li>No content silos </li></ul></ul><ul><li>Existing systems don't cover all needs </li></ul><ul><ul><li>Reiser: “Storage layers above the FS: A sure symptom the FS developer has failed” </li></ul></ul><ul><li>Solution: Content repository </li></ul>
  8. Content Repository Semantics <ul><li>Everything is content </li></ul><ul><ul><li>Hierarchy of named and typed nodes </li></ul></ul><ul><ul><li>Content in named and typed properties </li></ul></ul><ul><li>Superset of file system semantics </li></ul><ul><ul><li>Can be used to store files and folders, and more </li></ul></ul><ul><ul><li>Can be mounted as a file system </li></ul></ul><ul><li>With many database semantics </li></ul>
  9. Granularity of Content
  10. Granularity of Content, 1/2 <ul><li>File systems are typically best with coarse grained content </li></ul><ul><ul><li>Small files in ReiserFS, NTFS, etc. </li></ul></ul><ul><ul><li>Extended properties in many systems </li></ul></ul><ul><li>XML & co for fine grained content </li></ul><ul><ul><li>DJB: “Don't parse” </li></ul></ul>
  11. Granularity of Content, 2/2 <ul><li>Databases are best with fine grained content </li></ul><ul><ul><li>Blobs are becoming better supported </li></ul></ul><ul><ul><li>Often special limitations for search, access, etc. </li></ul></ul><ul><li>Content repository: Uniform interface for both stream and scalar properties </li></ul>
  12. Structure vs. Flexibility
  13. Structure vs. Flexibility <ul><li>File systems have no constraints </li></ul><ul><ul><li>Any file or directory can go anywhere </li></ul></ul><ul><ul><li>Naming conventions and access control </li></ul></ul><ul><li>Databases have nothing but constraints </li></ul><ul><ul><li>Structure of content is predefined </li></ul></ul><ul><li>Content repository: Both structured and unstructured content </li></ul>
  14. Search
  15. Search <ul><li>Traditionally no search in file systems </li></ul><ul><li>Custom indexers and search APIs </li></ul><ul><ul><li>Google Desktop Search </li></ul></ul><ul><ul><li>Mac OS X Spotlight </li></ul></ul><ul><ul><li>Lucene in many applications </li></ul></ul><ul><li>Content repository: Built-in search with full text indexing </li></ul>
  16. Transactions
  17. Transactions <ul><li>File systems have limited support for atomic updates </li></ul><ul><ul><li>The copy-and-move trick </li></ul></ul><ul><li>No transactions that cover multiple changes </li></ul><ul><ul><li>Journaling is internal to the system </li></ul></ul><ul><li>Content repository: Change sets, distributed transactions </li></ul>
  18. Versioning
  19. Versioning <ul><li>Typically no tracking of previous versions of content </li></ul><ul><ul><li>Snapshots in ZFS & co. </li></ul></ul><ul><ul><li>Version control systems </li></ul></ul><ul><li>Backups for archival vs. restore purpose </li></ul><ul><ul><li>Mac OS X Time Machine </li></ul></ul><ul><li>Content repository: Built-in versioning </li></ul>
  20. Observation
  21. Observation <ul><li>File system change monitoring </li></ul><ul><ul><li>File Alteration Monitor </li></ul></ul><ul><ul><li>Polling </li></ul></ul><ul><ul><li>Event APIs </li></ul></ul><ul><li>Triggers in databases </li></ul><ul><li>Content repository: Standard observation API </li></ul>
  22. Apache Jackrabbit
  23. Apache Jackrabbit <ul><li>Fully featured JCR content repository </li></ul><ul><li>Releases </li></ul><ul><ul><li>1.0 in 2006 </li></ul></ul><ul><ul><li>1.4 available since January 2008 </li></ul></ul><ul><ul><li>1.5 (with explorer) planned for Q2 </li></ul></ul><ul><ul><li>2.0 (with JCR 2.0) planned for 2008 </li></ul></ul><ul><li>Focus on conformance and flexibility </li></ul>
  24. Image credits <ul><li>Images from the morgueFile archive, used as licensed </li></ul><ul><ul><li> , Infographe_Elle </li></ul></ul><ul><ul><li> , msxo </li></ul></ul><ul><ul><li> , imelenchon </li></ul></ul><ul><ul><li> , ronnieb </li></ul></ul><ul><ul><li> , seriousfun </li></ul></ul><ul><ul><li> , rollingroscoe </li></ul></ul><ul><ul><li> , cohdra </li></ul></ul><ul><ul><li> , penywise </li></ul></ul><ul><ul><li> , bluekdesign </li></ul></ul><ul><ul><li> , gracey </li></ul></ul>
  25. Thank you! <ul><li>Questions / Comments? </li></ul>