Relational Won't Cut It: Architecting Content Centric Apps


Published on

In the past, developers have chosen to develop their own content-centric apps from scratch or by leveraging low level libraries. A content repository like Alfresco can save time and cost. Even if you don't choose Alfresco, you should still consider leveraging a standard API like CMIS as much as possible.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • "We're drowning in documents (or videos or images). We don't know what we have and none of it is organized. We waste so much time and money recreating stuff that probably already exists, if we could just find it.""We've got serious business risk caused by people using the first thing they find instead of the right thing.""We have a process for sending stuff around to the rest of the team for review and approval, but we have no idea what's in flight or who we're waiting on or why.""We have teams of people from both inside and outside the organization that need to be able to work together efficiently. They need to share files, of course, but really, it's more than that.""We've got business systems that generate, store, and process things like reports and images at an alarming rate."
  • May start out simple, but the system tends to morph over time. Let’s look at three “levels” of content-centric app complexity.
  • Process, Security, & SearchSome open source search engines that are out thereJBossjBPMActivitiIntalioBonitaSoftSome open source libraries that may be helpful in extraction/conversion: - Tikka - FOP - POI - ImageMagick - JAI
  • You’ve built a system that’s pretty bad-ass, and it is customized to your specific needs, but at what cost?
  • Is it easy to extend?Does it get out of the way?
  • CMIS Alfresco extensions support CMIS 1.0 out-of-scopefeaturessuch as aspects and datalists.
  • Founded in 2005John NewtonFounding developer of IngresCo-founded DocumentumJohn PowellCOO of Business ObjectsPresident of Oracle UKLots of Engineers from Documentum, Interwoven, VignetteAssembled from Open Source components
  • Pick your stack:Linux / Windows OS servers : RHEL, Solaris, Ubuntu, Windows Server, …DBMS : MySQL, MS SQL, Oracle, PostgreSQL, DB2, …Application servers : Tomcat, JBoss, WebLogic, WebSphere, …Web browsers : Firefox, MSIE, SafariIdentity Management systems : LDAP, AD, Kerberos, …
  • Activiti first appeared in theAlfresco 3.4 E preview release and willbe production readywith 4.0
  • Relational Won't Cut It: Architecting Content Centric Apps

    1. 1. Relational Won't Cut ItArchitecting Content-Centric Applications for Java<br />Jeff Potts<br />Chief Community Officer<br />
    2. 2. Agenda<br />What is a content-centric application?<br />Do-it-yourself approaches<br />A better way: The Platform Approach<br />Content Management Interoperability Services (CMIS) Standard<br />Alfresco technical overview<br />Repository services<br />APIs<br />
    3. 3. What is a Content-Centric Application?<br />Web application with a mix of structured and unstructured data<br />Unstructured data is typically file-based<br />Office documents<br />Images<br />Audio/Video<br />Reports<br />Usually collaborative<br />May also include business processes<br />
    4. 4. A Few Examples<br />Expense report review & approval<br />Contract negotiation, creation, & review<br />Press request/fulfillment<br />Research study authoring<br />Sales/Marketing collateral creation & communication<br />Course guide ("student packet") authoring/publishing<br />
    5. 5. Or the business is saying<br />I’ve got a ton of files<br />I’ve got people that produce them, sometimes collaboratively, and people that consume them.<br />I want to somehow make it easier to deal with all of this.<br />Source: eqqman<br />
    6. 6. Pains<br />Inability to find important content<br />Black hole process<br />Re-creating the wheel<br />Productivity loss<br />Higher costs<br />Using outdated content<br />Legal/business risk<br />Loss-of-life/injury<br />Source: khainomore<br />
    7. 7. Components of content-centric systems<br />User Interface<br />Persistence/Data Model/Metadata<br />Business Processes/Workflow<br />Library Services (Upload/Download, Versioning, & Check-in/Check-out)<br />Security<br />Search<br />Transforms/Renditions/Thumbs<br />Tagging/Categorization<br />Authoring tool integration<br />Remote API<br />Scheduler<br />Comments/Ratings/Activity Streams<br />
    8. 8. Let’s Build it Ourselves!<br />
    9. 9. DIY approach seems simple<br />“What’s so special about content-centric apps?”<br />Standard web app toolkit<br />Favorite front-end/presentation framework<br />Relational Database<br />Data Model/Metadata<br />Comments/Ratings<br />Tagging/Categorization<br />Files? Generally, a Bad Idea<br />
    10. 10. Files: Relational may not cut it<br />Relational is good at text and numbers. Binary data, YMMV<br />Size limits<br />Random seek (streaming)<br />Search: Some relational databases can index into blobs, but not all<br />
    11. 11. File storage options<br />On disk<br />Amazon S3 or an internal CAS filer<br />Source code control repository<br />XML database<br />NoSQL document store<br />Content repository<br />Apache Jackrabbit<br />Alfresco<br />Other open source and proprietary repositories<br />
    12. 12. Content repository<br /><ul><li>Content =a file + metadata
    13. 13. File system
    14. 14. Content binaries
    15. 15. Search indexes
    16. 16. Database
    17. 17. Relations (associations)
    18. 18. Metadata
    19. 19. Repository
    20. 20. Abstraction layer</li></li></ul><li>Once files are figured out…<br />Security framework<br />Search<br />Business Process/Workflow Engine<br />Transforms/Extractions/Renditions<br />Scheduled jobs<br />WebDAV, CIFS, FTP or other authoring integrations<br />Versioning<br />Check-in/Check-out<br />Remote API<br />Replication<br />Social features<br />Mobile access<br />Custom code to integrate all of these subsystems<br />
    21. 21. “What have we done?”<br />Source: gobucks2<br />
    22. 22. Factors that affect DIY reasonableness<br />Number and size of documents<br />Number and concurrency of users<br />Number and nature of integration points<br />Business process volatility & complexity<br />Time and cost of<br />Integrating all of these services/sub-systems<br />Maintaining all of that code…forever<br />
    23. 23. The Platform Approach<br />
    24. 24. Platform approach<br />Much of this has already been solved<br />Content Platform = Repository + Services<br />Find a platform that meets your needs<br />Extend the platform with your own business logic<br />Write your own front-end using whatever language or framework makes sense<br />Or, customize the UI that the platform provides<br />
    25. 25. What makes a great content platform?<br />Agility<br />Applicable to a broad set of solutions<br />Scale up, scale down<br />Fast/Friendly Development Model<br />Open Source<br />Troubleshooting<br />Bug tracking<br />Community<br />Standards compliance<br />Lower switching costs<br />Easier integration<br />
    26. 26. Bigpicture<br />Web Applications<br />Knowledge Portals<br />Web Services<br />Business<br />Process<br />Engine<br />App Server<br />CRM<br />Portal Server<br />Virtual File System<br />High Availability<br />FTP<br />CIFS<br />WebDAV<br />
    27. 27. and<br />
    28. 28. What is CMIS?<br />Content Management Interoperability Services<br />Language-independent, vendor-neutral API for content management<br />CRUD functions for nodes<br />Check-in/check-out<br />Associations<br />Permissions (Access Control Lists)<br />Policies<br />Queries<br />Repository traversal<br />
    29. 29. The Beauty of<br />Presentation Tier<br />REST<br />SOAP<br />Content Services Tier<br />?<br />?<br />Enterprise Apps Tier<br />
    30. 30. CMIS<br /><ul><li>CMIS API via
    31. 31. REST / Atom
    32. 32. WebServices
    33. 33. Use cases
    34. 34. Repository to repository
    35. 35. Application to repository
    36. 36. Federatedrepositories
    37. 37. CMIS Alfresco extensions</li></li></ul><li>About the CMIS Spec<br />OASIS standard<br />Alfresco, IBM, Microsoft, Oracle, FileNet support<br />Alfresco was first to production with CMIS<br />Two parts<br />Interoperability through standard SOAP and Atom Pub bindings<br />SQL-based query language for rich content repositories<br />New JSON binding coming soon<br />
    38. 38. Implementations Already Available…<br />Providers<br />Consumers<br />Developed by 30+ ECM Vendors<br />
    39. 39. Open Source implementations of CMIS<br />Apache Chemistry is the umbrella project for all CMIS related projects within the ASF<br />OpenCMIS (Java, client and server)<br />cmislib (Python, client)<br />phpclient (PHP, client)<br />DotCMIS (.NET, client)<br />
    40. 40.
    41. 41. Alfresco Overview<br />Alfresco is an open source Enterprise Content Management platform<br />Can manage any kind of file, any size<br />Stores the file and metadata<br />All content and metadata is searchable<br />Files can be secured to specific users and groups<br />CMIS-compliant<br />
    42. 42. Alfresco Overview (Cont’d)<br />Provides versioning and check-in/check-out<br />Has a built-in workflow engine<br />Can be accessed through a browser or from desktop applications via CIFS, WebDAV, FTP, IMAP, SMTP, SharePoint<br />Three editions<br />Community<br />Team<br />Enterprise<br />
    43. 43. High-level Architecture<br />Plus:<br /><ul><li>IMAP
    44. 44. SharePoint</li></li></ul><li>High-level Custom Front-End<br />Drupal<br />
    45. 45. Repository Services<br />
    46. 46. Repository Services<br />Services allow the content items within the repository to be managed :<br /><ul><li> Content lifecycle
    47. 47. Creation, modification, deletion, …
    48. 48. Control over the objects
    49. 49. Permissions, locks
    50. 50. Content models
    51. 51. Properties, associations
    52. 52. Workflows
    53. 53. Search
    54. 54. Rules and Automatic Actions
    55. 55. etc …</li></li></ul><li>Rules and actions<br />Actions (ActionService) :<br /><ul><li>Trigger actions over content items
    56. 56. Secured and transactional
    57. 57. Scheduled or on-demand
    58. 58. Can beleveraged by workflows
    59. 59. Ex: Send an email, copy or move a content item, … </li></ul>Rules (RuleService) :<br /><ul><li>Similar to mail client filters
    60. 60. Program event-basedautomatictasks and actions
    61. 61. Run one or several actions
    62. 62. Reusablerules
    63. 63. Sortable rules
    64. 64. Easy to configure, easy to activate</li></li></ul><li>Transformations<br />Transformations to different file formats<br />Ex : Word => PDF, Word => Flash, …<br />Automatic extraction of common file metadata<br />Grounded on OpenSource libraries : <br />Apache Tika, POI, FOP, PDFBox, pdf2swf, …<br />Can be leveraged by actions and content rules<br />Ex : When a MS Word document is uploaded, make a PDF copy of it, and send it by email to “admin”<br />
    65. 65. Workflows<br />Full BPM capabilities with jBPM/Activiti<br />Rich features :<br />Parallel or serial workflows<br />Joins, forks, conditions …<br />Group or individual assignees<br />Actions and complex behaviors<br />Implement your custom lifecycle model through workflows<br />Extensible--Build your own business processes<br />
    66. 66. Security - Authentication<br />Alfresco can handle it or pass it off to others<br />ActiveDirectory<br />LDAP<br />Kerberos<br />NTLM<br />SSO<br />Custom<br />Source: rooreynolds<br />
    67. 67. Security - Authorization<br />Spring Security Framework (ACEGI) under the covers<br />Users & Groups<br />Access Control Lists<br />Permissions<br />Hierarchical<br />
    68. 68. OtherAlfresco services<br /><ul><li>Search
    69. 69. Checkin/Checkout
    70. 70. Locking
    71. 71. Versioning
    72. 72. Tags & categories
    73. 73. Authentication
    74. 74. LDAP sync
    75. 75. Groups & users
    76. 76. Data dictionary
    77. 77. Browsing
    78. 78. Lifecycle
    79. 79. Rating
    80. 80. Invitations
    81. 81. Sites
    82. 82. User quotas
    83. 83. Copy – move
    84. 84. Transfer/Replication</li></li></ul><li>APIs<br />
    85. 85. Java & JavaScript<br />Alfresco’s “foundation” API is Java<br />Server-side JavaScript is also an option<br />Remote APIs<br />Web Services SOAP<br />HTTP REST Webscripts - Java or JavaScript<br />CMIS - Atom REST or SOAP<br />Source: 96dpi<br />
    86. 86. Web Script Framework<br /><ul><li>Model-View-Controller pattern</li></ul>Declare a URL, bind it to logic, provide one or more views<br />Controller implemented in JavaScript or Java<br />Views implemented in FreeMarker<br />Deployed to the repository or the classpath<br />Part of the Spring Surf Project<br />
    87. 87. Summary<br />Platform = Repository + Services<br />CMIS is an important standard<br />Alfresco is a great CMIS server<br />Even if you don’t pick Alfresco, try to leverage CMIS<br />Alfresco provides the repository plus services pre-integrated and ready for your custom content-centric apps<br />
    88. 88. For More Information…<br />Alfresco Community<br /><br />Alfresco Forums<br /><br />Alfresco Wiki<br /><br />Alfresco Blogroll<br /><br />ECM Architect Blog<br /><br />
    89. 89. Email: jpotts@alfresco.comTwitter: @jeffpotts01Blog:<br />
    90. 90. Extra/Unused Slides<br />
    91. 91. Data Modeling<br />Repository is a collection of nodes<br />Everything is a node, nodes are typed<br />Content Model is expressed in XML<br />Cold-deploy most common, hot deploy possible<br />Types, aspects, properties, associations, constraints<br />Hierarchical<br />Types inherit from super types<br />
    92. 92. Types, Aspects, Properties, & Associations<br />Content Types<br />Type “report” (metadata : subject, abstract, …)<br />Aspects <br />Aspect “client” (metadata : name, reference, contact, …)<br />Properties<br />Property “customer id” [integer]<br />Associations<br />Association “related documents”<br />
    93. 93. Example<br />Aspects Useful for Cross-Cutting<br />Type = Report<br />Type = Contract<br />Type = Email<br />Type = Case<br />Type attributes<br />Subject<br />Abstract<br />Type attributes<br />Effectivitystart date<br />Effectivity and date<br />Type attributes<br />Subject<br />Sender<br />Recipients<br />Type attributes<br />Format<br />Aspect = Client<br />Aspect attributes<br />Client nameClient IdContact-> Related docs<br />
    94. 94. Spring Framework<br />Alfresco repository services are built on top of the Spring framework :<br /><ul><li>Public Services through APIs
    95. 95. Services implemented through flexible components
    96. 96. XML driven configuration
    97. 97. Secured and transactional
    98. 98. Extensible</li></li></ul><li>A content rule is defined on a space level :<br />1/a triggeringevent(inbound / outbound / update)<br />2/ a set of filtering conditions(objectname, mimetype, …)<br />3/ a set of actions to run(move the content item, add an aspect, send a notification email …)<br />Rules help youcreatesmart spaces<br />Drafts<br />Approved<br />Published<br />Example<br />Rules and actions<br />
    99. 99. Transformations and metadata extractions are used by Share web interface :<br />PNG thumbnail<br />Flashpreview<br />Metadata <br />extraction<br />Transformations<br />