Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lessons learned Building Nuxeo EP - Component-based, open source ECM platform


Published on

Stefane Fermigier shares lessons learned over the last ten years of building Nuxeo EP. Presented at the ICSSEA 2010 software engineering conference.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Lessons learned Building Nuxeo EP - Component-based, open source ECM platform

  1. 1. Lessons LearnedBuilding Nuxeo EP Stefane Fermigier, PhD - NuxeoPresented at ICSSEA 2010, Dec. 8 2010
  2. 2. History and Context
  3. 3. Who we are• Company started in 2000• 2002-2005: Zope-based CPS project• 2005: First Eclipse RCP based project• 2006-now: Full switch to Java (Java EE 5 and OSGi)• 2009-now: Business model migration from service company to an OSS Software Vendor
  4. 4. What is ECM?ECM, a concept that emerged in the early2000s, represents the integratedenterprise-wide management of allforms of non-structured (andsometimes, semi-structured) content,including their metadata, across theirwhole lifecycle, supported by appropriatetechnologies and administrativeinfrastructure.
  5. 5. 5 Search & Find 1 Capture & Create4 Publish & Archive 2 Share & Collaborate 3 Process & Review
  6. 6. What are CEVA?• 4LA invented by Gartner in 2006: “Content Enabled Vertical Applications”• “CEVAs typically help to automate complex processes that previously required workers to manually sort through paper documents and other forms of content (in effect, a way to manage down costs of exception handling) and optimize the remainder of the work.”
  7. 7. Business Goals• First, create a MVP (minimal viable product) to ensure company sustainability• Base it on a clean, extensible architecture• With the end goal of enabling the creation of a rich ecosystem of extensions and application profiles
  8. 8. Nuxeo CPS• Content management and portal platform• Developed from 2002 to 2005• Built on top of the Zope and CMF (Content Management Framework) open source frameworks• Architecture: pluggable components (“Products”) and events
  9. 9. Switch to Java: Why?• Technical reasons: • ZODB doesn’t scale well in terms of data volume • Dymanic languages don’t scale well in terms of managing complexity (> 100 KLOC)• Business reasons: • Java makes it much easier to work with mainstream systems integrators
  10. 10. A Few Numbers• Nuxeo EP+DM is a 400 KLOC Java project• Comprises ~190 independent modules (JARs)• Developed over the last 4 1/2 years by a core team of 20 developers and 50 community contributors• Has generated ~20 MEUR of revenue for Nuxeo, ~50 MEUR for partners
  11. 11. Business Constraints and Requirements
  12. 12. Business Vision• Address the full ECM scope • Initial focus on Document Management • Architecture must be extensible and modular• Enable and sustain the Ecosystem • Easy to work with, designed for participation
  13. 13. Business Vision• Low barrier of entry for: • End-users (e.g. pleasant UI) • Developers (e.g. clean model and API, leverage existing knowledge) • Sysadmins / operations• “Enterprise-class” software • 10 000s of users, millions of documents
  14. 14. Our Original Roadmap• Dont reinvent the wheel • Leverage existing standards, work on new ones (ex: JCR2, CMIS) • Build on proven open source libraries(JBoss, Apache, Sun, Eclipse)• Use a robust software engineering process • Make it transparent for our community
  15. 15. Core ECM• Document types definition and management• Storage of the documents and associated metadata• Document life cycle and versioning• Access control• Indexing + query language, must enable complex queries on both full-text and metadata
  16. 16. Higher-Level ECM Services• Workflow• Transformation and rendering• User management• User interface• A rich set of HTTP-based APIs exposed to third-party developers and integrators (WS-* and REST)
  17. 17. 2.0 and 3.0 (Ongoing)• Tagging and folksonomies• Lightweight collab (wikis) and publishing (blogs)• Social networking (“friending” or “following” colleagues or business partners, user timelines)• Collaborative filtering• Mobile and disconnected access• Semantic content categorization and named entities extractions
  18. 18. Products andApplications
  19. 19. Nuxeo ECM - Our Approach Applications Correspondence Contracts Invoice Marketing Asset Management Management Processing Management Business Solutions Construction Media Government Life Sciences Digital Case Horizontal Document Records Content Asset Management Packages Management Management Aggregator Management Framework Nuxeo Enterprise Platform Platform: Complete set of components covering all aspects of ECM ContentInfrastructure Nuxeo Core Lightweight, scalable, embeddable content repository 19
  20. 20. Document Management
  21. 21. DAM
  22. 22. Case Management
  23. 23. Web Sites
  24. 24. The Strongest Requirement• Applications (horizontal, vertical or custom) must be buildable just by assembling components (packaged as Java JARs)• Architecture must allow behavior modification at the repository level (e.g. new document type), at the UI level (e.g. new actions), and at the service level (e.g. adding new services) without recompilation
  25. 25. Technical Challenges
  26. 26. Standards Choice• Switch to Java was motivated by the desire to be more “standards-compliant”• But the problem with standards, is that there are too many to choose from!• Old vs. new or emerging• Open standards vs. de facto standards• Overlapping standards (hardest issue!)
  27. 27. Initial Standards• Java EE 5, as the structuring general framework for the server-based application (but not for the core services)• OSGi, as a packaging model for components• The JCR (Java Content Repository), as the model API to manage content and metadata at the most basic level
  28. 28. Initial Standards• JSF as the presentation layer (part of Java EE 5)• JBoss Seam, a web presentation framework that extends JSF, because we felt would provide a much improved developer experience over the “pure Java EE 5” model
  29. 29. Notes• Java EE 5 was really new and still “wet” at the time• Seam was not a standard, but its concepts eventually merged into one (JCDI)• In 2006 OSGi had credibility in the embedded and rich client spaces, not yet on the server• We dropped JCR support in 2010
  30. 30. Open Source Libraries• The Open Source Java ecosystem started to grow in the late 90s (Apache) and had a huge boost in the early 00s (Eclipse, JBoss, OW2, etc.)• Like with standards, there are usually many OSS implementations to choose from• FYI: Nuxeo EP now embeds more that 200 external open source libraries!
  31. 31. Choosing an OSS Library• License compatibility with the LGPL (this excludes proprietary and GPL licenses)• Compliance to a chosen standard• Quality, as witnessed by visual inspection of the source code• Confidence in the development process (e.g. are there unit tests?) and the community behind the project
  32. 32. Benefits and Challengesof Using OSS Libraries• With OSS, it’s easier to evaluate options• Forking a library is sometimes the only way to fix a bug or add a missing functionality• But it comes with a tremendous price because now you have to maintain your own branch• Becoming a contributor is also sometimes needed, but comes at a price too• Risk of “JAR hell” (conflicting libraries reqs)
  33. 33. Lessons Learned• Allow users of our platform to extend it without touching its source code• Or, even better, without writing code at all!• Keep your options open, but don’t over- engineer flexibility
  34. 34. Architectural Solutions
  35. 35. Architectural Solutions• Layered architecture• High-level APIs• Component system• Extension points• Event bus
  36. 36. Layer CakeNuxeo EP Architecture Nuxeo UI Frameworks Flexible choice of interfaces Nuxeo ECM Services Modular set of content services Nuxeo Core Advanced content repository Nuxeo Runtime Component and service model
  37. 37. APIs
  38. 38. Everything Pluggable
  39. 39. OSGi in Theory• OSGi is a component system developed initially for the embedded systems industry• Adopted by Eclipse for Eclipse 3.0 (2005)• Both a module system and service platform (but we’re currently only using the former)• Modules, or “bundles” are just JARs with a special MANIFEST
  40. 40. OSGi in Theory• An OSGi “container” takes care of component activation• Bundles describe their own imports (dependencies) and exports (exposed API)• Container can also take care of provisioning• Class loader isolation can take care of “JAR Hell”
  41. 41. OSGi at Nuxeo• We package our components as OSGi bundles• We have our own “OSGi-like” adapter for app servers (JBoss, Jetty, Tomcat, Glassfish)• Most of our components can also run on Eclipse Equinox (for RCP apps)• We have our own service registry, but it’s currently not based on OSGi• We don’t provide class loader isolation
  42. 42. OSGi at Nuxeo• Goal is to be able to run everything on a “real” OSGi container in 2011• ... and to fully leverage the OSGi service stack at the same time • Including service registry, hot-reload, class isolation, etc.• Biggest conceptual issue: overlap with Java EE
  43. 43. Plugins and Extension Points• Inspired by the Eclipse architecture• Eclipse = a core runtime engine + a set of plugins• Plugin: the smallest extensible unit to contribute additional functions to the system• Extension points: boundaries between plug-ins• A plugin (bundle) can contribute either configuration (pure XML contribution) or code (XML + Java)
  44. 44. Plugins andExtension Points
  45. 45. Note• This “core + extensions” pattern is very common in successful open source projects • Linux kernel + drivers (modules) • Firefox + plugins • Emacs + Emacs LISP macros• It’s a key to enabling an architecture of participation
  46. 46. Event Bus• EventHandlers aka listeners • Synchronous / PostCommit / Asynchronous • Easily contributed (Java / script / MDB)• Great solution for • Glueing together independent components • Enforcing business rules (synchronous inline) • Pushing/getting data to/from external systems
  47. 47. Event Bus
  48. 48. Process andCommunity Engagement
  49. 49. Goals• Must enable the participation of third party contributors (partners, community)• Must improve synchronization between custom developments and OSS projects• Agile development practices (XP, TDD) already used at Nuxeo since 2001 or so• Must complement them with simple, efficient and scalable project management practices
  50. 50. Process:Scrum & Kanban
  51. 51. Community Engagement: PRIM
  52. 52. P = Portal
  53. 53. R = Repository
  54. 54. I = Issue Tracker
  55. 55. M = Mailing List (+ foruM)
  56. 56. “Every successful open source project I knowuses PRIM. Every closed source project Iknow, doesnt. People wonder how open sourceprojects manage to create high-quality productswithout managers or accountability. The answer:were accountable to our infrastructure. PRIMis the open source secret sauce.”Ted Husted
  57. 57. Development Tools
  58. 58. Tools• Mercurial (distributed SCM)• Maven (Dependency mngt, build, packaging, releasing)• Hudson (Continuous integration)• Jira (Bug / task tracking, Scrum iteration backlogs)
  59. 59. TDD and CI
  60. 60. More Tools• IDEs (Eclipse mostly)• Testing (JUnit, Selenium, WebDriver)• Static code analysis (FindBugs, IDEA inspections, Checkstyle, Enerjy)• Various profilers and debuggers
  61. 61. Outstanding Issues• CI at our level is very resource-intensive (10 servers farm)• It’s hard (read, impossible) to test UI without a browser• OTOH “Plain” Selenium test are hard to maintain• Some pieces (Maven repository, Selenium testing) are fragile, and introduce heisenbugs
  62. 62. Related Work
  63. 63. Open Source, Java-Based• Some Java-baed open source WCM or E2.0 platforms (XWiki, Jahia...) have developed ad- hoc component systems similar to ours• Alfresco is an ECM solution with a static architecture based on Spring, which makes it harder to customize and extend• Apache Sling is a framework based on JCR and OSGi, but doesn’t come with complete solutions and seems more focussed on WCM
  64. 64. Open Source, Scripting Languages Based• Drupal, Joomla, WordPress (PHP), Plone (Python), are examples of extensible content management platforms based on scripting languages with large ecosystems• Plugins usually rely on callbacks functions / methods instead of ext. points and events• Plugins can break due to API changes and lack of statical verification of compatibility
  65. 65. Proprietary• Proprietary software is, by definition, harder to study and tear apart• But the general view is that big name vendors products (Documentum, Open Text, FileNet) are based on mixes of old technologies patched together after various acquisitions, and harder to make evolve and to program to for modern developers
  66. 66. Perspectives
  67. 67. Ongoing Work• Simplifying the developer experience: faster code/test turnaround, simpler web front end development using the more recent JAX-RS standard or Google Web Toolkit (GWT).• Development of business-oriented RESTful APIs to allow high-level interaction with the content, and eventually business application development by non-technical users (cf. Nuxeo Studio).
  68. 68. Nuxeo Studio
  69. 69. Ongoing Work• Replication, both in a LAN (for scalability and fault tolerance) and WAN (for replication between remote data centers, or between a server and a desktop or mobile client) contexts• Social integration (using OpenSocial)• Further work on semantic technologies
  70. 70. Ongoing Work• Porting the platform to the Cloud, and exposing it as a PaaS service• Work on mobile ECM, with clients for platforms such as the iPhone/iPad, Android and Blackberry operating systems• Bringing Nuxeo EP to Java EE 6 and full OSGi compliance
  71. 71. Conclusion
  72. 72. Key Findings• The Nuxeo EP architecture fits both the OSS “architecture of participation” vision and our business model and goals• Main effort has now moved from the platform to its periphery (extensions, applications, development and operation tools), as enabled by the architecture• Still work to do on some key standards compliance aspects (OSGi, Java EE 6, CMIS...)
  73. 73. More Information•••