John Newton, co-founder of Documentum and Alfresco, discusses how open source is changing enterprise content management (ECM). He argues that ECM traditionally costs too much, is difficult to use, and does not scale well. However, open source is now driving innovation in ECM by simplifying use, improving scalability, and standardizing functionality. Alfresco provides an open source ECM platform that addresses these issues through modularity, rules-based automation, and distributed architectures.
2. Agenda
Intro
– John Newton
– Co-founder of Documentum and Alfresco
A Brief History of ECM
Why Open Source ECM
Alfresco Open Source ECM Architecture
Alfresco as a Scalable Enterprise Platform
ECM and Open Source: What’s Next
June 17, 2012 2
3. Alfresco is…
Open Open source, Open standards
Source Best-of-breed open source components
Enterprise Enterprise-scale, enterprise-
infrastructure, enterprise-control
Content Documents, records, XML, web pages,
images, rich media, code …
Management Most experienced team in content
management in the world from
Documentum and Interwoven
June 17, 2012 3
4. A Brief History of ECM
1985 1990 1995 2000 2006
Image mgmt Electronic Web Advanced Consolidation Open
and first document content content of ECM source
collaboration mgmt mgmt concepts enterprise
Open source content
Filenet, Documentum Netscape, DRM, web content mgmt
ViewStar, Saros Vignette, DAM, mgmt
Lotus PCDocs Interwoven Lifecycle CMS
Mgmt OpenCMS, Standards
Mambo,
Drupal, Plone
June 17, 2012 4
5. What is Enterprise Content Management?
Collaboration
Records
Management
Web Information
Content Obj /
ect
Electronic
Management File / Publishing
Data (XML)
Metadata
S earch
Forms
Document Management
Management
Source: AIIM Enterprise Content Management Association
June 17, 2012 5
6. Applications of Enterprise Content Mgmt
Web and Portal Content Management
Collaborative Development
On Demand Publishing
Compliance
Records Management
Document Management
Digital Asset Management
Image Management
June 17, 2012 6
7. What is Wrong with ECM?
1. ECM costs too much
2. ECM is too hard and too cumbersome
3. ECM doesn’t scale for enterprise
requirements
4. ECM is isolated in departmental islands
5. ECM hasn’t changed much
Source: Information Architecture Institute – Jan 2003
http://iainstitute.org/pg/the_problems_with_cms.php#000064
15-years hard knocks
June 17, 2012 7
8. Predictions for ECM
1. ECM will standardize, commoditize and the
business model will change
2. ECM will become simpler, lighter-weight, and
much easier
3. ECM will deploy new technologies to scale to
dynamically serve the enterprise and beyond
4. ECM will decentralize, federate and integrate
with the rest of the enterprise
5. Open source will become a powerful force
for change in ECM
June 17, 2012 8
9. Commoditization of ECM
Knowledge Web Services
Web Portals
CRM
Applications App Server
Portal Server
Business
Process
Engine
Virtual File Content
System FTP Repository High Availability
CIFS
WebDAV
Full-text Indexes Hot Standby
Metadata
& Categories DBMS
Storage
June 17, 2012 9
10. Standardization of Content Mgmt Functionality
Library Services
Content Services
Data Modelling
Search
Business Process and Lifecycle Management
Security and Organizational Structure
Application Integration
June 17, 2012 10
11. Standardization Efforts in Content Mgmt
WebDAV – IETF Web-based Distributed Authoring
and Versioning
JSR-170 – Java Content Repository (JCR) API
JSR-283 – Next generation of JCR
iECM – AIIM Interoperable Enterprise Content
Mgmt
Where is the SQL of Enterprise Content
Management?
June 17, 2012 11
12. Open Source as a Force in Content Mgmt
Open source is now
acceptable in F1000 “After Linux and MySQL,
enterprises are now looking
ECM is fast-growing for open source alternatives
“must have” for the rest of their stack” $3.9
$3.4
Marten Mickos, MySQL
ECM vendors are
$2.8
alienating customers
& channels $2.1
$1.8
$1.6
Enterprise software
and business model
evolves slowly
Open source evolves 2003 2004 2005 2006 2007 2008
faster North American ECM Market Revenue
Source: Forrester Research, June 2005
June 17, 2012 12
13. Fostering ECM Innovation
Closed Open
Customer Customer
Customer
Media
Sales Code Blogger
Reception
Developer
Product
Shipping
Mgmt Customer
Development
Developer
(Bugs)
Mgmt Internet
Support
QA Support Engineer
Accounts Marketer
Marketing Tester Partner
Partner Partner
Partner
June 17, 2012 13
14. Open Source is Changing Enterprise Software
All categories of enterprise software affected
– OS, DBMS, BI, Test & Build, System Admin, CRM,
ERP, CMS
Users sell themselves through try and buy
Direct connection between customers and
developers
Architecture is guided by the community
Community, developers collaborate on the
elimination of bugs
Faster propagation, faster innovation
June 17, 2012 14
15. Simplifying Enterprise Content Management
Address the paradigms that
users know best
Replace shared file drives with
1. File System Virtual File System
Emulation
Email-like plug-in rules
automate manual processing &
enhance compliance
2. Rules Google-like search, Yahoo-like
Engine browsing
Templates to encourage reuse,
simplifies use and provides
web access
3. Out of the box Simple data model supports
portal end user administration
integration
June 17, 2012 15
16. Scale Requirements for an Enterprise Platform
Scale in Information
– Complex search, structure & classification of
information
Scale in Activity
– Complex information per activity with dynamic
views with full object-level security
Scale in People
– Up to 100,000s of readers and writers of gigabytes
and terabytes
Scale in Geography
– Sharing of information across continents in real-
time
June 17, 2012 16
17. Strategies for Scaling Information and Activity
Modularity and AOP
Flexible Data Modelling
Object-Relational Mapping and Optimization
Service Oriented Architecture
Federation vs. Centralization
Caching and Clustering
Web Caching
High Availability
June 17, 2012 17
18. New ECM Architectures for Scalability and
Adaptability
Enterprise-scale, Java App Servers and Portals
high-integrity
repository Spring Framework
CIFS, FTP JSR-170 JSR-168
Best of Breed Open WebDAV & Web Svcs Portlets
Source Components
Content Mgmt Repository
Modular, light-
weight architecture Object
Persistence
BPM
Control
Distributed
Store
Distributed
Aspect (Module) Interface
architecture
Templating
Workflow
Database
Indexing
High Availability
Security
Storage
Admin
DRM
5X faster
MySQL, PHP Lucene OpenDRM
Oracle FreeMarker jBPM JMX
June 17, 2012 18
19. Scaling People and Geography
Federation R&D
Repo
DBMS Index
Mktg
Sales
Repo
Repo
DBMS Index
DBMS Index
Web Caching
Clustering
Virtual Virtual Virtual Repo Repo Repo
Workspace Workspace Workspace
ECM Cluster
WCM Distributed Cache
Repository
Web
DBMS DBMS DBMS
Web
Site Site
Web
Site Database Cluster
June 17, 2012 19
20. Gartner Hype Curve for Content Mgmt
Source: Gartner June 2005
June 17, 2012 20
21. Future of ECM
Enterprise Content Management should be bigger
– $2.5B ECM vs. $10B RDBMS
Standardization will fuel growth
– JSR-170, JSR-283, iECM -> SQL for Content
Commodization will drive global adoption
– China, India, Russia, Latin America
Innovation will drive scalability and simplicity
Open Source will drive innovation
– Next generation: Wikis, Blogs, XML Composition
– New distributed models: RSS, Web Services, Message based
June 17, 2012 21
22. Summary
Time is ripe for open source in enterprise
content management
Open source brings the community into the
development, support and service process
Open source changes the sales and price
dynamics of the industry
Open source brings back the innovation
process into the industry
June 17, 2012 22
23. For More Information
See us at Booth #1162
Try Alfresco
– http://www.alfresco.com
Case Studies
– http://www.alfresco.com/customers
Downloads
– http://www.alfresco.org/downloads
john.newton@alfresco.com
June 17, 2012 23
Editor's Notes
Mid to Late-1980’s: image management and first collaboration Filenet, ViewStar, Lotus Early-1990’s: electronic document management Documentum (EMC), Saros (Filenet), PCDocs (Hummingbird) Mid-1990’s: initial web content management Netscape, Vignette, Interwoven Late-1990’s: advanced content management concepts Digital Rights Mgmt, Digital Asset Mgmt, Lifecycle Mgmt Early-2000’s: consolidation of “enterprise content mgmt” and open source web content mgmt OpenCMS, Mambo, Drupal, Plone Mid-2000’s: open source enterprise content mgmt
Web and Portal Content Management Corporate web sites, Extranets Collaborative Development Sales, Product Development, Issue Resolution On Demand Publishing Marketing Materials, Journals, Annual Reports Compliance Standard Operating Procedures, Regulatory Submissions Records Management Email Archive, Invoices, Certificates, Public Documents Document Management Specifications, Proposals, Contracts, Manuals Digital Asset Management Photos, Videos, GIS, Engineering Image Management Accounts Receivable, Contracts, Citations
A content repository is a server or a set of services used to store, search, access and control content. The content repository provides these services to specialist content applications such as document management, web content management systems, image storage and retrieval systems, records management or other applications that require the storage and retrieval of large amounts of content. The repository provides content services to these applications such content storage or import, content classification, security on content objects, control through content checkin and checkout, and content query services. Major vendors of content repositories include Documentum, IBM, Filenet, Interwoven, Vignette, OpenText and Microsoft through their Sharepoint system. Content repositories have been around for at least 15-years and generally built on relational databases. Metadata about the content, such as descriptive information, process information, security, classification and relationships about content, are stored in the database for quick and flexible retrieval and to simplify the many ways in which this information might be use. The actual content stored in these repositories can be as simple as HTML and pictures for a website, but more often in an enterprise can be office documents, scanned images, XML and streaming media. Storage may be in the database as binary large objects for large content such as complex office documents, images or rich media, or it may be stored in files to simplify the management of storage and streaming of content for rich media and content transformations. Databases provide the transaction control and recoverability required when adding, updating or deleting this information. What distinguishes content management from other typical database applications is the level of control exercised over individual content objects and the ability to search content. Access to these services requires wrapping the calls in security to prevent unauthorized access or changes to content or its metadata. The finer granularity of this security and its complex relationship to other objects such as people and folders requires a more sophisticated mechanism than provided by SQL security. The hierarchical nature of how content is used, found, controlled and accessed, such as working in folder structures and hierarchical classifications, requires a different type of API than is currently provided by even SQL-2003. The paradigm for search of content introduced by internet search engines such as Google has meant that the highly structured search requirements of the SQL Select statement are not adequate for end user requirements and thus most repositories introduce an external full-text search engine to supplement database queries of structured metadata. The result of the complex requirements of these services means that much of the business logic of the content repository can be as large or larger than the database itself. Almost all the content repository vendors provide proprietary service interfaces and APIs to encapsulate the breadth of functionality required. Despite having tried over the last 10 years to standardize these APIs, it is only over the last two years that any progress has been made. In 2005, the Java community adopted the JSR-170 standard interface although only a couple of the major vendors have adopted this interface and the AIIM IECM effort has only just begun, although it has widespread participation of all the major vendors.
Library Services Locking, Versioning, Metadata, Classification Content Services Transformation, Streaming, Translation, Publishing Data Modelling Data Management by Non-experts Search Business Process and Lifecycle Management Security and Organizational Structure Application Integration Authoring, Publishing, Capture, BPM, CRM, ERP
17/06/12 WebDAV – Web-based Distributed Authoring and Versioning W3C standards for locking, browsing and authoring Extensions for version control, security and searching Most vendors support basic capabilities JSR-170 Java Content Repository (JCR) API Java API for accessing any repository An object-database model independent of data model Most implementations are open source, few commercial JSR-283 – Next generation of JCR A lot of vendor participation Covers a lot of weaknesses of JSR-170 iECM – Interoperable Enterprise Content Mgmt Active participation of Documentum, IBM, Filenet, Microsoft Currently moving from use cases to specification Where is the SQL of Enterprise Content Management?
Open source is now acceptable in F1000 73% of F1000 are using or will use open source Cost of distribution and support are dropping dramatically ECM is recognized as “must have”, but expensive Regulation and information explosion Fastest growing category of enterprise software ECM vendors are alienating customers & channels Enterprise software model is broken Pressure on customers to upgrade Vendors are taking margin away from partners by selling their own services New technologies available via open source Open source provides better infrastructure for free A unique window of opportunity Next in the stack after OS (Linux), DBMS (MySQL) and App Server (JBoss)
All categories of enterprise software affected OS, DBMS, BI, Test & Build, System Admin, CRM, ERP, CMS Users sell themselves through try and buy Faster, less risky adoption Lowering costs by reducing unnecessary sales, marketing Direct connection between customers and developers Product management is done by users Architecture is guided by the community Release often, constant peer review Community, developers collaborate on the elimination of bugs No more “send me a dump of your data” Faster propagation, faster innovation
Ease of Use Use Editor, Application, Portal of Choice Use Search of Choice: Google CIFS, FTP, WebDAV Secure Developer Productivity One Model for all Clients ECM: Document Lifecycle, Version, Audit, Compliance User Independent of Client No Way to Bypass
Scale in Information Complex search, structure & classification of information Change control and dependency can tax RDBMS Scale in Activity Complex information per activity with dynamic views Updates with full object-level security Scale in People Writing in gigabytes and terabytes Up to 100,000s of readers Scale in Geography Sharing of information across continents Collaboration must be real-time
Customization Model JSR-168 Portal JSR-170 API Web Services API BPEL Java Server Faces XML Complement the Client/Application
17/06/12 Local management – within global framework Departmental server controlled by users Scale to department requirements Avoids mess of consolidating entire enterprise Global discovery – with federated search Lucene developed by architect of Excite Aggregate search results from multiple departments Transaction control with MySQL and Hibernate Hierarchical classification search Google and Xpath syntaxes Alfresco Enterprise Advantage Global standardization Local control, with global access Consistent search paradigm Multiple language support Transactions are handled by aspects Uses JTA for transaction coordination Uses Hibernate binding Interfaces to Lucene and File Store Works with / without application server All services participate in transactions Nodes, Content, Search, Dictionary, Versions... Repository is completely transaction safe Can roll-back any operation Clustering achieved through XAct Cache Distributed EHCache provides replication of cache Coordinated distributed control