How Can
 Content Management Software
          Keep Pace?

San Francisco Gilbane Conference 2009
        Content Integrati...
Dick Weisinger
 Vice President and Chief Technologist
  Formtek, Inc
 20+ years of experience in Content,
  Document and...
Formtek
 An ECM software and services company
  – 25-year history
    25 year
 Experts in general ECM and CM space
 Dep...
Drowning in Digital Data
 Hand-held devices                  E-Discovery / Records
                                     ...
Size of the Digital Universe
    2003 – 20 exabytes
    2006 – 161 exabytes
    2007 – 281 exabytes
    2008 – 486 exa...
Data in Business and Science
 Walmart adds a billion rows of data to
  its 600 terabyte database every hour
 Chevron’s g...
Hardware’s Shrinking Cost

Year    Cost/MB
1986    $51.30
                  Storage costs are
1991    $13.00    plummeting...
Can Software Keep Pace?
How Can We Find Anything?

 Search Algorithms have evolved and
  improved, but…
 Internet Search...
The Problem of Search

 49 percent of business users say that finding
  data is difficult d time consuming.
  d t i diffi...
Scattered Data Repositories
                 p
 Corporate Applications
    –   ERP
    –   PLM/PDM
    –   Business Intel...
Multiple Repository Challenge
      p     p      y        g
Problem
 How to access and search data to achieve:
      Comp...
Unstructured Data Search is Hard
 80 percent of enterprise data is unstructured
     p               p
  – Eg., emails, P...
Huge Data Sets Brings Huge Problems
   Search gets harder as data sets grow
    – Longer to index and search
    – Harder...
Getting Data Under Control
 Ultimate goal: Content Intelligence
  – Knowledge extraction
  – Ability to distill, condense...
Creating Structure
Semi-Structured Data
S     S
 Use a structured native data format
  – XML Authoring/Publishing applica...
Centralized Repository Efficiency

   Management efficiencies of scale
   More efficient search
     – No need to consol...
Integration of Repositories
 Content-Intelligence Platforms can
  integrate/unite multiple repositories
 XML is the pipe...
CMIS -- ECM Integration

 ECM vendors have united to create a
  new interoperability standard:
  Content Management Inter...
What is CMIS?

 Content Management Interoperability Services
  – Defines a lowest-denominator CM capability set
  – CM co...
CMIS Timeline
 1993 – ODMA (Open Document Management API)
 1996 – DMA (AIIM Document Management Alliance)
 1996 – WebDA...
JCR versus CMIS
  Session-based API   Services Based
  Java Only           Language Agnostic
  “Complete” ECM      Core EC...
CMIS: Creators and Participants
 Founding Companies for the Original Standard
  – EMC/Documentum
  – IBM/Filenet
  – Micr...
CMIS – The Model
 Documents
   – Eg Office document or image
     Eg.,
   – Content, Metadata and Version History
 Folde...
Benefits of CMIS
 Standardized Core ECM functions
 Enables Interoperability between repositories
                 p     ...
CMIS Weak Points
   Only Basic Content Functions Available
   Does not cover Admin/Management
   Does not cover User Au...
Applications
 Workflow/Business Processes
  – Connect work packages from any
    repository
 Portals and Mash-ups
  – Ag...
Summary
 Massive Growth in Content Creation
 Advances in hardware technology is
  fueling content creation and storage
...
Gilbane 2009 -- How Can Content Management Software Keep Pace?
Upcoming SlideShare
Loading in...5
×

Gilbane 2009 -- How Can Content Management Software Keep Pace?

411

Published on

The amount of data stored is growing at a phenomenal rate. This paper documents the growth and suggests that a new standard, CMIS, may be useful in getting better control over data and data repositories.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
411
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Gilbane 2009 -- How Can Content Management Software Keep Pace?

  1. 1. How Can Content Management Software Keep Pace? San Francisco Gilbane Conference 2009 Content Integration Strategies Dick Weisinger g June 4, 2009
  2. 2. Dick Weisinger  Vice President and Chief Technologist Formtek, Inc  20+ years of experience in Content, Document and Image Management g g  Regular blogger at http://www.formtek.com/blog
  3. 3. Formtek  An ECM software and services company – 25-year history 25 year  Experts in general ECM and CM space  Depth of experience in engineering data management  Formtek Orion ECM Software  Alfresco Gold Integration Partner
  4. 4. Drowning in Digital Data  Hand-held devices  E-Discovery / Records Management  High-resolution video  Di iti d B i Digitized Business D t Data  High-End Video Games  Financial and Health  High-Resolution Records Graphics d Images G hi and I  Business Continuity  Scientific Data Backups Analysts at: Gartner Group, Forester Research, Research IDC and The 451 Group all predict massive growth in digital data. data
  5. 5. Size of the Digital Universe  2003 – 20 exabytes  2006 – 161 exabytes  2007 – 281 exabytes  2008 – 486 exabytes  2010 – 988 exabytes of data  2011 – 1800 exabytes of data  2012 – 2500 exabytes of data (30% of data is created by enterprises) Source: IDC One Exabyte == 1 billion gigabytes or 1000 petabytes (about 250 million DVDs) 161 exabytes is the equivalent of 12 stacks of books each extending 93 million miles from the earth to the Sun.
  6. 6. Data in Business and Science  Walmart adds a billion rows of data to its 600 terabyte database every hour  Chevron’s gas and oil exploration collects 2 terabytes of data daily y y  Large Hadron collider in Switzerland to collect 300 exabytes per year  Department of Energy has increased their data by a factor of 10 every four years since 1990
  7. 7. Hardware’s Shrinking Cost Year Cost/MB 1986 $51.30 Storage costs are 1991 $13.00 plummeting, plummeting but not as fast 1994 $1.00 as the amount of data is growing. 1997 $0.09 $0 09 2000 $0.07 Cheap storage costs also 2003 $0.02 $0 02 encourage applications to store ever more data. 2009 $0.0002
  8. 8. Can Software Keep Pace? How Can We Find Anything?  Search Algorithms have evolved and improved, but…  Internet Search is only Fair to Good – Google Page-Rank  8+ billion web pages, hundreds of thousands of p g , servers  Enterprise Search is Poor – Usage patterns are hard to model
  9. 9. The Problem of Search  49 percent of business users say that finding data is difficult d time consuming. d t i diffi lt and ti i -- AIIM 2008 Market Study  Users have a 50 percent success rate at search h -- Recommind Survey March 2009
  10. 10. Scattered Data Repositories p  Corporate Applications – ERP – PLM/PDM – Business Intelligence / Knowledge Management – Content and Document Management  Relational Databases  Local and Shared File Syste s oca a d S a ed e Systems  Internet/Intranet HTTP servers  Email Servers  Disk Appliances (digital cameras, cell phone…)
  11. 11. Multiple Repository Challenge p p y g Problem  How to access and search data to achieve: Compliance eDiscovery Business Intelligence Challenge  Many organization have multiple repositories from y g p p multiple vendors  Lack of standards around API and query language  Each system is different and has very little common reuse
  12. 12. Unstructured Data Search is Hard  80 percent of enterprise data is unstructured p p – Eg., emails, PDF, Word and Office docs  No underlying data model or schema y g – emails and IM often lack context and use shorthand and abbreviations that increase the search challenge
  13. 13. Huge Data Sets Brings Huge Problems  Search gets harder as data sets grow – Longer to index and search – Harder to determine context  The more systems, the harder to secure  The more systems, the harder to consolidate search  Conflicting or Inconsistent Data – Whi h i th system of reference? Which is the t f f ?
  14. 14. Getting Data Under Control  Ultimate goal: Content Intelligence – Knowledge extraction – Ability to distill, condense and summarize data How?  Apply more Structure and Reuse – XML Tags  Allow greater access across data sources – Consolidation of Systems – Integration of Systems
  15. 15. Creating Structure Semi-Structured Data S S  Use a structured native data format – XML Authoring/Publishing applications  DITA publishing XML – Microsoft Office 2007 docx, etc. (Office Open XML)  Complex: 29 namespaces and 89 schema models  Add Structure – Append Headers and Embedded Properties  Eg., Tiff, jpeg images  PDF and embedded Microsoft Office files  Associate tags and metadata with unstructured data
  16. 16. Centralized Repository Efficiency  Management efficiencies of scale  More efficient search – No need to consolidate search results  Available to users via a single interface
  17. 17. Integration of Repositories  Content-Intelligence Platforms can integrate/unite multiple repositories  XML is the pipeline for integration  Integration via APIs or XML Web services – REST Web Services have momentum – Integration with SOA
  18. 18. CMIS -- ECM Integration  ECM vendors have united to create a new interoperability standard: Content Management Interoperability Services (CMIS) – Web services for sharing information between different content repositories p – “SQL for Document Management”
  19. 19. What is CMIS?  Content Management Interoperability Services – Defines a lowest-denominator CM capability set – CM content is accessed as SOAP or AtomPub (REST) web services – A single application works identically with content from any CMIS vendor y
  20. 20. CMIS Timeline  1993 – ODMA (Open Document Management API)  1996 – DMA (AIIM Document Management Alliance)  1996 – WebDAV (Web-based Distributed Authoring and Versioning )  2002 - JSR-170 / Java Content Repository (Day Software) JSR 170  2005 – iECM (AIIM Interoperable ECM)  October 2006 – CMIS started  August 2008 - Contributing members invited  September 2008 - Draft Specification submitted to OASIS  Possible completion and acceptance in late 2009 or early 2010
  21. 21. JCR versus CMIS Session-based API Services Based Java Only Language Agnostic “Complete” ECM Core ECM functions Infrastructure Interoperability p y Targets DM, RM, Intended specifically DAM, WCM… for DM Complex Simple Prescriptive Little or No Change Connectors by Day Vendor Connectors Version 2.0 Version .61 Design spearheaded Design Led by Top by Day Software Tier ECM Vendors
  22. 22. CMIS: Creators and Participants  Founding Companies for the Original Standard – EMC/Documentum – IBM/Filenet – Microsoft  Contributing Members (after August 7, 2008) – Alfresco – Open Text – Oracle – SAP – More …
  23. 23. CMIS – The Model  Documents – Eg Office document or image Eg., – Content, Metadata and Version History  Folders – Defines Organization and Hierarchy – Container, Metadata and Hierarchy/Organization  Object Links and Relations j – Reference between two folders or documents – Requires a source and target  Policies – Set of rules that can be applied to control other objects, eg. ACLs or retention policy
  24. 24. Benefits of CMIS  Standardized Core ECM functions  Enables Interoperability between repositories p y p  Encourages Flexible Application Development  Encourages ‘mash-up’ composite applications  A single application can consolidate and aggregate content from multiple CMIS repositories  Business Processes/Workflow can span and touch all enterprise content
  25. 25. CMIS Weak Points  Only Basic Content Functions Available  Does not cover Admin/Management  Does not cover User Authentication  Does not handle Security/Authorization
  26. 26. Applications  Workflow/Business Processes – Connect work packages from any repository  Portals and Mash-ups – Aggregated Content from multiple sources  E-Discovery and Compliance
  27. 27. Summary  Massive Growth in Content Creation  Advances in hardware technology is fueling content creation and storage  Search and Retrieval of content grows in complexity with its volume  Content Intelligence is needed to bring understanding to data  Standards like XML and CMIS provide p consistent classification and handling of data
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×