conceptClassifier For SharePoint Driving Business Value


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • conceptClassifier For SharePoint Driving Business Value

    1. 1. Leveraging Metadata to Improve Findability, Records Management, and Compliance in SharePoint <br />Martin Garland, President<br /><br />
    2. 2. Agenda<br /><ul><li>Who We Are
    3. 3. Challenges in Knowledge Intensive </li></ul>Organizations<br /><ul><li>Metadata Drives
    4. 4. Search
    5. 5. Compliance & Records Management
    6. 6. Privacy and Security Exposures
    7. 7. Collaboration
    8. 8. Manual Metadata Approach will fail
    9. 9. The ROI of Managing Unstructured Content Assets
    10. 10. Benefits of metadata consistency, automated classification and taxonomy management
    11. 11. Technology Suite
    12. 12. Roadmap to SharePoint 2010
    13. 13. Case Study – US Air Force Medical Service</li></li></ul><li>Concept Searching, Inc.<br /><ul><li>Company founded in 2002
    14. 14. Product launched in 2003
    15. 15. Focus on management of structured and unstructured information
    16. 16. Privately held and profitable – no funding
    17. 17. Growth rate of 35% in 2008 and in excess of 100% for 2009
    18. 18. Founders and management team with company since inception
    19. 19. Technology
    20. 20. Automatic concept identification, content tagging, auto-classification, taxonomy management
    21. 21. Only statistical vendor that can extract conceptual metadata
    22. 22. 2009 and 2010 ‘100 Companies that Matter in KM’ (KM World Magazine)
    23. 23. KMWorld ‘Trend Setting Product’ of 2009
    24. 24. Locations: US, UK, & South Africa
    25. 25. Client base: Fortune 500/1000 organizations
    26. 26. Managed Partner under Microsoft global ISV Program - “go to partner” for Microsoft for auto-classification and taxonomy management
    27. 27. Microsoft Enterprise Search ISV , FAST Partner
    28. 28. Product Suite: conceptSearch, conceptTaxonomyManager, conceptClassifier</li></li></ul><li>Challenges in Knowledge Intensive Companies<br /><ul><li>Metadata Generation
    29. 29. Inconsistent, subjective, costly
    30. 30. Inability to harvest and reuse intellectual capital
    31. 31. Lack of organizational memory – most organization’s don’t know what they already know
    32. 32. Semantic identification of concepts
    33. 33. Vocabulary Normalization
    34. 34. Inability to share expertise and knowledge across geographic and cultural barriers due to inconsistent nomenclature
    35. 35. Inability to deliver consistency across global boundaries and even within practice areas
    36. 36. Folksonomy dilutes metadata value
    37. 37. Search alone does not deliver ‘findability’
    38. 38. Inability to manage knowledge assets so that relevant information can be found and used
    39. 39. Retrieval of information based on keywords or proximity does not deliver relevance as to what concepts the document contains
    40. 40. Continuously replacing your search engine will not fix search
    41. 41. Inability to control and manage content to improve back end processes such as records management</li></li></ul><li>The Dilemma Between Precision and Recall<br /><ul><li> Two key performance measures for information retrieval
    42. 42. Most information retrieval technologies are less than 22% accurate for both precision and recall
    43. 43. Ideal is to have them balanced
    44. 44. Recall is the retrieval of all items that are relevant to the query
    45. 45. Precision is the retrieval of only those items that are relevant to the query
    46. 46. Higher precision leads to missing items that may be relevant but use a different vocabulary
    47. 47. Higher recall leads to the retrieval of too many items that may be unrelated to the query </li></ul>Automatic Concept Identification has the ability to increase precision with no loss of recall<br />
    48. 48. Metadata Consistency Drives Business Agility <br />Findability & Enterprise Content Management<br /><ul><li>Findability every time and at a lower cost per click
    49. 49. Deliver a robust content management approach maximizing SharePoint technologies
    50. 50. Guided navigation, related topics based on concepts</li></ul>Identification of Assets for Data Security, eDiscovery, and Litigation Preparedness<br /><ul><li>Reduced litigation for ediscovery
    51. 51. Costs associated with data breaches</li></ul>Records Management & Compliance <br /><ul><li>Eliminate inconsistent meta-tagging
    52. 52. Preserve record integrity
    53. 53. Lower costs for managing records and information</li></ul>Collaboration and Enterprise 2.0<br /><ul><li>Increase collaboration and productivity through integration of diverse repositories
    54. 54. Improve information sharing and expert identification reducing rework and recreation
    55. 55. Integration with external repositories, web sites delivering a single search</li></ul>interface<br />
    56. 56. Metadata Drives Actionable Search<br /><ul><li>Keyword search captures only 33% of relevant information. Consistent, meaningful metadata ensures all relevant information related to key words will be returned.
    57. 57. Users can’t navigate to information. Taxonomies provide consistent guided navigation for end users to extract relevant information even in external content. Taxonomy navigation is 36%-48% faster and more efficient than lists.
    58. 58. Vocabulary normalization across diverse geographies and cultures causes issues and inhibits sharing of knowledge and expertise due to nomenclature.
    59. 59. Case Study: Fortune 500 firm realized that search alone did not solve findability issues. Implemented conceptClassifier to secure and manage content in a policy-compliant manner, eliminated end user tagging, delivered the ability to rapidly build and deploy taxonomies, and to normalize vocabulary across global boundaries.</li></ul>KNOWLEDGE WORKERS CHALLENGES<br />~ 15% of their time is spent duplicating information.<br />~ 25% of their time is spent searching.<br />~ 40% can not easily find the information they require to do their job.<br />The cost to a 500 employee company is<br />$2.4 million per year in inefficiencies <br />and lost productivity.<br /> Gartner Group<br />
    60. 60. Metadata Ensures Compliance & Records Management<br /><ul><li>Protects the organization by eliminating end user adoption issues
    61. 61. Ensures adoption to any enterprise regulation for external agencies or where compliance is mandatory
    62. 62. Easy Integration with Microsoft Records Center. Ensures the long term usefulness of the records and enforcement of life cycle management.
    63. 63. Automatic assignment of Records Retention codes
    64. 64. Optional updating of Content Type based on the metadata contained within the documents
    65. 65. Case Study: US Air Force Medical Service eliminated all manual metadata tagging and uses conceptClassifier to automatically generate semantic metadata, assign record retention codes based on the metadata within the content, automatically change the Content Type and migrate documents to the RM</li></ul>COMPLIANCE & RECORDS MANAGEMENT CHALLENGES<br />~ The average cost of manually tagging one item is estimated at $4.00 - $7.04<br />~IDC estimates that only 50% of content is correctly meta-tagged <br />~ It costs and organization $180 per document to recreate it when it is not tagged correctly and cannot be found<br />~ Poor information quality costs organizations 10% to 20% of operating revenues<br />
    66. 66. Metadata Helps Avoid Data Privacy & Security Issues<br /><ul><li> conceptClassifier for SharePoint ensures compliance by automatically identifying Personally Identifiable Information (PII), Protected Health Information (PHI) or any metadata that is considered by the organization to be confidential.
    67. 67. Migrates content to a secure location where Windows Rights Management Services is applied to the file in the new location
    68. 68. Optionally can change the Content Type during the classification process
    69. 69. The taxonomy standardizes the process of identifying all possible privacy data exposures at the time of content creation and modification (digital and handwritten).
    70. 70. Case Study: conceptClassifier for SharePoint extracted 2,000+ documents with sensitive information, from a redacted sample pilot data set. By human error, these 2,000 records contained real social security numbers and real employee information. These documents were identified in the proof of concept to the client in front of executive management.</li></ul>DATA BREACHES & EXPOSURES CHALLENGES<br />~ Average cost of a data breach is $6.3 million and ranges from $225K to $35 million.<br />~ Average cost per exposed record is $197 and ranges from $90-$305 per record.<br />~ 70% of breaches were due to a mistake or malicious intent by an organization’s own staff.<br />~ Healthcare provider - $7 million, TJX Companies - $256 million, ValueClick - $2.9 million.<br />
    71. 71. Metadata Drives Collaboration<br /><ul><li>Ability to add structure to chaos
    72. 72. Generate weighted results from diverse repositories such as human resources records, time and billing, project documentation, content authorship, team structures, user profiles
    73. 73. Results are generated based on the most experienced and knowledgeable individual for that specific topic or skill set aggregated from diverse repositories
    74. 74. Single logical view for expertise search across diverse information stores
    75. 75. Removes geographical boundaries through vocabulary normalization
    76. 76. Case Study: Professional Services firm with over 39K employees across 36 countries uses conceptClassifier for expert identification to identify and utilize in-house consultants for projects as opposed to outsourcing – increasing utilization of staff by 5% to 10%</li></ul>Collaboration and Enterprise 2.0<br />~ Nearly 80% of executives believe collaboration is important but needs to be managed<br />~Email storage costs $500GB per year – a Fortune 100 manufacturing company saved $2.6 million per year by implementing collaboration solutions<br />~ Up to 90% of content from premium paid publication database services is available for free on-line<br />~ Only 25% of executives describe their organization as effective at sharing knowledge across boundaries<br />
    77. 77. A Manual Metadata Approach Will Fail 95% Of The Time<br />
    78. 78. An Automated Metadata Approach Drives Business Value<br /><ul><li>Create enterprise metadata framework/model
    79. 79. Average return on investment minimum of 38% and runs as high as 600% (IDC)
    80. 80. Apply consistent meaningful metadata to enterprise content
    81. 81. Incorrect meta tags costs an organization $2,500 per user per year – in addition potential costs for non-compliance (IDC)
    82. 82. Guide users to relevant content with taxonomy navigation
    83. 83. Savings of $8,965 per year per user based on an $80K salary (Chen & Dumais)
    84. 84. Use automatic conceptual metadata generation to improve Records Management
    85. 85. Eliminate inconsistent end user tagging at $4-$7 per record (Hoovers)
    86. 86. Improve compliance processes, eliminate potential privacy exposures</li></li></ul><li>Concept Based Metadata Generation<br /><ul><li>Compound Term Processing – the ability to extract ‘concepts in context’
    87. 87. Only statistical metadata generation and classification company that can extract concepts from content as it is created or ingested</li></ul>triple heart bypass<br /><ul><li>conceptClassifier will generate conceptualmetadata by extracting multi-word terms that identifies ‘triple heart bypass’ as a concept as opposed to single keywords
    88. 88. Search will return results based on the concept even if the exact terms are not contained in the document (i.e. ‘coronary artery surgery’, ‘heart surgery’)
    89. 89. Metadata can be used by any search engine index or any application/process that uses metadata</li></ul>Triple<br />Baseball<br />Three<br />Heart<br />Organ<br />Center<br />Bypass<br />Highway<br />Avoid<br />
    90. 90. conceptClassifier for SharePoint<br /><ul><li>conceptClassifier for SharePoint
    91. 91. Automatic identification, extraction, and tagging of content with concepts
    92. 92. Intelligent auto-classification based on concepts not keywords
    93. 93. Enterprise class Taxonomy Management, uniquely based on concept identification
    94. 94. Integration
    95. 95. A technology platform that runs natively in SharePoint 2007 and 2010
    96. 96. Microsoft Enterprise Search, FAST ESP
    97. 97. Microsoft Office
    98. 98. Windows Server 2008 R2 FCI
    99. 99. Component – Automatic Conceptual Metadata Generation
    100. 100. Automatically generates and extracts metadata including keywords, acronyms, and multi-word terms that form concepts and convey meaning
    101. 101. Component – Automated Classification
    102. 102. Rules based categorization module
    103. 103. Real-time classification of individual pieces of content aligned to defined business structure(s)
    104. 104. Automatically classifies documents to multiple nodes in multiple taxonomies
    105. 105. Highly scalable, fast real-time classification
    106. 106. Classifier may be called via web services, or by other related applications (e.g.: FAST pipeline stage)
    107. 107. Based upon identified and extracted concepts proven to be more effective than keyword classifiers</li></li></ul><li>conceptClassifier for SharePoint<br /><ul><li>Component – Taxonomy Management
    108. 108. Hierarchical taxonomy structure with ontological features and ability to import standard structures such as OWL and RDF
    109. 109. Rapid taxonomy creation and maintenance
    110. 110. Automatic concept identification extracted from clients own content to populate taxonomy(s)
    111. 111. Class clues generated from client’s unique document corpus
    112. 112. Dynamic movement feedback
    113. 113. Maintenance via clue suggestion, integrated search, and instant feedback
    114. 114. Class weighting influenced by parent, child, sibling
    115. 115. AJAX based user interface
    116. 116. Taxonomy structure and classification results held in SQL database
    117. 117. Component - contentTypeUpdater
    118. 118. Based on organizationally defined metadata provides the ability to</li></ul>automatically change the SharePoint Content Type based on the presence<br />of the metadata within documents (if a document contains Protected Health Information (PHI) it will automatically change the Content Type to ‘PHI’)<br /><ul><li>Technology
    119. 119. SOA compliant and delivered as Web Parts
    120. 120. API is based entirely on Web Services and all information is exchanged in XML T
    121. 121. Taxonomy formats are based on Web Ontology Language (OWL). Since the server is stateless it also works with all failover and load balancing hardware and software. </li></ul> <br />
    122. 122. Roadmap from SharePoint 2007 to 2010<br /><ul><li>There is no auto classification of metadata (i.e. no way to auto apply term set values) it is a manual process
    123. 123. There is no way to automatically generate metadata when it is created or ingested
    124. 124. Same problem with end users adding inconsistent metadata
    125. 125. Taxonomy management is a manual process
    126. 126. Taxonomy maintenance requires significant resources to maintain and change as business changes
    127. 127. Enterprise Metadata Management
    128. 128. Properties (current flat lists) become hierarchical “Term Sets” –
    129. 129. Term Sets provide capability for faceted search and hierarchical navigation: Regions Country/State, Business Unit/Departments, Band Names/Album Names, TV Show Titles/Characters
    130. 130. conceptClassifier fully supports SharePoint 2010 EMM as the primary location for taxonomy definitions with no need to Import/Export
    131. 131. Changes to the taxonomy structure using Microsoft tools will be immediately visible in conceptClassifier and vice versa</li></li></ul><li>conceptClassifier for FAST Search<br />Delivers all functionality in conceptClassifier for SharePoint including:<br /><ul><li>Improves faceted search results as facets are based on concepts aligned within the taxonomy
    132. 132. Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s)
    133. 133. Provides accurate metadata filters such as numeric range searching and wildcard alphanumeric matching
    134. 134. Improves search outcomes by placing conceptual metadata in the FAST Search index to increase relevancy of search results
    135. 135. Enables import of FAST Entities into the conceptClassifier taxonomy manager to fine-tune them with metadata generated from your own content and nomenclature
    136. 136. Runs natively as a FAST Pipeline Stage eliminating integration and customization issues
    137. 137. Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies</li></li></ul><li>Case Study – US Air Force Medical Service<br /><ul><li>US Air Force Medical Service
    138. 138. Initially deployed conceptClassifier to power Knowledge Portal with over 65K users
    139. 139. Controlled vocabulary consists of over 27K unique keywords, metadata, and multi-word fragments generated by conceptClassifier
    140. 140. Have expanded use of technology to provide the following:
    141. 141. Automatically organize data assets prior to migration to SharePoint - identifies duplicates, PII, PHI before migration
    142. 142. Automatically extract metadata and classify to the various taxonomies
    143. 143. Provide Microsoft Enterprise Search and FAST indexes the conceptual metadata to improve findability
    144. 144. Identification and location of sensitive information (PII, PHI, Confidential) and migrates content to new location where Windows Rights Management Services are applied
    145. 145. Automatically tags and classifies content based on semantics contained within the actual document of record and optionally updates the Content Type
    146. 146. Eliminates end user tagging issues
    147. 147. Records are declared in a consistent manner
    148. 148. Only real records are migrated to the RM system
    149. 149. Only real records are managed for disposition
    150. 150. All other content stays in the collaborative portal
    151. 151. Freedom of Information Act (FOIA) compliance
    152. 152. eDiscovery</li></li></ul><li>USAF Human Performance ClearinghouseGOAL : Leverage Existing USAF, AFDW, and AFMS License Agreements to Enable IM, RM, & Privacy & Security Compliance<br />Requirements<br /><ul><li>DoDD 8320 (Data Sharing in a Net-Centric DoD)
    153. 153. DoDD 5015 (Records Management)
    154. 154. USAF Privacy Act Program & HIPAA
    155. 155. Freedom of Information Act (FOIA)</li></ul>Distribution Statement A: Approved for public release; distribution is unlimited. <br />311 ABG/PA No. 09-488, 16 Oct 2009<br />
    156. 156. Screen Shots<br />
    157. 157. Navigation – Auto Complete, Taxonomy & Faceted<br /><ul><li>Microsoft Enterprise Search/FAST ESP can utilize highly relevant compound term metadata
    158. 158. Browsable taxonomy navigation via Concept Searching Web Part
    159. 159. Faceted navigation (integrated with Microsoft CodePlex)</li></li></ul><li>FAST ESP Search<br /><ul><li>Cross taxonomy navigation filter
    160. 160. Taxonomy Browse
    161. 161. Conceptual metadata supplied to FAST search index</li></li></ul><li>Automatic Classification & Metadata Tagging<br /><ul><li>Content is automatically tagged with semantic metadata and uploaded to SharePoint
    162. 162. Content is automatically classified to one or more nodes in one or more taxonomies
    163. 163. Documents are automatically classified to multiple categories
    164. 164. Editable from within SharePoint & the Concept Searching Taxonomy Manager</li></li></ul><li>Full Support for Content Types<br /><ul><li>Eliminates time consuming manual metadata definition
    165. 165. Enforces governance, policies, and drives workflows in line with business processes
    166. 166. Enables different taxonomies to be assigned to different Content Types
    167. 167. Authorized users have complete control over automatically generated metadata</li></li></ul><li>Automatic Update of Content Types/Workflow<br />Event Handler<br />Based on a pre-defined Event Handler, the Content Type can be automatically changed when classified.<br /><ul><li>When organizationally defined metadata is identified within content the Content Type Updater will automatically change the Content Type</li></li></ul><li>Office Integration<br /><ul><li>Fully integrated with Microsoft Office & Exchange
    168. 168. Content automatically tagged with semantic metadata stored in custom properties
    169. 169. Content automatically classified to corporate or departmental taxonomies
    170. 170. Delivers governance at the desktop, improves ECM
    171. 171. Automatic metadata generation or optionally authorized users can change the classification</li></li></ul><li>Aggregates Multiple Content Sources<br /><ul><li>Ability to specify multiple file sources including:
    172. 172. SharePoint
    173. 173. Web Sites
    174. 174. Exchange Public folders
    175. 175. File stores
    176. 176. Can also include RSS feeds
    177. 177. Automatically classify and place semantic metadata in search engine index</li></li></ul><li>Taxonomy & Concept Based Metadata Generation<br /><ul><li>Conceptual metadata automatically generated </li></ul>from the organization’s own content and used as clues to build out the taxonomy<br /><ul><li>Hierarchical view of content
    178. 178. Content will be automatically classified to one or more nodes based on concepts within the content
    179. 179. Reduces time to develop, build, and maintain a taxonomy by as much as 80%
    180. 180. Can import industry standard taxonomies</li></li></ul><li>Automatic Clue Suggestion<br /><ul><li>Manual entry of class clues is available
    181. 181. Suggest Clues for Class
    182. 182. Search the document corpus and identify documents that are about the new node to be included or excluded
    183. 183. Clues can be single words, multi-word terms (concepts) or acronyms </li></li></ul><li>Classification and Automatic Clue Feedback<br /><ul><li>Clicking on a clue will display a document summary and the extract where the clue occurs in the context of the document.
    184. 184. The end user can either keep the clue, remove it, or modify the weighting</li></li></ul><li>Leveraging Metadata to Improve Findability, Records Management, and Compliance in SharePoint <br />Martin Garland, President<br /><br />