• Save
conceptClassifier For SharePoint Driving Business Value
Upcoming SlideShare
Loading in...5
×
 

conceptClassifier For SharePoint Driving Business Value

on

  • 1,360 views

 

Statistics

Views

Total Views
1,360
Views on SlideShare
1,354
Embed Views
6

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 6

http://www.lmodules.com 3
http://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  •  
  •  
  •  

conceptClassifier For SharePoint Driving Business Value conceptClassifier For SharePoint Driving Business Value Presentation Transcript

  • Leveraging Metadata to Improve Findability, Records Management, and Compliance in SharePoint
    Martin Garland, President
    marting@conceptsearching.com
  • Agenda
    • Who We Are
    • Challenges in Knowledge Intensive
    Organizations
    • Metadata Drives
    • Search
    • Compliance & Records Management
    • Privacy and Security Exposures
    • Collaboration
    • Manual Metadata Approach will fail
    • The ROI of Managing Unstructured Content Assets
    • Benefits of metadata consistency, automated classification and taxonomy management
    • Technology Suite
    • Roadmap to SharePoint 2010
    • Case Study – US Air Force Medical Service
  • Concept Searching, Inc.
    • Company founded in 2002
    • Product launched in 2003
    • Focus on management of structured and unstructured information
    • Privately held and profitable – no funding
    • Growth rate of 35% in 2008 and in excess of 100% for 2009
    • Founders and management team with company since inception
    • Technology
    • Automatic concept identification, content tagging, auto-classification, taxonomy management
    • Only statistical vendor that can extract conceptual metadata
    • 2009 and 2010 ‘100 Companies that Matter in KM’ (KM World Magazine)
    • KMWorld ‘Trend Setting Product’ of 2009
    • Locations: US, UK, & South Africa
    • Client base: Fortune 500/1000 organizations
    • Managed Partner under Microsoft global ISV Program - “go to partner” for Microsoft for auto-classification and taxonomy management
    • Microsoft Enterprise Search ISV , FAST Partner
    • Product Suite: conceptSearch, conceptTaxonomyManager, conceptClassifier
  • Challenges in Knowledge Intensive Companies
    • Metadata Generation
    • Inconsistent, subjective, costly
    • Inability to harvest and reuse intellectual capital
    • Lack of organizational memory – most organization’s don’t know what they already know
    • Semantic identification of concepts
    • Vocabulary Normalization
    • Inability to share expertise and knowledge across geographic and cultural barriers due to inconsistent nomenclature
    • Inability to deliver consistency across global boundaries and even within practice areas
    • Folksonomy dilutes metadata value
    • Search alone does not deliver ‘findability’
    • Inability to manage knowledge assets so that relevant information can be found and used
    • Retrieval of information based on keywords or proximity does not deliver relevance as to what concepts the document contains
    • Continuously replacing your search engine will not fix search
    • Inability to control and manage content to improve back end processes such as records management
  • The Dilemma Between Precision and Recall
    • Two key performance measures for information retrieval
    • Most information retrieval technologies are less than 22% accurate for both precision and recall
    • Ideal is to have them balanced
    • Recall is the retrieval of all items that are relevant to the query
    • Precision is the retrieval of only those items that are relevant to the query
    • Higher precision leads to missing items that may be relevant but use a different vocabulary
    • Higher recall leads to the retrieval of too many items that may be unrelated to the query
    Automatic Concept Identification has the ability to increase precision with no loss of recall
  • Metadata Consistency Drives Business Agility
    Findability & Enterprise Content Management
    • Findability every time and at a lower cost per click
    • Deliver a robust content management approach maximizing SharePoint technologies
    • Guided navigation, related topics based on concepts
    Identification of Assets for Data Security, eDiscovery, and Litigation Preparedness
    • Reduced litigation for ediscovery
    • Costs associated with data breaches
    Records Management & Compliance
    • Eliminate inconsistent meta-tagging
    • Preserve record integrity
    • Lower costs for managing records and information
    Collaboration and Enterprise 2.0
    • Increase collaboration and productivity through integration of diverse repositories
    • Improve information sharing and expert identification reducing rework and recreation
    • Integration with external repositories, web sites delivering a single search
    interface
  • Metadata Drives Actionable Search
    • Keyword search captures only 33% of relevant information. Consistent, meaningful metadata ensures all relevant information related to key words will be returned.
    • Users can’t navigate to information. Taxonomies provide consistent guided navigation for end users to extract relevant information even in external content. Taxonomy navigation is 36%-48% faster and more efficient than lists.
    • Vocabulary normalization across diverse geographies and cultures causes issues and inhibits sharing of knowledge and expertise due to nomenclature.
    • Case Study: Fortune 500 firm realized that search alone did not solve findability issues. Implemented conceptClassifier to secure and manage content in a policy-compliant manner, eliminated end user tagging, delivered the ability to rapidly build and deploy taxonomies, and to normalize vocabulary across global boundaries.
    KNOWLEDGE WORKERS CHALLENGES
    ~ 15% of their time is spent duplicating information.
    ~ 25% of their time is spent searching.
    ~ 40% can not easily find the information they require to do their job.
    The cost to a 500 employee company is
    $2.4 million per year in inefficiencies
    and lost productivity.
    Gartner Group
  • Metadata Ensures Compliance & Records Management
    • Protects the organization by eliminating end user adoption issues
    • Ensures adoption to any enterprise regulation for external agencies or where compliance is mandatory
    • Easy Integration with Microsoft Records Center. Ensures the long term usefulness of the records and enforcement of life cycle management.
    • Automatic assignment of Records Retention codes
    • Optional updating of Content Type based on the metadata contained within the documents
    • Case Study: US Air Force Medical Service eliminated all manual metadata tagging and uses conceptClassifier to automatically generate semantic metadata, assign record retention codes based on the metadata within the content, automatically change the Content Type and migrate documents to the RM
    COMPLIANCE & RECORDS MANAGEMENT CHALLENGES
    ~ The average cost of manually tagging one item is estimated at $4.00 - $7.04
    ~IDC estimates that only 50% of content is correctly meta-tagged
    ~ It costs and organization $180 per document to recreate it when it is not tagged correctly and cannot be found
    ~ Poor information quality costs organizations 10% to 20% of operating revenues
  • Metadata Helps Avoid Data Privacy & Security Issues
    • conceptClassifier for SharePoint ensures compliance by automatically identifying Personally Identifiable Information (PII), Protected Health Information (PHI) or any metadata that is considered by the organization to be confidential.
    • Migrates content to a secure location where Windows Rights Management Services is applied to the file in the new location
    • Optionally can change the Content Type during the classification process
    • The taxonomy standardizes the process of identifying all possible privacy data exposures at the time of content creation and modification (digital and handwritten).
    • Case Study: conceptClassifier for SharePoint extracted 2,000+ documents with sensitive information, from a redacted sample pilot data set. By human error, these 2,000 records contained real social security numbers and real employee information. These documents were identified in the proof of concept to the client in front of executive management.
    DATA BREACHES & EXPOSURES CHALLENGES
    ~ Average cost of a data breach is $6.3 million and ranges from $225K to $35 million.
    ~ Average cost per exposed record is $197 and ranges from $90-$305 per record.
    ~ 70% of breaches were due to a mistake or malicious intent by an organization’s own staff.
    ~ Healthcare provider - $7 million, TJX Companies - $256 million, ValueClick - $2.9 million.
  • Metadata Drives Collaboration
    • Ability to add structure to chaos
    • Generate weighted results from diverse repositories such as human resources records, time and billing, project documentation, content authorship, team structures, user profiles
    • Results are generated based on the most experienced and knowledgeable individual for that specific topic or skill set aggregated from diverse repositories
    • Single logical view for expertise search across diverse information stores
    • Removes geographical boundaries through vocabulary normalization
    • Case Study: Professional Services firm with over 39K employees across 36 countries uses conceptClassifier for expert identification to identify and utilize in-house consultants for projects as opposed to outsourcing – increasing utilization of staff by 5% to 10%
    Collaboration and Enterprise 2.0
    ~ Nearly 80% of executives believe collaboration is important but needs to be managed
    ~Email storage costs $500GB per year – a Fortune 100 manufacturing company saved $2.6 million per year by implementing collaboration solutions
    ~ Up to 90% of content from premium paid publication database services is available for free on-line
    ~ Only 25% of executives describe their organization as effective at sharing knowledge across boundaries
  • A Manual Metadata Approach Will Fail 95% Of The Time
  • An Automated Metadata Approach Drives Business Value
    • Create enterprise metadata framework/model
    • Average return on investment minimum of 38% and runs as high as 600% (IDC)
    • Apply consistent meaningful metadata to enterprise content
    • Incorrect meta tags costs an organization $2,500 per user per year – in addition potential costs for non-compliance (IDC)
    • Guide users to relevant content with taxonomy navigation
    • Savings of $8,965 per year per user based on an $80K salary (Chen & Dumais)
    • Use automatic conceptual metadata generation to improve Records Management
    • Eliminate inconsistent end user tagging at $4-$7 per record (Hoovers)
    • Improve compliance processes, eliminate potential privacy exposures
  • Concept Based Metadata Generation
    • Compound Term Processing – the ability to extract ‘concepts in context’
    • Only statistical metadata generation and classification company that can extract concepts from content as it is created or ingested
    triple heart bypass
    • conceptClassifier will generate conceptualmetadata by extracting multi-word terms that identifies ‘triple heart bypass’ as a concept as opposed to single keywords
    • Search will return results based on the concept even if the exact terms are not contained in the document (i.e. ‘coronary artery surgery’, ‘heart surgery’)
    • Metadata can be used by any search engine index or any application/process that uses metadata
    Triple
    Baseball
    Three
    Heart
    Organ
    Center
    Bypass
    Highway
    Avoid
  • conceptClassifier for SharePoint
    • conceptClassifier for SharePoint
    • Automatic identification, extraction, and tagging of content with concepts
    • Intelligent auto-classification based on concepts not keywords
    • Enterprise class Taxonomy Management, uniquely based on concept identification
    • Integration
    • A technology platform that runs natively in SharePoint 2007 and 2010
    • Microsoft Enterprise Search, FAST ESP
    • Microsoft Office
    • Windows Server 2008 R2 FCI
    • Component – Automatic Conceptual Metadata Generation
    • Automatically generates and extracts metadata including keywords, acronyms, and multi-word terms that form concepts and convey meaning
    • Component – Automated Classification
    • Rules based categorization module
    • Real-time classification of individual pieces of content aligned to defined business structure(s)
    • Automatically classifies documents to multiple nodes in multiple taxonomies
    • Highly scalable, fast real-time classification
    • Classifier may be called via web services, or by other related applications (e.g.: FAST pipeline stage)
    • Based upon identified and extracted concepts proven to be more effective than keyword classifiers
  • conceptClassifier for SharePoint
    • Component – Taxonomy Management
    • Hierarchical taxonomy structure with ontological features and ability to import standard structures such as OWL and RDF
    • Rapid taxonomy creation and maintenance
    • Automatic concept identification extracted from clients own content to populate taxonomy(s)
    • Class clues generated from client’s unique document corpus
    • Dynamic movement feedback
    • Maintenance via clue suggestion, integrated search, and instant feedback
    • Class weighting influenced by parent, child, sibling
    • AJAX based user interface
    • Taxonomy structure and classification results held in SQL database
    • Component - contentTypeUpdater
    • Based on organizationally defined metadata provides the ability to
    automatically change the SharePoint Content Type based on the presence
    of the metadata within documents (if a document contains Protected Health Information (PHI) it will automatically change the Content Type to ‘PHI’)
    • Technology
    • SOA compliant and delivered as Web Parts
    • API is based entirely on Web Services and all information is exchanged in XML T
    • Taxonomy formats are based on Web Ontology Language (OWL). Since the server is stateless it also works with all failover and load balancing hardware and software.
     
  • Roadmap from SharePoint 2007 to 2010
    • There is no auto classification of metadata (i.e. no way to auto apply term set values) it is a manual process
    • There is no way to automatically generate metadata when it is created or ingested
    • Same problem with end users adding inconsistent metadata
    • Taxonomy management is a manual process
    • Taxonomy maintenance requires significant resources to maintain and change as business changes
    • Enterprise Metadata Management
    • Properties (current flat lists) become hierarchical “Term Sets” –
    • Term Sets provide capability for faceted search and hierarchical navigation: Regions Country/State, Business Unit/Departments, Band Names/Album Names, TV Show Titles/Characters
    • conceptClassifier fully supports SharePoint 2010 EMM as the primary location for taxonomy definitions with no need to Import/Export
    • Changes to the taxonomy structure using Microsoft tools will be immediately visible in conceptClassifier and vice versa
  • conceptClassifier for FAST Search
    Delivers all functionality in conceptClassifier for SharePoint including:
    • Improves faceted search results as facets are based on concepts aligned within the taxonomy
    • Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s)
    • Provides accurate metadata filters such as numeric range searching and wildcard alphanumeric matching
    • Improves search outcomes by placing conceptual metadata in the FAST Search index to increase relevancy of search results
    • Enables import of FAST Entities into the conceptClassifier taxonomy manager to fine-tune them with metadata generated from your own content and nomenclature
    • Runs natively as a FAST Pipeline Stage eliminating integration and customization issues
    • Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies
  • Case Study – US Air Force Medical Service
    • US Air Force Medical Service
    • Initially deployed conceptClassifier to power Knowledge Portal with over 65K users
    • Controlled vocabulary consists of over 27K unique keywords, metadata, and multi-word fragments generated by conceptClassifier
    • Have expanded use of technology to provide the following:
    • Automatically organize data assets prior to migration to SharePoint - identifies duplicates, PII, PHI before migration
    • Automatically extract metadata and classify to the various taxonomies
    • Provide Microsoft Enterprise Search and FAST indexes the conceptual metadata to improve findability
    • Identification and location of sensitive information (PII, PHI, Confidential) and migrates content to new location where Windows Rights Management Services are applied
    • Automatically tags and classifies content based on semantics contained within the actual document of record and optionally updates the Content Type
    • Eliminates end user tagging issues
    • Records are declared in a consistent manner
    • Only real records are migrated to the RM system
    • Only real records are managed for disposition
    • All other content stays in the collaborative portal
    • Freedom of Information Act (FOIA) compliance
    • eDiscovery
  • USAF Human Performance ClearinghouseGOAL : Leverage Existing USAF, AFDW, and AFMS License Agreements to Enable IM, RM, & Privacy & Security Compliance
    Requirements
    • DoDD 8320 (Data Sharing in a Net-Centric DoD)
    • DoDD 5015 (Records Management)
    • USAF Privacy Act Program & HIPAA
    • Freedom of Information Act (FOIA)
    Distribution Statement A: Approved for public release; distribution is unlimited.
    311 ABG/PA No. 09-488, 16 Oct 2009
  • Screen Shots
  • Navigation – Auto Complete, Taxonomy & Faceted
    • Microsoft Enterprise Search/FAST ESP can utilize highly relevant compound term metadata
    • Browsable taxonomy navigation via Concept Searching Web Part
    • Faceted navigation (integrated with Microsoft CodePlex)
  • FAST ESP Search
    • Cross taxonomy navigation filter
    • Taxonomy Browse
    • Conceptual metadata supplied to FAST search index
  • Automatic Classification & Metadata Tagging
    • Content is automatically tagged with semantic metadata and uploaded to SharePoint
    • Content is automatically classified to one or more nodes in one or more taxonomies
    • Documents are automatically classified to multiple categories
    • Editable from within SharePoint & the Concept Searching Taxonomy Manager
  • Full Support for Content Types
    • Eliminates time consuming manual metadata definition
    • Enforces governance, policies, and drives workflows in line with business processes
    • Enables different taxonomies to be assigned to different Content Types
    • Authorized users have complete control over automatically generated metadata
  • Automatic Update of Content Types/Workflow
    Event Handler
    Based on a pre-defined Event Handler, the Content Type can be automatically changed when classified.
    • When organizationally defined metadata is identified within content the Content Type Updater will automatically change the Content Type
  • Office Integration
    • Fully integrated with Microsoft Office & Exchange
    • Content automatically tagged with semantic metadata stored in custom properties
    • Content automatically classified to corporate or departmental taxonomies
    • Delivers governance at the desktop, improves ECM
    • Automatic metadata generation or optionally authorized users can change the classification
  • Aggregates Multiple Content Sources
    • Ability to specify multiple file sources including:
    • SharePoint
    • Web Sites
    • Exchange Public folders
    • File stores
    • Can also include RSS feeds
    • Automatically classify and place semantic metadata in search engine index
  • Taxonomy & Concept Based Metadata Generation
    • Conceptual metadata automatically generated
    from the organization’s own content and used as clues to build out the taxonomy
    • Hierarchical view of content
    • Content will be automatically classified to one or more nodes based on concepts within the content
    • Reduces time to develop, build, and maintain a taxonomy by as much as 80%
    • Can import industry standard taxonomies
  • Automatic Clue Suggestion
    • Manual entry of class clues is available
    • Suggest Clues for Class
    • Search the document corpus and identify documents that are about the new node to be included or excluded
    • Clues can be single words, multi-word terms (concepts) or acronyms 
  • Classification and Automatic Clue Feedback
    • Clicking on a clue will display a document summary and the extract where the clue occurs in the context of the document.
    • The end user can either keep the clue, remove it, or modify the weighting
  • Leveraging Metadata to Improve Findability, Records Management, and Compliance in SharePoint
    Martin Garland, President
    marting@conceptsearching.com