Concept Searching Webinar P
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Concept Searching Webinar P

  • 999 views
Uploaded on

Using classification to improve sharepoint search

Using classification to improve sharepoint search

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
999
On Slideshare
992
From Embeds
7
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 7

https://www.linkedin.com 7

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The key points on this slide are: Been in business since 2002, first customers in 2003 Major Enterprises with up to 66 000 users have deployed successfully to manage unstructured data Owned by the Founders – no external investment. Profitable with 35% growth in 2008 and already trading for similar growth in 2009. Increasing number of specialized Partners in this space buying into our value proposition. Concept Searching was founded in 2002 with the goal of developing statistical search and classification products that delivered critical functionality currently unavailable in the marketplace. The products were launched in 2003 and Concept Searching has experienced growth and profitability every year since.   Concept Searching is the only statistical classification software company in the world that uses concept extraction and compound term processing to achieve the highest precision without the loss of recall. Our products are the only solutions that are fully integrated with MOSS and Microsoft Search. In side-by-side comparisons against industry leaders, Concept Searching has been able to dramatically illustrate the strength of the technology. Concept Searching counts an ever growing number of global and Fortune 500 and Fortune 1000 clients. We have built a strong partnership channel with Microsoft Partners. Continuing to invest in product development Concept Searching is defining new standards for the search and classification industry and is committed to delivering quantifiable business benefits to organizations around the world.
  • Traditional search assumes the end user knows what they are looking for, or must enter the ‘right’ combination of words to get the ‘right’ result
  • - FINANCIAL RISK – MAJOR LEGAL EXPOSURE – LOST PRODUCTIVITY The implementation of ECM solutions where content is inappropriately or insufficiently metadata tagged or where inappropriate content types are deployed gives rise to ineffective capture within the 5 phase ECM process. The issue simply put is that ineffective capture means a corporation is unable to manage, store, preserve and deliver content in any effective manner to meet the goals of the organization. Furthermore issues such as inconsistency of tagging causing mismanagement of content invariably leading to data and security breaches gives rise to non compliance, increased risk, litigation and fines. The traditional answer is to implement applications, point solutions, if you will, to address individual elements of the problem. The true answer to this issue is to enable content within the enterprise with appropriate metadata and tagging drive process, increase business productivity, and reduce risk.
  • Concept Searching’s conceptClassifier for SharePoint is the enabler of Enterprise Content. Thru the automatic tagging of content with semantic metadata content can be tagged in a consistent manner. The metadata can then be used by any application or process that utilizes metadata.
  • The Issues: Unstructured content is doubling every 3 months and yet 80% of business decision are made using unstructured data The failure rate of Enterprise Content Management (ECM) initiatives is 50% in large organizations Keyword Search captures only 33% of relevant information Inability to find information across disparate content stores Only 50% of content is correctly indexed, meta-tagged or efficiently searchable The Technology Highly scalable, thousands of users millions of documents Taxonomy development savings of 3-6 months and $150K-$300K Able to classify terabytes of documents Unique technology – compound term processing, semantic metadata generation Benefits: Identify new relationships within content & discover new insights Intuitive, requires no training Enables the retrieval of relevant information and identification of highly correlated content that normally would not be found Reduces time & Cost associated with managing content Reduce time spent finding information Enables re-use and repurposing of existing content Expedites access to real-time information Optimizes existing investments in technology Rapidly installed and implements Delivers ability to make better informed decisions
  • The Issues: Unstructured content is doubling every 3 months and yet 80% of business decision are made using unstructured data The failure rate of Enterprise Content Management (ECM) initiatives is 50% in large organizations Keyword Search captures only 33% of relevant information Inability to find information across disparate content stores Only 50% of content is correctly indexed, meta-tagged or efficiently searchable The Technology Highly scalable, thousands of users millions of documents Taxonomy development savings of 3-6 months and $150K-$300K Able to classify terabytes of documents Unique technology – compound term processing, semantic metadata generation Benefits: Identify new relationships within content & discover new insights Intuitive, requires no training Enables the retrieval of relevant information and identification of highly correlated content that normally would not be found Reduces time & Cost associated with managing content Reduce time spent finding information Enables re-use and repurposing of existing content Expedites access to real-time information Optimizes existing investments in technology Rapidly installed and implements Delivers ability to make better informed decisions
  • The Issues: The public sector, hospitals, financial and educational institutions, as well as private businesses are facing continuing pressure and government regulations to protect information from unauthorized access, use, and disclosure. Both the public and private sector routinely collect confidential information regarding their employees, customers, products, research, and financial status. The inability to protect confidential information can cause irreparable harm to individuals as well as the organization and the consequences can lead to loss of business, litigation, and both criminal and civil penalties. Seattle based healthcare company paid $100,000 for HIPAA violations in addition to the $7-$9 million spent on the breach itself TJX compromised 94 million accounts at a cost of $256 million ValuClick paid the U.S. Federal Trade Commission to settle a charge that consumer’s data was not secured Average cost of a data breach is $6.3 million and ranges from $225K - $35 million 70% of all breaches are due to mistake or malicious intent by the organization’s own staff PIIdiscovery enables organizations to define unknown Personally Identifiable Information (PII) according to their specific requirements and needs. Types of PII can include social security numbers, credit card numbers, date of birth, bank account numbers, passports, drivers licenses, or any unique organizational or mandatory descriptors (for example: HIPAA). PII can be identified from diverse repositories including: email servers, fax servers, forms and scanned documents, Microsoft Office Applications, website, and servers and PC’s. Once identified, it can be automatically aggregated into a central location for review and disposition.   Benefits: Protects the organization from costs associated with a data breach, civil and criminal penalties, sanctions, and loss of business and reputation Automatic identification of unknown PII mitigates risks associated with PII exposure Standardizes and improves organizational processes associated with the identification and segregation of PII Reduces organizational costs and effort in protecting and identifying PII Reduces costs and risk exposure through automatic identification of PII from disparate content repositories Eliminates risk associated with end user non-compliance issues Reduces the portability and transmissibility of protected data assets    
  • The Issues: The public sector, hospitals, financial and educational institutions, as well as private businesses are facing continuing pressure and government regulations to protect information from unauthorized access, use, and disclosure. Both the public and private sector routinely collect confidential information regarding their employees, customers, products, research, and financial status. The inability to protect confidential information can cause irreparable harm to individuals as well as the organization and the consequences can lead to loss of business, litigation, and both criminal and civil penalties. Seattle based healthcare company paid $100,000 for HIPAA violations in addition to the $7-$9 million spent on the breach itself TJX compromised 94 million accounts at a cost of $256 million ValuClick paid the U.S. Federal Trade Commission to settle a charge that consumer’s data was not secured Average cost of a data breach is $6.3 million and ranges from $225K - $35 million 70% of all breaches are due to mistake or malicious intent by the organization’s own staff PIIdiscovery enables organizations to define unknown Personally Identifiable Information (PII) according to their specific requirements and needs. Types of PII can include social security numbers, credit card numbers, date of birth, bank account numbers, passports, drivers licenses, or any unique organizational or mandatory descriptors (for example: HIPAA). PII can be identified from diverse repositories including: email servers, fax servers, forms and scanned documents, Microsoft Office Applications, website, and servers and PC’s. Once identified, it can be automatically aggregated into a central location for review and disposition.   Benefits: Protects the organization from costs associated with a data breach, civil and criminal penalties, sanctions, and loss of business and reputation Automatic identification of unknown PII mitigates risks associated with PII exposure Standardizes and improves organizational processes associated with the identification and segregation of PII Reduces organizational costs and effort in protecting and identifying PII Reduces costs and risk exposure through automatic identification of PII from disparate content repositories Eliminates risk associated with end user non-compliance issues Reduces the portability and transmissibility of protected data assets    
  • The Issues: End user adoption is cited as the single most critical barrier to success in Records Management Enforcing governance at the end user level is rarely successful and requires management and time to enforce policies Non-compliance results when documents are never subjected to enterprise policies Metadata is often non-descriptive as it does not capture the essence of the record making it less useful to end user and the organization Lack of automated tools that can categorize content without user intervention so retention policies can be assigned Inability to ensure that all content is identified and correctly processed within the organization Benefits: Automated classification and integration with Microsoft Office and Exchange eliminates end user adoption issues Automated records collection, classification, and organization reduce costs, implementation and on-going management Protects the records integrity and the native security model Fully integrated with SharePoint A custom router or workflow can be configure to automatically send uploaded documents to the Records Center As documents are uploaded to the Libraries they can automatically be declared records
  • The Issues: End user adoption is cited as the single most critical barrier to success in Records Management Enforcing governance at the end user level is rarely successful and requires management and time to enforce policies Non-compliance results when documents are never subjected to enterprise policies Metadata is often non-descriptive as it does not capture the essence of the record making it less useful to end user and the organization Lack of automated tools that can categorize content without user intervention so retention policies can be assigned Inability to ensure that all content is identified and correctly processed within the organization Benefits: Automated classification and integration with Microsoft Office and Exchange eliminates end user adoption issues Automated records collection, classification, and organization reduce costs, implementation and on-going management Protects the records integrity and the native security model Fully integrated with SharePoint A custom router or workflow can be configure to automatically send uploaded documents to the Records Center As documents are uploaded to the Libraries they can automatically be declared records
  • Only statistical metadata, classification, and taxonomy software that uses concept extraction through our compound term processing technology Concepts in Context Compound Term Processing Triple Heart Bypass (Baseball or three? Organ or center? Road or avoid?) Life Sciences vs. Life or Sciences Michigan State University vs. Michigan or State or University Respiratory & Inflammation vs. Respiratory or/& inflammation “ At last a tool set that enables enterprise content be the driver for business productivity” Concept Searching provides a comprehensive suite of tools for the automatic classification and taxonomy management of enterprise content. The ability to identify ‘ concepts in context’ generates far richer meta data, improving the precision and relevancy in the information retrieval process.   Concept Searching provides a comprehensive suite of tools for automatic semantic metadata generation, automated classification and taxonomy management of enterprise content. The metadata generation issue is increasingly a growing concern in large enterprises. A comprehensive approach requires more than syntactic metadata (i.e. date, author, title) and requiring end users to add rich metadata is haphazard and subjective at best. Since Concept Searching’s technology is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested. The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept. By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning. The ability to identify ‘ concepts in context’ generates far richer meta data, improving the precision and relevancy in the information retrieval process. Meta-tags are automatically added to the properties field of each document making the document more valuable to the organization by increasing the ability of the document to be retrieved using Microsoft Search Products that use keywords and metadata to retrieve information.   concept Classifier for SharePoint is fully integrated with both SharePoint, Microsoft Office, Exchange, FAST and Microsoft Enterprise Search. The automatic extraction of compound terms enables the Subject Matter Expert (SME) to use the terms within the taxonomy generation process, reducing the time to build out and maintain taxonomies by 80%. (Compound Term Processing performs matching on the basis of compound terms as opposed to keywords. Compound terms are built by combining two (or more) simple terms, for example ‘triple’ is a single word term but ‘triple heart bypass’ is a compound term. By identifying and forming compound (multi-word) terms and placing these in the search engine’s index the search can be performed with a greater degree of accuracy because the ambiguity inherent in single words is no longer a problem. A search for ‘ survival rate after triple bypass surgery’ will locate documents about this topic even if the precise phrase is not contained in any of the documents. A traditional search query return would return all documents that contained the words ‘triple’, all the words that contain ‘heart’, and all the words that contain ‘bypass’.) Features: Downloadable in 30 minutes – no programming required   Automatic classification and compound term meta data extraction   Classification technology uses concept extraction and compound term processing   Taxonomy based and faceted navigation   Robust suite of tools to build an maintain taxonomies Fully integrated with Content Types Automatic classification from MS Office and Outlook Taxonomy browse, faceted navigation, and preview functionality from the search interface Can automatically classify from SharePoint, folders, and web sites providing a single interface to all permmissable content   Simple intuitive interface designed for the SME    Fully SOA compliant, delivered as Web Parts, based on open standards    Integrates with Microsoft Office, Microsoft Records Center, and the Microsoft Business Data Catalog  
  • Concept Searching’s Concept Classifier for SharePoint enables enterprise content drive business productivity. Integrated fully with SharePoint Concept Classifier for SharePoint delivers robust Taxonomy management, Semantic metadata tagging, auto-classification and based upon the content classified and the tags therein can automatically update document content types that drive process, compliance management, storage, and preservation. It should be noted that to round out the full ECM solution other third party Microsoft and Concept Searching partners may be required for such things as scanning and paper capture, physical records management, business process workflow, etc.
  • A taxonomy is a classification structure that is represented by a hierarchical view of topics that have been grouped together because they share the same quality of characteristic. A taxonomy provides a unified view and access to relevant information across often disperse silos of information. Concept Searching supports multiple taxonomies within an organization. Taxonomy development is traditionally a very time consuming and costly activity. Our Taxonomy Manager has been proven to reduce taxonomy development time by 80%, generating a time savings of 6-12 months and a cost savings of $150K - $300K. Concept Searching also has a robust and frequently expanding library of off-the-shelf taxonomies covering a wide variety of domains to help jumpstart a classification project by providing off the shelf taxonomies to cover nearly any industry. The taxonomy (or multiple taxonomies) can be used by Subject Matter Experts (SME’s) to easily build taxonomies and classify document into predefined categories based on a small number of descriptors or clues. Once classified the documents can then be applied to a corporate taxonomy and made available to the organization. The taxonomy management features includes: - Ability to change the node weighting (score) - Auto clue suggestion: automatic generation of node clues from compound terms found in the document corpus eliminating training sets and complex Boolean rules - Dynamic screen updating: the user interface is fully AJAX enabled so changes to the taxonomy are immediately available for further refinement Document movement feedback: this feature enables the SME to see the cause and effect on the taxonomy without re-indexing. The metadata generation issue is increasingly a growing concern in large enterprises. A comprehensive approach requires more than syntactic metadata (i.e. date, author, title) and requiring end users to add rich metadata is haphazard and subjective at best. Since Concept Searching’s technology is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested. The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept. By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning.   Compound term processing is a new approach to an old problem. Instead of identifying single keywords, compound term processing identifies multi-word terms that form a complex entity and identifies them as a concept. By deriving these compound terms from the clients own document corpus we can tag content with meaningful semantic metadata and enable Microsoft’s Enterprise search to filter across that metadata at retrieval thus deliver a higher degree of accuracy because the ambiguity inherent in searching against single words in isolation is no longer a problem. As a result, a search for “survival rates following a triple heart bypass” will locate documents about this topic even if this precise phrase is not contained in any document.   Compound term processing can address many challenges facing large enterprises and provide many benefits. Identification of concepts within a large corpus of information removes the ambiguity in search, eliminates inconsistent meta-tagging, and automatic classification and taxonomy management based on concept identification simplifies development and on-going maintenance.   The unique compound term processing enables the identification of compound terms (not keywords) from highly relevant content that can be used to trigger the automatic meta-tagging and the auto-classification processes. This conceptual metadata is added to the original metadata for the category/folder. More semantic metadata that can be linked to a document or record results in information that becomes more useful to the organization. Meta-tags are automatically added to the properties field of each document making the document more valuable to the organization by increasing the ability of the document to be retrieved using Microsoft Search Products that use keywords and metadata to retrieve information.
  • Following the automatic generation (tagging) of compound terms and semantic metadata the documents in the document libraries are then automatically classified to multiple categories within the taxonomy. The terms generated can be edited from within SharePoint or from within the Taxonomy Manager tool. The content will remain and can be accessed from the original location but can be linked to multiple categories/nodes.
  • Enterprises are increasingly understanding the value and critical need to utilize Content Types to structure their content and identify the type of document regardless of its physical site or library storage location. Content Types can be used to enforce metadata governance, adhere to policies and drive workflows in line with business processes. Included in the new release is the ability to assign taxonomies to specific Content Types. Documents that correspond to the selected Content Types will be classified and documents that do not correspond to a content type or do not include some metadata elements that a specific content type has specified will not be classified. This essential functionality allows different taxonomies to be assigned to different Content Types for example, assign the HR taxonomy to all Content Types of type “HR”, including any Content Types derived from “HR” and assign the Finance taxonomy to all Content Types of type “Finance”, including any Content Types derived from “Finance”.   The configuration can be performed using a wizard that runs inside SharePoint. The taxonomies will be available for these documents regardless of their location. concept Classifier’s site columns and Event Handlers are associated to the Content Types. This delivers the ability to automatically add classification functionality to new sites when created.
  • concept Classifier for SharePoint fully supports Content Types. An add-on features includes the ability to update Content Types based on the identification of content during the classification process. This is particularly useful in records management and data privacy and security. This provides the ability to develop a series of actions that can occur when content contains specific metadata as defined by the organization.  
  • Knowledge workers need to identify content in the context of what they are seeking. The fundamental problem with most enterprise search solutions, and all statistical search solutions, is that they are based on an index of single words. Yet most queries are expressed in short patterns of words and not single words in isolation which are highly ambiguous.   A concept search engine can isolate the key meaning that is normally expressed as proper nouns, nouns phrases and verb phrases. Although linguistic products can do this, their performance is highly variable depending upon the vocabulary and language in use. A statistical based language independent concept search can accept queries in natural language with the user typing words, phrases or whole sentences. The system then analyzes the natural language query to extract the keywords and phrases to identify the main concepts and retrieve content that is highly relevant. Precision and recall are the two key performance measures for information retrieval. Precision is the retrieval of only those items that are relevant to the query. Recall is the retrieval of all items that are relevant to the query. Yet most information retrieval technologies are less than 22% accurate for both precision and recall. The ideal goal is to have them balanced. Compound Term Processing has the ability to increase precision with no loss of recall.   Documents that have been auto-classified are now accessible by searching for all the content within a folder and by using Microsoft Enterprise Search which can now filter on highly relevant metadata that has been created with Taxonomy Manager. Search results are clustered into categories or facets enabling an end user to rapidly drill into a result set based on organizational, functional, product line, and geographic metadata that have been generated using Taxonomy Manager and automatically tagged to relevant documents and records within document libraries. Based on the end user search refinement new facets will be generated when the query changes.
  • concept Classifier for SharePoint integration with Microsoft Office and Microsoft Exchange the automatic metadata generation and classification without end user participation. Alternatively, the Subject Matter Expert (SME) or Knowledge Worker can be granted the authority to modify the results from within the traditional Microsoft Office interface. The knowledge worker is the most qualified person to anticipate how the asset will be searched for and how to make it easy to find. The automatic classification returns not only single words but identifies concepts within the document to assist the knowledge worker in the classification process. This guided approach enables the knowledge worker to precisely and accurately classify the document for reuse and retrieval. Placing the ability to classify documents into the hands of knowledge workers results in rich and comprehensive metadata, significantly improving the organization’s ability to leverage their information capital. · Gives business experts the ability to classify critical business · information with highly relevant metadata · Greatly improves the search and retrieval process by ensuring accurate and complete metadata · Expedites organizational access to real-time information · Provides a consistent content management approach · Delivers metadata rich information retrieval thereby maximizing productivity and organizational agility    
  • U.S. Air Force Medical Service US Air Force Medical Service rolled out Concept Searching to over 66,000 users. In their analysis of vendors Concept Searching was selected based on the technologies. In evaluating the Taxonomy Manager, compared to other vendors they estimated that utilizing Concept Searching technologies could reduce the taxonomy development time by 80% saving them considerable man hours, resources and costs. Cost savings was estimated at $150K - $300K. The U.S. Air Force wrote a paper about the solution and were subsequently selected to present the paper and findings at the International Institute for Advanced Studies in Systems Research and Cybernetics in Baden, Germany in the fall of 2008. U.S. Defense Center for Excellence for Psychological Health and Traumatic Brain Injury Client initially purchased the solution for their 24/7 Customer Service Center. This was fully deployed within 3 weeks. During the deployment engagement they viewed the other uses for the technology and immediately upgraded to an Enterprise License to use Concept Searching as their classification standard as well as use it to identify ‘personally identifiable’ and potentially unknown data exposures. This was not a MOSS environment and was included in the solution.
  • Let’s take a look how the Air Force Medical Service is using their existing Concept Searching and Microsoft Enterprise license agreements to address enterprise wide capability gaps relating to: 1. Inadequate search precision across every search platform in the federal sector used by AFMS members; 2. Increasing amounts of PII, PHI, Classified Message Incident, and Sensitive Information unauthorized data releases 3. Non-compliance with data storage and data preservation requirements set forth in federal records management programs; and 4. An inability to use data analytics to make leadership aware of sensitive information data breaches and other events such as upcoming records destruction schedules. Just about every organization has some type of migration plan and the USAF is no different. With over 74 organizations faced with having to migrate content to a SharePoint environment. For organizations that are looking to migrate their content to a SharePoint environment all that they have to do is copy or use migration scripts to place this new content into SharePoint. Documents, messaging/chat logs, e-mail, and other content in SharePoint is then automatically tagged and classified in accordance with the organizational enterprise metadata environment model that is managed and maintained in Concept Searching’s Taxonomy Manager. After the tagging process an event-handler identifies documents which have metadata that require the update of a Content Type. This step is very important since Content Types drive activities associated with every document. The manual or blanket application of a Content Type is no different than the manual application of metadata. It is subjective, inefficient, and costly to do one record at a time. By automatically updating Content Types in SharePoint to reflect the actual content of a particular data asset the organization is now making their information actionable within SharePoint. What does this mean? RMS templates can automatically be applied to documents containing sensitive information without having to read each and every document to decide if it contains sensitive information or not. Records Retention Codes can be automatically applied as metadata and then updated as its own Content Type to drive appropriate data storage location and preservation. To dramatically increase search precision Concept Searching then applies different taxonomies and their associated metadata to records based on their unique Content Type. For organizations using Search Server Express, SharePoint Search, or FAST all will experience increased search precision as a result of Concept Classifier for SharePoint automatically tagging documents and records with highly correlated metadata. For organizations that have deployed Performance Point they can then use their declared Content Types to report daily on prevented data exposure events, identify which members are consistently putting the organization at risk for fines and litigation, and identify how many and which documents and records are coming due for destruction. The AFMS is using Concept Searching to automatically generate PII metadata from their respective content sources that are being migrated. This metadata is then placed into Concept Searching’s Taxonomy Manager in order to ascertain the location of sensitive information during the classification process. Since PII, PHI, and other types of sensitive information are also collected on forms that contain handwriting, Taxonomy Manager is also used to create a metadata environment around how the organization collects sensitive information. During the classification process Concept Searching automatically identifies sensitive information and then migrates that information to a “staging” location on the network where Information Rights Management templates are applied.  

Transcript

  • 1. Paul Billingham Sales Director Concept Searching. +44 7866476691 [email_address] Searching .com concept Classifier for SharePoint Unlocking Enterprise Content To Drive Business Agility Carla Mulley VP Marketing Concept Searching. +1 (412) 567-4948 [email_address]
  • 2.
    • Introductions
    • Who We Are
    • The Problems
    • The Solutions
    • Concept Searching Solution
    • concept Classifier for SharePoint
    • Use Cases
    • Driving Business Agility
    Agenda Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 3. Who We Are
    • Company founded in 2002
      • Product launched in 2003
      • Focus on management of structured and unstructured information
    • Locations: UK, US, & South Africa
    • Client base: Fortune 500/1000 organizations
    • Microsoft Enterprise Search ISV , FAST Partner
    • 2009 ‘100 Companies that Matter in KM’ (KM World Magazine)
    • concept Classifier for SharePoint
      • Compound Term Processing
      • Semantic metadata generation
      • Automated classification
      • Taxonomy Tools
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 4.
    • Compound Term Processing
      • Compound Term Processing is done with both Concept Searching’s Preferred Vocabulary Index and the Related Topics Index
      • Life Sciences vs. Life or Sciences
      • Michigan State University vs. Michigan or State or University
      • Respiratory & Inflammation vs Respiratory or & or inflammation
    Compound Term Processing triple heart bypass
    • concept Classifier will generate semantic metadata using compound terms that identifies ‘triple heart bypass’ as a concept
      • Search will return results based on the concept even if the exact terms are not contained in the document (i.e. ‘coronary artery surgery’, ‘heart surgery’)
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com Triple Baseball Three Heart Organ Center Bypass Highway Avoid
  • 5. The Problem – “Inconsistency” Insufficient Metadata and Inappropriate Content Types Applied to “The Enterprise”
    • Causes
      • End-users do not tag every data asset created - Incomplete
      • Metadata often applied from a subjective frame of reference - Inconsistent
      • Metadata application most often not in line with corporate governance (records retention schedules) – Non Compliant
      • Limited use of templates to populate metadata - Inconsistent
      • End-users rarely declare appropriate content type for each data asset - Unmanaged
    • Results
      • Limited data transparency due to lack of semantic metadata for use by search engines - inability to utilize enterprise content assets to improve business outcomes
      • Inappropriate Content Types applied – limit ability to drive business processes directly from the content
      • Records not managed in accordance with Data Privacy and Security guidelines – p otential fines, criminal penalties, litigation costs
      • Records not managed in accordance with organizational Records Management policies – increased organizational risk and non-compliance
      • Records not stored in the right location or preserved for the appropriate period of time – inability to effectively manage content assets
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com Ineffective Capture of Metadata Manage Store Preserve Deliver x x x x
  • 6. Solution – “Consistency” Leverage Internal Metadata Environment to Drive Information Worker Productivity
    • Objectives
      • Automatically tag all content with appropriate metadata - Consistent
      • Secure documents/records based on content at data asset level vs. global application of access rights – Complete & Compliant
      • Apply records retention schedule metadata to every data asset - Compliance
      • Automatically update Content Types to drive the automatic application of Rights Management templates and workflow based upon corporate governance – Compliance and data security
    • Results
      • Increased Data Transparency due to presence of semantic metadata for use by search engines – improves organizational performance
      • Automatic Content Types assignment based on content - drives business processes
      • Records are managed in accordance with Data Privacy and Security guidelines – reduces organizational risk
      • Records are managed in accordance with organizational Records Management policies – ability to manage content as an asset and protects records integrity
      • Records are stored in the right location or preserved for the appropriate period of time – improves compliance
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com Effective Capture of Metadata Manage Store Preserve Deliver
  • 7.
    • Failure rate of Enterprise Content Management initiatives is 50%
    • Keyword search captures only 33% of relevant information
    • Inability to find information across disparate internal and external content stores
    • Malicious meta tags
      • 40% of end users select first item in a drop down metadata pick list
    • Insufficient meta tags
      • Over 80% of documents do not have all of the metadata values that should be applied to the document from a corporate controlled vocabulary
    • Ambiguous meta tags
      • Single word meta tags
      • Michigan State University vs Michigan or State or University
    • Traditional taxonomy tools are:
      • Costly and time consuming
      • Complex and require significant effort & resources to maintain
    Enterprise Content Management Issues KNOWLEDGE WORKERS CHALLENGES ~ 15% of their time is spent duplicating information. ~ 25% of their time is spent searching. ~ 40% can not easily find the information they require to do their job. The cost to a 500 employee company is $2.4 million per year in inefficiencies and lost productivity. Gartner Group Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 8.
    • Enterprise Content Management
    • A controlled vocabulary provides enterprise consistency
    • Automatic metadata generation and classification as content is created or ingested
    • Single view of content from heterogeneous repositories (both internal and external)
    • Faceted and taxonomy navigation
      • Taxonomy navigation is 36%-38% faster than traditional search
    • Enterprise metadata framework that is consistent, scalable, and manageable
    • concept Classifier Benefits
    • Compound term processing eliminates ambiguity inherent in single word keywords
    • Enables retrieval of relevant information and highly correlated content that normally would not be found
    • Single interface to SharePoint, file stores, web sites removes complexity from search
    • Enhanced search features to identify relevant content
    Enterprise Content Management Solutions Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 9. Data Privacy & Security Issues DATA BREACHES & EXPOSURES CHALLENGES ~ Average cost of a data breach is $6.3 million and ranges from $225K to $35 million. ~ Average cost per exposed record is $197 and ranges from $90-$305 per record. ~ 70% of breaches were due to a mistake or malicious intent by an organization’s own staff. ~ Healthcare provider - $7 million, TJX Companies - $256 million, ValueClick - $2.9 million.
    • Lack of end user compliance to segregate content from the network and ensure that uploaded privacy data is not available for general access and protected accordingly
    • Lack of tools to standardize the process of identifying all possible privacy data exposures at the time of content creation and modification (digital and handwritten)
    • Lack of governance to enforce document meta-tagging based on content by end users
    • Inability to identify privacy data from diverse repositories, email and fax servers, scanned documents and aggregate them into a central repository for review and compliance assurance
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 10. Date Privacy & Security Solutions
    • Preventing Unknown Data Exposures
    • Can be used by any enterprise regulated by external agencies or where compliance is mandatory
    • Identifies unknown Personally Identifiable Information (PII) or Protected Health Information (PHI) residing in SharePoint, file stores, web sites
    • Easily customized to identify unique organizational requirements
    • Automatically changes the content type and routes to secure server for disposition
    • Augments current security solutions and processes
    • concept Classifier Benefits
    • Reduces organizational costs associated with data exposures, remediation, litigation, fines and sanctions
    • Eliminates risk typically associated with end user non-compliance issues
    • Protects the organization by securing PXX content and preventing the portability and electronic transmission of secured assets
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 11. Compliance & Records Management Issues
    • End user adoption is cited as the single most critical barrier to success in Records Management
    • Enforcing governance at the end user level is rarely successful and requires management and time to enforce policies
    • Non-compliance results when documents are never subjected to enterprise policies
    • Metadata is often non-descriptive as it does not capture the essence of the record making it less useful to end user and the organization
    • Lack of automated tools that can categorize content without user intervention so retention policies can be assigned
    • Inability to ensure that all content is identified and correctly processed within the organization
    COMPLIANCE & RECORDS MANAGEMENT CHALLENGES ~ End user adoption is cited as the single most critical barrier to success. ~ Enforcing governance at the desktop requires time and money. ~ Non-compliance results when documents are never subjected to enterprise policies. ~ Poor metadata makes it less useful to the organization and end user. Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 12. Compliance & Records Management Solutions
    • Compliance & Records Management
    • Automatic generation of highly descriptive metadata
    • Ability to create virtual centralization of content from multiple repositories
    • Utilized in conjunction with the Records Center and custom workflows or routers
    • Automates declaration of records based on organizational requirements
    • concept Classifier Benefits
    • Automatic metadata generation from Microsoft Office & Exchange eliminates end user adoption issues
    • Provides transparent governance & eliminates end user non-compliance
    • Retain integrity and authenticity of records
    • Improves the value of records as they become self-explanatory and meaningful to the end user
    • Reduces the costs and time to manage the process
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 13. Begins with highly accurate automatic semantic metadata capture to enable content to become a business driver to improve organizational performance, compliance, and data security Concept Searching’s Approach Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 14. conceptClassifier for SharePoint
    • Automatic Semantic Metadata Generation
      • Unique compound term processing technology
    • Automated Classification
      • From within MS Office, Outlook
    • Taxonomy Tools
      • Proven to reduce taxonomy development by 80%
    • Microsoft Integration
      • Fully integrated into SharePoint – not an add-on
      • Fully integrated with Content Types
      • Content Type Updater
    • Technology
      • Downloadable in 30 minutes – no programming required
      • Fully SOA compliant, delivered as Web Parts, based on open standards
      • Highly scalable
    • Microsoft Search Enhancement
      • Fully integrated with Microsoft Enterprise Search, SharePoint search, and FAST ESP
      • Provides taxonomy browse and enhances faceted search
      • Text preview capabilities from search interface
      • Provides a single search interface to end users from within SharePoint to multiple repositories (SharePoint, file stores, web sites)
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com Full Integration with Content Types Taxonomy Management Faceted & Taxonomy Navigation Plus Text Preview Single Interface to SharePoint, File Stores, & Websites MS Office Integration MOSS Record Center Workflow Automation Automatic Classification Integration with MS Search Products & FAST
  • 15. Semantic Metadata Generation & Content Tagging to Deliver Transparency & Improve ECM, Records Management, Compliance, Search, & Data Privacy in a SharePoint Environment Source: Mission Critical Symposium 2009 – AFMS Presentation Activities Capture Generating, Capturing, Preparing & Processing Information Phases Manage Store (temporary) Repositories Library Services Storage Technologies Preserve Long Term Storage Media Long Term Preservation Deliver Output Management File Systems CMS Databases Data Warehouses Online, Nearline, & Offline Storage RAID,SAN, NAS Magnetic Tape CD/DVD/MO WORM Optical Disk Tape Hard Disk Storage Networks Microfilm Paper Migration Emulation Location, Administration & Media Selection Transformation Security Distribution Transformation XML PDFs Security PKI Digital Rights Management Distribution Internet, Extranet, Intranet, Portals RSS Feeds Management, Processing & Use of Information Document Mgmt Collaboration Web Content Mgmt Records Mgmt Workflow/BP Mgmt Pre-Capture Defining Business Rules Identifying Types of Information for Capture Taxonomy Development Creating a Metadata Environment (MDE) Based upon Org. Mission Options Use Existing Guidelines File Plans Records Retention Schedules, etc… a nd Automatic Metadata Generation Use Enterprise Content to Create MDE Manual Subjective Inaccurate Time Consuming Expensive versus Automatic Objective Precise Rapid Cost Effective Admin/Retrieval Databases & Access Authorization System Metadata Tagging & Content Type Definition Metadata Drives Update of Content Types Using MOSS Feature
  • 16. Screen Shots Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 17. Taxonomy & Compound Term Processing
    • Compound Term Processing
    • Semantic metadata automatically generated
    • from the organization’s own content and used as clues to build out the taxonomy
    • Hierarchical view of content
    • Content will be automatically classified to one or more nodes based on concepts within the content
    • Reduces time to develop, build, and maintain a taxonomy by as much as 80%
    • Can import industry standard taxonomies
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 18. Automatic Classification & Metadata Tagging
    • Content is automatically tagged with semantic metadata and uploaded to SharePoint
    • Content is automatically classified to one or more nodes in one or more taxonomies
    • Documents are automatically classified to multiple categories
    • Editable from within SharePoint & the Concept Searching Taxonomy Manager
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 19. Full Support for Content Types
    • Eliminates time consuming manual metadata definition
    • Enforces governance, policies, and drives workflows in line with business processes
    • Enables different taxonomies to be assigned to different content types
    • Authorized users have complete control over automatically generated metadata
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 20. Automatic Update of Content Types
    • When specific organizationally defined metadata is identified within content the Content Type Updater will automatically change the Content Type
    Event Handler Based on a pre-defined Event Handler, the Content Type can be automatically changed when classified. Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 21. Navigation
    • Microsoft Enterprise Search/FAST ESP can utilize highly relevant compound term metadata
    • Faceted navigation (integrated with Microsoft CodePlex)
    • Browsable taxonomy navigation via Concept Searching Web Part
    • Text preview capability from search interface
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 22. Office Integration
    • Fully integrated with Microsoft Office & Exchange
    • Content automatically tagged with semantic metadata stored in custom properties
    • Content automatically classified to corporate or departmental taxonomies
    • Delivers governance at the desktop, improves ECM
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 23.
      • Government, Healthcare, Life Sciences, Military
        • $6.9 billion HMO,
          • Runs 75 hospitals and clinics providing care to over 2.6 million beneficiaries
        • Knowledge Portal - Over 27,000 unique terms, metadata, and compound terms generated
        • 66K+ users
        • Identification of unknown privacy data exposures
        • Medical Research
      • Energy, Oil, & Gas
        • 3 rd Largest global energy company
        • Integration with SharePoint Records Management
        • Identification of unknown privacy data exposures
        • Metadata tagging of legacy content
      • Government, Healthcare, CRM
        • Global collaborative network coordinates existing medical, academic, research, and advocacy assets
        • Used to power their 24/7 Customer Support Center
        • Enterprise classification standard
        • Identification of unknown privacy data exposures
    Use Cases Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
    • Legal
      • International law firm with over 1,500 users and 4 million live matters
      • Brokered search and classification across internal/external repositories
      • ‘ Know How’ and ‘Know Who’ portal applications
      • Won International KM award for solution
    • Professional Services
      • Integrated IT global solution provider with over 4K staff
      • Developed a comprehensive global proposal response application
  • 24. Source: Air Force Medical Service InterSymp 2010 Presentation Using Microsoft EA & Concept Searching to Address Enterprise Capability Gaps - Increasing Data Exposure Events - Poor Search Result Precision - Inappropriate Data Storage & Preservation - Lack of Detection using Data Analytics
  • 25. Consistency Drives Business Agility
    • Enterprise Content Management & Search
    • Findability first time every time
    • Deliver a robust content management approach maximizing SharePoint technologies
    • Identification of Unknown Privacy Data Exposures
    • Reduced litigation, costs associated with data breaches
    • Compliance & Records Management
    • Eliminate inconsistent meta-tagging
    • Preserve record integrity
    • Unlocking Enterprise Content To Drive Business Agility
    Concept Searching • Martin Garland • (703) 531-8567 • marting@conceptsearching.com
  • 26. Paul Billingham Sales Director Concept Searching. +44 7866476691 [email_address] Searching .com concept Classifier for SharePoint Unlocking Enterprise Content To Drive Business Agility Carla Mulley VP Marketing Concept Searching. +1 (412) 567-4948 [email_address]