Fast aeturnum concept searching webinar presentation


Published on

In this webinar, Nate Treloar, Principle Search Technology Evangelist in the Microsoft Enterprise Search Group shared the new Microsoft Search strategy and focused on FAST for SharePoint 2010 and what it means to your organization.

Learn how Concept Searching's award winning conceptClassifier eliminates manual metadata tagging through automatic conceptual metadata generation and provides the framework to rapidly build and deploy taxonomies to improve the search experience.

Recipient of the FAST Innovative Solution Award for their Search Solutions Framework at the SharePoint 2009 Conference, Aeturnum will share their expertise and best practices in deploying FAST and conceptClassifier as an enterprise search solution.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Fast aeturnum concept searching webinar presentation

  1. 1. Solving Enterprise Search Challenges Sponsored by: Microsoft, Aeturnum, Concept Searching
  2. 2. Welcome - Agenda Nate Treloar - Microsoft Principal Search Technology Evangelist in the Microsoft Enterprise Search Group Responsible for the group’s technology innovation and evangelism programs Don Miller – Concept Searching Vice President Business Development John Challis – Concept Searching CTO/CEO Mike Knuts – Aeturnum Vice President Business Development Sashika Dias – Aeturnum Knowledge Management & Information Access Practice
  3. 3. Search is the key to engaging information experiences
  4. 4. Connecting people to information, driving better outcomes Search helps your Search helps your customers employees get what they want get their jobs done increasing revenue cutting costs
  5. 5. Solutions for Solutions for Internet Business Sites Productivity
  6. 6. Search
  7. 7. OR
  8. 8. Best of High-end Best of SharePoint Best of Microsoft
  9. 9. Products for Every Customer Need Complete OOB search High end search delivered through SharePoint Common across the product line • Common UI Framework • Common Connector Framework (BDC) • Social search features and integration • APIs and developer Experience • SharePoint platform integration • Admin & deployment capabilities • End user and site administrator enablement • Operations advantages (SCOM, scripting)
  10. 10. Deep Refinement Sorting People Search Thumbnails Similar Results Federation Previews
  11. 11. and streamline how you find and collaborate with others Filter by title, Phonetic expertise & name lookup other attributes Expertise Real-time matching presence Org browsing Find recent content
  12. 12. A systematic approach to interpreting your content Entity Finds terms in the content and maps them to predefined DateExtraction Language and Time Converts dates andthe box support for People, specific categories. Out offrom multiple file formats, encodings,to Identifies the nativetimes to language and locale written a standard representation, Format Custom Stage Map Crawled Extracts plain text Maps all of thestages proper dictionariestheexample, Normalization Encoding and handlecustomspecific to perform specializedvariousanyby Insert locale metadata discovered by can be used encoding so that the representations. For content Companies and Locations, but can be extended to Conversion Properties and applications other processing. pipeline stages and lemmatization March 14, 2010. enrichment and Detection knows that 14-Mar-10 is equivalent stages the tokenization category.
  13. 13. Enterprise Search from Microsoft UX DX IT Go beyond the Do more Eliminate search box with search compromise
  14. 14. / Enterprise Search © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  15. 15. Leveraging Metadata to Improve Findability, Records Management, and Compliance in SharePoint Donald T. Miller, Vice President Business Development
  16. 16. Concept Searching, Inc. Company founded in 2002  Product launched in 2003  Focus on management of structured and unstructured information  Technology  Automatic concept identification, content tagging, auto- classification, taxonomy management  Only statistical vendor that can extract conceptual metadata  2009 and 2010 ‘100 Companies that Matter in KM’ (KM World Magazine)  KMWorld ‘Trend Setting Product’ of 2009  Locations: US, UK, & South Africa Client base: Fortune 500/1000 organizations  Managed Partner under Microsoft global ISV Program - “go to partner” for Microsoft for auto-classification and taxonomy management  Microsoft Enterprise Search ISV , FAST Partner
  17. 17. Taxonomies and Metadata Drive Business Value  Taxonomy  Classification hierarchy  Provides a manageable information infrastructure  Group unstructured information together based on an understanding of the concepts and ideas that share mutual attributes Search  Foundation for improving search outcomes (Key word only provides 33% of results)  Consistency of indexing, tagging, resulting in the ability to guide the end user to the ‘right’ information Finds concepts – eliminate ambiguity in single words  Taxonomy browse and faceted search (Guided navigation increases access to content by over 35%)  Solves the problem when people don’t know what they are looking for or even what they are looking for exists  Identification and protection of sensitive information (PII, PHI, etc.)  Only solution that combines pattern matching with associated vocabulary  Documents can be tagged and locked down and rendered unavailable in search  Enable more effective Records Management  Identify and declare documents of record and tag with the appropriate retention code and route to the Records Center
  18. 18. A manual metadata approach will fail 95% of time Issue Organizational Impact Inconsistent Less than 50% of content is correctly indexed, meta-tagged or efficiently searchable rendering it unusable to the organization (IDC) Subjective Highly trained Information Specialists will agree on meta tags about 33% of the time. (C. Cleverdon) Cumbersome Average cost of manually tagging one item runs from $4 - $7 per document and does not factor in the accuracy of the meta tags nor the repercussions from mis-tagged content (Hoovers) Malicious Compliance End users select first value in list (Perspectives on Metadata, Sarah Courier) No perceived value for end user What’s in it for me? End user creates document, does not see value for organization nor risks associated with litigation and non conformance to policies. What have you seen Metadata will continue to be a problem due to inconsistent human behavior The answer to consistent metadata is an automated approach that can extract the meaning from content eliminating manual metadata generation yet still providing the ability to manage knowledge assets in alignment with the unique corporate knowledge infrastructure.
  19. 19. An Automated Metadata Approach Drives Business Value  Create enterprise metadata framework/model  Average return on investment minimum of 38% and runs as high as 600% (IDC) 1. Model and  Apply consistent meaningful metadata to Validate enterprise content  Incorrect meta tags costs an organization $2,500 per user per year – in addition potential 6. Life Cycle 2. Automate costs for non-compliance (IDC) Management Tagging  Guide users to relevant content with taxonomy navigation  Savings of $8,965 per year per user based on an $80K salary (Chen & Dumais)  Use automatic conceptual metadata 5. Records Management 3. Findability generation to improve Records Management and PII  Eliminate inconsistent end user tagging at $4-$7 per record (Hoovers)  Improve compliance processes, eliminate 4. Business potential privacy exposures Processes
  20. 20. Concept Based Metadata Generation  Compound Term Processing – the ability to extract ‘concepts in context’ • Only statistical metadata generation and classification company that can extract concepts from content as it is created or ingested triple heart bypass Triple Heart Bypass Baseball Organ Highway Three Center Avoid  conceptClassifier will generate conceptual metadata by extracting multi-word terms that identifies ‘triple heart bypass’ as a concept as opposed to single keywords • Search will return results based on the concept even if the exact terms are not contained in the document (i.e. ‘coronary artery surgery’, ‘heart surgery’) • Metadata can be used by any search engine index or any application/process that uses metadata
  21. 21. conceptClassifier and TaxonomyManager  Automatic Conceptual Metadata Generation We Make Metadata Work For You  Automated Classification  Taxonomy Development & Management • Proven to reduce taxonomy development by 80%  Microsoft Integration • Runs natively in SharePoint 2007 and SharePoint 2010, Microsoft Office Applications, SharePoint Search and FAST, Windows Server 2008 R2 FCI • Fully integrated with SharePoint Content Types  Content Type Updater • Automatically changes the Content Type based on presence of organizationally defined metadata found within the document • Identification of confidential/privacy data • Ability to identify records based on the records retention schedule and route to the records center  Technology • Downloadable in 30 minutes – no programming required • Fully SOA compliant, delivered as Web Parts, based on open standards • Highly scalable
  22. 22. conceptClassifier for FAST Search  Improves search outcomes by placing conceptual  Provides accurate metadata filters such as numeric metadata in the FAST Search index to increase range searching and wildcard alphanumeric matching relevancy of search results  Removes documents from search results that are Enables import of FAST Entities into the confidential/sensitive through automatic Content Type conceptClassifier taxonomy manager to fine-tune updating and routing to secure server them with metadata generated from your own content and nomenclature  Automatically tags content with both vocabulary and retention codes and respects SharePoint security that could prevent access to the document once it has been  Runs natively as a FAST Pipeline Stage eliminating declared a record integration and customization issues Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies  Improves faceted search results as facets are based on concepts aligned with the taxonomy  Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s)
  23. 23. Roadmap from SharePoint 2007 to 2010  Enterprise Metadata Management  Ability to automatically extract all meaningful  Properties (current flat lists) become concepts from content when it is created or hierarchical “Term Sets” – ingested to be used by the Term Sets  Term Sets provide capability for faceted  Augments EMM through auto-classification search and hierarchical navigation: to automatically apply all semantic Regions Country/State, Business (conceptual) metadata to the Term Sets Unit/Departments, Band Names/Album Names, TV Show Titles/Characters  Automates the management, validation, and testing of the Term Sets in EMM from conceptClassifier’s Taxonomy Manager  Facilitates the ongoing taxonomy and Term Set maintenance through easy-to-use taxonomy features designed for Subject Matter Experts conceptClassifier fully supports SharePoint 2010 EMM as the primary location for taxonomy definitions with no need to Import/Export Changes to the taxonomy structure using Microsoft tools will be immediately visible in conceptClassifier and vice versa
  24. 24. Solving Enterprise Search Challenges
  25. 25. Contents  Why we need metadata  Architectural considerations  Why conceptSearching?
  26. 26. Why we need metadata (a search practitioner’s view) » Improve search relevancy • Create personas based on information needs (eg – a bank loan officer vs. a branch manager) • Attach different content sources, relevancy models and user experiences to these personas based on metadata » Better post-search navigation • Selectively expose metadata as navigators » Enable other features • Metadata can be used as input for workflows and alerting features (eg – notify the risk management department when documents with social security no’s are shared on the company intranet)
  27. 27. Architectural considerations Should metadata reside in a separate metadata repository, within the content repository, or within a search engine? Metadata or content repository Search engine • Metadata is actionable beyond search (drive • Simple to configure (connect search engine ingest workflows, alerts, etc.) process to classifier) • More complex implementation as no. of • Some content stores can’t hold metadata (eg – repositories increase shared drives) • Best suited for multiple repositories conceptClassifier Search Engine conceptClassifier Search Engine Content Store Content Store Content Store Content Store Content Store Content Store
  28. 28. Why use conceptSearching? » Iterative taxonomy development cycle avoids false positives and surprises » Pass taxonomy control into the hands of Business & KM users » Allow taxonomy to change with the business » Compound term classification, statistical ‘clues’ suggestions, etc. Define / Refine taxonomy Test Find clues results