Using metadata repositories with search Enterprise Search Summit 5/14/2007 Jean Graef The Montague Institute Jean.graef at...
Topics <ul><li>Role of metadata in search & discovery </li></ul><ul><li>Where is metadata stored? </li></ul><ul><li>What i...
1. Metadata in search & discovery <ul><li>Smarter full text search </li></ul><ul><li>Search by attribute </li></ul><ul><ul...
Smarter full text search  <ul><li>Results summaries </li></ul><ul><li>“Best Bets” </li></ul><ul><li>Synonyms </li></ul><ul...
Out of the box Results summaries
Topics, thesaurus
 
Search by attribute
Legacy retrieval tools: Before
Legacy retrieval tools: After
Content networks
2. Where is metadata stored? <ul><li>Embedded in document </li></ul><ul><li>Embedded in application </li></ul><ul><li>In X...
Embedded in document
Embedded in application
Embedded in application
In spreadsheet or database
Why isolate metadata? <ul><li>Easier to standardize & localize </li></ul><ul><li>Easier to update </li></ul><ul><li>Easier...
Who uses metadata? <ul><li>Programs </li></ul><ul><ul><li>Search engines </li></ul></ul><ul><ul><li>Other applications </l...
3. A metadata repository is… <ul><li>A data storage structure that describes the characteristics of information objects as...
Two familiar examples <ul><li>Library card catalog </li></ul><ul><ul><li>Data = book, journal title </li></ul></ul><ul><ul...
Library catalog (Endeca)
Bibliographic database
Two kinds <ul><li>Data management tools </li></ul><ul><ul><li>Catalog of business definitions, data processing systems, & ...
Data management tool
Classification tool
Metadata repository for search <ul><li>Controlled vocabulary </li></ul><ul><li>Thesaurus </li></ul><ul><li>Link to content...
4. Creating a metadata repository Search terms URL term Thesaurus Term BT/NT RT Use Controlled Vocabulary term Card catalo...
Search terms
Controlled vocabulary
Thesaurus
Metadata repository segment
Where does data come from? <ul><li>System (unique ID, date saved, user ID) </li></ul><ul><li>User input (free text) </li><...
Metadata sources & uses
5. Using metadata in search <ul><li>Export as XML </li></ul><ul><li>Access via ODBC </li></ul><ul><li>Vendor API </li></ul...
Export XML Metadata  repository XSL Style Sheet Search engine Related terms Use terms Preferred names Use names XML File
XML Style sheet
XML thesaurus data
 
Access via ODBC indexes Metadata Repository Search Engine Index Search Results List Search Engine
Access via ODBC
Access via ODBC
Product checklist <ul><li>Easy to use for both indexers & laymen </li></ul><ul><li>Basic thesaurus fields & relationships ...
Product checklist <ul><li>Workflow features </li></ul><ul><ul><li>Candidate terms, approvals </li></ul></ul><ul><li>Import...
6. Enterprise-grade products <ul><li>Data Harmony </li></ul><ul><li>Schemalogic </li></ul><ul><li>Factiva Synaptica </li><...
Data Harmony <ul><li>Thesaurus tool + rules-based indexer </li></ul><ul><ul><li>Create & manage thesaurus terms </li></ul>...
Data Harmony <ul><li>Multi-lingual capabilities </li></ul><ul><li>Time-limited trial download </li></ul><ul><li>$100,000 +...
Data Harmony file formats
Schemalogic <ul><li>Thesaurus, vocabularies, schemas </li></ul><ul><ul><li>Manage terms & vocabularies </li></ul></ul><ul>...
Schemalogic <ul><li>Multi-lingual capabilities </li></ul><ul><li>$50,000 - $750,000 for software + services </li></ul><ul>...
Schemalogic: Indexer’s view
Schemalogic: Business person’s view
Factiva Synaptica <ul><li>Thesaurus, vocabularies, “warehouse” </li></ul><ul><ul><li>Manage terms & vocabularies </li></ul...
Factiva Synaptica
Wordmap <ul><li>Thesaurus manager, tagger, auto-classifier, topic browse (navigator) </li></ul><ul><ul><li>Manage terms & ...
Wordmap: Indexer’s view
Wordmap: User’s view
7. Selling metadata repositories <ul><li>Time saved by users in finding information </li></ul><ul><li>Staff time saved by ...
Metadata repository lab <ul><li>Enter your own data </li></ul><ul><ul><li>Thesaurus </li></ul></ul><ul><ul><ul><li>Map ter...
Metadata repository Lab <ul><li>Manual data entry or import </li></ul><ul><li>Navigation tools </li></ul><ul><ul><li>Field...
A – Z index
More info Montague Institute Review  http://www.montague.com/review/review.html
 
Upcoming SlideShare
Loading in …5
×

Using metadata repositories with search

3,770 views
3,642 views

Published on

Pre-conference workshop at the 2007 Enterprise Search Summit

Published in: Technology, Education
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,770
On SlideShare
0
From Embeds
0
Number of Embeds
44
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • In this presentation, we’ll talk about the best of two worlds: How a metadata repository and A – Z index complements a search engine How search complements browse
  • Using metadata repositories with search

    1. 1. Using metadata repositories with search Enterprise Search Summit 5/14/2007 Jean Graef The Montague Institute Jean.graef at montague.com (413) 367-0245
    2. 2. Topics <ul><li>Role of metadata in search & discovery </li></ul><ul><li>Where is metadata stored? </li></ul><ul><li>What is a metadata repository? </li></ul><ul><li>How to create a metadata repository </li></ul><ul><li>Using metadata within search </li></ul><ul><li>Commercial products </li></ul><ul><li>How to sell metadata repositories </li></ul>
    3. 3. 1. Metadata in search & discovery <ul><li>Smarter full text search </li></ul><ul><li>Search by attribute </li></ul><ul><ul><li>Faceted navigation </li></ul></ul><ul><ul><li>Fielded search </li></ul></ul><ul><li>Legacy navigation tools </li></ul><ul><ul><li>Card catalogs & subject guides </li></ul></ul><ul><ul><li>A – Z indexes, tables of contents, glossaries, lists </li></ul></ul><ul><ul><li>Bibliographic databases </li></ul></ul><ul><li>Content networks </li></ul>
    4. 4. Smarter full text search <ul><li>Results summaries </li></ul><ul><li>“Best Bets” </li></ul><ul><li>Synonyms </li></ul><ul><li>Topic browse </li></ul><ul><li>“See also” references </li></ul>
    5. 5. Out of the box Results summaries
    6. 6. Topics, thesaurus
    7. 8. Search by attribute
    8. 9. Legacy retrieval tools: Before
    9. 10. Legacy retrieval tools: After
    10. 11. Content networks
    11. 12. 2. Where is metadata stored? <ul><li>Embedded in document </li></ul><ul><li>Embedded in application </li></ul><ul><li>In XML file </li></ul><ul><li>In external database </li></ul>
    12. 13. Embedded in document
    13. 14. Embedded in application
    14. 15. Embedded in application
    15. 16. In spreadsheet or database
    16. 17. Why isolate metadata? <ul><li>Easier to standardize & localize </li></ul><ul><li>Easier to update </li></ul><ul><li>Easier to share with multiple applications </li></ul><ul><ul><li>Full text search </li></ul></ul><ul><ul><li>ERP/transaction applications </li></ul></ul><ul><ul><li>CMS and DMS applications </li></ul></ul><ul><ul><li>Legacy retrieval tools </li></ul></ul><ul><li>Insurance against technological change </li></ul>
    17. 18. Who uses metadata? <ul><li>Programs </li></ul><ul><ul><li>Search engines </li></ul></ul><ul><ul><li>Other applications </li></ul></ul><ul><li>Humans </li></ul><ul><ul><li>Authors </li></ul></ul><ul><ul><li>Site administrators </li></ul></ul><ul><ul><li>Indexers </li></ul></ul><ul><ul><li>Readers/visitors </li></ul></ul>
    18. 19. 3. A metadata repository is… <ul><li>A data storage structure that describes the characteristics of information objects as an aid in identification, discovery, assessment, and management. </li></ul><ul><li>Able to be read and updated by both humans and computers. </li></ul><ul><li>Example: <Title>Web site makeover</title> </li></ul>
    19. 20. Two familiar examples <ul><li>Library card catalog </li></ul><ul><ul><li>Data = book, journal title </li></ul></ul><ul><ul><li>Metadata = author, title, subject, call number </li></ul></ul><ul><li>Bibliographic database </li></ul><ul><ul><li>Data = journal article </li></ul></ul><ul><ul><li>Metadata = author, title, pub date, publisher, keywords </li></ul></ul>
    20. 21. Library catalog (Endeca)
    21. 22. Bibliographic database
    22. 23. Two kinds <ul><li>Data management tools </li></ul><ul><ul><li>Catalog of business definitions, data processing systems, & application components. </li></ul></ul><ul><ul><li>ER diagrams </li></ul></ul><ul><ul><li>Rochade, Informatica </li></ul></ul><ul><li>Classification (semantic) tools </li></ul><ul><ul><li>Reference for names, terms, topics, and other data used to classify content objects </li></ul></ul><ul><ul><li>Collaboration, display of terms & relationships </li></ul></ul><ul><ul><li>Products we discuss here </li></ul></ul>
    23. 24. Data management tool
    24. 25. Classification tool
    25. 26. Metadata repository for search <ul><li>Controlled vocabulary </li></ul><ul><li>Thesaurus </li></ul><ul><li>Link to content object (e.g. URL, file path) </li></ul><ul><li>Other attributes </li></ul><ul><ul><li>Language </li></ul></ul><ul><ul><li>Geographic region </li></ul></ul><ul><ul><li>Industry </li></ul></ul><ul><li>Search is more than a search engine! </li></ul>
    26. 27. 4. Creating a metadata repository Search terms URL term Thesaurus Term BT/NT RT Use Controlled Vocabulary term Card catalog Author Title Subject Contact database Name Password Address
    27. 28. Search terms
    28. 29. Controlled vocabulary
    29. 30. Thesaurus
    30. 31. Metadata repository segment
    31. 32. Where does data come from? <ul><li>System (unique ID, date saved, user ID) </li></ul><ul><li>User input (free text) </li></ul><ul><li>User input (selected from list) </li></ul><ul><li>Program generated (assigned from rules) </li></ul><ul><li>Database lookup (e.g. employee directory) </li></ul><ul><li>Licensed from creator or vendor </li></ul>
    32. 33. Metadata sources & uses
    33. 34. 5. Using metadata in search <ul><li>Export as XML </li></ul><ul><li>Access via ODBC </li></ul><ul><li>Vendor API </li></ul><ul><li>Web services </li></ul>
    34. 35. Export XML Metadata repository XSL Style Sheet Search engine Related terms Use terms Preferred names Use names XML File
    35. 36. XML Style sheet
    36. 37. XML thesaurus data
    37. 39. Access via ODBC indexes Metadata Repository Search Engine Index Search Results List Search Engine
    38. 40. Access via ODBC
    39. 41. Access via ODBC
    40. 42. Product checklist <ul><li>Easy to use for both indexers & laymen </li></ul><ul><li>Basic thesaurus fields & relationships </li></ul><ul><ul><li>BT, NT, Use/Use For, RT </li></ul></ul><ul><ul><li>Definitions, scope notes, source </li></ul></ul><ul><li>Multiple vocabularies </li></ul><ul><li>Polyhierarchy </li></ul><ul><li>Error checking </li></ul><ul><ul><li>Duplicates </li></ul></ul><ul><ul><li>No x-refs to dead-end terms </li></ul></ul>
    41. 43. Product checklist <ul><li>Workflow features </li></ul><ul><ul><li>Candidate terms, approvals </li></ul></ul><ul><li>Import/export formats </li></ul><ul><li>Statistics/reports </li></ul><ul><ul><li>Terms used in queries </li></ul></ul><ul><li>Add new fields & relationships </li></ul><ul><li>Robust daabase search features </li></ul><ul><ul><li>Boolean, truncated, & phrase search </li></ul></ul>
    42. 44. 6. Enterprise-grade products <ul><li>Data Harmony </li></ul><ul><li>Schemalogic </li></ul><ul><li>Factiva Synaptica </li></ul><ul><li>Wordmap </li></ul>
    43. 45. Data Harmony <ul><li>Thesaurus tool + rules-based indexer </li></ul><ul><ul><li>Create & manage thesaurus terms </li></ul></ul><ul><ul><li>Call indexer to assign terms to documents </li></ul></ul><ul><li>Has interfaced with: </li></ul><ul><ul><li>Documentum </li></ul></ul><ul><ul><li>Sharepoint </li></ul></ul><ul><ul><ul><li>Index document from within Sharepoint </li></ul></ul></ul><ul><ul><li>MarkLogic </li></ul></ul><ul><ul><li>Verity & Ultraseek </li></ul></ul>
    44. 46. Data Harmony <ul><li>Multi-lingual capabilities </li></ul><ul><li>Time-limited trial download </li></ul><ul><li>$100,000 + for both thesaurus & indexer </li></ul><ul><ul><li>pricing based on the number of servers </li></ul></ul><ul><li>Customer base: government agencies, corporations </li></ul>
    45. 47. Data Harmony file formats
    46. 48. Schemalogic <ul><li>Thesaurus, vocabularies, schemas </li></ul><ul><ul><li>Manage terms & vocabularies </li></ul></ul><ul><ul><li>Interfaces for both indexers & business people </li></ul></ul><ul><li>Has interfaced with: </li></ul><ul><ul><li>Auto-categorization tools (Teragram, Nstein) </li></ul></ul><ul><ul><li>Search engines: OmniFind, FAST, Verity, Autonomy </li></ul></ul><ul><ul><li>Sharepoint </li></ul></ul>
    47. 49. Schemalogic <ul><li>Multi-lingual capabilities </li></ul><ul><li>$50,000 - $750,000 for software + services </li></ul><ul><ul><li>pricing based on the number of seats & servers </li></ul></ul><ul><li>Customers: Commercial publishers, corporations </li></ul>
    48. 50. Schemalogic: Indexer’s view
    49. 51. Schemalogic: Business person’s view
    50. 52. Factiva Synaptica <ul><li>Thesaurus, vocabularies, “warehouse” </li></ul><ul><ul><li>Manage terms & vocabularies </li></ul></ul><ul><ul><li>Index Management Service (classification) </li></ul></ul><ul><li>Multi-lingual capabilities </li></ul><ul><li>$100,000 + </li></ul><ul><li>Customers: Commercial publishers, corporations </li></ul>
    51. 53. Factiva Synaptica
    52. 54. Wordmap <ul><li>Thesaurus manager, tagger, auto-classifier, topic browse (navigator) </li></ul><ul><ul><li>Manage terms & vocabularies </li></ul></ul><ul><ul><li>Assign terms </li></ul></ul><ul><ul><li>Clusters documents into categories </li></ul></ul><ul><ul><li>Yellow-pages style directory with x-refs </li></ul></ul><ul><li>Multi-lingual capability </li></ul>
    53. 55. Wordmap: Indexer’s view
    54. 56. Wordmap: User’s view
    55. 57. 7. Selling metadata repositories <ul><li>Time saved by users in finding information </li></ul><ul><li>Staff time saved by self service </li></ul><ul><li>Staff time saved in preparing & publishing content </li></ul><ul><li>Increased revenues from information products & services </li></ul>
    56. 58. Metadata repository lab <ul><li>Enter your own data </li></ul><ul><ul><li>Thesaurus </li></ul></ul><ul><ul><ul><li>Map terms in two different thesauri </li></ul></ul></ul><ul><ul><li>Digital assets (documents, images, etc) </li></ul></ul><ul><ul><li>Names: people, products, organizations </li></ul></ul><ul><ul><li>Relationships </li></ul></ul><ul><ul><ul><li>Authored by </li></ul></ul></ul><ul><ul><ul><li>Subject of </li></ul></ul></ul><ul><ul><ul><li>Made by </li></ul></ul></ul><ul><ul><ul><li>Acquired by </li></ul></ul></ul><ul><ul><ul><li>Used by </li></ul></ul></ul>
    57. 59. Metadata repository Lab <ul><li>Manual data entry or import </li></ul><ul><li>Navigation tools </li></ul><ul><ul><li>Fielded (faceted) search </li></ul></ul><ul><ul><li>Bibliography </li></ul></ul><ul><ul><li>Glossary </li></ul></ul><ul><ul><li>Back-of-the-book style A – Z index </li></ul></ul><ul><ul><li>Subject hierarchy (table of contents) </li></ul></ul><ul><li>Custom lists & export formats (XML) </li></ul>
    58. 60. A – Z index
    59. 61. More info Montague Institute Review http://www.montague.com/review/review.html

    ×