Your SlideShare is downloading. ×
Using metadata repositories with search
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Using metadata repositories with search

3,458
views

Published on

Pre-conference workshop at the 2007 Enterprise Search Summit

Pre-conference workshop at the 2007 Enterprise Search Summit

Published in: Technology, Education

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,458
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • In this presentation, we’ll talk about the best of two worlds: How a metadata repository and A – Z index complements a search engine How search complements browse
  • Transcript

    • 1. Using metadata repositories with search Enterprise Search Summit 5/14/2007 Jean Graef The Montague Institute Jean.graef at montague.com (413) 367-0245
    • 2. Topics
      • Role of metadata in search & discovery
      • Where is metadata stored?
      • What is a metadata repository?
      • How to create a metadata repository
      • Using metadata within search
      • Commercial products
      • How to sell metadata repositories
    • 3. 1. Metadata in search & discovery
      • Smarter full text search
      • Search by attribute
        • Faceted navigation
        • Fielded search
      • Legacy navigation tools
        • Card catalogs & subject guides
        • A – Z indexes, tables of contents, glossaries, lists
        • Bibliographic databases
      • Content networks
    • 4. Smarter full text search
      • Results summaries
      • “Best Bets”
      • Synonyms
      • Topic browse
      • “See also” references
    • 5. Out of the box Results summaries
    • 6. Topics, thesaurus
    • 7.  
    • 8. Search by attribute
    • 9. Legacy retrieval tools: Before
    • 10. Legacy retrieval tools: After
    • 11. Content networks
    • 12. 2. Where is metadata stored?
      • Embedded in document
      • Embedded in application
      • In XML file
      • In external database
    • 13. Embedded in document
    • 14. Embedded in application
    • 15. Embedded in application
    • 16. In spreadsheet or database
    • 17. Why isolate metadata?
      • Easier to standardize & localize
      • Easier to update
      • Easier to share with multiple applications
        • Full text search
        • ERP/transaction applications
        • CMS and DMS applications
        • Legacy retrieval tools
      • Insurance against technological change
    • 18. Who uses metadata?
      • Programs
        • Search engines
        • Other applications
      • Humans
        • Authors
        • Site administrators
        • Indexers
        • Readers/visitors
    • 19. 3. A metadata repository is…
      • A data storage structure that describes the characteristics of information objects as an aid in identification, discovery, assessment, and management.
      • Able to be read and updated by both humans and computers.
      • Example: <Title>Web site makeover</title>
    • 20. Two familiar examples
      • Library card catalog
        • Data = book, journal title
        • Metadata = author, title, subject, call number
      • Bibliographic database
        • Data = journal article
        • Metadata = author, title, pub date, publisher, keywords
    • 21. Library catalog (Endeca)
    • 22. Bibliographic database
    • 23. Two kinds
      • Data management tools
        • Catalog of business definitions, data processing systems, & application components.
        • ER diagrams
        • Rochade, Informatica
      • Classification (semantic) tools
        • Reference for names, terms, topics, and other data used to classify content objects
        • Collaboration, display of terms & relationships
        • Products we discuss here
    • 24. Data management tool
    • 25. Classification tool
    • 26. Metadata repository for search
      • Controlled vocabulary
      • Thesaurus
      • Link to content object (e.g. URL, file path)
      • Other attributes
        • Language
        • Geographic region
        • Industry
      • Search is more than a search engine!
    • 27. 4. Creating a metadata repository Search terms URL term Thesaurus Term BT/NT RT Use Controlled Vocabulary term Card catalog Author Title Subject Contact database Name Password Address
    • 28. Search terms
    • 29. Controlled vocabulary
    • 30. Thesaurus
    • 31. Metadata repository segment
    • 32. Where does data come from?
      • System (unique ID, date saved, user ID)
      • User input (free text)
      • User input (selected from list)
      • Program generated (assigned from rules)
      • Database lookup (e.g. employee directory)
      • Licensed from creator or vendor
    • 33. Metadata sources & uses
    • 34. 5. Using metadata in search
      • Export as XML
      • Access via ODBC
      • Vendor API
      • Web services
    • 35. Export XML Metadata repository XSL Style Sheet Search engine Related terms Use terms Preferred names Use names XML File
    • 36. XML Style sheet
    • 37. XML thesaurus data
    • 38.  
    • 39. Access via ODBC indexes Metadata Repository Search Engine Index Search Results List Search Engine
    • 40. Access via ODBC
    • 41. Access via ODBC
    • 42. Product checklist
      • Easy to use for both indexers & laymen
      • Basic thesaurus fields & relationships
        • BT, NT, Use/Use For, RT
        • Definitions, scope notes, source
      • Multiple vocabularies
      • Polyhierarchy
      • Error checking
        • Duplicates
        • No x-refs to dead-end terms
    • 43. Product checklist
      • Workflow features
        • Candidate terms, approvals
      • Import/export formats
      • Statistics/reports
        • Terms used in queries
      • Add new fields & relationships
      • Robust daabase search features
        • Boolean, truncated, & phrase search
    • 44. 6. Enterprise-grade products
      • Data Harmony
      • Schemalogic
      • Factiva Synaptica
      • Wordmap
    • 45. Data Harmony
      • Thesaurus tool + rules-based indexer
        • Create & manage thesaurus terms
        • Call indexer to assign terms to documents
      • Has interfaced with:
        • Documentum
        • Sharepoint
          • Index document from within Sharepoint
        • MarkLogic
        • Verity & Ultraseek
    • 46. Data Harmony
      • Multi-lingual capabilities
      • Time-limited trial download
      • $100,000 + for both thesaurus & indexer
        • pricing based on the number of servers
      • Customer base: government agencies, corporations
    • 47. Data Harmony file formats
    • 48. Schemalogic
      • Thesaurus, vocabularies, schemas
        • Manage terms & vocabularies
        • Interfaces for both indexers & business people
      • Has interfaced with:
        • Auto-categorization tools (Teragram, Nstein)
        • Search engines: OmniFind, FAST, Verity, Autonomy
        • Sharepoint
    • 49. Schemalogic
      • Multi-lingual capabilities
      • $50,000 - $750,000 for software + services
        • pricing based on the number of seats & servers
      • Customers: Commercial publishers, corporations
    • 50. Schemalogic: Indexer’s view
    • 51. Schemalogic: Business person’s view
    • 52. Factiva Synaptica
      • Thesaurus, vocabularies, “warehouse”
        • Manage terms & vocabularies
        • Index Management Service (classification)
      • Multi-lingual capabilities
      • $100,000 +
      • Customers: Commercial publishers, corporations
    • 53. Factiva Synaptica
    • 54. Wordmap
      • Thesaurus manager, tagger, auto-classifier, topic browse (navigator)
        • Manage terms & vocabularies
        • Assign terms
        • Clusters documents into categories
        • Yellow-pages style directory with x-refs
      • Multi-lingual capability
    • 55. Wordmap: Indexer’s view
    • 56. Wordmap: User’s view
    • 57. 7. Selling metadata repositories
      • Time saved by users in finding information
      • Staff time saved by self service
      • Staff time saved in preparing & publishing content
      • Increased revenues from information products & services
    • 58. Metadata repository lab
      • Enter your own data
        • Thesaurus
          • Map terms in two different thesauri
        • Digital assets (documents, images, etc)
        • Names: people, products, organizations
        • Relationships
          • Authored by
          • Subject of
          • Made by
          • Acquired by
          • Used by
    • 59. Metadata repository Lab
      • Manual data entry or import
      • Navigation tools
        • Fielded (faceted) search
        • Bibliography
        • Glossary
        • Back-of-the-book style A – Z index
        • Subject hierarchy (table of contents)
      • Custom lists & export formats (XML)
    • 60. A – Z index
    • 61. More info Montague Institute Review http://www.montague.com/review/review.html
    • 62.