Using metadata repositories with search

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    In this presentation, we’ll talk about the best of two worlds: How a metadata repository and A – Z index complements a search engine How search complements browse

    2 Favorites & 1 Group

    Using metadata repositories with search - Presentation Transcript

    1. Using metadata repositories with search Enterprise Search Summit 5/14/2007 Jean Graef The Montague Institute Jean.graef at montague.com (413) 367-0245
    2. Topics
      • Role of metadata in search & discovery
      • Where is metadata stored?
      • What is a metadata repository?
      • How to create a metadata repository
      • Using metadata within search
      • Commercial products
      • How to sell metadata repositories
    3. 1. Metadata in search & discovery
      • Smarter full text search
      • Search by attribute
        • Faceted navigation
        • Fielded search
      • Legacy navigation tools
        • Card catalogs & subject guides
        • A – Z indexes, tables of contents, glossaries, lists
        • Bibliographic databases
      • Content networks
    4. Smarter full text search
      • Results summaries
      • “Best Bets”
      • Synonyms
      • Topic browse
      • “See also” references
    5. Out of the box Results summaries
    6. Topics, thesaurus
    7.  
    8. Search by attribute
    9. Legacy retrieval tools: Before
    10. Legacy retrieval tools: After
    11. Content networks
    12. 2. Where is metadata stored?
      • Embedded in document
      • Embedded in application
      • In XML file
      • In external database
    13. Embedded in document
    14. Embedded in application
    15. Embedded in application
    16. In spreadsheet or database
    17. Why isolate metadata?
      • Easier to standardize & localize
      • Easier to update
      • Easier to share with multiple applications
        • Full text search
        • ERP/transaction applications
        • CMS and DMS applications
        • Legacy retrieval tools
      • Insurance against technological change
    18. Who uses metadata?
      • Programs
        • Search engines
        • Other applications
      • Humans
        • Authors
        • Site administrators
        • Indexers
        • Readers/visitors
    19. 3. A metadata repository is…
      • A data storage structure that describes the characteristics of information objects as an aid in identification, discovery, assessment, and management.
      • Able to be read and updated by both humans and computers.
      • Example: <Title>Web site makeover</title>
    20. Two familiar examples
      • Library card catalog
        • Data = book, journal title
        • Metadata = author, title, subject, call number
      • Bibliographic database
        • Data = journal article
        • Metadata = author, title, pub date, publisher, keywords
    21. Library catalog (Endeca)
    22. Bibliographic database
    23. Two kinds
      • Data management tools
        • Catalog of business definitions, data processing systems, & application components.
        • ER diagrams
        • Rochade, Informatica
      • Classification (semantic) tools
        • Reference for names, terms, topics, and other data used to classify content objects
        • Collaboration, display of terms & relationships
        • Products we discuss here
    24. Data management tool
    25. Classification tool
    26. Metadata repository for search
      • Controlled vocabulary
      • Thesaurus
      • Link to content object (e.g. URL, file path)
      • Other attributes
        • Language
        • Geographic region
        • Industry
      • Search is more than a search engine!
    27. 4. Creating a metadata repository Search terms URL term Thesaurus Term BT/NT RT Use Controlled Vocabulary term Card catalog Author Title Subject Contact database Name Password Address
    28. Search terms
    29. Controlled vocabulary
    30. Thesaurus
    31. Metadata repository segment
    32. Where does data come from?
      • System (unique ID, date saved, user ID)
      • User input (free text)
      • User input (selected from list)
      • Program generated (assigned from rules)
      • Database lookup (e.g. employee directory)
      • Licensed from creator or vendor
    33. Metadata sources & uses
    34. 5. Using metadata in search
      • Export as XML
      • Access via ODBC
      • Vendor API
      • Web services
    35. Export XML Metadata repository XSL Style Sheet Search engine Related terms Use terms Preferred names Use names XML File
    36. XML Style sheet
    37. XML thesaurus data
    38.  
    39. Access via ODBC indexes Metadata Repository Search Engine Index Search Results List Search Engine
    40. Access via ODBC
    41. Access via ODBC
    42. Product checklist
      • Easy to use for both indexers & laymen
      • Basic thesaurus fields & relationships
        • BT, NT, Use/Use For, RT
        • Definitions, scope notes, source
      • Multiple vocabularies
      • Polyhierarchy
      • Error checking
        • Duplicates
        • No x-refs to dead-end terms
    43. Product checklist
      • Workflow features
        • Candidate terms, approvals
      • Import/export formats
      • Statistics/reports
        • Terms used in queries
      • Add new fields & relationships
      • Robust daabase search features
        • Boolean, truncated, & phrase search
    44. 6. Enterprise-grade products
      • Data Harmony
      • Schemalogic
      • Factiva Synaptica
      • Wordmap
    45. Data Harmony
      • Thesaurus tool + rules-based indexer
        • Create & manage thesaurus terms
        • Call indexer to assign terms to documents
      • Has interfaced with:
        • Documentum
        • Sharepoint
          • Index document from within Sharepoint
        • MarkLogic
        • Verity & Ultraseek
    46. Data Harmony
      • Multi-lingual capabilities
      • Time-limited trial download
      • $100,000 + for both thesaurus & indexer
        • pricing based on the number of servers
      • Customer base: government agencies, corporations
    47. Data Harmony file formats
    48. Schemalogic
      • Thesaurus, vocabularies, schemas
        • Manage terms & vocabularies
        • Interfaces for both indexers & business people
      • Has interfaced with:
        • Auto-categorization tools (Teragram, Nstein)
        • Search engines: OmniFind, FAST, Verity, Autonomy
        • Sharepoint
    49. Schemalogic
      • Multi-lingual capabilities
      • $50,000 - $750,000 for software + services
        • pricing based on the number of seats & servers
      • Customers: Commercial publishers, corporations
    50. Schemalogic: Indexer’s view
    51. Schemalogic: Business person’s view
    52. Factiva Synaptica
      • Thesaurus, vocabularies, “warehouse”
        • Manage terms & vocabularies
        • Index Management Service (classification)
      • Multi-lingual capabilities
      • $100,000 +
      • Customers: Commercial publishers, corporations
    53. Factiva Synaptica
    54. Wordmap
      • Thesaurus manager, tagger, auto-classifier, topic browse (navigator)
        • Manage terms & vocabularies
        • Assign terms
        • Clusters documents into categories
        • Yellow-pages style directory with x-refs
      • Multi-lingual capability
    55. Wordmap: Indexer’s view
    56. Wordmap: User’s view
    57. 7. Selling metadata repositories
      • Time saved by users in finding information
      • Staff time saved by self service
      • Staff time saved in preparing & publishing content
      • Increased revenues from information products & services
    58. Metadata repository lab
      • Enter your own data
        • Thesaurus
          • Map terms in two different thesauri
        • Digital assets (documents, images, etc)
        • Names: people, products, organizations
        • Relationships
          • Authored by
          • Subject of
          • Made by
          • Acquired by
          • Used by
    59. Metadata repository Lab
      • Manual data entry or import
      • Navigation tools
        • Fielded (faceted) search
        • Bibliography
        • Glossary
        • Back-of-the-book style A – Z index
        • Subject hierarchy (table of contents)
      • Custom lists & export formats (XML)
    60. A – Z index
    61. More info Montague Institute Review http://www.montague.com/review/review.html
    62.  

    + jgraefjgraef, 3 years ago

    custom

    1869 views, 2 favs, 0 embeds more stats

    Pre-conference workshop at the 2007 Enterprise Sear more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1869
      • 1869 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events