• Like
  • Save
Taxonomies in Search
Upcoming SlideShare
Loading in...5
×
 

Taxonomies in Search

on

  • 3,958 views

Presented by Marjorie Hlava, president of Access Innovations, Inc. on August 10, 2011. Part two of the Special Libraries Association's Leveraging Your Taxonomy series.

Presented by Marjorie Hlava, president of Access Innovations, Inc. on August 10, 2011. Part two of the Special Libraries Association's Leveraging Your Taxonomy series.

Statistics

Views

Total Views
3,958
Views on SlideShare
3,758
Embed Views
200

Actions

Likes
1
Downloads
49
Comments
0

4 Embeds 200

http://lonewolflibrarian.wordpress.com 195
http://www.library.ceu.hu 2
http://www.accessinn.com 2
http://news.google.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Taxonomies in Search Taxonomies in Search Presentation Transcript

    • Taxonomies in SearchAn SLA Webinar
      Aug 10, 1:00pm-2:00pm EST
      Marjorie Hlava, President
      mhlava@accessinn.com
      Access Innovations, Inc.
      www.accessinn.com
      Leveraging your content semantically
    • Agenda
      How search works
      Measuring accuracy in search
      Precision
      Recall
      Relevance
      Search theoretical basis
      Bayes, Boole and the rest of the guys
      The taxonomy effect
    • How does search work?
      Many parts
      Search software – of course
      Computer network
      Parsing of text
      Well formed or structured text
      CLEAN DATA
      Computer software – network
      Computer hardware
      Telecommunications connection
      Training sets for statistical systems
    • Technical parts of search
      Search technology
      Ranking algorithms
      Query language
      Federators
      Cache
      Inverted index
      Other enhancements
      Presentation Layer
    • My Main Frustration
      Select hardware
      Select software
      Design system
      Try to load the data
      Add the taxonomy
      That’s BACKWARDS
    • Data First!
      What are you building the system for?
      Assess the data
      Do the design
      Decide what else needs to be added
      Taxonomy terms
      Other controls
      Find a system that will work with your data
    • Access Innovations – Complex FarmWith Perfect Search
      Query
      Federators
      Query Servers
      Search Harmony Presentation
      Layer
      Deploy
      Hub
      Index
      Builders
      Cleanup, etc.
      Repository XIS (cache)
      Cache
      Builders
      Source
      Data
    • CUSTOM
      CONNECTOR
      EMAIL
      CONNECTOR
      DATABASE
      CONNECTOR
      FILE
      TRAVERSER
      WEB
      CRAWLER
      MANAGEMENT API
      QUERY API
      CONTENT API
      Data Harmony Governance API
      SEARCH
      SERVER
      FILTERSERVER
      FAST Search example
      Core Architectural Components
      Administrator’s
      Dashboard
      Web
      Content
      Vertical
      Applications
      Pipeline
      Query
      Pipeline
      Files,
      Documents
      QUERY
      PROCESSOR
      Portals
      Index DB
      Databases
      DOCUMENT
      PROCESSOR
      Results
      Custom
      Front-Ends
      Alerts
      Email,
      Groupware
      Search harmony
      Mobile
      Devices
      Custom
      Applications
      Content
      Push
      MAIstro
      Agent DB
    • Measuring accuracy in search
      Relevance
      Recall
      Precision
      Accuracy – Hits, miss, noise
      Ranking
      Linguistics
      Query Processing
      Results Processing
      Display
      Search refinement
      Usability
      Business Rules
      9
    • Relevance
      How well a set of returned documents answers the information need
      “Accuracy”
      Related to objective of search
      Different user communities
      Information resources
      Tension of user needs and context available
      A confidence “guessimate”
      10
    • The formulas
      Recall = Number of relevant items retrieved
      Number of relevant items in the collection
      Precision = Number of relevant items retrieved
      Number of items retrieved
      Relevance = Germane (Precision)
      Pertinent (Recall)
    • Measuring Relevance
      Concepts
      Context
      Age of documents
      Completeness (recall)
      Quality
      Statistically determined ?
      Nope, it is subjective
      Someone has to determine the rightness of the item
      A confidence factor = canard!
    • Kinds of search
      Bayesian –
      FAST
      Lucene
      Autonomy / Verity
      Boolean
      Dialog
      Endeca
      Perfect Search
      Ranking algorithms
      Google
      13
    • Search Theoretical BasisThose Famous Guys
      Boole
      Bayes
      Bayesian Techniques
      Turney
      Turney algorithm
      Enriched structured data
      Marco Dorigo
      Ant Colony
      This is only a sample
      of a large body of research
    • George Boole and Boolean algebra
      George Boole
      Mathematician
      1815-1864
      Boolean algebra
      An algebraic system of logic
      AND, OR, NOT, ANDNOT,
      Dialog, BRS, Stairs
      15
    • Boolean representation
      Venn diagram showing the intersection of sets A AND B (in violet),
      The union of sets A OR B (all the colored regions),
      And set A XOR B (all the colored regions except the violet).
      The "universe" is represented by the rectangular frame.
      16
    • Bayes and Bayes’ Theorem
      Thomas Bayes
      Mathematician
      1702 - 1761
      Bayesian theorem
      Uses probability inductively
      Established a mathematical basis for probability inference
      WHAT?
      A means of calculating,
      from the number of times an event has not occurred,
      the probability that it will occur in future trials
      17
    • Bayesian methods - Cautions
      A user might wish to change the distribution of probabilities.
      A user will make a novel request for information in a previously unanticipated way.
      The computational difficulty of exploring a previously unknown network.
      The quality and extent of the prior beliefs used in Bayesian inference processing.
    • Bayesian cautions (cont.)
      A Bayesian network is only as useful as the prior knowledge is reliable.
      An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results.
      Must ensure the selection of the statistical distribution induced in modeling the data.
      Must have the proper distribution model to describe the data.
      That is you have to constantly train and retrain the data
    • Peter Turney and the Turney Algorithm
      Peter D. Turney, Canada, present
      Learning algorithms for keyphraseextraction
      Tree Induction Algorithm
      Lexical Semantics
      GenEx – with human input
      80% acceptable
      Extraction vs. generation and sentiment of words
               (hits(word AND "excellent") hits (poor))log2 ----------------------------------------         (hits(word AND "poor") hits (excellent))
    • Marco Dorigo and Ant Colony Optimization
      Marco Dorigo
      Research director for the Belgian Fonds de la RechercheScientifique
      Research director of the IRIDIA lab at the UniversitéLibre de Bruxelles
      Ant Colony Optimization
      metaheuristicfor combinatorial optimization problems
      Swarm intelligence
      Value importance vs. heuristic importance
      Useful in search prediction
      21
    • Natural Language Processing
      Syntactic
      Semantic
      Morphological
      Phraseological
      Lemmatization (stemming)
      Statistical
      Grammatical
      Common Sense
    • Basic areas of Automatic Language Processing (ALP)
      Auto Translation
      Auto Indexing
      Auto Abstracting
      Artificial Intelligence
      Searching
      Spell Checking
      Semantic Web
      Natural Language Processes (NLP)
      Computational Linguistics
    • Statistical Search
      Cluster analysis
      Neural networks
      Co-occurrence
      Bayesian inference
      Latent Semantic
      Etc.
      24
    • Inverted Files and Boolean
      are basic to all search
      Searchable Index
      Inverted
      File
      Index
      Taxonomy
      Thesaurus
      Hierarchical Display
    • Sample Slide for Inverted File Index Demonstration
      Outline of Presentation
      • Define key terminology
      • Thesaurus tools
      Features
      Functions
      • Costs
      Thesaurus construction
      Thesaurus tools
      • Why & when?
    • Simple Inverted File Index
      key
      of
      outline
      presentation
      terminology
      thesaurus
      tools
      when
      why
      &
      1
      2
      3
      4
      construction
      costs
      define
      features
      functions
    • Complex Inverted File Index
      Example 1
      key - L2, P2, H
      of - Stop
      outline - L1, P1, T
      presentation - L1, P3, T
      terminology - L2, P3, H
      thesaurus - (1) - L3, P1, H
      (2) - L7, P1, SH
      (3) - L8, P1, SH
      tools - (1) - L3, P2, H
      (2) - L8, P2, SH
      when - L9, P3, H
      why - L9, P1, H
      & - Stop
      1 - Stop
      2 - Stop
      3 - Stop
      4 - Stop
      construction - L7, P2, SH
      costs - L6, P1, H
      define - L2, P1, H
      features - L4, P1, SH
      functions - L5, P1, SH
    • Word and Term Parsing
      Stemming
      -ing, -ed, -es, -’s, -s’, etc.
      Depluralization
      Truncation
      Left and right
      Wild cards
      Organi*ation
      Variant Spellings
      Centre, center
      Hyphens
    • The taxonomy effect
      Where do the terms go?
      How are they used in search
      What other ways can I use the taxonomy in search?
    • Site search
      Search of 53 crawled sites including journals, books, web site, conference sites, etc.
      Navigation
      Bookstore search
      Search database for Journals and pubs
      For search all publications
    • Navigate the full taxonomy “tree”
      BROWSE
      Auto-completion using the taxonomy
      Guide the user
      Taxonomy Driven Search Presentation
    • A quick look behind the scenes
      Database
      Management
      System
      • Search thesaurus
      • Validate term entry
      • Block invalid terms
      • Record candidates
      • Establish rules for
      term use
      • Suggest indexing
      terms
      Thesaurus
      tool
      Indexing
      tool
      • Validate terms
      • Add terms and rules
      • Change terms and rules
      • Delete terms and rules
    • Thesaurus
      Term Record
      view
      Taxonomy
      view
    • Where does the subject metadata go?
      Apply to content itself
      Use meta name field in HTML header
      Connect search to the keywords in the SQL or other database tables
    • HTML Header
    • RDBMS Connection
      Taxonomy term table
    • Suggested taxonomy descriptors
    • Integrate taxonomy to enhance findability
      Browsable categories of a directory
      Browsable faceted navigation
      Smart search for term equivalents
      Taxonomy terms (original or modified) as labels
      Navigation aids incorporate taxonomy terms and relationships
    • More Taxonomy Enrichment
      Spelling alternatives and correction
      Related concepts
      Statistical information about the metadata
      Navigation or drill downs
      Search refinement
      Recursive sets
      Concept linking
      Dictionary lookup (in taxonomy glossary)
    • Brand is repeated in several spots and tied to search as well
    • Raw Full text data feeds
      Data Base Plus Search Workflow
      XIS Creation
      SQL for ecommerce
      Printed source materials
      Add metadata
      Data Crawls on 53+ sources
      XIS repository
      Taxonomy terms
      Load to
      Perfect Search
      MAI Concept Extractor
      Taxonomy Thesaurus Master
      MAI Rule Base
      Search Harmony Display Search
      Save data to search and repositories at the same time
    • Raw Full text data feeds
      Data Base Plus Search Workflow
      XIS Creation
      SQL for ecommerce
      Printed source materials
      XIS repository
      Data Crawls on data sources
      Add metadata
      Load to
      Search
      MAI Concept Extractor
      MAI Rule Base
      Search Harmony Display Search
      Taxonomy Thesaurus Master
      Source data
      Taxonomy terms
      Search data
      Clean and enhance data
    • Client Data
      Full Text
      HTML, PDF,
      Data Feeds, etc.
      Taxonomy In Sharepoint
      Automatic Summarization
      Search
      Presentation:90% accuracy
      Browse by Subject
      Auto-completion
      Broader Terms
      Narrower Terms
      Related Terms
      Machine Aided Indexer (M.A.I.™)
      Repository
      Search
      Software
      Inline Tagging
      Client taxonomy
      Client Taxonomy
      Metadata and Entity Extractor
      Thesaurus Master
    • What we covered
      How search works
      Measuring accuracy in search
      Search theoretical basis
      Bayes, Boole and the rest of the guys
      The taxonomy effect
    • Do the data FIRST
      What do you have?
      What does it need?
      How would you LIKE to access it?
      Look at the data BEFORE you create the specifications
      DTD built without data is not going to work
      Then choose the system that will support your data
    • Next Month
      Same time, same station
      Solving the Challenge of Connecting People and Author NetworksJay Ven Eman, Ph.D.September 14As online digital publishing continues to grow, taxonomies can be increasingly useful in connecting people with author networks through directory creation with author disambiguation and subject metadata tagging to increase the usefulness of information for researchers and community-building.
    • About Access Innovations
      49
      Access innovations are experts in content creation, enrichment and conversion services. We provide services to semantically enrich and tag and raw text into highly structured data. We deliver clean ,well formed, metadata enriched ,data so our clients can reuse repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for data.
      Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, e-commerce . We change search to found!
      Quick Facts
      • Founded in 1978
      • Headquartered in Albuquerque
      • Privately held
      • Delivered more than 2000 engagements
    • Thank you for your attention!
      Slides will be available on SLA Taxonomy Division and Access Web sites tomorrow
      Taxonomies in Search: http://www.accessinn.com/library/presentations/sla-taxonomies-in-search-aug10-2011.pptx
      Marjorie M. K. Hlava
      Access Innovations / Data Harmony
      mhlava@accessinn.com
      +505.998.0800