• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Semantic Search with Topic Maps
 

Semantic Search with Topic Maps

on

  • 2,022 views

A description of a possible approach to a true semantic search based on Ontopia and Topic Maps, presented at TMRA 2009.

A description of a possible approach to a true semantic search based on Ontopia and Topic Maps, presented at TMRA 2009.

Statistics

Views

Total Views
2,022
Views on SlideShare
2,017
Embed Views
5

Actions

Likes
4
Downloads
42
Comments
0

2 Embeds 5

http://www.slideshare.net 4
http://tmra.de 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Semantic Search with Topic Maps Semantic Search with Topic Maps Presentation Transcript

  • Towards Semantic Search with Topic Maps Lars Marius Garshol <larsga@bouvet.no> TMRA 2009, November 12, Leipzig
  • What this talk is about
    • Basically, moving from full-text search to a more semantic form of search
      • if the user types “hotels Leipzig” can we do something more than look for documents containing these two words?
      • for example, can we turn this into “find hotels located in Leipzig” ?
    • It describes some personal experiments with new approaches
      • what is described here needs more work
  • Two kinds of search
    • Web-wide search and site-wide search
      • these two are not the same kind of search
      • the former means searching everything
      • the second means searching in a limited domain
    • This proposal only deals with site-wide search
      • to make it work for web-wide search is hard
      • so we don’t do that
  • Two (other) kinds of search
    • Natural language search
      • where users put questions to the machine, typically using something approaching complete sentences
      • users are assumed to be at least somewhat familiar with the domain
    • Web-site search
      • users behave unpredictably
      • users do not necessarily know the domain
      • users are unaware of what search technology is used
      • users cannot be trained
  • Algorithm
    • (1) Parse query into a list of tokens
      • categorize tokens as “instance”, “topic type”, “unknown”, ...
    • (2) Build an interpretation from the token list
      • the interpretation is a tolog query
      • if none found, fall back to full-text
    • (3) Verify interpretation against schema
      • if one is present, that is
    • (4) Run chosen interpretation, present results
      • also present interpretation, so the user knows what is happening
      • allow the user to override and fall back to normal full-text search
  • Tokens
    • The types of tokens are:
      • T topic type (e.g., “person”)
      • I instance topic (e.g., “Lars Marius Garshol”)
      • A association type (e.g., “employed by”)
      • ? unrecognized word (e.g., “TMRA”)
    • For example, the search “hotels Leipzig ” would typically be parsed into to the following list of tokens
      • T hotel, topic type
      • I Leipzig , instance of city
  • Example: a photo topic map
    • I use a topic map to organize my digital photos
      • it now holds ~13,000 photos
      • online at http://www.garshol.priv.no/tmphoto/
    • A web application is used for search and navigation
      • I’ve added the semantic search to this application for demonstration purposes
    Photo Person Event Category Location
  •  
  •  
  • Hierarchies
    • In many cases, the generic “I” interpretation is too simplistic
      • none of the Sam Oh photos are marked as being taken in Canada, they are all marked as being taken in places that are contained in Canada
      • this is a very common case
    • Solved by using ontology annotation
      • Kal Ahmed has published a set of PSIs for indicating hierarchical association types
      • these are used by the Ontopia tools, at least
      • these can be used to pick up hierarchical association types and extending the interpretation of “I” terms to handle them
  •  
  •  
  • Hotel Europa is in Montreal. Ste Brigitte des Saults is on the road between Montreal and Quebec City.
  •  
  • Verifying the interpretation
    • Not all interpretations can actually produce results
      • for example, “puccini tenor” does not work, because no topics are related to both
    • We can actually work this out, based on the schema, because
      • there is no topic type to which both composers and voice types can be related
      • studying the schema will tell us this
    • Studying the schema also helps us explain the interpretation to the user
    Sam Oh Montr éal person photo person location location
  • How to use this with your topic map
    • Install the component, then search
    • No configuration is necessary!
    • However, for better results you may want to
      • add more names for some topics
      • mark hierarchical association types as such (should be done already)
      • mark topic types with large instance sets as such
  • Current implementation
    • Just a Jython script using Ontopia
      • 541 lines
      • builds a set of token objects, then a set of constraint objects
      • then introspects the schema to remove hopeless constraints
    • Stemming is still missing!
      • need to modify Ontopia full-text search to do this
    • Run from a JSP file by means of the Jython API
      • just 10-15 lines of glue code
    • Longer-term this may turn into a proper Ontopia component
      • time horizon not at all clear
  • Weaknesses
    • No relevance ranking
      • given “beer Oslo”, all found photos are equally closely tied to “beer” and “Oslo”
      • there is nothing to rank their relevance by
      • on the other hand, all hits are definitely relevant to the query as given
    • Homonym support too simplistic
      • it’s not clear that it will actually handle all cases in practice
      • a better approach would be to construct multiple interpretations and then choose between them
      • ideally the user should be allowed to override the choice
    • Very closely tied to topic map structure
      • if the user uses the wrong terms, the approach does not work
      • only allows structured searches along the dimensions actually in the topic map
      • how much of an issue this is is likely to depend on the application
  • Do users actually query this way?
    • Literature studies and log mining indicate that:
      • nearly all queries are just 1 or 2 words
      • 2-word queries tend to be either
        • the name of a entity (New York), or
        • qualified searches (Montr éal city )
    • Conclusion
      • this feature has to be used with caution
      • it may work best when users can be told about it
      • site feedback may encourage users to use it more
    • More work is needed on this
  • Taking this further
    • Limitations
      • so far all queries use a single variable
      • no understanding of association types
      • no understanding of occurence types
      • no notion of ordering (first, last, biggest, smallest, ...)
    • This can be implemented
      • an earlier prototype could interpret queries such as “operas based on works written by Shakespeare”
      • other elements also implementable
    • However, this takes the system further away from normal user searches
      • more thinking needed on how to handle this
      • make it a semi-formal language?
      • turn it into a full natural language search component?
  • Conclusion
    • The system really does have a kind of semantic understanding
      • you type “beer Oslo”, and it says “I think you want photos of beer taken in Oslo”
    • Easy to implement
      • no configuration necessary
      • component can be plugged into any web application based on Ontopia
      • (also easy to implement on top of other Topic Maps engines)
    • Does not match current user behaviour
      • more work necessary on this
    • Not as advanced as it could be
      • single-variable queries only
      • no understanding of association types
      • more work to be done on this, too