Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Exploiting Ontological Relations for           Automatic Semantic Tag Recommendation             Panos Alexopoulos, John P...
Presentation Outline Background, Focus & Approach Tag Recommendation Framework Example          & Experimental Evaluati...
Background   Tagging is a textual annotation technique that involves assigning to a    document terms and phrases that ar...
Focus   We are interested in tagging not so much as a classification process    but rather as a summarization one.   Thi...
Approach & Assumptions   We follow a “detective-like” approach and we try to find evidence    within the document that po...
Approach & Assumptions   For example, in a historical document, a given event is a likely tag    when the text also conta...
Proposed Tagging Framework   Two components:            A tagging context model that models the relative importance of  ...
Tagging Context   Based on the first assumption, we say than an instance A is supported by    another (related to it) ins...
Tag Recommendation Process1.    We define the Tagging Context Model for the given scenario        •      We select the ont...
Tag Recommendation Process•       Using the ontological relations of the context we derive from the        above set of te...
Tag Recommendation Process1.      For each of the candidate tag we calculate three scores:        •           Its Ontologi...
Ontological Support   Intuitively, the ontological support of a candidate tag is analogous to    the number and support d...
Ontological Ambiguity   It is the degree to which the ontology entities that are found within the    text and support the...
Ontological Ambiguity14 l June 5, 2012
Tag Confidence   To derive an overall score about a given tag‘s likelihood that it actually    characterizes the text we ...
Example   Tagging a film review about the film “Steel”“Hows this for diminishing returns? In BATMAN AND ROBIN, George Clo...
Experimental Evaluation   Dataset of 1000 film reviews randomly selected from imdb.   Film ontology derived from Freebas...
Summary We proposed a novel method for automatically generating and    recommending semantic tags for text documents in a...
Future Work Establishment of the effectiveness of our method in more complex    and ambiguous domains and with larger dat...
Thank you…                    Panos Alexopoulos                    Semantic Solutions Architect - Researcher              ...
Upcoming SlideShare
Loading in …5
×

Semantic Tag Recommendation

461 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Semantic Tag Recommendation

  1. 1. Exploiting Ontological Relations for Automatic Semantic Tag Recommendation Panos Alexopoulos, John Pavlopoulos, Manolis Wallace, Konstantinos Kafentzis 7th International Conference on Semantic Systems September 7th, Graz, Austria1 l June 5, 2012
  2. 2. Presentation Outline Background, Focus & Approach Tag Recommendation Framework Example & Experimental Evaluation Conclusions & Future Work2 l June 5, 2012
  3. 3. Background Tagging is a textual annotation technique that involves assigning to a document terms and phrases that are representative of its semantic content. The term “representative” may have a different interpretation depending on the reason why tagging is employed:  Classifying a document by means of concepts that represent meaningful categories for the document user.  Summarizing a documents content by means of keywords that constitute a representative description of what the document «talks specifically about»  Characterizing a document by means of proper adjectives that denote some kind of judgment(e.g. “positive“, “negative“) In all cases, the automatic generation of tags remains an open issue.3 l June 5, 2012
  4. 4. Focus We are interested in tagging not so much as a classification process but rather as a summarization one. This means, for example, that we do not wish to determine whether a given document is about sports or politics but which specific sport events or politicians it is about. The challenge in this kind of identification is to be able to distinguish between the keywords that play a central role to the documents meaning and those that are just complementary to it. For example, a piece of news might make reference to many politicians even when its primary subject is only one of them.4 l June 5, 2012
  5. 5. Approach & Assumptions We follow a “detective-like” approach and we try to find evidence within the document that point towards the correct tag(s). We assume that we have available for the documents we wish to tag some comprehensive ontology that describes their domain. Since we wish to tag the documents with specific entities (e.g. films, director, actors etc) we consider as candidate tags the instances of the ontologys concepts. We then consider two fundamental premises: 2. That an instance is more likely to represent the text‘s meaning when there are many other ontologically related to it instances in the text. 3. That not all relations in a domain ontology are equally important in the above process.5 l June 5, 2012
  6. 6. Approach & Assumptions For example, in a historical document, a given event is a likely tag when the text also contains persons, locations and other events related to this event. Similarly, in a document about cinema, a given film is a likely tag when the text also contains persons involved in the film (directors, actors, characters etc.). However, the relation linking films and films characters is more important than the relation linking films and directors when it comes to determining whether the text actually refers to a given film. The reason is that the reference within a specific film text of a film character that wasnt part of it, is less likely than the reference of a director.6 l June 5, 2012
  7. 7. Proposed Tagging Framework Two components:  A tagging context model that models the relative importance of ontological relations to the tag identification process.  A tag recommendation process that determines, for a given text, the ontology instances that are potential tags for it and recommends to the user the ones with the highest confidence score.7 l June 5, 2012
  8. 8. Tagging Context Based on the first assumption, we say than an instance A is supported by another (related to it) instance B when the existence of B in a text is an indication that A is a potential tag. The Tagging Context defines for each ontology concept which relations and to what extent should be used for determining the level of support that the concept’s instances “enjoy” from other instances in a given text. For example, for the concept Film a potential tagging context would be:  hasDirector: 0.7  hasActor: 0.5  hasCharacter: 0.9 That would mean that films are generally supported firstly by characters, then by directors and lastly by actors that are related to them.8 l June 5, 2012
  9. 9. Tag Recommendation Process1. We define the Tagging Context Model for the given scenario • We select the ontology concepts whose instances are going to be used as candidate tags (tag concepts) in the given scenario. • For each of these concepts we determine the ontological relations that have this class within their domain and we define their relative importance in the tagging process.• We extract from the text the terms that match to instances of the tag concepts as well those that match to instances of the concepts that fall within the range of the tag ontological relations. • E.g. in the film example we would extract films, directors, actors and characters.9 l June 5, 2012
  10. 10. Tag Recommendation Process• Using the ontological relations of the context we derive from the above set of terms those that are related to instances of the tag concepts. The derived terms comprise the candidate tags for the text • E.g. in the film example, candidate tags would be the extracted films as well as those that are related to the extracted actors, directors and characters.• For each candidate tag we compute the set of text-found instances that support the candidacy of the tag to a degree derived from the tagging context. • E.g. for the candidate tag “Annie Hall” a possible set could be: • Woody Allen:0.7 • Annie Hall: 1 • Alvy Singer: 0.910 l June 5, 2012
  11. 11. Tag Recommendation Process1. For each of the candidate tag we calculate three scores: • Its Ontological Support, namely the degree to which the instances found within the text imply that the candidate tag actually characterizes the text. • Its Ontological Ambiguity, namely the degree to which the ontology entities that are found within the text and support the tag, support other tags as well. • Its Tag Confidence, namely the relative confidence that the tag actually characterizes the text.11 l June 5, 2012
  12. 12. Ontological Support Intuitively, the ontological support of a candidate tag is analogous to the number and support degrees of the tags supporting text instances. Thus we calculate the overall support score for a given candidate tag as the sum of the its partial supports, weighted by the relative number of the instances that support it. This weighting is important as our aim is to compute for the tag a support that is relative to the supports of the other tags.12 l June 5, 2012
  13. 13. Ontological Ambiguity It is the degree to which the ontology entities that are found within the text and support the tag, support other tags as well. Intuitively, for a given tag, the more additional tags its support set also supports, the higher is the tag’s ambiguity. The ambiguity between two tags is calculated as follows:  First we find the set of text extracted instances that commonly support the two tags.  Then we calculate the contribution degree of the common set to each of the wholes based on the following intuition:  If the two contributions are high and comparable then the ambiguity is high.  If they are low and comparable then the ambiguity is low  If they are non-comparable (i.e. one relatively high and one relatively low) then again the ambiguity is low.13 l June 5, 2012
  14. 14. Ontological Ambiguity14 l June 5, 2012
  15. 15. Tag Confidence To derive an overall score about a given tag‘s likelihood that it actually characterizes the text we use its ambiguity score in order to “adjust” its initial ontological support. The intuition here is that the more ambiguous is the tag, the less “reliable” is its support. Thus, the overall tag confidence is calculated as follows: Tuning variable w is a weight that adjusts the influence of the ambiguity score to the total confidence. The above is not an absolute measure of tag confidence but rather a relative one that enables the ranking of the candidate tags.15 l June 5, 2012
  16. 16. Example Tagging a film review about the film “Steel”“Hows this for diminishing returns? In BATMAN AND ROBIN, George Clooneybattled Arnold Schwarzenegger. In SPAWN, it was Michael Jai White versus JohnLeguizamo. In STEEL, the third and presumably final superhero stretch of thesummer, Shaquille ONeal dons a high-tech, hand-crafted suit of armor to combatthe earth-shaking, world-shattering, super-duper-ultra evil menace of... JuddNelson? ...”16 l June 5, 2012
  17. 17. Experimental Evaluation Dataset of 1000 film reviews randomly selected from imdb. Film ontology derived from Freebase • ~148000 films, ~36000 directors, ~145000 actors, ~63000 characters For each review we generated two set of recommended tags  One using a basic key phrase extraction methodology in which we assumed that the most frequent film found within the text is the one the text is talking about.  One using our proposed method using the tagging context of the example. We then measured the success of each method by measuring the number of cases in which the tag with the highest confidence was the correct one. Our method achieved a score of 89% while the basic method only 45%.17 l June 5, 2012
  18. 18. Summary We proposed a novel method for automatically generating and recommending semantic tags for text documents in an effort to summarize the intended meaning of their content. Our approach has been based on the customized utilization of domain-specific ontological relations for extracting and evaluating “evidence“ from within the text that may identify the correct tag(s) in the given tagging scenario. A comprehensive experimental evaluation of the method highlighted its high effectiveness for the tag recommendation task.18 l June 5, 2012
  19. 19. Future Work Establishment of the effectiveness of our method in more complex and ambiguous domains and with larger datasets. Extension/enhancement of the method in order to:  Cope with knowledge incompleteness and/or inconsistency.  Determine the values of the tagging context in an automated way.  Determine the exact number of appropriate tags for the document.19 l June 5, 2012
  20. 20. Thank you… Panos Alexopoulos Semantic Solutions Architect - Researcher IMC Technologies SA 2 Marathonos Str. & 360A Kifissias Av. 15233, Athens, Greece T: +30 210 6927 378 F: +30 210 6926 813 E: palexopoulos@imc.com.gr W: www.imc.com.gr20 l June 5, 2012

×