Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Klink-2: integrating multiple web sources to generate semantic topic networks


Published on

ISWC 2015 research paper:

The amount of scholarly data available on the web is steadily increasing, enabling different types of analytics which can provide important insights into the research activity. In order to make sense of and explore this large-scale body of knowledge we need an accurate, comprehensive and up-to-date ontology of research topics. Unfortunately, human crafted classifications do not satisfy these criteria, as they evolve too slowly and tend to be too coarse-grained. Current automated methods for generating ontologies of research areas also present a number of limitations, such as: i) they do not consider the rich amount of indirect statistical and semantic relationships, which can help to understand the relation between two topics – e.g., the fact that two research areas are associated with a similar set of venues or technologies; ii) they do not distinguish between different kinds of hierarchical relationships; and iii) they are not able to handle effectively ambiguous topics characterized by a noisy set of relationships. In this paper we present Klink-2, a novel approach which improves on our earlier work on automatic generation of semantic topic networks and addresses the aforementioned limitations by taking advantage of a variety of knowledge sources available on the web. In particular, Klink-2 analyses networks of research entities (including papers, authors, venues, and technologies) to infer three kinds of semantic relationships between topics. It also identifies ambiguous keywords (e.g., “ontology”) and separates them into the appropriate distinct topics – e.g., “ontology/philosophy” vs. “ontology/semantic web”. Our experimental evaluation shows that the ability of Klink-2 to integrate a high number of data sources and to generate topics with accurate contextual meaning yields significant improvements over other algorithms in terms of both precision and recall.

Published in: Software
  • Be the first to comment

Klink-2: integrating multiple web sources to generate semantic topic networks

  1. 1. Francesco Osborne, Enrico Motta KMi, The Open University, United Kingdom November 2015 Klink&2:)Integra0ng)Mul0ple)Web)Sources) to)Generate)Seman0c)Topic)Networks)
  2. 2. Seman&cs)vs)keywords) •  Many)systems)for)the)explora&on)of)research) •  A)good)number)of)LD)corpus)describing)scholarly)data) –  Nature)LD,)Bio2RDF,)AGRIS)LOD,)RDK,)DBLP++,)SW)Dog)Food,)Seman&c)Web) Journal,)Springer)LOD,)Aminer)FOAF,)Dataset)Scholarometer)) 2
  3. 3. From)keywords)to)research)topics) For)making)sense)of)academic)data)is)very)useful)to)have)an) comprehensive)and)upNtoNdate)ontology)of)research)topics.) ) Unfortunately:) •  human)craCed)classifica&ons)evolve)too)slowly)and)tend)to)be)too) coarse&grained.) •  Current)automated)methods)for)genera&ng)ontologies)of)research) topics:) –  ignore)many)indirect)sta&s&cal)and)seman&c)rela&onships) –  do)not)support)different)kinds)of)hierarchical)rela&onships) –  are)not)able)to)handle)effec&vely)ambiguous)topics)characterized)by)a)noisy) set)of)rela&onships.)) 3
  4. 4. Our)first)solu&on:)Klink) Osborne,)F.)and)Mo/a,)E.)(2012))Mining) Seman:c)Rela:ons)between)Research) Areas.)Interna:onal)Seman:c)Web) Conference,)Boston,)MA)
  5. 5. Some)examples:)Seman&c)Network)of)Topics) Osborne,)F.,)Mo/a,)E.)and)Mulholland,)P.)(2013))Exploring)Scholarly) Data)with)Rexplore,)Interna:onal)Seman:c)Web)Conference,)Sydney,) Australia)
  6. 6. Main SW Communities (2000 – 2010) Some)examples:)TopicNbased)Community)detec&on) Osborne,)F.,)Scavo,)G.)and)Mo/a,)E.)(2014))A)Hybrid)Seman:c) Approach)to)Building)Dynamic)Maps)of)Research)Communi:es,) EKAW)2014,)Linkoping,)Sweden)
  7. 7. KlinkN2) Klink&2)is)more)scalable)and)introduces)a)number)of)new) features,)and)is)able:)) •  to)scale)up)to)large)interdisciplinary)ontologies) –  )It)is)able)to)generate)the)topic)ontology)incrementally) •  to)handle)ambiguous)keywords) –  e.g.,)“java)(programming)”,)“java)(Indonesia)”,)“java)(Coffee)”) •  to)take)as)input)any)kind)of)sta0s0cal)or)seman0c) rela0onship) –  )e.g.,)involving)authors,)organiza0ons,)venues…)
  8. 8. K1) K2) K) K) K) K) K) K)K) K)K) K) K) K) K) K)K) K) K) K) K) K) K) K) K)K) K)K) K) K) K) A) A) A) A) A) A) O) O) O) O) O) V) V) V) V) V K) K) K) Klink) Klink&2) K1) K2) Venues) Authors)Organiza0ons) Keywords)Keywords) Rela&onships)used)in)Klink)and)KlinkN2.))
  9. 9. KlinkN2)data)model) •  skos:broaderGeneric.)We)reuse)this)property)from)the)SKOS) model,)to)indicate)the)intui&ve)no&on)that)an)area)is)a)sub& area)of)another)one.) •  contributesTo.)This)is)defined)as)a)subNproperty)of)skos:related) and)indicates)that)R1)research)outputs)are)relevant)to)R2.) •  relatedEquivalent.)Defined)as)a)subNproperty)of)skos:related,) which)indicates)that)two)topics)can)be)treated)as)equivalent) for)the)purpose)of)exploring)research.) 9
  10. 10. 10 Statistical Inferences skos:relatedEquivalent skos:broaderGeneric contributesTo Filtering Triples generation K) K) K) K) K) K)K) K)K) K) K) K) A) A) A) A) A) A) O) O) O) O) O) V) V) V) V) V) K) K) K) K1) K2) Venues) Authors) Organiza0ons) Keywords) Linked)Data)Cloud) Clusterization Disambiguation Input keywords Klink-2
  11. 11. Sta&s&cal)indicators) Hierarchical)rela0onship)(skos:broaderGeneric,)contributesTo))) 11 RelatedEquivalent)rela0onship)
  12. 12. Handling)ambiguous)keywords) KlinkN2)address)mainly)three)categories)of)ambiguous)keywords:) •  Terms)which)actually)have)two)or)more)different)meanings) –  )e.g.,)“owl”,)the)ontology)web)language,)and)“owl”,)the)bird.)) •  Vague)terms,)with)meaning)that)can)change)according)to)the) paper)they)are)associated)to) –  )e.g.,)“mapping”,)“indexing”,)“performance”.) •  Terms)that)used)to)have)a)unique)meaning,)but)are)now)used) in)specialized)ways)by)different)research)communi0es) –  e.g.)“ontology”.)) 12 1 2
  13. 13. An)Example:)Java)(Programming)Language)) 13Klink-2 approach
  14. 14. An)Example:)Java)(Programming)Language)) 14Klink-2 approach HOW? 1.  Klink-2 runs a hierarchical bottom-up clustering algorithm on the set of associates keywords. 2.  If the algorithm yields more than one cluster, Klink-2 run a slower and more accurate clusterization algorithm which considering only the entities associated with disambiguator keywords. 3.  If the process yields more than one cluster, the original keyword is used to produce as many disambiguated topics as the resulting number of clusters.)
  15. 15. Evalua&on) 15 We)tested)four)different)methods:)) •  the)classic)subsump0on)method)(labelled)S);) •  the)original)Klink)algorithm)(labelled)K);) •  a)first)version)of)Klink&2,)with)the)ability)of)integra&ng) mul&ple)rela&onships,)but)not)addressing)ambiguous) keywords)(labelled)KR);) •  the)final)version)of)Klink&2,)with)also)the)ability)to)detect)and) split)ambiguous)keywords)in)contextual)mode)(labelled)K2);)
  16. 16. Evalua&on) 16
  17. 17. Evalua&on) 17
  18. 18. Current)situa&on) •  We)are)collabora&ng)with)major)academic)publishers,) such)as)Elsevier)and)Springer.) •  We)run)KlinkN2)on)a)por&on)of)Scopus)data)about) Computer)Science.)We)obtained)a)large&scale)ontology) consist)of)about))15)000)topics)linked)by)about)70)000) seman&c)rela&onships.)) •  We)are)developing)a)new)version)of)Rexplore) ( full)advantage)of)KlinkN2)
  19. 19. Future)Direc&ons) •  Diachronic)analysis)of)topic)meanings.) •  Allowing)KlinkN2)to)analyze)paradigms,)technologies,) datasets,)tools)and)so)on.) •  Exploi&ng)KlinkN2)ontology)in)a)variety)of)ways)to)produce) smart)analy0cs)of)research)data)