Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

JIST 2012

1,157 views

Published on

  • Be the first to comment

JIST 2012

  1. 1. Utilizing Linked Open Data (LOD) Resources for Semantic Enhancement of User-Generated Content Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3, Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3 1ITC, University of Twente, Enschede, the Netherlands 2Institute of Information Science & 3Biodiversity Research Center, Academia Sinica, Taipei, Taiwan 4Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, TaiwanThursday, February 7, 2013
  2. 2. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 2Thursday, February 7, 2013
  3. 3. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 3Thursday, February 7, 2013
  4. 4. Background  Web 2.0 technologies enable people to contribute their content on the web, e.g. wiki, blog, tagging  Social media utilize web 2.0 technologies to support social interactive on the web, e.g. twitter, flickr, facebook  The content on the web (or/and social media) contributed by people is called “User-Generated Content” (UGC)  UGC is mainly multimedia or textual data  UGC is considered as a potential resource for scientific projects, e.g. citizen science JIST2012 2012/12/3 4Thursday, February 7, 2013
  5. 5. Background(cont.)  There are several problems to harvest UGC to scientific purposes  The unstructured UGC is difficult to handle  The semantics of UGC is often ambiguous or/and poor  Social media is not designed for scientific purposes Courtesy from http://www.datenform.de/mapeng.html JIST2012 2012/12/3 5Thursday, February 7, 2013
  6. 6. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 6Thursday, February 7, 2013
  7. 7. Motivation  LOD datasets as resources  LOD aims on how to make data available on the Web, and to interconnect data with the aim of increasing its value for users  about 300 datasets consisting of over 31 billion RDF triples within LOD projects.  Each entry representing a fact in LOD datasets has a Unique Resource Identifier (URI) which is referenceable and linkable on the Web.  The high interconnectivity between entries potentially increases discoverability, reusability, and the utility of information JIST2012 2012/12/3 7Thursday, February 7, 2013
  8. 8. Motivation (cont.)  Therefore, if named entities of UGC can be identified and connected to entries of LOD, the semantics of named entities would be disambiguated, so that the UGC could be easier to process. JIST2012 2012/12/3 8Thursday, February 7, 2013
  9. 9. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 9Thursday, February 7, 2013
  10. 10. Data collection  Two Facebook interest groups for ecological observations in Taiwanhttp://www.facebook.com/groups/roadkilled/ http://www.facebook.com/groups/enjoymoths/ JIST2012 2012/12/3 10Thursday, February 7, 2013
  11. 11. Ecological Observations on Facebook JIST2012 2012/12/3 11Thursday, February 7, 2013
  12. 12. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 12Thursday, February 7, 2013
  13. 13. LOD Ecology  Linked Open Data of Ecology (LODE) is a validated dataset from a LOD project.  LODE integrated 5 previously distributed databases: TFRI: Taiwan Forestry Research Institute JIST2012 2012/12/3 13Thursday, February 7, 2013
  14. 14. LODE in Linked Open Data Cloud JIST2012 2012/12/3 14Thursday, February 7, 2013
  15. 15. LODE in Linked Open Data Cloud JIST2012 2012/12/3 14Thursday, February 7, 2013
  16. 16. LOD Taiwan Geographic Name (TGN)  LOD TGN is mainly transferred from Taiwan Gazetteer via LOD principles  LOD TGN has 159,241 geographic name entries, in which 17,442 entries are linked to geonames.org JIST2012 2012/12/3 15Thursday, February 7, 2013
  17. 17. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 16Thursday, February 7, 2013
  18. 18. An approach for processing UGC Information Extraction Information Reuse Information Formalization JIST2012 2012/12/3 17Thursday, February 7, 2013
  19. 19. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 18Thursday, February 7, 2013
  20. 20. Problems in Chinese species names in Facebook ecological observations 曙鳳蝶 (Atrophaneura Horishana) 曙鳳 (1) 玉帶鳳蝶 (Papilio Polytes) 玉帶 琉璃紋鳳蝶 (Papilio Hermosanus) 琉璃 Adjective Noun 細紋 (pronounced Si-Wen, meaning “fine veined” 細紋黃鉤蛾 (2) 細紋蠍蛉 細紋新蠍蛉 ...15 species names with prefix name “細紋” JIST2012 2012/12/3 19Thursday, February 7, 2013
  21. 21. Identifying shortened species names Confidence value = JIST2012 2012/12/3 20Thursday, February 7, 2013
  22. 22. Determine a species name for a thread  What if several species names had mentioned in one thread? We used three criteria  How many Like does the post or the comments get?  How prestigious are the people who post or make comments?  How many times does a species name occur in a thread? JIST2012 2012/12/3 21Thursday, February 7, 2013
  23. 23. The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 Te-Sheng-Jhong-Sin JIST2012 2012/12/3 22Thursday, February 7, 2013
  24. 24. The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 There are no rules to Te-Sheng-Jhong-Sin shorten long geographic names JIST2012 2012/12/3 22Thursday, February 7, 2013
  25. 25. Identifying shortened geographic names JIST2012 2012/12/3 23Thursday, February 7, 2013
  26. 26. The ontology...  is relied on a Facebook thread, which is an entity comprised of social media contents involving peoples, places, time periods, photos, and links to other contents  uses standard vocabularies,  Semantically-Interlinked Online communities (SIOC) can be used to represent the structure of Facebook posts, comments, and threads.  Friend of a Friend (FOAF) can be used to describe content creators,  and Dublin Core for the interlinked contents they created JIST2012 2012/12/3 24Thursday, February 7, 2013
  27. 27. An ontology for formalizing the extracted information from Facebook threads JIST2012 2012/12/3 25Thursday, February 7, 2013
  28. 28. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 26Thursday, February 7, 2013
  29. 29. Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27Thursday, February 7, 2013
  30. 30. Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27Thursday, February 7, 2013
  31. 31. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
  32. 32. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
  33. 33. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
  34. 34. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
  35. 35. A taxon of Theretra Nessus is the extracted species name JIST2012 2012/12/3 29Thursday, February 7, 2013
  36. 36. A taxon of Theretra Nessus is the extracted species name This entry is connected to LODE via owl:sameAs JIST2012 2012/12/3 29Thursday, February 7, 2013
  37. 37. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
  38. 38. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
  39. 39. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
  40. 40. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
  41. 41. The entry of LOD TGN transferred from Taiwan Gazetteer JIST2012 2012/12/3 31Thursday, February 7, 2013
  42. 42. The entry of LOD TGN transferred from Taiwan Gazetteer It is linked to geonames.org via owl:sameAs JIST2012 2012/12/3 31Thursday, February 7, 2013
  43. 43. Publish the processed Facebook ecological observations JIST2012 2012/12/3 32Thursday, February 7, 2013
  44. 44. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 33Thursday, February 7, 2013
  45. 45. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34Thursday, February 7, 2013
  46. 46. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34Thursday, February 7, 2013
  47. 47. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34Thursday, February 7, 2013
  48. 48. JIST2012 2012/12/3 35Thursday, February 7, 2013
  49. 49. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 36Thursday, February 7, 2013
  50. 50. Conclusion remarking  This study reports our experiences in transferring FB ecological observations to interlink to LOD resources (LODE and LOD TGN)  With these information extraction tools and LOD resources, we developed a tool for semantic enhancement of user input.  The LOD TGN is an ongoing project.  In the future, we will consolidate the feature types of the geographic names, and we plan to make the LOD TGN a geospatial semantics reference resource. JIST2012 2012/12/3 37Thursday, February 7, 2013
  51. 51. Thank you for your attentions Questions? deng@itc.nl JIST2012 2012/12/3 38Thursday, February 7, 2013

×