JIST 2012
Upcoming SlideShare
Loading in...5
×
 

JIST 2012

on

  • 701 views

 

Statistics

Views

Total Views
701
Views on SlideShare
701
Embed Views
0

Actions

Likes
1
Downloads
20
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    JIST 2012 JIST 2012 Presentation Transcript

    • Utilizing Linked Open Data (LOD) Resources for Semantic Enhancement of User-Generated Content Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3, Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3 1ITC, University of Twente, Enschede, the Netherlands 2Institute of Information Science & 3Biodiversity Research Center, Academia Sinica, Taipei, Taiwan 4Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, TaiwanThursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 2Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 3Thursday, February 7, 2013
    • Background  Web 2.0 technologies enable people to contribute their content on the web, e.g. wiki, blog, tagging  Social media utilize web 2.0 technologies to support social interactive on the web, e.g. twitter, flickr, facebook  The content on the web (or/and social media) contributed by people is called “User-Generated Content” (UGC)  UGC is mainly multimedia or textual data  UGC is considered as a potential resource for scientific projects, e.g. citizen science JIST2012 2012/12/3 4Thursday, February 7, 2013
    • Background(cont.)  There are several problems to harvest UGC to scientific purposes  The unstructured UGC is difficult to handle  The semantics of UGC is often ambiguous or/and poor  Social media is not designed for scientific purposes Courtesy from http://www.datenform.de/mapeng.html JIST2012 2012/12/3 5Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 6Thursday, February 7, 2013
    • Motivation  LOD datasets as resources  LOD aims on how to make data available on the Web, and to interconnect data with the aim of increasing its value for users  about 300 datasets consisting of over 31 billion RDF triples within LOD projects.  Each entry representing a fact in LOD datasets has a Unique Resource Identifier (URI) which is referenceable and linkable on the Web.  The high interconnectivity between entries potentially increases discoverability, reusability, and the utility of information JIST2012 2012/12/3 7Thursday, February 7, 2013
    • Motivation (cont.)  Therefore, if named entities of UGC can be identified and connected to entries of LOD, the semantics of named entities would be disambiguated, so that the UGC could be easier to process. JIST2012 2012/12/3 8Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 9Thursday, February 7, 2013
    • Data collection  Two Facebook interest groups for ecological observations in Taiwanhttp://www.facebook.com/groups/roadkilled/ http://www.facebook.com/groups/enjoymoths/ JIST2012 2012/12/3 10Thursday, February 7, 2013
    • Ecological Observations on Facebook JIST2012 2012/12/3 11Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 12Thursday, February 7, 2013
    • LOD Ecology  Linked Open Data of Ecology (LODE) is a validated dataset from a LOD project.  LODE integrated 5 previously distributed databases: TFRI: Taiwan Forestry Research Institute JIST2012 2012/12/3 13Thursday, February 7, 2013
    • LODE in Linked Open Data Cloud JIST2012 2012/12/3 14Thursday, February 7, 2013
    • LODE in Linked Open Data Cloud JIST2012 2012/12/3 14Thursday, February 7, 2013
    • LOD Taiwan Geographic Name (TGN)  LOD TGN is mainly transferred from Taiwan Gazetteer via LOD principles  LOD TGN has 159,241 geographic name entries, in which 17,442 entries are linked to geonames.org JIST2012 2012/12/3 15Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 16Thursday, February 7, 2013
    • An approach for processing UGC Information Extraction Information Reuse Information Formalization JIST2012 2012/12/3 17Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 18Thursday, February 7, 2013
    • Problems in Chinese species names in Facebook ecological observations 曙鳳蝶 (Atrophaneura Horishana) 曙鳳 (1) 玉帶鳳蝶 (Papilio Polytes) 玉帶 琉璃紋鳳蝶 (Papilio Hermosanus) 琉璃 Adjective Noun 細紋 (pronounced Si-Wen, meaning “fine veined” 細紋黃鉤蛾 (2) 細紋蠍蛉 細紋新蠍蛉 ...15 species names with prefix name “細紋” JIST2012 2012/12/3 19Thursday, February 7, 2013
    • Identifying shortened species names Confidence value = JIST2012 2012/12/3 20Thursday, February 7, 2013
    • Determine a species name for a thread  What if several species names had mentioned in one thread? We used three criteria  How many Like does the post or the comments get?  How prestigious are the people who post or make comments?  How many times does a species name occur in a thread? JIST2012 2012/12/3 21Thursday, February 7, 2013
    • The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 Te-Sheng-Jhong-Sin JIST2012 2012/12/3 22Thursday, February 7, 2013
    • The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 There are no rules to Te-Sheng-Jhong-Sin shorten long geographic names JIST2012 2012/12/3 22Thursday, February 7, 2013
    • Identifying shortened geographic names JIST2012 2012/12/3 23Thursday, February 7, 2013
    • The ontology...  is relied on a Facebook thread, which is an entity comprised of social media contents involving peoples, places, time periods, photos, and links to other contents  uses standard vocabularies,  Semantically-Interlinked Online communities (SIOC) can be used to represent the structure of Facebook posts, comments, and threads.  Friend of a Friend (FOAF) can be used to describe content creators,  and Dublin Core for the interlinked contents they created JIST2012 2012/12/3 24Thursday, February 7, 2013
    • An ontology for formalizing the extracted information from Facebook threads JIST2012 2012/12/3 25Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 26Thursday, February 7, 2013
    • Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27Thursday, February 7, 2013
    • Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27Thursday, February 7, 2013
    • The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
    • The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
    • The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
    • The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28Thursday, February 7, 2013
    • A taxon of Theretra Nessus is the extracted species name JIST2012 2012/12/3 29Thursday, February 7, 2013
    • A taxon of Theretra Nessus is the extracted species name This entry is connected to LODE via owl:sameAs JIST2012 2012/12/3 29Thursday, February 7, 2013
    • The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
    • The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
    • The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
    • The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30Thursday, February 7, 2013
    • The entry of LOD TGN transferred from Taiwan Gazetteer JIST2012 2012/12/3 31Thursday, February 7, 2013
    • The entry of LOD TGN transferred from Taiwan Gazetteer It is linked to geonames.org via owl:sameAs JIST2012 2012/12/3 31Thursday, February 7, 2013
    • Publish the processed Facebook ecological observations JIST2012 2012/12/3 32Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 33Thursday, February 7, 2013
    • A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34Thursday, February 7, 2013
    • A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34Thursday, February 7, 2013
    • A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34Thursday, February 7, 2013
    • JIST2012 2012/12/3 35Thursday, February 7, 2013
    • Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 36Thursday, February 7, 2013
    • Conclusion remarking  This study reports our experiences in transferring FB ecological observations to interlink to LOD resources (LODE and LOD TGN)  With these information extraction tools and LOD resources, we developed a tool for semantic enhancement of user input.  The LOD TGN is an ongoing project.  In the future, we will consolidate the feature types of the geographic names, and we plan to make the LOD TGN a geospatial semantics reference resource. JIST2012 2012/12/3 37Thursday, February 7, 2013
    • Thank you for your attentions Questions? deng@itc.nl JIST2012 2012/12/3 38Thursday, February 7, 2013