Transcript of "Harvesting crowdsourcing biodiversity data from Facebook groups"
Harvesting crowdsourcing biodiversity data from Facebook groups Jason Guan-Shuo Mai1, Cheng-Hsin Hsu1, Dong-Po Deng2, De-En Lin3, Hsu-Hong Lin3, Kwang-Tsao Shao11 Taiwan Biodiversity Information Facility (TaiBIF), Biodiversity Research Center, Academia Sinica, Taipei, Taiwan2 Institute of Information Science, Academia Sinica, Taipei, Taiwan3 Taiwan Endemic Species Research Institute, Council of Agriculture, Nantou, TaiwanThe emergence of Web 2.0 enables people to contribute their biodiversity observations on the Web. These crowdsourcing biodiversity data are increasing theirvalue in scientific studies due to the potentially broader spatial and temporal scales. However, the data provided in plain text hinder the process of data retrievaland analysis. In this study, we propose a framework to automatically structure the loose-format text so that volunteers can keep providing data in their ownfamiliar ways, while interested citizens, biodiversity researchers and managers can benefit from the semantically structured information. We take 2 Facebookbiodiversity interest groups Reptile-Road-Mortality and Enjoy-Moths as examples. 0. Crowdsourcing - Thread participants provide 2. Using natural language Post message unstructured data processing techs with Taiwan voluntarily Geographic Name and Taiwan Post Picture Catalogue of Life databases as Facebook interest groups knowledge bases to extract Comment message species vernacular names and 6. Improving place names from a thread Comment message source data Comment message quality without changing users’ … Reptile-Road-Mortality Enjoy-Moths What a typical discussion thread own familiar looks like. ways 1. Crawling data from Facebook via its API Our algorithm picks a most related species name appearing in a thread based on social networking characteristics. Semantic annotation tool disambiguates For each vernacular name in TaiCOL do: toponymic occurs in the message? Full-matched homonyms 細紋南蛇 Yes name No occurs in the Prefix3 message? Postfix2 occurs in the thread? 細紋南 Yes 南蛇 Yes No No occurs in the One click on a message? message to recognize species Main Prefix2 細紋 Yes Postfix1 蛇 No Yes No vernacular names and related Database Name doesn’t exist in the Matched abbreviation message Calculate confidence score information of this name 5. Developing 4. Publishing browser plug- linked open ins to give data via D2R users digested server for feedback of open access structuralized and usage data Our dataset is linked to other datasets on linked open data cloud such as DBPedia, GeoNames and LODE (Linked Open Data of 3. Introducing content management Ecology) so it can have benefit from the large amount of meta-information they provide. system Drupal for easier data Algorithms used to recognize abbreviations management (including error of vernacular names and place names correction) and display