SlideShare a Scribd company logo
1 of 1
Download to read offline
Building and Using Geospatial Ontology
                                                                                                    in the BioCaster Surveillance System
                                                                                                   Son Doan1, Quoc-Hung Ngo2, Ai Kawazoe1, and Nigel Collier1
                                                                                                                              1 National
                                                                                                                                      Institute of Informatics, Tokyo, Japan
                                                                                                     2 University    of Information Technology, Vietnam National University (HCM), Vietnam

                                                                        Motivation                                                         Building a geospatial ontology from Wikipedia
                                                                                                                                                                                                                                            In addition
                                                                         Disease and location names are important information in                                                                                                            we extracted
                                                                        any reported news story.                                           Step 1 Extract names and ISO 3166-1 code of countries and dependent                              geo-
                                                                          There is currently a lack of available geospatial ontology       territories (level 1).                                                                           coordinates
                                                                        that covers all countries and major cities with the                For each country, extract names and ISO 3166-2 codes of country                                  of all
                                                                        longitude/latitude information for visualization purposes.         subdivisions and dependent areas (level 2).                                                      countries
                                                                                                                                           Step 2 Re-create the part-whole relations between level 1 and level 2 of                         and sub-
                                                                                                                                           geographical ontology.                                                                           countries for
                                                                        Key idea                                                                                                                                                            visualization
                                                                                                                                           Step 3 Verify results by manually checking local administrative websites.                        purpose.
                                                                        • Build a geospatial ontology from Wikipedia
                                                                        (http://www.wikipedia.org).
Nature Precedings : doi:10.1038/npre.2008.2110.1 : Posted 22 Jul 2008




                                                                                                                                         http://en.wikipedia.org/wiki/ISO_3166-1     http://en.wikipedia.org/wiki/ISO_3166-2:AF
                                                                        • Use advanced natural language processing techniques
                                                                        to detect events (including location names) within news
                                                                        stories.

                                                                        Results

                                                                        • A geospatial ontology with two administrative levels: 243
                                                                        names in level 1 and 4,025 names in level 2 with their
                                                                        part-whole relationship and longitudes/latitudes.

                                                                        • Geospatial ontology was integrated into BioCaster
                                                                        ontology and freely downloaded at
                                                                        http://biocaster.nii.ac.jp/index.php?page=ontology.             Algorithm for grounding disease/location into BioCaster ontology
                                                                        • Algorithms for detecting locations are implemented                                                                     Named Entity
                                                                        inside the Global Health Monitor, publicly available at            Preprocessing            Topic Classification                                          Grounding
                                                                                                                                                                                                  Recognition
                                                                        http://biocaster.nii.ac.jp.

                                                                        • We collected data from a 10-week period (Dec 20, 2007                                  Input: A set of news stories tagged with NEs.
                                                                        – Feb 20, 2008):                                                                         Output: A set of disease/location pairs.
                                                                           • 7,412 English news.
                                                                           • Covering 110 countries and 360 sub-countries.              Text/                                                                                     Google Map
                                                                           • News by continents: 58.00% Africa, 18.23% Asia,            Web                                                                                       visualization
                                                                                                                                                      Step 1: List LOCATION-DISEASE pairs in
                                                                           11.37% South America, 5.30 % North America, 3.40%
                                                                                                                                                      the news story.
                                                                           Middle East, 2.86% Europe and 0.34% Ocean.

                                                                        • The system successfully detected ebola in Uganda             Step 2: Calculate the frequency of LOCATION-DISEASE pairs in the news database for
                                                                        (Bundibugyo, Kampala, Mbarara), yellow fever in Brazil         the last 12 hours.
                                                                        (Goias, Sao Paulo), avian influenza in Indonesia (Jakarta,
                                                                        Banten), and cholera in Vietnam (Ha Noi, Ha Tay).              Step 3: Rank LOCATION – DISEASE pairs by the frequencies calculated in Step 2. Use a
                                                                                                                                       threshold to choose top LOCATION - DISEASE names.
                                                                        Challenges and Future work
                                                                        • Geo-coding: To disambiguate location names, e.g.,            Step 4: Map disease and location names:
                                                                        Camden can be a area in London (UK) or a town in New
                                                                        South Wales (Australia).
                                                                        • Extend to deeper administrative levels like districts and      • If sequences within DISEASE matches (regular expression matching) to a synonym in
                                                                        sub-districts (wards, towns, villages).                          BioCaster ontology then DISEASE was assigned to that disease name.
                                                                        • Evaluate and compare to other available resources, i.e.,       • If sequences within LOCATION matches (regular expression matching) to a location in
                                                                        GAZ, dbpedia will be considered.                                 geospatial ontology then LOCATION was assigned to that location name (countries and
                                                                                                                                         cities).
                                                                        Acknowledgements
                                                                        This work was supported by Grants-in-Aid from the Japan        Step 5: Re-map into news stories: Match detected diseases and locations within the first
                                                                        Society for the Promotion of Science (grant no. 18049071)
                                                                        and Trandisciplinary Integration Grant from ROIS.
                                                                                                                                       half of each news story. If both disease and location are matched then they are stored;
                                                                                                                                       otherwise, skip.

                                                                                                                                                                                   BioCaster ontology is freely downloaded from
NII – National Institute of Informatics - Japan                                                                                                                                                          http://biocaster.nii.ac.jp

More Related Content

Viewers also liked

Viewers also liked (9)

Pruebasdeingreso 2013doc
Pruebasdeingreso 2013docPruebasdeingreso 2013doc
Pruebasdeingreso 2013doc
 
Soal remidi uas matematika ganjil 11 12
Soal remidi uas matematika ganjil 11 12Soal remidi uas matematika ganjil 11 12
Soal remidi uas matematika ganjil 11 12
 
Cómo mejorar la escritura digital
Cómo mejorar la escritura digitalCómo mejorar la escritura digital
Cómo mejorar la escritura digital
 
Cuadro javier
Cuadro javierCuadro javier
Cuadro javier
 
Pepo y tonga poema
Pepo y tonga poemaPepo y tonga poema
Pepo y tonga poema
 
Campos del desarrllo
Campos del desarrlloCampos del desarrllo
Campos del desarrllo
 
Placepunch Introduction
Placepunch IntroductionPlacepunch Introduction
Placepunch Introduction
 
Horario2ºb
Horario2ºbHorario2ºb
Horario2ºb
 
About Partners
About PartnersAbout Partners
About Partners
 

Similar to Building and Using Geospatial Ontology in the BioCaster Surveillance System

Voiron-Canicio - Input2012
Voiron-Canicio - Input2012Voiron-Canicio - Input2012
Voiron-Canicio - Input2012INPUT 2012
 
Representing and Reasoning about Geographic Occurrences in the Sensor Web
Representing and Reasoning about Geographic Occurrences in the Sensor WebRepresenting and Reasoning about Geographic Occurrences in the Sensor Web
Representing and Reasoning about Geographic Occurrences in the Sensor WebAnusuriya Devaraju
 
Self-organizing maps - Tutorial
Self-organizing maps -  TutorialSelf-organizing maps -  Tutorial
Self-organizing maps - Tutorialaskroll
 
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptxBIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptxMeghaSVNaturalscienc
 
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptxBIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptxMeghaSVNaturalscienc
 
Conferencia 9 andreas_hueni
Conferencia 9 andreas_hueniConferencia 9 andreas_hueni
Conferencia 9 andreas_huenimontaval
 
SOIL MAPPING USING GIS
SOIL MAPPING USING GISSOIL MAPPING USING GIS
SOIL MAPPING USING GISIRJET Journal
 
Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...
Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...
Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...Daniel Jones
 
APPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENT
APPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENTAPPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENT
APPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENTCarrie Cox
 
Spatial Information Systems yesterday, today and tomorrow
Spatial Information Systems yesterday, today and tomorrowSpatial Information Systems yesterday, today and tomorrow
Spatial Information Systems yesterday, today and tomorrowBeniamino Murgante
 

Similar to Building and Using Geospatial Ontology in the BioCaster Surveillance System (11)

Voiron-Canicio - Input2012
Voiron-Canicio - Input2012Voiron-Canicio - Input2012
Voiron-Canicio - Input2012
 
Representing and Reasoning about Geographic Occurrences in the Sensor Web
Representing and Reasoning about Geographic Occurrences in the Sensor WebRepresenting and Reasoning about Geographic Occurrences in the Sensor Web
Representing and Reasoning about Geographic Occurrences in the Sensor Web
 
Self-organizing maps - Tutorial
Self-organizing maps -  TutorialSelf-organizing maps -  Tutorial
Self-organizing maps - Tutorial
 
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptxBIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
 
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptxBIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
BIOINFORMATICS AND GEOINFORMATICS 🔸.pptx
 
Conferencia 9 andreas_hueni
Conferencia 9 andreas_hueniConferencia 9 andreas_hueni
Conferencia 9 andreas_hueni
 
SOIL MAPPING USING GIS
SOIL MAPPING USING GISSOIL MAPPING USING GIS
SOIL MAPPING USING GIS
 
Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...
Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...
Finding Fallopia: detection of Japanese Knotweed s.l. taxa in the UK using re...
 
APPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENT
APPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENTAPPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENT
APPLICATION OF GIS IN ENVIRONMENTAL MANAGEMENT
 
Spatial Information Systems yesterday, today and tomorrow
Spatial Information Systems yesterday, today and tomorrowSpatial Information Systems yesterday, today and tomorrow
Spatial Information Systems yesterday, today and tomorrow
 
20090521 Dv Brief
20090521 Dv Brief20090521 Dv Brief
20090521 Dv Brief
 

Building and Using Geospatial Ontology in the BioCaster Surveillance System

  • 1. Building and Using Geospatial Ontology in the BioCaster Surveillance System Son Doan1, Quoc-Hung Ngo2, Ai Kawazoe1, and Nigel Collier1 1 National Institute of Informatics, Tokyo, Japan 2 University of Information Technology, Vietnam National University (HCM), Vietnam Motivation Building a geospatial ontology from Wikipedia In addition Disease and location names are important information in we extracted any reported news story. Step 1 Extract names and ISO 3166-1 code of countries and dependent geo- There is currently a lack of available geospatial ontology territories (level 1). coordinates that covers all countries and major cities with the For each country, extract names and ISO 3166-2 codes of country of all longitude/latitude information for visualization purposes. subdivisions and dependent areas (level 2). countries Step 2 Re-create the part-whole relations between level 1 and level 2 of and sub- geographical ontology. countries for Key idea visualization Step 3 Verify results by manually checking local administrative websites. purpose. • Build a geospatial ontology from Wikipedia (http://www.wikipedia.org). Nature Precedings : doi:10.1038/npre.2008.2110.1 : Posted 22 Jul 2008 http://en.wikipedia.org/wiki/ISO_3166-1 http://en.wikipedia.org/wiki/ISO_3166-2:AF • Use advanced natural language processing techniques to detect events (including location names) within news stories. Results • A geospatial ontology with two administrative levels: 243 names in level 1 and 4,025 names in level 2 with their part-whole relationship and longitudes/latitudes. • Geospatial ontology was integrated into BioCaster ontology and freely downloaded at http://biocaster.nii.ac.jp/index.php?page=ontology. Algorithm for grounding disease/location into BioCaster ontology • Algorithms for detecting locations are implemented Named Entity inside the Global Health Monitor, publicly available at Preprocessing Topic Classification Grounding Recognition http://biocaster.nii.ac.jp. • We collected data from a 10-week period (Dec 20, 2007 Input: A set of news stories tagged with NEs. – Feb 20, 2008): Output: A set of disease/location pairs. • 7,412 English news. • Covering 110 countries and 360 sub-countries. Text/ Google Map • News by continents: 58.00% Africa, 18.23% Asia, Web visualization Step 1: List LOCATION-DISEASE pairs in 11.37% South America, 5.30 % North America, 3.40% the news story. Middle East, 2.86% Europe and 0.34% Ocean. • The system successfully detected ebola in Uganda Step 2: Calculate the frequency of LOCATION-DISEASE pairs in the news database for (Bundibugyo, Kampala, Mbarara), yellow fever in Brazil the last 12 hours. (Goias, Sao Paulo), avian influenza in Indonesia (Jakarta, Banten), and cholera in Vietnam (Ha Noi, Ha Tay). Step 3: Rank LOCATION – DISEASE pairs by the frequencies calculated in Step 2. Use a threshold to choose top LOCATION - DISEASE names. Challenges and Future work • Geo-coding: To disambiguate location names, e.g., Step 4: Map disease and location names: Camden can be a area in London (UK) or a town in New South Wales (Australia). • Extend to deeper administrative levels like districts and • If sequences within DISEASE matches (regular expression matching) to a synonym in sub-districts (wards, towns, villages). BioCaster ontology then DISEASE was assigned to that disease name. • Evaluate and compare to other available resources, i.e., • If sequences within LOCATION matches (regular expression matching) to a location in GAZ, dbpedia will be considered. geospatial ontology then LOCATION was assigned to that location name (countries and cities). Acknowledgements This work was supported by Grants-in-Aid from the Japan Step 5: Re-map into news stories: Match detected diseases and locations within the first Society for the Promotion of Science (grant no. 18049071) and Trandisciplinary Integration Grant from ROIS. half of each news story. If both disease and location are matched then they are stored; otherwise, skip. BioCaster ontology is freely downloaded from NII – National Institute of Informatics - Japan http://biocaster.nii.ac.jp