Using the Semantic Web to Support Ecoinformatics - Presentation Transcript
Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County http://ebiquity.umbc.edu/paper/html/id/319/Using-the-Semantic-Web-to-Support-Ecoinformatics Joint work with Tim Finin , Joel Sachs, Cynthia Sims Parr, Rong Pan, Lushan Han, Li Ding (UMBC), Allan Hollander (UCD), David Wang (UMCP) This research was supported by NSF ITR 0326460 and matching funds received from USGS National Biological Information Infrastructure
Invasive Species
Invasive species cost the U.S. economy over $138 billion per year [1].
By various estimates, these species contribute to the decline of 35 to 46 percent of U.S. endangered and threatened species
The invasive species problem is growing, as the number of pathways of invasion increases.
[1] Pimental et al. 2000 Environmental and economic costs associated with non-indigenous species in the United States. Bioscience 50:53-65.
[2] Charles Groat, Director U.S. Geological Survey, http://www.usgs.gov/invasive_species/plw/usgsdirector01.html
Currently most common ways of dealing with data among biologists:
Journal articles
Excel spreadsheets
Local databases
Some information is on-line in HTML/XML
Semantic Web can offer:
Ontologies to arrive to a common vocabulary and define exactly what is what across disciplines (multiple ontologies with mappings possible)
Constant on-line data availability with convenient ways of data acquisition and processing
Data discovery (Swoogle)
Data integration from different sources, queries on data from multiple sources
Expanding the knowledge base by inferencing
Data can be easily updated or added, users notified
OLD NEW Green: data gathering; Pink: data integration and manipulation White: data analysis; Blue: results dissemination Collect data OR Find data tables in literature or data registry OR Email author of data Massage data manually Write up metadata record Register dataset with data registry Start over for next project Run analyses Publish paper Post supplemental data file on web Create local spreadsheet Build automatically updating dynamic dataset Develop intelligent query for semantic web data Download to local spreadsheet Run analyses Publish paper Reanalyze using latest dataset (Query and data already publicly available)
An NSF ITR collaborative project with
University of Maryland, Baltimore County
University of Maryland, College Park
U. Of California, Davis
Rocky Mountain Biological Laboratory
Food Webs
A food web models the trophic (feeding) relationships between organisms in an ecology
Food web simulators are used to explore the consequences of changes in the ecology, such as the introduction or removal of a species
A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them.
Goal: automatically construct a food web for a new location using existing data and knowledge
ELVIS: Ecosystem Location Visualization and Information System
East River Valley Trophic Web http://www.foodwebs.org/
Species List Constructor
Click a county, get a species list
The problem
We know which species exist in the location and can further restrict and fill in with other ecological models
But we don’t know which of them might be eaten by a potential invasive, or which might eat the invasive
We can reason from taxonomic data (similar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps.
Food Web Constructor
Predict food web links using database and taxonomic reasoning.
In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected
Evidence Provider
Examine evidence for predicted links.
ELVIS
Final goal:
ELVIS
(Ecosystem Location Visualization and Information System) as an integrated set of web services for constructing food webs for a given location.
Background Ontologies
SpireEcoConcepts:
confirmed and potential food web links
bibliographic information of food web studies
ecosystem terms
taxonomic ranks
California Wildlife Habitat Relationships Ontology
life history
geographic range
management information
ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources
Data representation: ETHAN Ontology
ethan_animals.owl: phylogenetic information about organisms
Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors, statistics, etc.
Searching specialized collections
Spire: aggregating observations and data from biologists
InferenceWeb: searching over and enhancing proofs
SemNews: Text Meaning of news stories
Supporting Semantic Web tools
Triple shop: finding data for SPARQL queries
1 2 3
Search for ontologies which contain this terms 1
746 ontologies were found that had these two terms By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by date or size.
We can also search for any RDF documents containing these terms
5,378 documents were found that had these two terms
UMBC Triple Shop
http://sparql.cs.umbc.edu/tripleshop2/
Finding datasets in the absence of the FROM clause
Constraints by URI domain or namespace (more coming)
Reasoning (none/rdfs/owl)
Dataset persistence : queries and results can be saved, tagged, annotated, shared, searched for, etc.
3 2
What are body masses of fishes that eat fishes? . . . leaving out the FROM clause Swoogle Triple Shop
specify dataset
RDF documents were found that might have useful data
We’ll select them all and add them to the current dataset.
We’ll run the query against this dataset to see if the results are as expected.
The results can be produced in any of several formats
Results http://sparql.cs.umbc.edu/tripleshop2/
Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.
Contributions
OWL ontologies for ecoinformatics domain
data representation
data sharing
inferencing
OWL data discovery
Ability to automatically construct datasets relevant to the query
We describe our on-going work in using the semantic more
We describe our on-going work in using the semantic web in support of ecological informatics, and demonstrate a distributed platform for constructing end-to-end use cases. Specifically, we describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which allows scientists to semi-automatically construct distributed datasets relevant to the queries they want to ask. ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources. less
0 comments
Post a comment