Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki

1

Integrating
Heterogeneous and Distributed Information
about Marine Sp...
Outline
• Context, Problem, Objectives
• Main Approaches for Integration
• The Followed Approach
– The Ontology MarineTLO
...
Problem and objectives
The Problem
• There are several sources of the marine domain, but each of
them stores complementary...
Marine Information:
in several sources

Storing

Taxonomic information

complementary
information

Ecosystem information (...
Main approaches for Integration
In general there are two main approaches for integration
Warehouse approach (materialized ...
Main approaches for integration (cont.)
In both cases we need a unified model/schema

Yannis Tzitzikas et al., MTSR 2013,
...
MarineTLO: Objectives
• MarineTLO aims at being a global core model that
– provides a common, agreed-upon and understandin...
MarineTLO: Key Design Principles
• Formulation
– It is an object-oriented semantic model, expressed to a form
comprehensib...
The notion of competence queries as driver
#Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra ...
Class Level (excerpt)

S-Class Level (Version 3.0.0)
Temporal
Phenomenon

Country Code
Assignment

Country

Ecosystem
Code...
Example 1: ThunnusAlbacares

Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki

21

Example 2: Scientific name assignment
E...
Example 3: Species Establishment
isAssociatedWith

isAssocitedWith

Ecosystem

Country

Water Area

usualluIsBioticElement...
Ways to use/exploit MarineTLO
1. For constructing semantic warehouses which:
– can answer queries which cannot be answered...
The MarineTLO-based
Warehouse
MarineTLO
Warehouse

Warehouse construction and evolution process
Define requirements in ter...
The MarineTLO-based warehouse’s contents: used sources
RDF
Triple Store

MarineTLO
FLOD-to-TLO
mapping

ECOSCOPE-to-TLO
ma...
The MarineTLO-based warehouse’s contents: concepts
Concepts

Ecoscope

FLOD

WoRMS DBpedia Fishbase

Species
Scientific Na...
For Semantic Post-Processing: The process
web
browsing
contents

query
terms

(top-L) results
(+ metadata)

Entity
Mining
...
Example of an
EntityCard
of Xsearch (if the entity’s
type is Species)

The
Warehou
se is used

From DBpedia

From FLOD

Fr...
Concluding Remarks

Concluding Remarks
•

To tackle the need for having integrated sets of facts about marine species,
and...
Future Work and Research
• Next steps
– Finalize and make accessible the next release of the
warehouse (in 2013)

• Curren...
Thank you for your attention
Visit and send us feedback:
www.ics.forth.gr/isl/MarineTLO

Yannis Tzitzikas et al., MTSR 201...
Upcoming SlideShare
Loading in …5
×

Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

461 views

Published on

On the 21st of November 2013, Yannis Tzitzikas, FORTH, presented the Integrating heterogeneous and distributed information about marine species through a top level ontology paper at the 7th Metadata and Semantic Research Conference in Thessaloniki, Greece.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
461
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

  1. 1. Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 1 Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology Y. Tzitzikas 1,2 , C. Alloca 1 , C. Bekiari 1 , Y. Marketakis 1 , P. Fafalios 1,2 , M. Doerr 1 , N. Minadakis 1 , T.Patkos 1 , L. Candela 3 1 Institute of Computer Science, FORTH-ICS 2 Computer Science Department, University of Crete, GREECE 3 Consiglio Nazionale delle Ricerche, CNR-ISTI, Pisa, Italy 7th Metadata and Semantics Research Conference (MTSR), Thessaloniki, Nov 19-22, 2013 1
  2. 2. Outline • Context, Problem, Objectives • Main Approaches for Integration • The Followed Approach – The Ontology MarineTLO • Objectives, Benefits, Architecture – The MarineTLO-based Warehouse • Exploitation Scenarios • Concluding Remarks Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 3 Context: iMarine Id: It is an FP7 Research Infrastructure Project (2011-2014) Final goal: launch an initiative aimed at establishing and operating an einfrastructure supporting the principles of the Ecosystem Approach to fisheries management and conservation of marine living resources. Partners: Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 4 2
  3. 3. Problem and objectives The Problem • There are several sources of the marine domain, but each of them stores complementary information structured according to its needs. Our objective • Harmonize and integrate (link, connect) information of the marine domain – Specific motivating scenario and use cases will be given at the end Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 5 Marine Information: in several sources WoRMS: World Register of Marine Species Registers more than 200K species ECOSCOPE- A Knowledge Base About Marine Ecosystems (IRD, France) FLOD (Fisheries Linked Data) of Food and Agriculture Organization (FAO) of the United Nations FishBase: Probably the largest and most extensively accessed online database of fish species. DBpedia Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 6 3
  4. 4. Marine Information: in several sources Storing Taxonomic information complementary information Ecosystem information (e.g. which fish eats which fish) Commercial codes General information, occurrence data, including information from other sources General information, figures Yannis Tzitzikas et al., MTSR 2013, Thessaloniki Marine Information: in several sources 7 Using and accessed through different technologies Web services (SOAP/WSDL) RDF + OWL files SPARQL Endpoint Relational Database SPARQL Endpoint Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 8 4
  5. 5. Main approaches for Integration In general there are two main approaches for integration Warehouse approach (materialized integration) • Design Phase: The underlying sources (and their parts) have to be selected • Creation Phase: Process for getting and creating the warehouse • Maintenance Phase: Ability to create the warehouse from scratch, and/or ability to update parts of it • Mappings are exploited to extract information from data sources, to transform it to the target model and then to store it at the central repository Mediator approach (virtual integration) • The mediator receives a query formulated in terms of the unified model/schema. The mappings are used to enable query translation. The derived sub-queries are sent to the wrappers of the individual sources, which transform them into queries over the underlying sources. The results of these sub-queries are sent back to the mediator where they are assembled to form the final answer Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 9 Main approaches for integration (cont.) Warehouse • • • • • Mediator • Benefit: Flexibility in transformation logic (including ability to curate and fix problems) Benefit: Decoupling of the release management of the integrated resource • from the management cycles of the underlying sources Benefit: Decoupling of access load from the underlying sources. Benefit: Faster responses (in query answering but also in other tasks, e.g. if one wants to use it for applying an entity matching technique). Benefit: One advantage (but in some cases disadvantage) of virtual integration is the real-time reflection of source updates in integrated access Comment: The higher complexity of the system (and the quality of service demands on the sources) is only justified if immediate access to updates is indeed required. Shortcomings You have to pay the cost for hosting the warehouse. You have to refresh periodically the warehouse Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 10 5
  6. 6. Main approaches for integration (cont.) In both cases we need a unified model/schema Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 11 The ontology MarineTLO (Marine Top Level Ontology) 6
  7. 7. MarineTLO: Objectives • MarineTLO aims at being a global core model that – provides a common, agreed-upon and understanding of the concepts and relationships holding in the marine domain to enable knowledge sharing, information exchanging and integration between heterogeneous sources – covers with suitable abstractions the marine domain to enable the most fundamental queries, – can be extended to any level of detail on demand, and – allows data originating from distinct sources to be adequately mapped and integrated • MarineTLO is not supposed to be the single ontology covering the entirety of what exists Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 13 MarineTLO: Benefits from a Top-Level Ontology • The adoption of a global core model has various benefits: – reduced effort for improving and evolving • the focus is given on one model, rather than many (the results are beneficial for the entire community – reduced effort for constructing mappings • this approach avoids the inevitable combinatorial explosion and complexities that results from pair-wise mappings between individual metadata formats and/or ontologies Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 14 7
  8. 8. MarineTLO: Key Design Principles • Formulation – It is an object-oriented semantic model, expressed to a form comprehensible to both documentation experts and information scientists while readily can be converted to machine-readable formats such as RDF Schema, OWL, etc • Metaclasses – certain types of inference about classes is supported in an analogous way as classes support certain types of inference about instances • Monotonicity – It aims to be monotonic in the sense of Domain Theory: the existing constructs and the deductions made from them should remain valid and well-formed, even as new constructs are added to the MarinTLO Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 15 MarineTLO: Query capabilities It allows formulating complex queries, e.g.: 1.Given the scientific name of a species, find its predators with the related taxon-rank classification and with the different codes that the organizations use to refer to them. 2. Given the scientific name of a species, find the ecosystems, waterareas and countries that this species is native to, and the common names that are used for this species in each of the countries Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 16 8
  9. 9. The notion of competence queries as driver #Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps), find/give me Q1 the biological environments (e.g. ecosystems) in which the species has been introduced and more general descriptive information of it (such as the country) Q2 its common names and their complementary info (e.g. languages and countries where they are used) Q3 Q4 Q5 Q6 the water areas and their FAO codes in which the species is native the countries in which the species lives the water areas and the FAO portioning code associated with a country the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of the water area) Q7 the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identification information (e.g. several codes provided by different organizations) Q8 a map w.r.t. Country and Predator, providing for each predator both the identification information and the biological classification Q9 who discovered it, in which year, the biological classification, the identification information, the common names - providing for each common name the language, the countries where it is used in. Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 17 MarineTLO as Product • The “full” version of MarineTLO (Version3.0.0) – aims at covering any part of the marine domain – contains 70 classes and 41 properties • The “operational” version, for the needs of iMarine(Version 3.0.0) – used for building MarineTLO Warehouse (Version 3.0.0) – contains 92 classes and 41 properties – applied for integrating data mainly from FLOD, ECOSCOPE, part of WoRMS and FISHBASE sources • URL: www.ics.forth.gr/isl/MarineTLO Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 18 9
  10. 10. Class Level (excerpt) S-Class Level (Version 3.0.0) Temporal Phenomenon Country Code Assignment Country Ecosystem Code Assignment Ecosystem Event Human Activity Exclusive Economic Zone TLO Entity Scientific Name Assignment Attribute Assignment Common Name Assignment Physical Man Made Thing Man Made Object Water Area Code Assignment Man Made Thing Actor Persistent Item Conceptual Object Vessel Codification System Identifier EEZCode Physical Thing Area FAOGearTypeIdentifier Sub Area Water Area FAOVesselTypeIdentifier Division Sub Division Yannis Tzitzikas et al., MTSR 2013, Thessaloniki FORTH, i-Marine, Ostend, January 2013 19 19 Meta Class Level (excerpt) Meta Class Level (Version 3.0.0) Marine Ecosystem Type Temporal Phenomen on Type Ecosystem Type Event Type Human Activity Type Attribute Assignment Type TLO Entity Type Digital Object type Actor Type Persistent Item Type Identifier Type Conceptual Object Type Gear Type Physical Thing Type Equipment Type Biotic Element Type ECOSCOPE Marine Animal Type FLOD Marine Animal Type WoRMS Marine Animal Type Vessel Type Marine Animal Type Fish Base Marine Animal Type DBpedia Marine Animal Type Yannis Tzitzikas et al., MTSR 2013, Thessaloniki FORTH, i-Marine, Ostend, January 2013 20 20 10
  11. 11. Example 1: ThunnusAlbacares Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 21 Example 2: Scientific name assignment Event assignedDate assignedIdentifier relatedAuthorshipAssigment relatedIdentifierAssigment Attribute Assignment PersistentItem assignedName MarineSpecies relatedIdentifierAssigment Thunnus_alba cares relatedAuthorshipAssigment blank_node_Thu nnus_albacares assignedDate name reference Actor Scientific Name Assignment blank_node_Bo nnaterre assignedName Yannis Tzitzikas et al., MTSR 2013, “1788” “Thunnus Albacares” Thessaloniki name “Bonnaterre” 22 11
  12. 12. Example 3: Species Establishment isAssociatedWith isAssocitedWith Ecosystem Country Water Area usualluIsBioticElementOf usualluIsBioticElementOf native Introduced Endemic native Introduced Endemic usualluIsBioticElementOf native Introduced Endemic Marine Species Poromitra crassiceps isAssocitedWith Antarctic isAssocitedWith Elephant I Yannis Tzitzikas et al., MTSR 2013, Thessaloniki Atlantic Antarctic 23 Exploiting MarineTLO 12
  13. 13. Ways to use/exploit MarineTLO 1. For constructing semantic warehouses which: – can answer queries which cannot be answered by the underlying sources individually – can aid the construction of mappings between instances – can be exploited for various other task • We shall see how they are exploited in the context of semantic postprocessing of search results 2. Various other uses – – For publishing Linked Data For mashing up facts Yannis Tzitzikas et al., MTSR 2013, Thessaloniki Publishing Linked Data, Mashups Constructing Warehouses offering Complex query answering Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 25 For semanticpost processing of search results 26 13
  14. 14. The MarineTLO-based Warehouse MarineTLO Warehouse Warehouse construction and evolution process Define requirements in terms of competence queries produces Queries Fetch the data from the selected sources (SPARQL endpoints, services, etc) Triples Transform and Ingest to the Warehouse Apply the rules to the warehouse MaTWare creates Inspect the connectivity of the Warehouse Formulate rules creating sameAs relationships uses uses produces Rules for Instance Matching uses sameAs triples MaTWare Ingest the sameAs relationships to the warehouse Warehouse Test and evaluate the Warehouse (using competence queries) Yannis Tzitzikas et al., MTSR 2013, Thessaloniki MaTWare 28 14
  15. 15. The MarineTLO-based warehouse’s contents: used sources RDF Triple Store MarineTLO FLOD-to-TLO mapping ECOSCOPE-to-TLO mapping DBpedia-to-TLO mapping FishBase-to-TLO mapping ECOSCOPE FLOD WoRMS-to-TLO mapping WoRMS (part of) DBpedia (part of) FishBase (part of) Replicate Replicate Replicate Replicate Replicate Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 29 The MarineTLO-based warehouse’s contents: in numbers • Now contains information about 37,000 distinct marine species (including Fishbase). Number of triples: 2,970,058 Source DBpedia 14,291 FLOD Common Species (size of intersections) Species Number 10,849 WoRMS 1124 Ecoscope 277 FishBase 31,277 FLOD DBpedia FLOD 3,046 WoRMS Ecoscope Fishbase 731 56 9833 768 73 6141 53 1288 WoRMS Ecoscope Yannis Tzitzikas et al., MTSR 2013, iMarine 2nd Review, September Thessaloniki 2013,Brussels 53 30 15
  16. 16. The MarineTLO-based warehouse’s contents: concepts Concepts Ecoscope FLOD WoRMS DBpedia Fishbase Species Scientific Names Authorships Common Names Predators Ecosystems Countries Water Areas Vessels Gears EEZ Yannis Tzitzikas et al., MTSR 2013, iMarine 2nd Review, September Thessaloniki 2013,Brussels 31 Exploiting the MarineTLO-based Warehouse for Semantic Post-Processing of Search Results 16
  17. 17. For Semantic Post-Processing: The process web browsing contents query terms (top-L) results (+ metadata) Entity Mining MarineTLO Warehouse entities / contents Visualization/Interaction (faceted search, entity exploration, annotation, top-k graphs, etc.) semantic data Semantic Analysis Yannis Tzitzikas et al., MTSR 2013, Thessaloniki • Grouping, • Ranking • Retrieving more properties 33 XSearch-Portlet Screenshot The Warehou se is used The Warehou se is used Search Results Result of Entity Mining Yannis Tzitzikas et al., MTSR 2013, Thessaloniki Result of textual clustering 34 17
  18. 18. Example of an EntityCard of Xsearch (if the entity’s type is Species) The Warehou se is used From DBpedia From FLOD From Ecoscope From WoRMS Yannis Tzitzikas et al., MTSR 2013, Thessaloniki XSearch as a bookmarklet The Warehou se is used Annotating entities over the original page Entity exploration Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 36 18
  19. 19. Concluding Remarks Concluding Remarks • To tackle the need for having integrated sets of facts about marine species, and thus to assist research about species and biodiversity, we have described a top level ontology for that domain. – It provides a unified and coherent core model for schema mapping which enables formulating and answering queries which cannot be answered by any individual source. • • We detailed the process of constructing MarineTLO-based warehouses. The current warehouse contains information about more than 37K marine species We have identified and described particular use cases and applications that exploit this ontology and it warehouse. Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 38 19
  20. 20. Future Work and Research • Next steps – Finalize and make accessible the next release of the warehouse (in 2013) • Current and Future Research – Focus on quality/connectivity issues Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 39 Links • MarineTLO • http://www.ics.forth.gr/isl/MarineTLO/ • TripleStores – MarineTLO-Warehouse: http://virtuoso.i-marine.d4science.org:8890/sparql – also browsable through http://virtuoso.i-marine.d4science.org:8890/fct • Systems – X-Search and gCube Search • Portlet: https://i-marine.d4science.org/ (in various VREs, e.g. FCPPS , iSearch) • Web Applications: – http://62.217.127.118/x-search/ (over Bing and MarineTLO-Warehouse) – http://62.217.127.118/x-search-fao/ (over ECOSCOPE and MarineTLO-Warehouse) Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 40 20
  21. 21. Thank you for your attention Visit and send us feedback: www.ics.forth.gr/isl/MarineTLO Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 41 21

×