The document discusses the AquaRing project, which created a cross-border virtual knowledge space on aquatic environments and resources. It describes the challenges in integrating content from diverse scientific domains and the approach taken to semantically tag content using ontologies and metadata. Content was annotated using terms from seven ontologies as well as free tags, and relationships between concepts were generated to create the AquaRing ontology with over 75,000 concepts.
Scientific social tagging - background knowledge come to surface (AquaRing Project)
1. Dublin Core Social Tagging Workshop 2009
Scientific social tagging:
background knowledge comes to surface
Stefano Bianchi
www.aquaringweb.eu
ECP 2005 CULT 038261
2. ECP 2005 CULT 038261
Background
Scientific institutions create, manage and store thousands of
digital contents (images, video, documents etc.) for their
institutional mission (research, edutainment etc.)
Classifying and aggregating such contents is a benefit
o for the scientific institutions, to ease management and reuse
o for the “community”, once contents are published online
HOW? ONTOLOGY
knowledge formalization
+
METADATA
enhanced content annotation
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 2
3. ECP 2005 CULT 038261
AquaRing project
eContentPlus EC funded programme – CALL 2005
o FOCUS: Cultural and scientific/scholarly content
Project www.aquaringweb.eu (Sept.06-Mar.09)
o a European cross-border virtual global knowledge and content
space on aquatic environment and resources
IT providers
Coordination
Dissemination
Knowledge &
content
providers
Evaluation
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 3
4. ECP 2005 CULT 038261
Overview
Distributed
? content space
Additional
content
providers
? Centralised
access
Users
DETAILS >
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 4
5. ECP 2005 CULT 038261
Challenge
integrated, semantic-based, cross-border digital collection
of cultural and scientific contents
on aquatic and marine sciences
=
complex knowledge domain
HETEROGENEOUS
Species, Land, Habitats, Environment,
Fishing areas, Vessels, Leisure etc.
Different languages, different audiences, DYNAMIC
different formats etc.
HUGE
E.g. FishBase (www.fishbase.org)
- 31,200 Species
- 276,500 Common names
…only fish!
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 5
6. ECP 2005 CULT 038261
Difficulties
Huge, heterogeneous and dynamic knowledge domain
o Marine Biology, Aquatic Sciences, Aquatic Environment, Aquatic and
marine activities and technology, Marine Culture and Leisure,
Education and Awareness etc. etc.
Several different large ontologies/thesauri exist
but
no one covers the whole domain
o Integration/merging to achieve adequate coverage
o Introduction of new knowledge as it appears
Mandatory interoperability with ongoing scientific initiatives
o Future extensions/collaborations/data exchange
o Considering inclusion in Europeana (www.europeana.eu)
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 6
7. ECP 2005 CULT 038261
Remedial actions
To adopt a standard metadata model (DC)
o To ensure future interoperability with other organisations / initiatives
To use state-of-the-art existing ontologies
o provided by reliable organisations in the field
To transform state-of-the-art existing thesauri/DB views into ontologies
o E.g. conversion of ASFA thesaurus into an ontology
o E.g. Habitats db view from European Environment Agency
To create a new ontology for uncovered sub-domains
o using reliable scientific data as sources (E.g. EDUcational ontology)
To support a mixed annotation approach (ontologies + hierarchical free tags)
o to use knowledge from ontologies and incorporate new knowledge
To implement an ontology learning approach
o to learn from content annotation creating a unified ontology
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 7
8. ECP 2005 CULT 038261
Approach
How can semantics support a cross-border virtual global knowledge
and content space on aquatic environment and resources?
Contents
o management
o annotation
• semantics (meaning) * CONTEXTUALIZATION
– Metadata
» data about data (e.g. “identity card” of a resource)
– Ontology
» “an ontology is an explicit specification [i.e. formalization] of a conceptualization”
[Gruber,1993]… concepts + relations
Knowledge
o generation
• from generic to specific can annotation create new knowledge?
o exploitation
• content annotation *: semantics for domain-focused tagging
• content retrieval: semantics to refine search and guide navigation
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 8
9. ECP 2005 CULT 038261
Content management
How contents are managed?
o contents are locally collected and arranged
into collections (folders) and then simply
moved to a dedicated server (http/ftp)
o basic metadata for all digital contents
uploaded are automatically created and
ready for semantic annotation
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 9
10. ECP 2005 CULT 038261
Content annotation
How contents are annotated?
o once uploaded, each content/collection has a metadata record
associated, ready to be enriched with information specifying “what
the resource is about” (subject)
Subject:
Hyppocamous spp
o tags for subject can be then selected NE ATLANTIC
from a) seven multidisciplinary ontologies
covering different aspects of the aquatic
world or b) can be entered freely to fill
possible gaps in coverage of each ontology
• distinction is tracked for quality check!
o only once all mandatory fields are filled
the content is published online
• NOTE: annotation is inherited for collections!
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 10
11. ECP 2005 CULT 038261
Knowledge generation
How knowledge is generated?
o Semi-automatic ontology learning process
• Relations between concepts used for annotation are semi-automatically
created on the basis of the specific contextual content annotations
lives in
Subject: SPECIES HABITATS
Hippocampus spp
NE ATLANTIC
lives in
Hippocampus spp NE ATLANTIC
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 11
12. ECP 2005 CULT 038261
Knowledge-based content retrieval
How contents are retrieved?
o Services exploits annotations and generated knowledge
• To focus better search and navigation results
• To suggest refinements
• To ease content navigation and use
Hippocampus spp!
Hippocampus spp SEARCH
Hippocampus spp Aquariology!
is studied by
lives in Aquariology
NE Atlantic is affected by
Illegal fishing
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 12
13. ECP 2005 CULT 038261
Services
Back-office
o content management, knowledge formalization
• Metadata editor (http://metadata.aquaringweb.eu) RESTRICTED
• Ontology editor (http://ontology.aquaringweb.eu) ACCESS!
Front-office (http://www.aquaringweb.eu)
o content provision, knowledge exploitation
• multilanguage dynamic site framework
• (semantic) search engine (5 customised + 1 general)
• (semantic) tag cloud
• virtual exhibitions SEMANTICS
• GoogleMaps-based interactive map
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 13
14. ECP 2005 CULT 038261
Semantics
WHY?
o To improve semantic-based content annotation
o To support semantic-based retrieval and navigation of contents
HOW?
o Identification / adaptation of a suitable metadata formalism
• Based on assessed & interoperable metadata standards
o Research / definition of a suitable domain ontology
• Scientific / technical evaluation of existing ontologies
• Reliability / suitability / application domain / multilingual support
• Merge / combine of existing ontologies
o Mechanisms to allow multilingual user interaction
• Support for consortium languages + English
• Multilingual annotations
• Multilingual ontologies
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 14
15. ECP 2005 CULT 038261
Metadata
DublinCore Qualified
o 24 elements (DCMES’ 15 plus some refinements)
o 10 mandatory elements
– Title, Audience, Abstract, Publisher, Type, Language, Subject, Format, Identifier, Date
o 4 purpose-based element groupings
• Core Elements
– Title, Audience, Abstract, Publisher, Type, Language, Rights, Creator
• Semantic Annotation
– Subject (terms selected from 7 ontologies + free tags)
• Physical Resource
– Format, Identifier, Date of creation, Date of Availability, Date of Issuing, Date of Validity,
Format extent
• Additional Elements
– Contributor, Bibliographic Citations, Source, HasVersion, Replaces, Requires, HasPart,
References, HasFormat, SpatialCoverage, Temporal Coverage
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 15
16. ECP 2005 CULT 038261
Instances
Instances used to annotate a translation of the original resource
o Mandatory
• to create at least one instance for each metadata record
• to create at least one annotation of the resource in English
– the metadata record or an instance
• to create an instance in the annotator’s native language (configurable)
Title Title in the instance language
Abstract Abstract in the instance language
Date Date of creation, modification, issuing, availability and /or
validity
Format MIME type
Format Size and / or duration
Extent
Identifier Physical location of resource
Language Language of the translation
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 16
17. ECP 2005 CULT 038261
Ontology
… 7 ontologies selected for content annotation
1. Biological Species
By FAO
2. Fishing Areas developed by EC project Neon
3. Land Areas (www.neon-project.org)
4. Vessels
5. Habitats
• programmatically developed from EUNIS Habitat types classification
6. ASFA (AQUATIC SCIENCES AND FISHERIES ABSTRACTS)
• programmatically developed from ASFA thesaurus, provided by FAO
7. EDUcation
• derived from DCMI, LOM, DC-Ed AP, LRE, IMATI ITD
Hierarchical free tags annotation mechanism
AquaRing ontology generated by annotation (ontology learning)
o including relationships among the seven ontologies and merging free tags
o ontology editor for non-experts* developed for manual refinements
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 17
18. ECP 2005 CULT 038261
Domain coverage
Species ASFA (general)
FREE
TAGS
FREE FREE
TAGS FREE TAGS
FREE
TAGS
TAGS
FREE
Habitats
TAGS
FREE What if a species I need is
TAGS missing?
Land But only checked names &
correct classification, please…
EDUcation we are scientists!
Fishing
Areas FREE
TAGS
FREE
TAGS
Vessels
FREE
TAGS
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 18
19. ECP 2005 CULT 038261
Free tags
Free tags allowed for each sub-domain (ontology)
o to fill in domain gaps not covered by the selected ontologies
o functionality available at annotation time
o results reusable by all annotators
Annotators advised to consult preferred thesauri
o GEMET, Environment field
o AGROVOC, Agricultural field (but covers partially Marine Biology)
o AQUATEXT, Online Aquaculture Dictionary
o MarineSpecies, Marine organisms names
o FishBase, contains most of fish species known to science
o EUROVOC, EC activity multilingual thesaurus
“Guided” free tags (+ ontology editing!)
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 19
20. ECP 2005 CULT 038261
Ontology learning approach
ONTOLOGY A
“concepts = keywords” used for annotation
ONTOLOGY B
semi-automatically generated relations
HIERARCHICAL
FREE TAGS
ONTOLOGY C
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 20
21. ECP 2005 CULT 038261
Ontology editor
AquaRing ontology enrichment and improvement (for non-expert*)
o Terms and free tags translations (multilingual system!)
o Relationships management
o Free tags management
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 21
22. ECP 2005 CULT 038261
Results
AquaRing ontology
75,00
0! 0 ,000!*
over over 2
Ontology # of concepts Relations #
Biological Species (Free Tags) 6811 Affects 1876
Biological Species 22343 Considers 332
Educational (Free Tags) 90 Describes 2
Educational 854 Exploits 47
Fishing Areas (Free Tags) 1297 Lies in 3757
Fishing Areas 3583 Occurs_in 4902
Habitats (Free Tags) 1470 Owns 43
Habitats 5764 Studies 6254
Land Areas (Free Tags) 1059 Uses 1151
Land Areas 5049 Includes family 1696
Marine Biology (Free Tags) 5066 Includes order 110
Marine Biology 24188 Includes species 459
Vessels (Free Tags) 28 Related to 122
Vessels 93 * Over 160,000 with translations!
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 22
23. ECP 2005 CULT 038261
Benefits
Access to resources classified in AquaRing knowledge sub-
domains
Search performed according to resource meaning and user
preferences
o User’s preferences: audience type, language, format, etc.
• e.g. “videos for children in French”
Ontology terms suggested to refine/complete the search
o Knowledge exploration – domain-based guided serendipity!
Access to resources grouped by a representation of usage
degree of ontology terms in annotations (+ relations with other
terms)
• Tag cloud-based navigation + relations
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 23
24. ECP 2005 CULT 038261
Lessons learnt
Semantic technology (and Semantic web) still considerably unknown in
domains where CMS and IS might benefit from such formalised approaches
o simple yet effective knowledge-based solutions are usually positively evaluated by
content providers
Semantic technology’s take-up hampered by difficulties related to the proper
formalization of complex scientific knowledge (ontology engineering) and
classification of contents (semantic annotation)
Semantic content annotation is a valuable source of information to generate new
domain knowledge
o From content contextualization to knowledge formalization
Semi-automatic generation is effective provided that according to the domain
(e.g. science, health) manual check is allowed / enabled (QoS problem)
o “Freedom” is related to the application domain & objectives
o “Hierarchical” free tags more meaningful than “flat” free tags
o “Guided” free tag approach minimizes contextual noise
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 24
25. ECP 2005 CULT 038261
www.aquaringweb.eu
Contacts
Stefano Bianchi
Research Team Leader
stefano.bianchi@softeco.it
Tel. +39 010 6026 368 Thanks for your attention!
Fax. +39 010 6026 350
Milan, June 10th 2009
www.softeco.it Tagging Workshop 2009
DC Social 25
www.aquaringweb.eu
26. ECP 2005 CULT 038261
About contextualization…
Why is contextualization important?
NICE IMRESSIONIST LANDSCAPE? eutrophic estuary
nutrient enrichment
low oxygen
http://www.sccwrp.org/view.php?id=82
macroalgal blooms
chlorophyll-a
aquatic vegetation
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 26
27. ECP 2005 CULT 038261
About contextualization…
How can contextualization help? Scientific classification
Animalia
JUST A SHARK? Chordata
Vertebrata
Chondrichthyes
Elasmobranchii
Euselachii
Human impact Carcharhiniformes
BYCATCH Carcharhinidae
LINE-FISHING Galeocerdo
SHARK FISHERY Galeocerdo cuvier
FINNING
FISHERY REGULATIONS
Geographical area
FISHERY MORTAL DATA
FISHERY DATA EC PACIFIC
HISTORICAL FISHING WC PACIFIC
ENDANGERED SPECIES EC ATLANTIC
WC ATLANTIC
< BACK INDIAN OCEAN
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 27
28. ECP 2005 CULT 038261
Final demonstration prototype
www.aquaringweb.eu
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 28
29. ECP 2005 CULT 038261
Search engine
Hippocampus spp SEARCH
CONTENTS RELATIONS
to extend
and refine
the search
on the basis
of the
AquaRing
ontology
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 29
30. ECP 2005 CULT 038261
Tag cloud
TAGS (ontology concepts): “the larger the font, the more the contents”
RELATIONS
to ease
CONTENTS navigation
on the basis
of the
AquaRing
ontology
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 30
31. ECP 2005 CULT 038261
Virtual exhibitions
Build value-added learning paths on top of aggregated digital content collections
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 31
32. ECP 2005 CULT 038261
Virtual exhibitions
NAVIGATION
DESCRIPTION
ROOMS
CONTENT
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 32
33. ECP 2005 CULT 038261
Video
FLASH PREVIEW
DOWNLOAD
VIDEO BROWSER
TOPIC BROWSER
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 33
34. ECP 2005 CULT 038261
Map
Web 2.0 GoogleMaps API based content visualization
< BACK
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 34
35. ECP 2005 CULT 038261
Architecture details
Central system Global Content Space Node CONTENT
Local PCs
MANAGEMENT
Semantic
Server
repository
(metadata +
Easily reachable
Contents from inside
ontologies)
(to move contents from
Portal local PCs to the server –
INTERNET
INTERNET
Content FTP facility)
Management
Administration Services
services Easily reachable from
outside
User services (to retrieve contents from
the portal – HTTP facility)
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 35
36. Local digital collections and objects interfacing
ECP 2005 CULT 038261
CENTRAL SYSTEM GLOBAL CONTENT SPACE NODE
DMZ MINIMAL FTP
CLIENT
1
SERVLET (DESKTOP APP)
CONTAINER
2a
FTP SERVER
DMZ
2b
FTP SERVER
ure ktop
3 ced des
METADATA
pro r’s
EDITOR
ot” vide tal
HOST (SERVER)
-sh pro 4 por
6 “One nt
e ing
ont quaR
m c to A
fro SERVLET
5
CONTAINER
HTTP SERVER
LAN
INTERNET
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 36
37. Local digital collections and objects interfacing
ECP 2005 CULT 038261
ftp://81.93.5.231 ftp://81.208.74.210
??
http://www.nausicaa.fr/aquaring http://81.208.74.210
?
FTP + HTTP FTP + HTTP
SERVER SERVER
AQR FTPCLIENT AQR FTPCLIENT AQR FTPCLIENT AQR FTPCLIENT
http://adg.contents.aquaringweb.eu
http://naus.contents.aquaringweb.eu
http://lsm.contents.aquaringweb.eu
AQR FTPCLIENT AQR FTPCLIENT FTP + HTTP
SERVER
http://rzoo.contents.aquaringweb.eu
http://rbins.contents.aquaringweb.eu
http://new.contents.aquaringweb.eu
< BACK
3 physical nodes
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 37
38. ECP 2005 CULT 038261
Metadata annotation & collections
Metadata annotation = time consuming task
o Review report
• “Think about alternative methods and technologies to address the time consuming
multilingual metadata annotation”
• “The metadata annotation requires a lot of labour-intensive effort and specialized
expertise: this might become a risk for the sustainability of the project”
“Hierarchical collections”
o Dublin Core: “A collection is an aggregation of items. The term collection
means that the resource is described as a group; its parts may be separately
described and navigated.”
Many contents with similar “meaning” aggregated in a
collection (folder or even folder tree)
o Iterative automatic creation of metadata for inner content
Specific annotation can be specified at any level
o Necessary requirement for annotation quality!!!
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 38
39. ECP 2005 CULT 038261
Collections
on large
tion tion on
COLLECTION = FOLDER (or folder tree!) r annota descrip
Faste etailed
collec tions, d s!!!
ontent
sp ecific c
COLLECTION General description, valid for all
METADATA
contents included in the collection
(e.g. photo campaign on the same
area/species/individual etc.)
+
CONTENT Specific description, valid for a specific
METADATA
content (e.g. peculiarity, location etc.)
=
Semantic description (inheritance)
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 39
40. Collections & content managements
AquaRing FTP client
ECP 2005 CULT 038261
All available information on files (name, size, format etc.) can be
- automatically detected and used for the annotation application
simple Java standalone desktop
- file/folder selectionMETADATA CREATION
AUTOMATIC
from local pc
- one shot ftp transfer
ON CONTENT/FILE FTP TRANSFER
Once created, metadata can be refined (E.g. subject)
- add/remove file/folder on ftp server
LOCAL PC FTP SERVER
Fully integrated & synchronized with Metadata
Editor functionalities
- trigger system for metadata creation/deletion
Milan, June 10th 2009 DC Social Tagging Workshop 2009 www.aquaringweb.eu 40
Editor's Notes
It’s now time to get a flavour of the AquaRing demonstration prototype, having a direct experience of the value added of the services provided…TAG CLOUD--------------------------------------------------------------------------------HabitatsMarine HabitatsDangerous organismsTraumatogenicRed SeaEducationCartoni animatiHabitats mariniSharkResearchMYSTICETI Human foodToxicityMali SEARCH--------------------------------------------------------------------------------User type: educationBottlenose Dolphin+ Sexual reproductionPollution+Mammalia+Bacterian diseaseInglese x mediastranding (animale spiaggiato)ricerca immaginipublisher rbinsInglese x noneaccidentsoccurr in NEtherlands con +