A Controlled Crowdsourcing Approach for Practical
Ontology Extensions and Metadata Annotations
Yolanda Gil1, Daniel Garijo1, Varun Ratnakar1,
Deborah Khider2, Julien Emile-Geay2 and Nicholas McKay3
1Information Sciences Institute, University of Southern California,
2Department of Earth Sciences, University of Southern California,
3School of Earth Sciences and Environmental Sustainability,
North Arizona University
@yolandagil, @dgarijov
{gil,dgarijo}@isi.edu
Information
Sciences
Institute
ISWC In-Use Track, Vienna, 2017
Data reuse in paleoclimate and environmental
sciences
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
• Data is collected using idiosyncratic notation and protocols by independent
scientists.
• Hundreds of types of observations
• Physical samples may be from ice, tree, coral, marine sediment, etc.
• Hundreds of types of measures
• Temperature, rainfall, PH, etc.
• Diversity is so great that no one dares to embark on standards.
• Typical situation for environmental sciences (water modeling, hydrology etc.)
Challenges
• How can we leverage basic core agreements?
• How can scientist create new properties that they want to use to describe
their data?
• How to facilitate consensus on new extensions to core agreements?
• How can the scientific community immediately benefit from these continued
expansion of core agreements?
• Coordination and maintenance of new extensions to core agreements
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
Approach: Controlled crowdsourcing
• A metadata crowdsourcing platform
• Controlled standardization process for new metadata properties
• Framework for updating metadata of previously annotated datasets
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
A Framework for Controlled Crowdsourcing
Data Annotation
Core
ontology
Snapshot
Snapshot Repository
Update
Ontology Repository
Core
ontology
revision
Crowd
vocabulary
revision
Revision
Annotation Framework
Revision Framework
Update Framework
Version 0
Version 1
Requests & issues
(core ontology)
Requests
& issues
Extended
crowd
vocabulary
Dataset metadata
Dataset metadata store
Changes
-Monotonic changes
-Non-monotonic changes
Crowd
vocabulary
Load/
reload
Load/
reload
Reload
datasets
Changes to
crowd vocabulary
Editorial Board
Basic editor
Datasets
Advanced
editor
Core
ontology
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
Specifying metadata for a dataset
Data Download
Completed
properties
Missing properties
Crowd Properties
Category
Category Annotation
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
Fostering standardization
Suggestion of renames
Autocompletion
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
Dynamic map-based visualizations
Dataset annotation
interface
Author credit Polls for decision making
Community discussions
Implementation: The Linked Earth Platform
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
The Linked Earth Ontology - Overview
• Modular design (Core modules + crowd extensions)
http://linked.earth/ontology#
Linked Paleo Data Ontology (LiPD)
EXTENSION
(Coral, Wood,
Lake Sediment…)
EXTENSION
(Spectral,
Chemical …)
EXTENSION
(Rock, Snow,
Tree …)
EXTENSION
(Spectrometer,
Spectroscope …)
EXTENSION
(Precipitation,
time …)
Crowd Vocabulary Extension
Schema.org
(Dataset)
Wgs_84
(Position)
Geosparql
(Position)
SSN
(Observation)
FOAF
(Person)
PROV
(Derivation)
DC
(Publication)
CoreOntology
ProxyArchive ProxyObservation ProxySensor Instrument InferredVariable
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
The Linked Earth Ontology - versioning
• Working Groups discuss new changes to the ontology
• Once a new version is approved, the core vocabulary released and versioned
outside the wiki:
• Naming schema: http://linked.earth/ontology/module/version
• Example: http://linked.earth/ontology/core/1.2.0
• Latest version preserves its URI (aggregates all modules):
• http://linked.earth/ontology#
• Each version is documented and published in a machine readable and human
readable manner
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
Organizing the community
• Basic editors
• Advanced editor
• Editorial board
• Working group
• Periodic face to face events for community engagement
• Engagement through twitter polls, online surveys
• Editorial board requests votes for candidate standard properties
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
Current Situation
Page Distribution
Datasets 699
ProxyAcrhive 207
ProxyObservation 76
ProxySensor 63
Instrument 45
InferredVariable 1207
MeasuredVariable 3348
Working Group 12
Location 659
Person 524
Publication 875
• More than 14000 pages
• More than 150 registered users (50 active)
• One full iteration and revision of the ontology
• Identified leaders for working groups
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
Conclusions and Future Work
Approach for on the fly ontology extensions for scientific metadata
annotations
• Foster standardization through renaming, autocompletion and voting
• Editorial process to review core standard with new crowd terms
• Framework for updating dataset properties when a new standard is released
Ongoing work:
• Support editorial process for core ontology revisions
• Automating the ontology documentation updates
• Further automations of update framework
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations
(Gilt et al, ISWC In use track, Vienna, 2017)
A Controlled Crowdsourcing Approach for Practical
Ontology Extensions and Metadata Annotations
Yolanda Gil1, Daniel Garijo1, Varun Ratnakar1,
Deborah Khider2, Julien Emile-Geay2 and Nicholas McKay3
1Information Sciences Institute, University of Southern California,
2Department of Earth Sciences, University of Southern California,
3School of Earth Sciences and Environmental Sustainability,
North Arizona University
@yolandagil, @dgarijov
{gil,dgarijo}@isi.edu
Information
Sciences
Institute
ISWC In-Use Track, Vienna, 2017

A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations

  • 1.
    A Controlled CrowdsourcingApproach for Practical Ontology Extensions and Metadata Annotations Yolanda Gil1, Daniel Garijo1, Varun Ratnakar1, Deborah Khider2, Julien Emile-Geay2 and Nicholas McKay3 1Information Sciences Institute, University of Southern California, 2Department of Earth Sciences, University of Southern California, 3School of Earth Sciences and Environmental Sustainability, North Arizona University @yolandagil, @dgarijov {gil,dgarijo}@isi.edu Information Sciences Institute ISWC In-Use Track, Vienna, 2017
  • 2.
    Data reuse inpaleoclimate and environmental sciences A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017) • Data is collected using idiosyncratic notation and protocols by independent scientists. • Hundreds of types of observations • Physical samples may be from ice, tree, coral, marine sediment, etc. • Hundreds of types of measures • Temperature, rainfall, PH, etc. • Diversity is so great that no one dares to embark on standards. • Typical situation for environmental sciences (water modeling, hydrology etc.)
  • 3.
    Challenges • How canwe leverage basic core agreements? • How can scientist create new properties that they want to use to describe their data? • How to facilitate consensus on new extensions to core agreements? • How can the scientific community immediately benefit from these continued expansion of core agreements? • Coordination and maintenance of new extensions to core agreements A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 4.
    Approach: Controlled crowdsourcing •A metadata crowdsourcing platform • Controlled standardization process for new metadata properties • Framework for updating metadata of previously annotated datasets A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 5.
    A Framework forControlled Crowdsourcing Data Annotation Core ontology Snapshot Snapshot Repository Update Ontology Repository Core ontology revision Crowd vocabulary revision Revision Annotation Framework Revision Framework Update Framework Version 0 Version 1 Requests & issues (core ontology) Requests & issues Extended crowd vocabulary Dataset metadata Dataset metadata store Changes -Monotonic changes -Non-monotonic changes Crowd vocabulary Load/ reload Load/ reload Reload datasets Changes to crowd vocabulary Editorial Board Basic editor Datasets Advanced editor Core ontology A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 6.
    Specifying metadata fora dataset Data Download Completed properties Missing properties Crowd Properties Category Category Annotation A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 7.
    Fostering standardization Suggestion ofrenames Autocompletion A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 8.
    Dynamic map-based visualizations Datasetannotation interface Author credit Polls for decision making Community discussions Implementation: The Linked Earth Platform A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 9.
    The Linked EarthOntology - Overview • Modular design (Core modules + crowd extensions) http://linked.earth/ontology# Linked Paleo Data Ontology (LiPD) EXTENSION (Coral, Wood, Lake Sediment…) EXTENSION (Spectral, Chemical …) EXTENSION (Rock, Snow, Tree …) EXTENSION (Spectrometer, Spectroscope …) EXTENSION (Precipitation, time …) Crowd Vocabulary Extension Schema.org (Dataset) Wgs_84 (Position) Geosparql (Position) SSN (Observation) FOAF (Person) PROV (Derivation) DC (Publication) CoreOntology ProxyArchive ProxyObservation ProxySensor Instrument InferredVariable A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 10.
    The Linked EarthOntology - versioning • Working Groups discuss new changes to the ontology • Once a new version is approved, the core vocabulary released and versioned outside the wiki: • Naming schema: http://linked.earth/ontology/module/version • Example: http://linked.earth/ontology/core/1.2.0 • Latest version preserves its URI (aggregates all modules): • http://linked.earth/ontology# • Each version is documented and published in a machine readable and human readable manner A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 11.
    Organizing the community •Basic editors • Advanced editor • Editorial board • Working group • Periodic face to face events for community engagement • Engagement through twitter polls, online surveys • Editorial board requests votes for candidate standard properties A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 12.
    Current Situation Page Distribution Datasets699 ProxyAcrhive 207 ProxyObservation 76 ProxySensor 63 Instrument 45 InferredVariable 1207 MeasuredVariable 3348 Working Group 12 Location 659 Person 524 Publication 875 • More than 14000 pages • More than 150 registered users (50 active) • One full iteration and revision of the ontology • Identified leaders for working groups A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 13.
    Conclusions and FutureWork Approach for on the fly ontology extensions for scientific metadata annotations • Foster standardization through renaming, autocompletion and voting • Editorial process to review core standard with new crowd terms • Framework for updating dataset properties when a new standard is released Ongoing work: • Support editorial process for core ontology revisions • Automating the ontology documentation updates • Further automations of update framework A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations (Gilt et al, ISWC In use track, Vienna, 2017)
  • 14.
    A Controlled CrowdsourcingApproach for Practical Ontology Extensions and Metadata Annotations Yolanda Gil1, Daniel Garijo1, Varun Ratnakar1, Deborah Khider2, Julien Emile-Geay2 and Nicholas McKay3 1Information Sciences Institute, University of Southern California, 2Department of Earth Sciences, University of Southern California, 3School of Earth Sciences and Environmental Sustainability, North Arizona University @yolandagil, @dgarijov {gil,dgarijo}@isi.edu Information Sciences Institute ISWC In-Use Track, Vienna, 2017