Abstract. The interest of “Citizen Scientists” in their local environment is
potentially of great value because they can assist in supplying essential “Environmental
Knowledge” in an efficient and cost-effective way. This is particularly
the case when “Volunteered Data” is registered in a standardized manner,
interoperable with the data created by official institutions. The present work
incorporates OpenStreetMap (OSM) and broadly accepted metadata-standards,
that are controlled by scientific communities, to include the use of standardized
interfaces for volunteered data contributions. An essential requirement for citizen
science to operate, is the participation of the people. Spatial cognition is concerned
with the acquisition, organization, employment, and examination of
“knowledge about spatial environments”. By this means “knowledge about
spatial environments” is related to geographic proximity. Both OSM and metadata
standards explore recent technologies for “Semantic Web” (SW) and
“Linked Open Data” (LOD) enablement. The present study discusses the challenges
and effects of standardized community contributions.
DOI: https://doi.org/10.1007/978-3-319-60642-2_39
URL: https://www.springerprofessional.de/citizen-science-involving-collections-of-standardized-community-/12355456
Werner Leyh1, Maria Fava2, Narumi Abe2, Sandra Cavalcante3,
Leandro Giatti3, Carolina Monteiro de Carvalho3,
Homero Fonseca Filho4, and Clemens Jacobs5
1 Department of Computer Science,
University of São Paulo (USP), São Paulo, Brazil
WernerLeyh@yahoo.com
2 São Carlos School of Engineering,
University of São Paulo (USP), São Paulo, Brazil
mfava7@gmail.com, mail.narumi@gmail.com
3 Department of Environmental Health,
University of Sao Paulo (USP), São Paulo, Brazil
sandracavalcante@uol.com.br, lgiatti@usp.br,
carvalhocm@gmail.com
4 Environmental Management, School of Arts, Sciences and Humanities,
University of São Paulo, USP, São Paulo, SP, Brazil
homeroff@usp.br
5 GIScience, Institute of Geography, Heidelberg University,
Heidelberg, Germany
c.jacobs@uni-heidelberg.de
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Citizen Science Involving Collections of Standardized Community Data
1. W E R N E R L E Y H , M A R I A F A V A , N A R U M I A B E , S A N D R A C A V A L C A N T E ,
L E A N D R O G I A T T I , C A R O L I N A M O N T E I R O D E C A R V A L H O , H O M E R O F O N S E C A
F I L H O A N D C L E M E N S J A C O B S , B R A Z I L / G E R M A N Y
C O G N I T I V E C O M P U T I N G A N D I N T E R N E T O F T H I N G S ( C C I O T )
I N F O R M A T I O N M A N A G E M E N T : T E C H N I Q U E S A N D A P P L I C A T I O N S ( S 1 9 2 )
F R I D A Y , J U L Y 2 1 , 8 : 0 0 - 1 0 : 0 0
L O S A N G E L E S , C A , U S A
h t t p : / / w w w . a h f e 2 0 1 7 . o r g / p r o g r a m 3 . h t m l
R e l a t e d w o r k a n d p u b l i c a t i o n :
h t t p s : / / w w w . s p r i n g e r p r o f e s s i o n a l . d e / c i t i z e n - s c i e n c e - i n v o l v i n g - c o l l e c t i o n s - o f -
s t a n d a r d i z e d - c o m m u n i t y - / 1 2 3 5 5 4 5 6
CITIZEN SCIENCE INVOLVING
COLLECTIONS OF STANDARDIZED
COMMUNITY DATA
2. Overview: The great picture
Summarizing
Our
Contribution
To
IMPROVE THE IMPACT
of
CITIZEN SCIENCE DRIVEN
DATA COLLECTIONS
3. Overview: The great picture –
What, Why, How, When and Where
What Both OSM and METADATA STANDARDS explore generally recent
technologies for “LINKED OPEN DATA” (LOD) enablement.
The present study discusses the CHALLENGES and EFFECTS of
STANDARDIZED COMMUNITY CONTRIBUTIONS.
Why For WIKIDATA AND OSM to support computing applications MOST
EFFECTIVELY, their structured data must have a HIGH DEGREE OF
STANDARDIZATION.
There is an INHERENT TENSION between the STANDARDIZATION
NEEDS of structured data and the ethos of CONTRIBUTOR FREEDOM.
How CONTROLLED VOCABULARIES can be applied in SIMILAR WAYS
to DOMAIN ONTOLOGIES. Both may be “STANDARDIZED”, like the
Dublin Core metadata standard and the Darwin Core metadata standard.
This kind of STANDARDIZATION can lead to their (scientific, cultural,
commercial) DISSEMINATION.
When 2017
Where International.
4. Overview: Outline and Content
Motivation
Citizen Science (CS) in data driven surveys – Opportunities and challenges
Overview: The great picture
What, Why, How, When and Where
Motivation
Citizen Science (CS) in data driven surveys – Opportunities and challenges
Introduction
Context and Former work
Part (1) - Research question - Where is the big difference between OSM and GIS ?
Nodes versus Layers , Impact, Local spatial cognition
Part (1) – Results: Spatial cognition – Nodes versus Layers
Do not limit their geographical reasoning to just one thematic layer to just one thematic layer to just one thematic layer (class of data)
Challenges with Standards – its all about sharing
Preparedness, Individualism, Overlapping Standards, Competition, Interoperability
Part (2) - Research question - Looking for interoperability
Why controlled vocabularies?
Part (2) - Approach
Attributes that can be explored to characterize the neighborhoods
Local spatial cognition
Part (2) - Approach: Local spatial cognition
Attributes that can be explored to explored to characterize the neighborhoods
Part (2) - Results
Need of Controlled Vocabularies
Part (3) - Research question - Why Controlled Vocabularies are particularly important + difficult to apply in Citizen Science ?
Significant new opportunities and challenges arises from the huge increase of data from varied devices
Part (3) - Results - Why Controlled Vocabularies are particularly important + difficult to apply in Citizen Science ?
Inherent tension between the standardization needs and contributor freedom
Flat vocabularies composed by terms with taxonomic relations may represent so-called light-weight ontologies weight ontologies weight ontologies
Applying controlled vocabularies means exploring an already developed, accumulated and maintained domain knowledge
Conclusion
Applying controlled vocabularies means
Exploring an already developed, accumulated and maintained domain knowledge,
e.g. through highly sophisticated educational infrastructures.
Annex
5. Motivation: Citizen Science (CS) in data driven
surveys – Opportunities and challenges
Device Sensor
Human Sensor
data
Authoritative
data
6. Introduction: Context and Former work
IN THE ANNEX WE ARE INTRODUCING
➢ Citizen Science
➢ VGI and OpenStreetMap
➢ Spatial Cognition
➢ Controlled Vocabularies
➢ Lightweight ontologies
PLEASE CONSIDER ALSO OUR MAIN REFERENCES:
Comber et al. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158329
Muki Haklay https://link.springer.com/chapter/10.1007/978-94-007-4587-2_7
Hall et al. http://dl.acm.org/citation.cfm?id=3025940&dl=ACM&coll=DL
Good et al. http://icbo.cgrb.oregonstate.edu/node/331
Mokhtar Henda http://www.benhenda.com/eng/node/64
Sachs and Finin http://ebiquity.umbc.edu/paper/html/id/572/Social-and-Semantic-Computing-in-Support-of-
Citizen-Science
Eckle and Porto
de Albuquerque
https://www.semanticscholar.org/paper/Quality-Assessment-of-Remote-Mapping-in-OpenStreet-
Eckle-Albuquerque/15be7e569d71085aaff38a5b37e2f54eadc8288c
7. Part (1) - Research question
Regarding the
PROCESS OF SPATIAL COGNITIONS
in
VOLUNTEERED DATA COLLECTIONS:
Where is the big difference between
OPENSTREETMAP (OSM)
and typical
GEOGRAPHICAL INFORMATION SYSTEMS (GIS) ?
8. Results (1): Spatial cognition: Geographic Information
systems (GIS) and OpenStreetMap (OSM)
The big difference for spatial cognition
# Geographic Information systems (GIS) are working with LAYERs
# OpenStreetMap (OSM) is working with NODES
Impact:
# OSM´S DATABASE architecture
ANIMATE VOLUNTEERS
to recognize and
CONSIDER THE TOPOLOGICAL RELATIONS WITH
NEIGHBORHOODS
and
DO NOT LIMIT THEIR GEOGRAPHICAL REASONING
TO JUST ONE THEMATIC LAYER (class of data).
9. Challenges: Standards – its all about sharing
1. Not everyone is WILLING to link and share;
2. Not every standard FIT BEST everywhere;
3. COMPETITION between OVERLAPPING standards is frequent;
4. Many individuals (fiscal and juridical persons) tend to great “their own
standard”: INDIVIDUALISM
5. The same thing may be described perfectly with different scientific
vocabularies – REPRESENTING DIFFERENT POV´S
10. Challenges: Standards – its all about sharing
If we are
➢ ABLE AND WILLING
to
➢ SHARE
our
➢ CONTRIBUTION (of Data / information / knowledge / services …)
must be somehow INTEROPERABLE with our TARGET (database, etc).
11. Part (2) - Research question
Looking for interoperability:
Why controlled vocabularies ?
12. Part (2) - Approach: semantic Interoperability
To achieve this SEMANTIC INTEROPERABILITY
➢ our contribution must have a characteristic in common with our target;
➢ Many people use the GEOGRAPHICAL LOCATION for this integration:
13. Part (2) - Approach: Local spatial cognition
What is special on GEOGRAPHIC COORDINATES?
1) Yes, the definition of a GEOGRAPHIC LOCATION may serve as an attribute
of a particular spatial feature, but, in reality, it is also much more:
2) Geographic coordinates define also “thousands” of GEO-RELATED
CONDITIONS in their local "neighborhood": natural, political, social, etc.
3) Geographic coordinates are therefore an excellent resource for
INTEGRATION OF LOCAL KNOWLEDGE contributed by “local” citizen
scientists.
14. Part (2) - Approach: Join by attribute-based
semantic mapping for data-integration
But besides LOCATION, there are many OTHER ATTRIBUTES that can be
explored to CHARACTERIZE THE NEIGHBORHOODS of our (e.g. data)
contribution, e.g.,
➢ Time,
➢ Topology,
➢ Taxonomy,
➢ Provenance, etc. etc, and
➢ Knowledge definitions (ontological models).
15. Part (2) - Results: Need of Controlled Vocabularies
Result:
When INTEGRATING OUR CONTRIBUTION (e.g. data)
With an ALREADY EXISTING DATABASE (target)
we have generally MANY OPTION to choose
appropriate ATTRIBUTE FOR JOINING
But
In any case we need a CONTROLLED VOCABULARY !
16. Part (2) - Results: Use of Controlled Vocabularies
> If scientists are working in KNOWLEDGE DOMAINS
>> they are discussing their ideas with CONCEPTS defined by TERMS
>>> they are describing DOMAIN KNOWLEDGES with its
DOMAIN VOCABULARIES.
>>>> These terms must, of course, be well defined
>>>>> they are using CONTROLLED VOCABULARY:
COMMUNICATION (between humans or machines ) works only if we use such
controlled vocabularies.
17. Part (3) - Research question
Why Controlled Vocabularies are particularly
important + difficult to apply
in Citizen Science ?
18. Part (3) - Results:
Huge increase in data production capacity
A significant NEW CHALLENGE arises from the
HUGE INCREASE in data production capacity
by the VARIED DEVICES (cellphones and many others),
which has an impact
on the data sharing challenge,
the PRACTICES FOR SCIENTIFIC RESEARCH,
and the modern, ontology-based
METHODS FOR THE DEFINITION OF KNOWLEDGE.
19. Part (3) - Results: Use of Flat Vocabularies to
represent Knowledge
Based on former work we argue in our paper (DOI 10.1007/978-3-319-60642-
2_39) that:
➢ CONTROLLED DOMAIN VOCABULARIES (CV) may represent a DOMAIN
OF KNOWLEDGE, similar to a DOMAIN ONTOLOGY.
➢ The reason for this is that even FLAT VOCABULARIES, composed by terms
with TAXONOMIC RELATIONS, in reality, may represent so-called LIGHT-
WEIGHT ONTOLOGIES.
➢ They can be successfully used to REPRESENT KNOWLEDGE in the domain
of interest.
20. Part (3) - Results: Why are there difficulties in using
Standards in Community – driven Data Collections?
Based on former work and WIKIDATAs and OSMs Talk-pages we argue in our
paper (DOI 10.1007/978-3-319-60642-2_39) that:
➢ For WIKIDATA AND OSM to support computing applications most
effectively, their structured data must have a HIGH DEGREE OF
STANDARDIZATION.
➢ There is an INHERENT TENSION between the STANDARDIZATION
needs of structured data and the ethos of CONTRIBUTOR FREEDOM.
➢ Wikipedia tempers contributor freedom with a set of policies (such as “Neutral
Point of View”) that are strictly enforced by the community.
➢ OSM’s “Good Practice” says “Nobody is forced to obey (the OSM guidelines),
nor will OSM ever force any of its mappers to do anything
21. Conclusion: How Does the use of Standardized Data
Affect the Contribution Made by CS?
Controlled vocabularies can be applied in similar ways as domain ontologies.
Applying controlled vocabularies means
➢ EXPLORING AN ALREADY DEVELOPED, ACCUMULATED AND
MAINTAINED DOMAIN KNOWLEDGE,
which, in many cases, is supported by a
➢ HIGHLY SOPHISTICATED EDUCATIONAL INFRASTRUCTURES.
… as in the case of our two examples, “Dublin Core” and “Darwin Core”.
22. Werner Leyh
h t t p s : / / w i k i . o s g e o . o r g / w i k i / U s e r : W e r n e r L e y h
Grupo de Pesquisa CNPq/USP
I N F R A E S T R U T U R A D E D A D O S E S P A C I A I S ( G E P I D E )
h t t p : / / d g p . c n p q . b r / b u s c a o p e r a c i o n a l / d e t a l h e g r u p o . j s p ? g r u p o =
0 0 6 7 1 0 7 H R Y 8 K T 0
Questions ?
Interested in linking Wikidata, Openstreetmap and
scientific Datasets?
Join us !
24. Annex: Citizen Science (CS): A Volunteered
Geographic Information (VGI) Subset
Haklay
➢ defines CS as the SCIENTIFIC ACTIVITIES IN WHICH NON-
PROFESSIONAL SCIENTISTS VOLUNTEER TO PARTICIPATE in data
collection, analysis and the dissemination of a scientific project and
➢ argues that NOT ALL CS IS GEOGRAPHIC; that is,
➢ although it involves a project in which a particular location on Earth plays a key
role, WITHIN THESE VGI PRACTICES, THERE IS A SUBSET THAT
FALLS INTO THE CATEGORY OF CS.
25. Annex: Lightweight (domain) ontology
SCIENTIFIC VOCABULARIES composed by terms with TAXONOMIC
RELATIONS:
▪ biological life classification;
▪ family classification;
▪ astronomical classification;
such vocabularies may represent KNOWLEDE of a DOMAIN
such vocabularies may therefore be CONSIDERED as a lightweight (domain)
ontology
such vocabularies may therefore be USED as a lightweight (domain) ontology
… as an excellent REFERENCE for citizen science