2. Primary Biodiversity Data
Information directly collected in the field
What has been collected where
When is also important for multiple uses
Types of PBD - occurrences
Museum specimens
Field observations
Fossils, literature, germplasm…
DEFINITIONS - PBD
3. Precision
Closeness of repeated measurements to the same value
Accuracy
Closeness of a measurement to the true value
DEFINITIONS – PRECISION AND
ACCURACY
5. Precision
Closeness of repeated measurements to the same value
Accuracy
Closeness of a measurement to the true value
Accuracy depends on knowing the true value of the
variable
Precision is an intrinsic value
DEFINITIONS – PRECISION AND
ACCURACY
6. Geospatial data best representation: coordinates
Standard way of reporting coordinates: decimal degree
Precision in geospatial data ~ precision in coordinates
Geospatial information in several formats
GEOSPATIAL DATA - PRECISION
-1.2, 36.8 -1.2219, 36.8967
7. GEOSPATIAL DATA - PRECISION
55.932576, 13.132359
Anahuac NWR (UTC 049)
Grandville
POINT(-1.3223333 53.44958)
Marine Nature Study Area
78º 47’ 52” S; 35º 50’ 31” E
Stewart Park
POINT(-1.1735004 53.358746)
Backyard
My Habitat
55.932576, 13.132359
Wilderness Park, north of 14th St.
28054
Delaney Conservation Area
57.3, 11.9
…
8. Geospatial data best representation: coordinates
Standard way of reporting coordinates: decimal degree
Precision in geospatial data ~ precision in coordinates
Geospatial information in several formats
Low precision in the original data is impossible to solve a
posteriori
GEOSPATIAL DATA - PRECISION
9. Sometimes, low precision data is encouraged
Endangered species
Commercially interesting species
…
Sensitive data should not be directly available in high-
resolution (precise) format
Good practice: Provide low precision information but keep the
original high-precision data
Level of imprecision depends on level of threat
GEOSPATIAL DATA - PRECISION
10. Low precision = reduced usability of the data
Low accuracy = wrong conclusions if used without
caution
Causes:
Malfunction of devices
Wrong interpretation in transformations between systems
Issues in digitization
GEOSPATIAL DATA - ACCURACY
11. Transformations are prone to errors if not handled carefully
Wrong formula when converting DMS to DD
Wrong datum when converting UTM to DD
Issues at the time of digitization
Transposition of coordinates 45.34, -9.16 => -9.16, 45.34
Forget the minus sign 45.34, -9.16 => 45.34, 9.16
Use comma instead of period 45.34, -9.16 => 45,34, -9,16
Transform coordinates to zero 45.34, -9.16 => 0,0
…
Some methods could reduce precision to gain accuracy
GEOSPATIAL DATA - ACCURACY
12. Precision as completion of higher taxonomic levels
Depends on the lowest taxonomic rank that has
information
Lowest level = genus, fairly precise, broader usability
Lowest level = class, poorly precise, narrow usability
Threshold depends on several factors:
Scope of the analyses
Taxonomic group
TAXONOMIC DATA - PRECISION
13. Mainly due to one of these two:
Use of a wrong taxonomy
Inaccuracies in the identification
Taxonomic hierarchies are subjective and different
taxonomies exist
With poor data, incomplete data, how to rely on
identification?
“Taxonomic assessments” section
TAXONOMIC DATA - ACCURACY
14. Wrong identification of organism, due to:
Lack of expertise
Bad identification environment
Expert curation needed to improve reliability of
identification
Annotations, flags, reviews… don’t prevent the issue,
but help to its resolution
Different PBD types = different reliability
Museum specimens – reviewable, higher reliability
Field observations – reliability depends on expertise, non-
reviewable
TAXONOMIC DATA - ACCURACY
15. Precision refers to degree of completion of elements
DarwinCore Standard recommends ISO 8601
Wide range of date formats
Canonical: YYYY-MM-DD
Reduced formats: YYYY-MM, YYYY, CC
Problems:
Usability of low-precision dates
Ambiguity of some formats: 19 = 1919 / XIX / 2019?
Solution relies on solid date parsers or human interpretation
Parsers: Hard to build
Humans: get overwhelmed too easily
TEMPORAL DATA - PRECISION
16. Element swapping
Information in the wrong field
Sometimes self-detectable – 2012-19-02
Best solution: go back to the original data
Misspellings
Date shrink – 1996 = 196
Date change – 1996 = 1986
Again, best solution: go back to the original data
TEMPORAL DATA - ACCURACY
17. Low precision – reduce range of possible uses
Usable for many applications
Geospatial – regional or national checklists
Taxonomic – large group assessments
Temporal – large-scale assessments, such as climate change
Still, a minimum precision is required
Low accuracy – can lead to wrong conclusions
Reducing precision can mask inaccuracies
Inaccuracies might be hard to spot
Collating accurate and inaccurate data for error detection
GENERAL IMPLICATIONS
19. A little is better than nothing
Absence of data could be seen as better than
wrong data
Vague and/or wrong data, together with good
data can help in detecting issues
CONCLUSION