2. ¡ What is flagging?
§ Adding a piece of information to a record or PBD
§ Give extra information on something
§ Especially used to highlight records to inform collector or user
¡ Aims of error flagging:
§ Provide a simple way of filtering records that might be
problematic
§ Very useful for automated error processing
§ Reporting issues to the owner
¡ Difference between flagging and resolving:
§ Ownership
INTRODUCTION – ERROR FLAGGING
3. DATA IS OURS
¡ We are directly
responsible for the
quality
¡ We may share the
master copy of the data
¡ We can directly improve
the quality of the data
and serve it
DATA IS NOT OURS
¡ We are not directly
responsible for the
quality
¡ We point to the original
source
¡ We cannot directly
improve the quality of
the data and serve it
INTRODUCTION – OWNERSHIP
4. ¡ What is flagging?
§ Adding a piece of information to a record or PBD
§ Give extra information on something
§ Especially used to highlight records to inform collector or user
¡ Aims of error flagging:
§ Provide a simple way of filtering records that might be
problematic
§ Very useful for automated error processing
§ Reporting issues to the owner
¡ Difference between flagging and resolving:
§ Ownership
¡ Why flag and not resolve? Attribution and persistence
INTRODUCTION – ERROR FLAGGING
5. ¡ Data from an aggregator – certain restrictions or conditions
¡ Acknowledge the original source of the data
¡ Each collection might have additional rules
INTRODUCTION - ATTRIBUTION
6. ¡ Data from an aggregator – certain restrictions or conditions
¡ Acknowledge the original source of the data
¡ Each collection might have additional rules
INTRODUCTION - ATTRIBUTION
7. ¡ Data from an aggregator – certain restrictions or conditions
¡ Acknowledge the original source of the data
¡ Each collection might have additional rules
INTRODUCTION - ATTRIBUTION
8. ¡ Persistence of the correction
¡ Local work = no permanence of corrections
¡ Next researcher must repeat the cleaning process
¡ Error flagging as an excellent tool for reporting
issues
¡ Once reported, owners can clean the data
¡ Example or flagging: annotations
INTRODUCTION - PERSISTENCE
9. ¡ Data manipulation – add a piece of information to the original
record
¡ New fields, populated if an issue is detected
¡ Recommendation: use (and document) a codification
INTRODUCTION - MECHANISMS
Coordinates swapped
Swapped coordinates
Coordinates transposed
Coordnates transposed
…
1
1
1
1
1
10. ¡ Data Usage Terms
§ Accepted when using the portal
§ Among others, the need to cite the data
¡ Data Sharing Agreement
§ “GBIF Secretariat may cache a copy and serve full or partial data
further to other users together with the terms and conditions for use
set by the Data Publisher”
§ Partial based on detected issues in the quality
¡ How do they detect issues?
§ Processing routines search for most common issues
§ Errors are flagged – They cannot alter the data
§ Flags used to alert users and reported back to owners
INTRODUCTION – EXAMPLE: GBIF
11. INTRODUCTION – EXAMPLE: GBIF
Coordinates fall outside specified
country, territory or island
12. INTRODUCTION – EXAMPLE: GBIF
138,458 records with coordinates
138,312 records in map
146 records with wrong coordinates
13. ¡ What happens when errors are flagged?
¡ Flags or annotations should reach the owner
¡ Owner is the only one who can solve issues at the
source
¡ Corrected data is then deployed and re-indexed
¡ This has happened often…
INTRODUCTION – RESOLUTION PATH
15. ¡ Key factor: awareness and implication of data owners
§ Some owners correct their data
§ Some owners don’t
¡ Without this step, the process of error flagging loses
part of its sense
INTRODUCTION – RESOLUTION PATH
16. ¡ Error flagging can be applied to several data storage
formats
¡ Each format has its own requirements
¡ Formats:
§ Text files: tab-delimited, CSV files…
§ Spreadsheets: LibreOffice Calc, Google Spreadsheets, Microsoft
Office…
§ Database tables
ERROR FLAGGING
17. ¡ On some aspects, the most comfortable way of
managing data
¡ Semi-structured, visual management of information
§ Rows, columns and cells
§ Not determined to hold any specific type of data
§ Plotting records in several ways
¡ Calculations with cells
¡ Some of the most common operations:
ERROR FLAGGING – SPREADSHEETS
24. ¡ Error flagging – the process of reporting
issues without modifying the original data
¡ Useful when working with shared data
¡ In Spreadsheets
§ Simple, yet powerful
§ Adaptable levels of difficulty
§ Several possibilities to filter and flag records
CONCLUSION