SlideShare a Scribd company logo
1 of 29
Download to read offline
Data Cleaning
and Data
Publishing
Workshop
2013 18-22
February,
Nairobi, Kenya
Javier Otegui
@jotegui
TAXONOMIC
ASSESSMENTS
¡ What is Taxonomy?
§ CBD – “Taxonomy is the science of naming, describing and
classifying organisms and includes all plants, animals and
microorganisms of the world”
§ Using morphological, behavioral, genetic and biochemical
observations, taxonomists identify, describe and arrange
species into classifications, including those that are new to
science.
¡ Taxonomy is related to:
§ the identification of an organism
§ Placing the organism in context with the rest of living
organisms
TAXONOMY – WHAT IS IT?
¡ Taxonomy is based on names
¡ Humans have always given names
¡ Binomial nomenclature
¡ Define individuals and groups
¡ Each name defines a taxon
TAXONOMY – TAXONOMIC NAMES
¡  Organization and
classification of
organisms
¡  According to common
features
¡  Taxonomic
classification
TAXONOMY - HIERARCHIES
http://wp.lps.org/jbenson2/blog/2012/01/18/january-18-taxonomy-chart-lab
¡ Taxonomy has a strong subjective component
¡ Classifications depend on the expertise and point of
view of the specialist
¡ Lots of episodes of:
§ Name removals
§ Taxon splits
§ Taxon merges
§ Different organizations according to different features
¡ Some cases…
TAXONOMY – NAMES AND TAXONOMIES
¡  Two different names are applied to the same organism
¡  Expert argues that two originally different taxa are the same
¡  Generally one name remains, the other is considered a
synonym and no longer valid
TAXONOMY - SYNONYMY
Photo: Arthur Chapman
Antilocapra americana
Ord, 1815
Antilocapra anteflexa
Gray, 1855
¡  Two different names are applied to the same organism
¡  Expert argues that two originally different taxa are the same
¡  Generally one name remains, the other is considered a
synonym and no longer valid
TAXONOMY - SYNONYMY
Photo: Arthur Chapman
Antilocapra americana
Ord, 1815
Antilocapra anteflexa
Gray, 1855
¡  The same name is applied to two different organisms
¡  New description using “already taken” name
¡  Generally, oldest name prevails and newest has to change
TAXONOMY - HOMONYMY
Echidna
Cuvier, 1797
Echidna
Forster, 1777
Photo: David R
Photo: Petr Baum
Photo: David R
Photo: Petr Baum
¡  The same name is applied to two different organisms
¡  New description using “already taken” name
¡  Generally, oldest name prevails and newest has to change
TAXONOMY - HOMONYMY
Echidna
Cuvier, 1797
Echidna
Forster, 1777
Photo: Petr Baum
¡  The same name is applied to two different organisms
¡  New description using “already taken” name
¡  Generally, oldest name prevails and newest has to change
TAXONOMY - HOMONYMY
Echidna
Cuvier, 1797
Tachyglossus
Illiger, 1811
¡ Taxonomic classifications are subjective
¡ Based on common features
¡ Different experts select different features
¡ Scientific names might remain the same
¡ Higher level taxa or groups might differ
¡ See example…
TAXONOMY – ALTERNATE
CLASSIFICATIONS
TAXONOMY – ALTERNATE
CLASSIFICATIONS
¡ Issues with names hamper the use of
taxonomic names alone to be
effective
¡ New term: Taxon concept
¡ Name – Concatenation of characters
¡ Concept – Name + context
¡ Even if the name is the same, the
concept is different since it applies
to different organisms
TAXONOMY – NAME VS CONCEPT
TAXONOMY - STANDARDS
¡  Taxonomic names: Scientific name and all higher taxa
¡  Taxon concept: taxonConceptID, nameAccordingTo,
namePublishedIn…
TAXONOMY - STANDARDS
¡  Taxonomic names: Scientific name and all higher taxa
¡  Taxon concept: taxonConceptID, nameAccordingTo,
namePublishedIn…
Source in which the specific taxon concept
circumscription is defined or implied
TAXONOMY - STANDARDS
¡  Taxonomic names: Scientific name and all higher taxa
¡  Taxon concept: taxonConceptID, nameAccordingTo,
namePublishedIn…
For taxa that result from identifications, a reference
to the keys, monographs, experts and other sources
should be given
¡ One of the most common issues
¡ Random alteration of one or more characters in a
name
¡ Possibilities:
§ Purely accidental
§ Due to low knowledge
¡ Tend to appear at the time of digitization
NOISE - MISSPELLINGS
NOISE - MISSPELLINGS
Photo: Barracuda1983
Pipistrellus
Pipistrelus Pippistrellus
Pipistrella Pippistrela
…
¡ Misidentification
§ A more obscure type of error
§ Wrongly identify a taxon
§ The only way of solving is through close examination by
expert taxonomist
§ Might not be resolvable at all
¡ Emptiness
§ Seriousness depends on missing level/s
§ Importance decreases as taxonomic rank increases
§ Scientific name missing?
§ Special cases: homonymies, synonymies…
NOISE – MISIDENTIFICATIONS &
EMPTINESS
¡ Not defining used taxonomy
§ Can have the same effect as having only scientific name
§ We might complete hierarchy, but reliability?
§ Providing employed taxonomy (taxonomic concept)
§ Use identification qualifiers: “Sensu Otegui, 2013”, or “Sensu
Biologia Centrali Americana”
¡ Synonymies and homonymies
§ Again, background information (metadata, taxonomic concept)
needed
§ Use of identification qualifiers
NOISE – NATURE OF TAXONOMY
¡  Instability of taxonomic identifications
¡  Background information greatly help
¡  Also having source of change records
NOISE – NATURE OF TAXONOMY
¡  Aims of taxonomic assessments
§  Correct issues
§  Reconcile taxonomies
§  Complete hierarchies
¡  Basic general process – controlled name list
§  Take a name
§  Check if exists in a reliable list of names
§  Extract related information
§  Apply to our dataset
ASSESSMENTS
¡  General Databases
§  Ideally, global high-quality information
§  Not complete
§  Rely on taxon-specific sources and their completeness
ASSESSMENTS – SOURCES OF DATA
¡  General Databases
§  Ideally, global high-quality information
§  Not complete
§  Rely on taxon-specific sources and their completeness
¡  Thematic databases and regional checklists
§  If our collection is taxon-specific or location-specific
§  Gather all available knowledge on their topic
§  Reliable authoritative sources
ASSESSMENTS – SOURCES OF DATA
¡  General Databases
§  Ideally, global high-quality information
§  Not complete
§  Rely on taxon-specific sources and their completeness
¡  Thematic databases and regional checklists
§  If our collection is taxon-specific or location-specific
§  Gather all available knowledge on their topic
§  Reliable authoritative sources
¡  Taxonomic Literature
§  Most specific source
§  Very high reliability
§  Hard to retrieve relevant literature
§  Some processing needed
ASSESSMENTS – SOURCES OF DATA
¡ Free of misspellings
§ Ab initio, or manage to reduce to the minimum
§ Some of the tools (Refine, Excel processing…) to accomplish
this
§ Taxonomic reconciliation depends on this requirement
¡ Completeness
§ At least to certain point
§ This minimum is scientific name
§ But only scientific name might not be enough
¡ Helpful metadata
§ Not related to the organism, but to the process of identification
§ The person who identified, taxonomic classification
ASSESSMENTS - REQUIREMENTS
¡  Manual
§  Removing inconsistencies, updating the wrong information
§  Taxonomy is an interpretation of explicit and implicit knowledge
§  Explicit knowledge – records
§  Implicit knowledge – human deduction
§  Machines are not good at interpreting implicit knowledge
§  Prone to errors. Automated approach recommended
¡  Automatic
§  Big amounts of data
§  Repetitive tasks
§  Removal of misspellings, checking against source, update
§  Only explicit knowledge. Explicit metadata mandatory
ASSESSMENTS - METHODS
ASSESSMENTS - SEQUENCE
¡  After cleaning, validate output
¡  Check:
§  The data that has been corrected
§  The data that could not be corrected
§  The data that might have gone worse
¡  Taxonomic validation:
§  Expertise
§  Mixture of explicit and implicit knowledge
§  Not completely automatable
¡  If assessments fail:
§  Our data – Document and report reliability
§  Distributed data – Flag and report
VALIDATION

More Related Content

Viewers also liked

Assistive technology ppt
Assistive technology pptAssistive technology ppt
Assistive technology ppt
pbush1
 
Technology Use in Special Education
Technology Use in Special EducationTechnology Use in Special Education
Technology Use in Special Education
guesta429eb
 
Assistive technology ppt
Assistive technology pptAssistive technology ppt
Assistive technology ppt
tinyrussell
 
Integrating Technology in a Special Education Classroom
Integrating Technology in a Special Education ClassroomIntegrating Technology in a Special Education Classroom
Integrating Technology in a Special Education Classroom
kmott
 

Viewers also liked (10)

Assistive technology for disabled students
Assistive technology for disabled studentsAssistive technology for disabled students
Assistive technology for disabled students
 
Technology for the disabled
Technology for the disabledTechnology for the disabled
Technology for the disabled
 
Assistive technology ppt
Assistive technology pptAssistive technology ppt
Assistive technology ppt
 
Information & Communication Technology for disabled
Information & Communication Technology for disabledInformation & Communication Technology for disabled
Information & Communication Technology for disabled
 
A Power Point on Assistive Technology in Education
A Power Point on Assistive Technology in EducationA Power Point on Assistive Technology in Education
A Power Point on Assistive Technology in Education
 
Technology Use in Special Education
Technology Use in Special EducationTechnology Use in Special Education
Technology Use in Special Education
 
Assistive technology ppt
Assistive technology pptAssistive technology ppt
Assistive technology ppt
 
Integrating Technology in a Special Education Classroom
Integrating Technology in a Special Education ClassroomIntegrating Technology in a Special Education Classroom
Integrating Technology in a Special Education Classroom
 
Hierarchy of Classification Groups - Biology
Hierarchy of Classification Groups - BiologyHierarchy of Classification Groups - Biology
Hierarchy of Classification Groups - Biology
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similar to ASSESSMENTS-Taxonomic-Assessments-Javier

KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptxKOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
PriyankaChakraborty95
 
05 phylogeny modern taxonomy
05   phylogeny modern taxonomy05   phylogeny modern taxonomy
05 phylogeny modern taxonomy
mrtangextrahelp
 
Species Concepts And Speciation
Species Concepts And SpeciationSpecies Concepts And Speciation
Species Concepts And Speciation
Mark McGinley
 
WHAT IS Taxonomy_Classification_17_.ppt
WHAT IS  Taxonomy_Classification_17_.pptWHAT IS  Taxonomy_Classification_17_.ppt
WHAT IS Taxonomy_Classification_17_.ppt
dawitg2
 

Similar to ASSESSMENTS-Taxonomic-Assessments-Javier (20)

The Good Species
The Good SpeciesThe Good Species
The Good Species
 
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptxKOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
 
Taxonomy
TaxonomyTaxonomy
Taxonomy
 
Science and Its Classification
Science and Its ClassificationScience and Its Classification
Science and Its Classification
 
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledgeFranz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
 
The Good species
The Good speciesThe Good species
The Good species
 
KAG B303.pptx
KAG B303.pptxKAG B303.pptx
KAG B303.pptx
 
Plant systematics
Plant systematicsPlant systematics
Plant systematics
 
Binomial Classification of Animals and Taxonomy
Binomial Classification of Animals and TaxonomyBinomial Classification of Animals and Taxonomy
Binomial Classification of Animals and Taxonomy
 
05 phylogeny modern taxonomy
05   phylogeny modern taxonomy05   phylogeny modern taxonomy
05 phylogeny modern taxonomy
 
2008 PGSAS Introduction
2008 PGSAS Introduction2008 PGSAS Introduction
2008 PGSAS Introduction
 
Polytraits: A database on biological traits of marine polychaetes
Polytraits: A database on biological traits of marine polychaetesPolytraits: A database on biological traits of marine polychaetes
Polytraits: A database on biological traits of marine polychaetes
 
Angiosperm systematics and biodiversity
Angiosperm systematics and biodiversityAngiosperm systematics and biodiversity
Angiosperm systematics and biodiversity
 
Species Concepts And Speciation
Species Concepts And SpeciationSpecies Concepts And Speciation
Species Concepts And Speciation
 
Franz. 2014. Explaining taxonomy's legacy to computers – how and why?
Franz. 2014. Explaining taxonomy's legacy to computers – how and why?Franz. 2014. Explaining taxonomy's legacy to computers – how and why?
Franz. 2014. Explaining taxonomy's legacy to computers – how and why?
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Taxonomy_Classification_17_.ppt
Taxonomy_Classification_17_.pptTaxonomy_Classification_17_.ppt
Taxonomy_Classification_17_.ppt
 
Biol161 01
Biol161 01Biol161 01
Biol161 01
 
Taxonomy of Angiosperm for M.Sc. Students
Taxonomy of Angiosperm for M.Sc. StudentsTaxonomy of Angiosperm for M.Sc. Students
Taxonomy of Angiosperm for M.Sc. Students
 
WHAT IS Taxonomy_Classification_17_.ppt
WHAT IS  Taxonomy_Classification_17_.pptWHAT IS  Taxonomy_Classification_17_.ppt
WHAT IS Taxonomy_Classification_17_.ppt
 

More from Javier Otegui

Highlighting Fitness-For-Use of Published Biodiversity Data
Highlighting Fitness-For-Use of Published Biodiversity DataHighlighting Fitness-For-Use of Published Biodiversity Data
Highlighting Fitness-For-Use of Published Biodiversity Data
Javier Otegui
 
CLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-JavierCLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-Javier
Javier Otegui
 
CLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-JavierCLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-Javier
Javier Otegui
 
ASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-JavierASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-Javier
Javier Otegui
 
Linking systems to improve data quality
Linking systems to improve data qualityLinking systems to improve data quality
Linking systems to improve data quality
Javier Otegui
 

More from Javier Otegui (7)

Highlighting Fitness-For-Use of Published Biodiversity Data
Highlighting Fitness-For-Use of Published Biodiversity DataHighlighting Fitness-For-Use of Published Biodiversity Data
Highlighting Fitness-For-Use of Published Biodiversity Data
 
CLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-JavierCLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-Javier
 
CLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-JavierCLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-Javier
 
ASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-JavierASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-Javier
 
Haciendo Ciencia en Abierto / Making Open Science
Haciendo Ciencia en Abierto / Making Open ScienceHaciendo Ciencia en Abierto / Making Open Science
Haciendo Ciencia en Abierto / Making Open Science
 
Linking systems to improve data quality
Linking systems to improve data qualityLinking systems to improve data quality
Linking systems to improve data quality
 
Biodibertsitatea... eta niri zer axola?
Biodibertsitatea... eta niri zer axola?Biodibertsitatea... eta niri zer axola?
Biodibertsitatea... eta niri zer axola?
 

ASSESSMENTS-Taxonomic-Assessments-Javier

  • 1. Data Cleaning and Data Publishing Workshop 2013 18-22 February, Nairobi, Kenya Javier Otegui @jotegui TAXONOMIC ASSESSMENTS
  • 2. ¡ What is Taxonomy? § CBD – “Taxonomy is the science of naming, describing and classifying organisms and includes all plants, animals and microorganisms of the world” § Using morphological, behavioral, genetic and biochemical observations, taxonomists identify, describe and arrange species into classifications, including those that are new to science. ¡ Taxonomy is related to: § the identification of an organism § Placing the organism in context with the rest of living organisms TAXONOMY – WHAT IS IT?
  • 3. ¡ Taxonomy is based on names ¡ Humans have always given names ¡ Binomial nomenclature ¡ Define individuals and groups ¡ Each name defines a taxon TAXONOMY – TAXONOMIC NAMES
  • 4. ¡  Organization and classification of organisms ¡  According to common features ¡  Taxonomic classification TAXONOMY - HIERARCHIES http://wp.lps.org/jbenson2/blog/2012/01/18/january-18-taxonomy-chart-lab
  • 5. ¡ Taxonomy has a strong subjective component ¡ Classifications depend on the expertise and point of view of the specialist ¡ Lots of episodes of: § Name removals § Taxon splits § Taxon merges § Different organizations according to different features ¡ Some cases… TAXONOMY – NAMES AND TAXONOMIES
  • 6. ¡  Two different names are applied to the same organism ¡  Expert argues that two originally different taxa are the same ¡  Generally one name remains, the other is considered a synonym and no longer valid TAXONOMY - SYNONYMY Photo: Arthur Chapman Antilocapra americana Ord, 1815 Antilocapra anteflexa Gray, 1855
  • 7. ¡  Two different names are applied to the same organism ¡  Expert argues that two originally different taxa are the same ¡  Generally one name remains, the other is considered a synonym and no longer valid TAXONOMY - SYNONYMY Photo: Arthur Chapman Antilocapra americana Ord, 1815 Antilocapra anteflexa Gray, 1855
  • 8. ¡  The same name is applied to two different organisms ¡  New description using “already taken” name ¡  Generally, oldest name prevails and newest has to change TAXONOMY - HOMONYMY Echidna Cuvier, 1797 Echidna Forster, 1777 Photo: David R Photo: Petr Baum
  • 9. Photo: David R Photo: Petr Baum ¡  The same name is applied to two different organisms ¡  New description using “already taken” name ¡  Generally, oldest name prevails and newest has to change TAXONOMY - HOMONYMY Echidna Cuvier, 1797 Echidna Forster, 1777
  • 10. Photo: Petr Baum ¡  The same name is applied to two different organisms ¡  New description using “already taken” name ¡  Generally, oldest name prevails and newest has to change TAXONOMY - HOMONYMY Echidna Cuvier, 1797 Tachyglossus Illiger, 1811
  • 11. ¡ Taxonomic classifications are subjective ¡ Based on common features ¡ Different experts select different features ¡ Scientific names might remain the same ¡ Higher level taxa or groups might differ ¡ See example… TAXONOMY – ALTERNATE CLASSIFICATIONS
  • 13. ¡ Issues with names hamper the use of taxonomic names alone to be effective ¡ New term: Taxon concept ¡ Name – Concatenation of characters ¡ Concept – Name + context ¡ Even if the name is the same, the concept is different since it applies to different organisms TAXONOMY – NAME VS CONCEPT
  • 14. TAXONOMY - STANDARDS ¡  Taxonomic names: Scientific name and all higher taxa ¡  Taxon concept: taxonConceptID, nameAccordingTo, namePublishedIn…
  • 15. TAXONOMY - STANDARDS ¡  Taxonomic names: Scientific name and all higher taxa ¡  Taxon concept: taxonConceptID, nameAccordingTo, namePublishedIn… Source in which the specific taxon concept circumscription is defined or implied
  • 16. TAXONOMY - STANDARDS ¡  Taxonomic names: Scientific name and all higher taxa ¡  Taxon concept: taxonConceptID, nameAccordingTo, namePublishedIn… For taxa that result from identifications, a reference to the keys, monographs, experts and other sources should be given
  • 17. ¡ One of the most common issues ¡ Random alteration of one or more characters in a name ¡ Possibilities: § Purely accidental § Due to low knowledge ¡ Tend to appear at the time of digitization NOISE - MISSPELLINGS
  • 18. NOISE - MISSPELLINGS Photo: Barracuda1983 Pipistrellus Pipistrelus Pippistrellus Pipistrella Pippistrela …
  • 19. ¡ Misidentification § A more obscure type of error § Wrongly identify a taxon § The only way of solving is through close examination by expert taxonomist § Might not be resolvable at all ¡ Emptiness § Seriousness depends on missing level/s § Importance decreases as taxonomic rank increases § Scientific name missing? § Special cases: homonymies, synonymies… NOISE – MISIDENTIFICATIONS & EMPTINESS
  • 20. ¡ Not defining used taxonomy § Can have the same effect as having only scientific name § We might complete hierarchy, but reliability? § Providing employed taxonomy (taxonomic concept) § Use identification qualifiers: “Sensu Otegui, 2013”, or “Sensu Biologia Centrali Americana” ¡ Synonymies and homonymies § Again, background information (metadata, taxonomic concept) needed § Use of identification qualifiers NOISE – NATURE OF TAXONOMY
  • 21. ¡  Instability of taxonomic identifications ¡  Background information greatly help ¡  Also having source of change records NOISE – NATURE OF TAXONOMY
  • 22. ¡  Aims of taxonomic assessments §  Correct issues §  Reconcile taxonomies §  Complete hierarchies ¡  Basic general process – controlled name list §  Take a name §  Check if exists in a reliable list of names §  Extract related information §  Apply to our dataset ASSESSMENTS
  • 23. ¡  General Databases §  Ideally, global high-quality information §  Not complete §  Rely on taxon-specific sources and their completeness ASSESSMENTS – SOURCES OF DATA
  • 24. ¡  General Databases §  Ideally, global high-quality information §  Not complete §  Rely on taxon-specific sources and their completeness ¡  Thematic databases and regional checklists §  If our collection is taxon-specific or location-specific §  Gather all available knowledge on their topic §  Reliable authoritative sources ASSESSMENTS – SOURCES OF DATA
  • 25. ¡  General Databases §  Ideally, global high-quality information §  Not complete §  Rely on taxon-specific sources and their completeness ¡  Thematic databases and regional checklists §  If our collection is taxon-specific or location-specific §  Gather all available knowledge on their topic §  Reliable authoritative sources ¡  Taxonomic Literature §  Most specific source §  Very high reliability §  Hard to retrieve relevant literature §  Some processing needed ASSESSMENTS – SOURCES OF DATA
  • 26. ¡ Free of misspellings § Ab initio, or manage to reduce to the minimum § Some of the tools (Refine, Excel processing…) to accomplish this § Taxonomic reconciliation depends on this requirement ¡ Completeness § At least to certain point § This minimum is scientific name § But only scientific name might not be enough ¡ Helpful metadata § Not related to the organism, but to the process of identification § The person who identified, taxonomic classification ASSESSMENTS - REQUIREMENTS
  • 27. ¡  Manual §  Removing inconsistencies, updating the wrong information §  Taxonomy is an interpretation of explicit and implicit knowledge §  Explicit knowledge – records §  Implicit knowledge – human deduction §  Machines are not good at interpreting implicit knowledge §  Prone to errors. Automated approach recommended ¡  Automatic §  Big amounts of data §  Repetitive tasks §  Removal of misspellings, checking against source, update §  Only explicit knowledge. Explicit metadata mandatory ASSESSMENTS - METHODS
  • 29. ¡  After cleaning, validate output ¡  Check: §  The data that has been corrected §  The data that could not be corrected §  The data that might have gone worse ¡  Taxonomic validation: §  Expertise §  Mixture of explicit and implicit knowledge §  Not completely automatable ¡  If assessments fail: §  Our data – Document and report reliability §  Distributed data – Flag and report VALIDATION