SlideShare a Scribd company logo
1 of 47
HIGHLIGHTING FITNESS-FOR-USE OF
PUBLISHED BIODIVERSITY DATA
Javier Otegui, Arturo H. Ariño
University of Navarra
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
PUBLISHING DATA
• Data published in papers
• Data papers published
• Data published
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
WHAT, WHERE, WHEN
OUR TARGET DATA
Primary Biodiversity Data Record
PBR
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
WHAT, WHERE, WHEN
PBR
Megaptera novaehollandiae
Adult female, live
Off North Truro, MA, USA
42.101 N, 70.169 W
2010.09.29 21:47 GMT
Arturo H. Ariño
Aboard Dolphin VI
Canon Eos 450D, 200 mm lens
un
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
GBIF GEOREFERENCED DATA
237.348.923 animal data records by Oct. 2012 (total georeferenced records: 327.048.532)
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
THE CASE FOR DIRECT DATA PUBLICATION
• Access to massive data increasingly
commonalized: GBIF
• Spectrum of possible uses increasing: new
science, new paradigms
• Data-Intensive Science
– Reliance on good data: Opportunity for discovery
– Reliance on bad data: Risk of “undiscoveries”
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
WHAT, WHERE, WHEN
PBR
Nautilus pompilus
4 specimens
Off Palau Islands
1921
Legit :unknown
Det.: J.A. Salinas
Collection: JDR at MZNA
un
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
BARRIERS TO DIRECT DATA PUBLICATION
• Data availability
• Data sharing mechanisms
• Data publication incentives
• Data quality
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
DATA AVAILABILITY INCREASE
GBIF, October 2012
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Ariño, 2010. Biodiv. Informat. 7: 15-26
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
ESTIMATED DATA IN MISSING COLLECTIONS
BCI
GBIF
GSAP –DNHC
survey
unknown
est. CI
Cexp = 8.37K
Nexp = 2.01G
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
WHAT, WHERE, WHEN
PBR
Saccharomyces cerevisae
TCP1-beta
Cask in wreck of ARGO
2 km E of Akta Képhalos
Stratum 2000 BC
Legit : Homer S.
Det.: LoScanSQ-X
Collection: Museum of Beer History
(MBH)
un
??
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
THE TRUST PARADOX
• Papers are generally more trusted than raw/downloadable data
– Papers have gone through peer review
• Published data have common sources:
– Experiments,
– Observations,
– Digitizations
• Raw data in published papers can go unreviewed
– Review focuses on soundness, methods, conclusions
– Data assumed to be true & correct
• Direct publication of data, in fact, should facilitate revision
– Enforcing rules
– Filtering
– Pattern detection
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
FITNESS-FOR-USE
• FFU defines whether data can be used for a
specific purpose
• Useful compromise for publishing data
• FFU not equal to data quality
Quality Fitness-for-use
Intrinsic to data Depends on intended use
Conceptual Pragmatical
Good quality predicting good FFU Good FFU not predict good quality
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
FFU ASSESSMENT
• In 2006 AHA started analyzing our own DB for
FFU, creating pattern-detection visualizations
– First reported in TDWG-2006 (St. Louis)
• In 2008 we started to analyze raw & processed
GBIF data (2.4G records by 2012) (JOT’s thesis)
– Building on works by Chapman, Yesson, Wieckzorek,
etc., changing scope and perspective
• Started producing reports in 2009, 2010
• Teamed up with GBIF-Sec, 2011
• Created BIDDSAT, 2012
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Latitude Longitude
30.37, -87.1765
48.5584, -123.463
42.3487, -123.78
41.7866, -100.061
-73.9071, 42.7028
38.8749, -104.88
44.3964, -75.6668
42.1927, -89.106
32.693, -79.9606
44.2124, -88.42
41.6637, -81.3782
39.6992, -121.778
46.13, -72.7196
38.7231, -77.0674
27.7349, -82.6479
36.0852, -121.616
39.0901, -77.5203
-83.1662, 43.0622
41.2956, -74.5956
45.5146, -73.8131
42.0755, -122.759
41.1047, -81.4944
42.4792, -89.0333
40.6956, -74.8913
...
-
+
Otegui & Ariño, 2009. Proceedings of the TDWG 2009 Annual Conference, Montpellier, FR
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
DATA INDEXING
Provider
B
Provider
A
Provider
C
Provider
D
GBIF index
?
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
DATA QUERYING
Provider
B
Provider
A
Provider
C
Provider
D
GBIF index
?
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Análisis detallado de GBIF Detailed assessment of GBIF
Bad data Good data
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Otegui , Ariño, Gaiji & Chavan, in press
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
CONTROL AND FFU TOOLS AT INDEXING
Gaiji et al., 22011-2012 –
EMBARGOED DECEMBER 2012
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
CORRECTING MECHANISMS: EXAMPLES
• GBIF has implemented many georeferencing correction
algorithms, such as e.g. coordinate/country match
• This removes many bogus data points, for example redressing
reversed lat/long when serving data
• Still, original data need to be corrected: GBIF cannot alter
original data (only tag them)
David Remsen, TDWG-2011. In ViBRANT, http://vbrant.eu/content/gbif-integration
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
2010.04.28
2011.14.09 ErrCode: 10
10: Fields “month” and “day” probably swapped
FILTER MECHANISMS: EXAMPLE
• Original data unchanged
• Index entry corrected
• Error entry generated in issue log
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
-
+
Otegui et al., 2012
FILTERS CANNOT GET ALL
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Otegui , Ariño, Gaiji & Chavan, in press
FILTERS CANNOT SOLVE ALL
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Modified from Otegui et al. In press
All GBIF data
Some date
element wrong
Some date element
missing
All date elements
missing
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
BIDDSAT
• Tool to detect space-time and other patterns
• Applicable to data publishers sharing data
through GBIF
• Uses tailored visualizations
• http://www.unav.es/unzyec/mzna/biddsat/
• Open source: https://github.com/jotegui/BIDDSAT
• Bioinformatics, DOI: 10.1093/bioinformatics/BTS359
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
0 100
Percentage of completeness
Numberofcollections
015304560
Source: BIDDSAT
DATA COMPLETNESS
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
0 100
Percentage of completeness
Numberofcollections
015304560
• Wrong
implementation of
exchange standards
(DwC) – solvable
• Data loss – not
solvable
• Limited room for
improvement
Fuente: BIDDSAT
DATA COMPLETNESS
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Data Provider LEONIDAS, Resource SHIELD
GBIF 2008/05 Version
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Data Provider LEONIDAS, Resource SHIELD
GBIF 2009/09 Version
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
1/Jan31/Dec
1/Mar
1/Feb
1/Apr
1/May
1/Jun
1/Jul
1/Aug
1/Sep
1/Oct
1/Nov
1/Dec
Fall
Winter
Spring
Summer
1750 Year 2012
-
+
Cronhorogram. Introduced by
Ariño & Otegui, 2008, TDWG
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Source: BIDDSAT
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
- +
Hebdogram. Iintroduced by Ariño & Otegui, 2008. Proceedings of TDWG
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Ariño, Otegui & Robles, 2009
Provider 180
All datasets
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
2008/05
Data Provider
Codename:
BORODIN
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Data Provider Codename: BORODIN
2009/092008/05
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Treemap by Google Charts API on authors’ data
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Fungi
INDEX TAXONOMY
Gaiji et al. in press
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
• Patchy data publishing… also in papers
• Opportunistic behavior: “Low-hanging fruit”
• Data can (and will) evolve
• The human factor still counts
PATTERNS OF PATTERNS
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
0
5
10
15
20
25
0 5 10 15
Clase de distancia
Clasedeimprecisión
0 10000 20000 30000 40000
Chordata
Orthoptera
Lepidoptera
Hymenoptera
Diptera
Coleoptera
Thysanoptera
Collembola
Acari
Polychaeta
Oligochaeta
Nematoda
Georreferenciado
Localidad sin
coordenadas
Sin localidad
PAPER WOES: PBR FROM LITERATURE
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
Publisher: Swedish
Publisher: German
Publisher: French
Publisher: British
Publisher: Norwegian
A MATTER OF CONVENIENCE
Otegui, Robles & Ariño, 2009. eBiosphere, London, UK.
Publisher: Parisien
Publisher: Spanish
JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
CLASSIFICATION
ACCORDING TO:
Ariño, Otegui & Robles, 2009
PROVIDERSP2K
GBIF RECORDS
SAMPLE
EVOLUTIONARY DATA
T H E E N D
THANK YOU
WITH SPECIAL THANKS TO:
VISHWAS CHAVAN, SAMY GAIJI, ANDREA HAHN, TIM ROBERTSON, AND
THE DIGIT SCIENCE SUBCOMITEE AND THE GSAP-NHC AND CNA TASK GROUPS
THE GBIF SECRETARIAT (COPENHAGUEN) AND
THE SPANISH COORDINATION NODE (GBIF.ES)
ESTRELLA ROBLES AND
THE PEOPLE AT THE DEPARTMENT OF
ZOOLOGY AND ECOLOGY (UNZYEC),
THE UNIVERSITY OF NAVARRA
No bytes were seriously harmed while preparing this PPTX.
(And copies exist of those who actullay were anyway).
This file used 328 watt-hours, offset by forfeiting Cantonese roast duck for far too long.
All images, plots and analyses by the authors except where otherwise noted
PPTX © 2012 A.H. Ariño, University of Navarra
www.unav.es/unzyec
BIDDSAT, WWW.UNAV.ES/UNZYEC/MZNA/BIDDSAT/, WWW.NCBI.NLM.NIH.GOV/PUBMED/22730433. SOON IN A PDF NEARYOU.

More Related Content

Similar to Highlighting Fitness-For-Use of Published Biodiversity Data

International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...Denodo
 
Improving Access to Research Data: What does changing legislation mean for y...
Improving Access to Research Data:  What does changing legislation mean for y...Improving Access to Research Data:  What does changing legislation mean for y...
Improving Access to Research Data: What does changing legislation mean for y...Marieke Guy
 
ISEOR-AoM QDAS Workshop
ISEOR-AoM QDAS WorkshopISEOR-AoM QDAS Workshop
ISEOR-AoM QDAS Workshopjbaugh
 
Towards the Digital Research Enterprise
Towards the Digital Research EnterpriseTowards the Digital Research Enterprise
Towards the Digital Research EnterprisePhilip Bourne
 
Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...e-ROSA
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
Open Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWOOpen Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWOOpenAccessBelgium
 
BioVis Meetup @ IEEE VIS 2015
BioVis Meetup @ IEEE VIS 2015BioVis Meetup @ IEEE VIS 2015
BioVis Meetup @ IEEE VIS 2015Nils Gehlenborg
 
Guideline based CDSS for COVID-19
Guideline based CDSS for COVID-19Guideline based CDSS for COVID-19
Guideline based CDSS for COVID-19openEHR-Japan
 
Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"
Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"
Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"Dr. Ravi Dhar
 
Theories of change to guide future service innovations
Theories of change to guide future service innovationsTheories of change to guide future service innovations
Theories of change to guide future service innovationsBoris Divjak
 
디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로
디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로
디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로Yoon Sup Choi
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
order_token_gs_598_ppt_1858317164795720214.pptx
order_token_gs_598_ppt_1858317164795720214.pptxorder_token_gs_598_ppt_1858317164795720214.pptx
order_token_gs_598_ppt_1858317164795720214.pptxVijayx1
 
GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...
GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...
GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...nikolaygrigoriev
 
Jepson biofresh_bih2013
Jepson biofresh_bih2013Jepson biofresh_bih2013
Jepson biofresh_bih2013Paul Jepson
 

Similar to Highlighting Fitness-For-Use of Published Biodiversity Data (20)

International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
 
Improving Access to Research Data: What does changing legislation mean for y...
Improving Access to Research Data:  What does changing legislation mean for y...Improving Access to Research Data:  What does changing legislation mean for y...
Improving Access to Research Data: What does changing legislation mean for y...
 
ISEOR-AoM QDAS Workshop
ISEOR-AoM QDAS WorkshopISEOR-AoM QDAS Workshop
ISEOR-AoM QDAS Workshop
 
Towards the Digital Research Enterprise
Towards the Digital Research EnterpriseTowards the Digital Research Enterprise
Towards the Digital Research Enterprise
 
Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Open Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWOOpen Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWO
 
BioVis Meetup @ IEEE VIS 2015
BioVis Meetup @ IEEE VIS 2015BioVis Meetup @ IEEE VIS 2015
BioVis Meetup @ IEEE VIS 2015
 
Guideline based CDSS for COVID-19
Guideline based CDSS for COVID-19Guideline based CDSS for COVID-19
Guideline based CDSS for COVID-19
 
Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"
Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"
Dr. Ravi Dhar on "Technology Transfer: an overview- 2014"
 
Theories of change to guide future service innovations
Theories of change to guide future service innovationsTheories of change to guide future service innovations
Theories of change to guide future service innovations
 
디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로
디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로
디지털 헬스케어를 어떻게 구현할 것인가: 국내 스타트업 업계를 중심으로
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
Nigel open 4 business
Nigel open 4 businessNigel open 4 business
Nigel open 4 business
 
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
order_token_gs_598_ppt_1858317164795720214.pptx
order_token_gs_598_ppt_1858317164795720214.pptxorder_token_gs_598_ppt_1858317164795720214.pptx
order_token_gs_598_ppt_1858317164795720214.pptx
 
GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...
GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...
GERO RESEARCH - THE ACCURACY OF DIFFERENT ACTIVITY TRACKERS IN ESTIMATING STE...
 
Jepson biofresh_bih2013
Jepson biofresh_bih2013Jepson biofresh_bih2013
Jepson biofresh_bih2013
 

More from Javier Otegui

CLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-JavierCLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-JavierJavier Otegui
 
CLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-JavierCLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-JavierJavier Otegui
 
ASSESSMENTS-Taxonomic-Assessments-Javier
ASSESSMENTS-Taxonomic-Assessments-JavierASSESSMENTS-Taxonomic-Assessments-Javier
ASSESSMENTS-Taxonomic-Assessments-JavierJavier Otegui
 
ASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-JavierASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-JavierJavier Otegui
 
Haciendo Ciencia en Abierto / Making Open Science
Haciendo Ciencia en Abierto / Making Open ScienceHaciendo Ciencia en Abierto / Making Open Science
Haciendo Ciencia en Abierto / Making Open ScienceJavier Otegui
 
Linking systems to improve data quality
Linking systems to improve data qualityLinking systems to improve data quality
Linking systems to improve data qualityJavier Otegui
 
Biodibertsitatea... eta niri zer axola?
Biodibertsitatea... eta niri zer axola?Biodibertsitatea... eta niri zer axola?
Biodibertsitatea... eta niri zer axola?Javier Otegui
 

More from Javier Otegui (7)

CLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-JavierCLEANING-Error-Flagging-Javier
CLEANING-Error-Flagging-Javier
 
CLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-JavierCLEANING-Data-Transformation-Javier
CLEANING-Data-Transformation-Javier
 
ASSESSMENTS-Taxonomic-Assessments-Javier
ASSESSMENTS-Taxonomic-Assessments-JavierASSESSMENTS-Taxonomic-Assessments-Javier
ASSESSMENTS-Taxonomic-Assessments-Javier
 
ASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-JavierASSESSMENTS-Primary-Data-Precision-Javier
ASSESSMENTS-Primary-Data-Precision-Javier
 
Haciendo Ciencia en Abierto / Making Open Science
Haciendo Ciencia en Abierto / Making Open ScienceHaciendo Ciencia en Abierto / Making Open Science
Haciendo Ciencia en Abierto / Making Open Science
 
Linking systems to improve data quality
Linking systems to improve data qualityLinking systems to improve data quality
Linking systems to improve data quality
 
Biodibertsitatea... eta niri zer axola?
Biodibertsitatea... eta niri zer axola?Biodibertsitatea... eta niri zer axola?
Biodibertsitatea... eta niri zer axola?
 

Recently uploaded

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 

Recently uploaded (20)

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Highlighting Fitness-For-Use of Published Biodiversity Data

  • 1. HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHED BIODIVERSITY DATA Javier Otegui, Arturo H. Ariño University of Navarra
  • 2.
  • 3. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 PUBLISHING DATA • Data published in papers • Data papers published • Data published
  • 4. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 WHAT, WHERE, WHEN OUR TARGET DATA Primary Biodiversity Data Record PBR
  • 5. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 WHAT, WHERE, WHEN PBR Megaptera novaehollandiae Adult female, live Off North Truro, MA, USA 42.101 N, 70.169 W 2010.09.29 21:47 GMT Arturo H. Ariño Aboard Dolphin VI Canon Eos 450D, 200 mm lens un
  • 6. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 GBIF GEOREFERENCED DATA 237.348.923 animal data records by Oct. 2012 (total georeferenced records: 327.048.532)
  • 7. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 THE CASE FOR DIRECT DATA PUBLICATION • Access to massive data increasingly commonalized: GBIF • Spectrum of possible uses increasing: new science, new paradigms • Data-Intensive Science – Reliance on good data: Opportunity for discovery – Reliance on bad data: Risk of “undiscoveries”
  • 8. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 WHAT, WHERE, WHEN PBR Nautilus pompilus 4 specimens Off Palau Islands 1921 Legit :unknown Det.: J.A. Salinas Collection: JDR at MZNA un
  • 9. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 BARRIERS TO DIRECT DATA PUBLICATION • Data availability • Data sharing mechanisms • Data publication incentives • Data quality
  • 10. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 DATA AVAILABILITY INCREASE GBIF, October 2012
  • 11. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Ariño, 2010. Biodiv. Informat. 7: 15-26
  • 12. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 ESTIMATED DATA IN MISSING COLLECTIONS BCI GBIF GSAP –DNHC survey unknown est. CI Cexp = 8.37K Nexp = 2.01G
  • 13. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 WHAT, WHERE, WHEN PBR Saccharomyces cerevisae TCP1-beta Cask in wreck of ARGO 2 km E of Akta Képhalos Stratum 2000 BC Legit : Homer S. Det.: LoScanSQ-X Collection: Museum of Beer History (MBH) un ??
  • 14. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 THE TRUST PARADOX • Papers are generally more trusted than raw/downloadable data – Papers have gone through peer review • Published data have common sources: – Experiments, – Observations, – Digitizations • Raw data in published papers can go unreviewed – Review focuses on soundness, methods, conclusions – Data assumed to be true & correct • Direct publication of data, in fact, should facilitate revision – Enforcing rules – Filtering – Pattern detection
  • 15. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 FITNESS-FOR-USE • FFU defines whether data can be used for a specific purpose • Useful compromise for publishing data • FFU not equal to data quality Quality Fitness-for-use Intrinsic to data Depends on intended use Conceptual Pragmatical Good quality predicting good FFU Good FFU not predict good quality
  • 16. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 FFU ASSESSMENT • In 2006 AHA started analyzing our own DB for FFU, creating pattern-detection visualizations – First reported in TDWG-2006 (St. Louis) • In 2008 we started to analyze raw & processed GBIF data (2.4G records by 2012) (JOT’s thesis) – Building on works by Chapman, Yesson, Wieckzorek, etc., changing scope and perspective • Started producing reports in 2009, 2010 • Teamed up with GBIF-Sec, 2011 • Created BIDDSAT, 2012
  • 17. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
  • 18. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Latitude Longitude 30.37, -87.1765 48.5584, -123.463 42.3487, -123.78 41.7866, -100.061 -73.9071, 42.7028 38.8749, -104.88 44.3964, -75.6668 42.1927, -89.106 32.693, -79.9606 44.2124, -88.42 41.6637, -81.3782 39.6992, -121.778 46.13, -72.7196 38.7231, -77.0674 27.7349, -82.6479 36.0852, -121.616 39.0901, -77.5203 -83.1662, 43.0622 41.2956, -74.5956 45.5146, -73.8131 42.0755, -122.759 41.1047, -81.4944 42.4792, -89.0333 40.6956, -74.8913 ... - + Otegui & Ariño, 2009. Proceedings of the TDWG 2009 Annual Conference, Montpellier, FR
  • 19. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 DATA INDEXING Provider B Provider A Provider C Provider D GBIF index ?
  • 20. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 DATA QUERYING Provider B Provider A Provider C Provider D GBIF index ?
  • 21. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Análisis detallado de GBIF Detailed assessment of GBIF Bad data Good data
  • 22. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Otegui , Ariño, Gaiji & Chavan, in press
  • 23. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 CONTROL AND FFU TOOLS AT INDEXING Gaiji et al., 22011-2012 – EMBARGOED DECEMBER 2012
  • 24. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 CORRECTING MECHANISMS: EXAMPLES • GBIF has implemented many georeferencing correction algorithms, such as e.g. coordinate/country match • This removes many bogus data points, for example redressing reversed lat/long when serving data • Still, original data need to be corrected: GBIF cannot alter original data (only tag them) David Remsen, TDWG-2011. In ViBRANT, http://vbrant.eu/content/gbif-integration
  • 25. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 2010.04.28 2011.14.09 ErrCode: 10 10: Fields “month” and “day” probably swapped FILTER MECHANISMS: EXAMPLE • Original data unchanged • Index entry corrected • Error entry generated in issue log
  • 26. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 - + Otegui et al., 2012 FILTERS CANNOT GET ALL
  • 27. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Otegui , Ariño, Gaiji & Chavan, in press FILTERS CANNOT SOLVE ALL
  • 28. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Modified from Otegui et al. In press All GBIF data Some date element wrong Some date element missing All date elements missing
  • 29. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 BIDDSAT • Tool to detect space-time and other patterns • Applicable to data publishers sharing data through GBIF • Uses tailored visualizations • http://www.unav.es/unzyec/mzna/biddsat/ • Open source: https://github.com/jotegui/BIDDSAT • Bioinformatics, DOI: 10.1093/bioinformatics/BTS359
  • 30. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012
  • 31. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 0 100 Percentage of completeness Numberofcollections 015304560 Source: BIDDSAT DATA COMPLETNESS
  • 32. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 0 100 Percentage of completeness Numberofcollections 015304560 • Wrong implementation of exchange standards (DwC) – solvable • Data loss – not solvable • Limited room for improvement Fuente: BIDDSAT DATA COMPLETNESS
  • 33. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Data Provider LEONIDAS, Resource SHIELD GBIF 2008/05 Version
  • 34. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Data Provider LEONIDAS, Resource SHIELD GBIF 2009/09 Version
  • 35. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 1/Jan31/Dec 1/Mar 1/Feb 1/Apr 1/May 1/Jun 1/Jul 1/Aug 1/Sep 1/Oct 1/Nov 1/Dec Fall Winter Spring Summer 1750 Year 2012 - + Cronhorogram. Introduced by Ariño & Otegui, 2008, TDWG
  • 36. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Source: BIDDSAT
  • 37. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 - + Hebdogram. Iintroduced by Ariño & Otegui, 2008. Proceedings of TDWG
  • 38. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Ariño, Otegui & Robles, 2009 Provider 180 All datasets
  • 39. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 2008/05 Data Provider Codename: BORODIN
  • 40. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Data Provider Codename: BORODIN 2009/092008/05
  • 41. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Treemap by Google Charts API on authors’ data
  • 42. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Fungi INDEX TAXONOMY Gaiji et al. in press
  • 43. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 • Patchy data publishing… also in papers • Opportunistic behavior: “Low-hanging fruit” • Data can (and will) evolve • The human factor still counts PATTERNS OF PATTERNS
  • 44. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 0 5 10 15 20 25 0 5 10 15 Clase de distancia Clasedeimprecisión 0 10000 20000 30000 40000 Chordata Orthoptera Lepidoptera Hymenoptera Diptera Coleoptera Thysanoptera Collembola Acari Polychaeta Oligochaeta Nematoda Georreferenciado Localidad sin coordenadas Sin localidad PAPER WOES: PBR FROM LITERATURE
  • 45. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 Publisher: Swedish Publisher: German Publisher: French Publisher: British Publisher: Norwegian A MATTER OF CONVENIENCE Otegui, Robles & Ariño, 2009. eBiosphere, London, UK. Publisher: Parisien Publisher: Spanish
  • 46. JAVIER OTEGUI & ARTURO H. ARIÑO: HIGHLIGHTING FITNESS-FOR-USE OF PUBLISHEDBIODIVERSITY DATA. TDWG2012, BEIJING, 22-X-2012 CLASSIFICATION ACCORDING TO: Ariño, Otegui & Robles, 2009 PROVIDERSP2K GBIF RECORDS SAMPLE EVOLUTIONARY DATA
  • 47. T H E E N D THANK YOU WITH SPECIAL THANKS TO: VISHWAS CHAVAN, SAMY GAIJI, ANDREA HAHN, TIM ROBERTSON, AND THE DIGIT SCIENCE SUBCOMITEE AND THE GSAP-NHC AND CNA TASK GROUPS THE GBIF SECRETARIAT (COPENHAGUEN) AND THE SPANISH COORDINATION NODE (GBIF.ES) ESTRELLA ROBLES AND THE PEOPLE AT THE DEPARTMENT OF ZOOLOGY AND ECOLOGY (UNZYEC), THE UNIVERSITY OF NAVARRA No bytes were seriously harmed while preparing this PPTX. (And copies exist of those who actullay were anyway). This file used 328 watt-hours, offset by forfeiting Cantonese roast duck for far too long. All images, plots and analyses by the authors except where otherwise noted PPTX © 2012 A.H. Ariño, University of Navarra www.unav.es/unzyec BIDDSAT, WWW.UNAV.ES/UNZYEC/MZNA/BIDDSAT/, WWW.NCBI.NLM.NIH.GOV/PUBMED/22730433. SOON IN A PDF NEARYOU.