SWETS – Be Open!
Donat Agosti
Plazi, Bern
23.6.2014, Universität Bern
Open Access and the Future of
(Biodiversity-) Resear...
Future of Biodiversity Research
Data Mining
Background
Rio Earth Summit 1992
Background
Biodiversity Crisis
Background
Indicators
Indicators as powerful widely understood tool
Reed Elsevier, Annual Reports and Financial Statements 2013
http://www.reede...
Biodiversity research and conservation planning0
Multi-Taxon Specimen Data for Setting Conservation Priorities
Source: Kre...
Politics
IPBES
Intergovernmental Platform on Biodiversiy & Ecosystem Services
EU-Political and Science Decision to support IPBES
EU-BON
European Biodiversity Observation Network
EU-FP7 funded
A EU decision in support of environmental policy making
EU-BON
• To build a European Biodiversity Observation
Network
• Me...
The basic science question
Hardisty, Nature 502, 171 (2013)
BUT: predictive ecology has substantial data needs
Harfoot, BI...
Modeling life on earth
Can we do it?
A realistic goal?
Communication
EU-BON a child of GEOSS
Global Earth Observation System of Systems
Open Access to remote sensing data from a...
The impact of remote sensing data on understanding biodiversity
With sophisticated technologies we can identify different ...
Access to data
…we could create a link to the related data in our biodiversity literature
http://video.ted.com/talk/podcas...
Names as information tags in life sciences
Names
Characteristics
Publications
GenesCollections
Specimens
Distribution
Treatments as bits of information
Treatment: sections of publications documenting the
features or distribution of a relate...
Treatments as part of publications
DNA
Specimens Observations
Institution
Pharmacology/epidemiology
Publication
Treatment
...
Text
(e.g. PDF)
<tax:treatment>
<tax:nomenclature>
<tax:name>
<tax:xid source="HNS" identifier="193329"/>
<tax:xmldata>
<d...
Automatic extraction and visualization of treatment content
Countries
Madagascar
Anochetus grandidieri Forel
Datamining of treatments
Pseudomyrmex ants and Vachellia ant-acacias
are a classic example of mutualism in biology.
alleni...
The Plazi approach
From treatment
to treatment repository
The Plazi approach
Agosti, D., W. Egloff. 2009. Taxonomic inform...
The Plazi approach
Plazi workflow
Plazi
SRS
find scan «OCR» markup store
Analyzing a large corpus of publications: Plazi repository
14,590 specimens
8900 plottable specimens from
1138 unique loca...
Analyzing a journal: Journal of Hymenoptera Research
5170 specimens
4062 plottable specimens from
1138 unique locations
The biodiversity community
Plants
3,400 Herbaria worldwide
10,000 Associate curators and specialists
350,000,000 specimens...
The biodiversity community
200,000,000+ printed pages
1,900,000 species described
20,000,000+ species treatments
17,000 ne...
The taxonomy publishing world
12,000 Taxonomic Papers on 42,000 Spiders
Since 1757
Publications widely scattered
Source: J...
Why is the system broken?
WHY
does it NOTwork?
Access to data limited
…we cannotcreate a link to the related biodiversity data
http://video.ted.com/talk/podcast/2013G/No...
Communication
200,000,000+ printed pages
1,900,000 species described
20,000,000+ species treatments
17,000 new species per...
Why is the system broken?
Access to a corpus
NOT
single PDF, data point
Why is the system broken?
Access to content
NOT
representations
Why is the system broken?
Legal issues
Technical issues
Social issues
Legal issues: Copyright
Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including curre...
Legal issues: licences
Legal licences for 1000+ journals
cannot be tracked by scientists
Technical issues: Digtial Object Identifiers
DOI
Missing
CrossRef an exclusive club
Technical issues: Journal publishing workflow
Journal publishing workflows:
From structured data to unstructured text
Technical issues: Content extraction
Conversion of legacy literature prohibitively expensive
Mark up costs for markup incl...
Social issues: data sharing
The misunderstood attribution
Why is the system broken?
WHY
NOT
make it work?
European Open Biodiversity Knowledge Management System
European Open
Biodiversity Knowledge
Management System
European Uni...
European Open Biodiversity Knowledge Management System
Prepare the ground for the creation of
a system for intelligent man...
Legal issues: Copyright: The Blue List
The Blue List
elements of taxonomic information that are not subject to copyright
P...
Legal issues: Copyright: Legal exceptions for research
Legal exceptions for research
Egloff W, Patterson D, Agosti D, Hage...
Legal issues: Copyright: Open Access
Open Access
Legal issues: Copyright: Creative Commons Licence
Technical issues: DOI
Persistent identifiers for data objects and physical objects
Linking data using agreed vocabularies
...
Technical issues: DOI
Biodiversity Literature Repository @ Zenodo
public repository for legacy literature using Data Cite ...
Technical issues: semantic enhanced publishing
Semantic enhanced publishing
Taxpub JATS
Use DOI as widely as possible
Technical issues: machine access
(well documented) API
Technical issues: semantic publishing
Advanced publishing and dissemination
Form based
Semantnic enhanced TaxPub JATS base...
Social issues: Bouchout Declaration
http://bouchoutdeclaration.org/ launched June 12, 2014
10 Principles
Free and open use...
Social issues: Bouchout Declaration
Technical issues: business plan
Conclusions
If we want to conserve the
world’s biodiversity, we
need one stop open
shopping for biodiversity
research resu...
Conclusions
We scientists are getting our
acts together.
Conclusions
Will the publishers too?
Thank you!
Donat Agosti
agosti@plazi.org
http://plazi.org
Upcoming SlideShare
Loading in...5
×

20140623 swets agosti_final

578
-1

Published on

Open Access and the Future of (Biodiversity-) Research
"SWETS Be Open" Event Bern, Switzerland, June 23, 2013;
Donat Agosti, Plazi

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
578
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

20140623 swets agosti_final

  1. 1. SWETS – Be Open! Donat Agosti Plazi, Bern 23.6.2014, Universität Bern Open Access and the Future of (Biodiversity-) Research
  2. 2. Future of Biodiversity Research Data Mining
  3. 3. Background Rio Earth Summit 1992
  4. 4. Background Biodiversity Crisis
  5. 5. Background Indicators
  6. 6. Indicators as powerful widely understood tool Reed Elsevier, Annual Reports and Financial Statements 2013 http://www.reedelsevier.com/investorcentre/reports%202007/Documents/2013/reed_elsevier_ar_2013.pdf 39% profit
  7. 7. Biodiversity research and conservation planning0 Multi-Taxon Specimen Data for Setting Conservation Priorities Source: Kremen C, et al. 2008. Science 320: 222-226. Consensus conservation priority areas and actual and proposed protected areas 2003: Madagascar announces it will triple protected land to 10% coverage
  8. 8. Politics IPBES Intergovernmental Platform on Biodiversiy & Ecosystem Services
  9. 9. EU-Political and Science Decision to support IPBES EU-BON European Biodiversity Observation Network EU-FP7 funded
  10. 10. A EU decision in support of environmental policy making EU-BON • To build a European Biodiversity Observation Network • Measure and predict change over space and time • Combine Remote Sensing data and on the ground observation data in predictive modeling • Tools to inform decision makers (EU-politicians)
  11. 11. The basic science question Hardisty, Nature 502, 171 (2013) BUT: predictive ecology has substantial data needs Harfoot, BIH2013, Rome, 2013 What is the future of the biological world? Imagine if we could: …Predict community level dynamics of ecosystems at scales from local to global, based on the ecology and biology of all individual organisms
  12. 12. Modeling life on earth Can we do it? A realistic goal?
  13. 13. Communication EU-BON a child of GEOSS Global Earth Observation System of Systems Open Access to remote sensing data from all over the world
  14. 14. The impact of remote sensing data on understanding biodiversity With sophisticated technologies we can identify different trees in the Amazon… http://video.ted.com/talk/podcast/2013G/None/GregAsner_2013G-480p.mp4
  15. 15. Access to data …we could create a link to the related data in our biodiversity literature http://video.ted.com/talk/podcast/2013G/None/GregAsner_2013G-480p.mp4
  16. 16. Names as information tags in life sciences Names Characteristics Publications GenesCollections Specimens Distribution
  17. 17. Treatments as bits of information Treatment: sections of publications documenting the features or distribution of a related group of organisms (called a “taxon”, plural “taxa”) in ways adhering to highly formalized conventions. (Catapano, 2010) Formica obsoleta, Linnaeus 1758: 580
  18. 18. Treatments as part of publications DNA Specimens Observations Institution Pharmacology/epidemiology Publication Treatment Treatment Treatment Table Appendix Biology/ecology Reference to other biota Publication Treatment Publication
  19. 19. Text (e.g. PDF) <tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> Bohn & Verhaagh <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0. 1.30, SI 137, PW 0.73, ML 0.38. Mandible oute to a sharp apical tooth, the apex parallel to (Holotype with material in mandibles, so mand $ described below from paratypes.) Median cly .... </treatment> Enhanced and linked text (XML: Taxonx / Taxpub JATS) Plazi: Semantic enhanced treatments
  20. 20. Automatic extraction and visualization of treatment content Countries Madagascar Anochetus grandidieri Forel
  21. 21. Datamining of treatments Pseudomyrmex ants and Vachellia ant-acacias are a classic example of mutualism in biology. allenii melanoceras ruddiae chiapensis collinsii cookii cornigera globulifera hindsii janzenii mayana sphaerocephala boopis flavicornis hesperius ita janzeni kuenckeli mixtecus nigrocinctus nigropilosus opaciceps particeps peperi reconditus satanicus simulans spinicola subtilissimus veneficus ferrugineus gentlei gracilis Transbiotic link network Associated species linked through references in taxonomic treatments Acacia-ant species: Pseudomyrmex gracili Treatment: original description Treatment: redescription Associated ant-acacia: Acacia gentlei Ants Plants Photocredits: Alex Wild Treatment Treatments linked through citations
  22. 22. The Plazi approach From treatment to treatment repository The Plazi approach Agosti, D., W. Egloff. 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53. doi:10.1186/1756-0500-2-53
  23. 23. The Plazi approach Plazi workflow Plazi SRS find scan «OCR» markup store
  24. 24. Analyzing a large corpus of publications: Plazi repository 14,590 specimens 8900 plottable specimens from 1138 unique locations
  25. 25. Analyzing a journal: Journal of Hymenoptera Research 5170 specimens 4062 plottable specimens from 1138 unique locations
  26. 26. The biodiversity community Plants 3,400 Herbaria worldwide 10,000 Associate curators and specialists 350,000,000 specimens in collections 180,000,000 specimens digitized 2,000,000,000 specimens including animals
  27. 27. The biodiversity community 200,000,000+ printed pages 1,900,000 species described 20,000,000+ species treatments 17,000 new species per year
  28. 28. The taxonomy publishing world 12,000 Taxonomic Papers on 42,000 Spiders Since 1757 Publications widely scattered Source: Jeremy Miller
  29. 29. Why is the system broken? WHY does it NOTwork?
  30. 30. Access to data limited …we cannotcreate a link to the related biodiversity data http://video.ted.com/talk/podcast/2013G/None/GregAsner_2013G-480p.mp4
  31. 31. Communication 200,000,000+ printed pages 1,900,000 species described 20,000,000+ species treatments 17,000 new species per year BUT: The data are hidden Incomplete digitization Publications are not semantically enhanced Collections are incomplete Data is not linked Most data are not open
  32. 32. Why is the system broken? Access to a corpus NOT single PDF, data point
  33. 33. Why is the system broken? Access to content NOT representations
  34. 34. Why is the system broken? Legal issues Technical issues Social issues
  35. 35. Legal issues: Copyright Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)
  36. 36. Legal issues: licences Legal licences for 1000+ journals cannot be tracked by scientists
  37. 37. Technical issues: Digtial Object Identifiers DOI Missing CrossRef an exclusive club
  38. 38. Technical issues: Journal publishing workflow Journal publishing workflows: From structured data to unstructured text
  39. 39. Technical issues: Content extraction Conversion of legacy literature prohibitively expensive Mark up costs for markup including materials citations 0 5 10 15 20 25 30 35 40 0 100 200 300 400 500 600 700 Pages Minutes Source: Spider Pilot, Jeremy Miller Plazi SRS find scan «OCR» markup store Average: 6 min / page complete OCR: 0.80 EUR /page vendor
  40. 40. Social issues: data sharing The misunderstood attribution
  41. 41. Why is the system broken? WHY NOT make it work?
  42. 42. European Open Biodiversity Knowledge Management System European Open Biodiversity Knowledge Management System European Union FP7 funded project
  43. 43. European Open Biodiversity Knowledge Management System Prepare the ground for the creation of a system for intelligent management of biodiversity knowledge which will improve the present system of taxonomic literature.
  44. 44. Legal issues: Copyright: The Blue List The Blue List elements of taxonomic information that are not subject to copyright Patterson, D. J., Egloff, W., Agosti, D., Eades, D., Franz, N., Hagedorn, G., Rees, J. A. and Remsen, D. P. 2014. Scientific names of organisms: attribution, rights, and licensing BMC Research Notes 7:79 doi:10.1186/1756-0500-7-79.
  45. 45. Legal issues: Copyright: Legal exceptions for research Legal exceptions for research Egloff W, Patterson D, Agosti D, Hagedorn G 2014. Open exchange of scientific knowledge and European copyright: The case of biodiversity information. ZooKeys 414, 109-135. DOI: 10.3897/zookeys.414.7717
  46. 46. Legal issues: Copyright: Open Access Open Access
  47. 47. Legal issues: Copyright: Creative Commons Licence
  48. 48. Technical issues: DOI Persistent identifiers for data objects and physical objects Linking data using agreed vocabularies http://wiki.pro-ibiosphere.eu/wiki/Best_practices_for_stable_URIs
  49. 49. Technical issues: DOI Biodiversity Literature Repository @ Zenodo public repository for legacy literature using Data Cite DOI CrossRef to cite (Zenodo) Data Cite DOI?!
  50. 50. Technical issues: semantic enhanced publishing Semantic enhanced publishing Taxpub JATS Use DOI as widely as possible
  51. 51. Technical issues: machine access (well documented) API
  52. 52. Technical issues: semantic publishing Advanced publishing and dissemination Form based Semantnic enhanced TaxPub JATS based publishing
  53. 53. Social issues: Bouchout Declaration http://bouchoutdeclaration.org/ launched June 12, 2014 10 Principles Free and open use of digital resources Use of persistent identifiers and linking of data Policy developments Developing sustainable business models
  54. 54. Social issues: Bouchout Declaration
  55. 55. Technical issues: business plan
  56. 56. Conclusions If we want to conserve the world’s biodiversity, we need one stop open shopping for biodiversity research results.
  57. 57. Conclusions We scientists are getting our acts together.
  58. 58. Conclusions Will the publishers too?
  59. 59. Thank you! Donat Agosti agosti@plazi.org http://plazi.org
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×