The Future of Scientific Publishing Donat Agosti  (Plazi, Bern) 21 January 2011 Paris
I don‘t know the future,  but I have a dream…
Immersing in the knowledge
I want to ask a publication a question, not the author telling me what I have to read.
I want to find out  how many and which species are there?  how are they related?  do they disappear? how are they distribu...
I want to find out  how many and which species there are  how are they related  do they disappear Other people have differ...
<ul><li>An example from the Neurocommons text mining pilot: </li></ul><ul><li>PubMed abstracts: > 16,000,000 </li></ul><ul...
 
In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to...
It will open up scientific literature for data mining “ protein-protein interaction networks” John Wilbanks,    Neurocommons
<ul><li>An example from the taxonomy text mining pilot: </li></ul><ul><li>Every year: > 17,000 new species described / yea...
1996 Conservation, Phylogeny, Systematics, Curiosity, Aesthetics, Fascination
2011 Experience, Frustration, Wonder, Excitment, Satisfaction, Determination
Modeling taxonomic literature: TaxonX Taxpub NLM DTD Plazi
<ul><li>- Get LSID  from Hymenoptera Name Server for names; ZooBank? </li></ul><ul><li>Add new names  </li></ul>- Get bibl...
The semantically enhanced treatments, extracted, stored on Plazi.org, and served in a human readable form, are linked to t...
Plazi Search and Retrieval Server: Access to data TAPIR, SPM You You You human machine
The conversion comes at a cost, even though GoldenGate and other editors exist
Time per minute to produce clean OCR using ABBYY; publications in chronological order Production metrics to measure effort...
How to mark up large body of legacy publications? Inhouse? Build / use commercial services? Use the community, e.g. volunt...
Training and demos...
Avoid it
Prospective publications: Zookeys / Phytokeys
Semantic enhancements to published texts
2036 ?
Why do we publish?
Public funded research
Contribute to the welfare of the nations…
Dissemination
Access
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant...
Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of ...
The Biodiversity Heritage Library is currently digitizing and make accessible >100 million pages, most of them out of copy...
What is a publication from public funded science?
 
Open Access
What is a scientific publication? Print, journal, article, treatment, public funding, pdf, xml Tool to disseminate scienti...
Why do we publish the way we publish?
What kind of publications serve our needs?
IPBES
Access
Beyond the PDF
Access to what?
Scratchpad, EOL page, Wikipage, species page
Treatment
Treatments come with a lot of overhead
Genus Diagnosis Notes Biology Distribution Key to sp. Species  descriptions The structure of a systematics publication Spe...
Treatments come with a lot of overhead Treatments are highly structured
Genus Diagnosis Notes Biology Distribution Key to sp. Species  descriptions The structure of a systematics publication Spe...
Treatments come with a lot of overhead Treatments are highly structured Content ist defined
Treatments come with a lot of overhead Treatments are highly structured   Content ist defined  XML can define it
This can also be applied to entire sections of text, such as the descriptions of a species and its parts. <tax:treatment> ...
Treatments come with a lot of overhead treatments are highly structured   Content ist defined XML defines them The questio...
Mark-up of legacy publications
$$$$$$$$$$$$$$$$$
Prospective semantic mark-up and linking to external sources is the future
Treatment repository + external resources
BHL-Modern
The future is writable.
Happy Birthday! January 15, 2001
What is a scientific publication? Wikipedia entry as a publication?
Quality control
What is a scientific publication? Centrifugal versus centripetal forces or  are we attractive enough?
Continuity
$$$$$$$
http://plazi.org Thank you very much! Donat Agosti [email_address]
Upcoming SlideShare
Loading in …5
×

Setting the Scene for ViBRANT – Strategy, Philosophy and Communication

1,165
-1

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,165
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Notes: Add in Plazi and the idea of the treatment server
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Building a DNA – go to the library and find all about the DNA
  • Reuse of information
  • Reuse of information
  • Reuse of information
  • Measuring and monitoring biodiversity
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Part of scientific discourse Records Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Part of scientific discourse Records Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Acces is enough?
  • Acces is enough?
  • Acces is enough?
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases) From print to pdf to content
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • What is context? Can we afford to create and maintain context? If you bet, what is the limiting factor for out future? Where is what hosted? Who is paying for it? Do we need cross-sectoral financing? What is the role of the natural history museums? What is after the infotainment wave? Intelligent customers as opposed to consumers?
  • Where is the border of science and where not?
  • Part of scientific discourse Records Public funded science – what we talk about today – not military or industrial funded science has the opposite business model. It‘s funding includes the creation of a product (publication). Commerical publishing creates its resources from selling publications. The task of the scientific community is to disseminate its findings as widely as possible. Therefore, barriers linked to copyright need be avoided: passwords, pay per view, etc. Find information is the paramount act in science, and thus every impediment for the discovery must be removed. This does not exclude that we need a working business model. But it does also not need that we become opportunists and led control over our data slip (see context, bibliographic data, other databases)
  • Setting the Scene for ViBRANT – Strategy, Philosophy and Communication

    1. 1. The Future of Scientific Publishing Donat Agosti (Plazi, Bern) 21 January 2011 Paris
    2. 2. I don‘t know the future, but I have a dream…
    3. 3. Immersing in the knowledge
    4. 4. I want to ask a publication a question, not the author telling me what I have to read.
    5. 5. I want to find out how many and which species are there? how are they related? do they disappear? how are they distributed?
    6. 6. I want to find out how many and which species there are how are they related do they disappear Other people have different interests
    7. 7. <ul><li>An example from the Neurocommons text mining pilot: </li></ul><ul><li>PubMed abstracts: > 16,000,000 </li></ul><ul><li>CNS classified abstracts: 874,727 </li></ul><ul><li>text mining recognized: 368,688 </li></ul><ul><li>text mining processed: 94,381 </li></ul><ul><li>extracted graph of 30,000+ relationships and 5,500 genes and proteins </li></ul>“ protein-protein interaction networks” John Wilbanks, Neurocommons
    8. 9. In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other: “ protein-protein interaction networks” John Wilbanks, Neurocommons 27,266 papers 4,563 papers 41,985 papers 10,365 papers 128,437 papers
    9. 10. It will open up scientific literature for data mining “ protein-protein interaction networks” John Wilbanks, Neurocommons
    10. 11. <ul><li>An example from the taxonomy text mining pilot: </li></ul><ul><li>Every year: > 17,000 new species described / year </li></ul><ul><li>Every year: >100,000 species redescribed /year </li></ul><ul><li>Total journals: >2,000 with taxonomic content </li></ul><ul><li>Total: 1,900,000 species described </li></ul><ul><li>Total: >20,000,000 treatments </li></ul><ul><li>text mining processed: 0 </li></ul><ul><li>extracted graph of 0 species 0 relationships </li></ul>Taxon mining project
    11. 12. 1996 Conservation, Phylogeny, Systematics, Curiosity, Aesthetics, Fascination
    12. 13. 2011 Experience, Frustration, Wonder, Excitment, Satisfaction, Determination
    13. 14. Modeling taxonomic literature: TaxonX Taxpub NLM DTD Plazi
    14. 15. <ul><li>- Get LSID from Hymenoptera Name Server for names; ZooBank? </li></ul><ul><li>Add new names </li></ul>- Get bibliographic Metadata from HNS (MODS) - Get bibliographic Guids from bioguid (or EDIT?) - Get geographic long/lat from geonames.org Plazi workflow: GoldenGate mark up as an example <ul><li>Get Guids for </li></ul><ul><ul><li>CBOL </li></ul></ul><ul><ul><li>NCBI </li></ul></ul><ul><ul><li>specimen </li></ul></ul><ul><ul><li>images </li></ul></ul><ul><ul><li>..... </li></ul></ul>
    15. 16. The semantically enhanced treatments, extracted, stored on Plazi.org, and served in a human readable form, are linked to the underlying data: Fisher & Smith, 2008, PLoS ONE.
    16. 17. Plazi Search and Retrieval Server: Access to data TAPIR, SPM You You You human machine
    17. 18. The conversion comes at a cost, even though GoldenGate and other editors exist
    18. 19. Time per minute to produce clean OCR using ABBYY; publications in chronological order Production metrics to measure effort and compare various approaches and alogrithm
    19. 20. How to mark up large body of legacy publications? Inhouse? Build / use commercial services? Use the community, e.g. volunteers? Activation energy Gutenberg Semantic Web Cost per knowledge
    20. 21. Training and demos...
    21. 22. Avoid it
    22. 23. Prospective publications: Zookeys / Phytokeys
    23. 24. Semantic enhancements to published texts
    24. 25. 2036 ?
    25. 26. Why do we publish?
    26. 27. Public funded research
    27. 28. Contribute to the welfare of the nations…
    28. 29. Dissemination
    29. 30. Access
    30. 31. Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only).
    31. 32. Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)
    32. 33. The Biodiversity Heritage Library is currently digitizing and make accessible >100 million pages, most of them out of copyright, ie older then 1925. ........ to be finished in 2048...
    33. 34. What is a publication from public funded science?
    34. 36. Open Access
    35. 37. What is a scientific publication? Print, journal, article, treatment, public funding, pdf, xml Tool to disseminate scientific knowledge
    36. 38. Why do we publish the way we publish?
    37. 39. What kind of publications serve our needs?
    38. 40. IPBES
    39. 41. Access
    40. 42. Beyond the PDF
    41. 43. Access to what?
    42. 44. Scratchpad, EOL page, Wikipage, species page
    43. 45. Treatment
    44. 46. Treatments come with a lot of overhead
    45. 47. Genus Diagnosis Notes Biology Distribution Key to sp. Species descriptions The structure of a systematics publication Species treatments Title Author Abstract Introduction Taxon descriptions Suppl. Materials Acknowledgments References Species 1 Species 2 Species 3 Species 4 Species .. Species n Nomenclature Diagnosis Distribution Material Examined Comments Description Graphic art Species 1
    46. 48. Treatments come with a lot of overhead Treatments are highly structured
    47. 49. Genus Diagnosis Notes Biology Distribution Key to sp. Species descriptions The structure of a systematics publication Species treatments Title Author Abstract Introduction Taxon descriptions Suppl. Materials Acknowledgments References Species 1 Species 2 Species 3 Species 4 Species .. Species n Nomenclature Diagnosis Distribution Material Examined Comments Description Graphic art Species 1
    48. 50. Treatments come with a lot of overhead Treatments are highly structured Content ist defined
    49. 51. Treatments come with a lot of overhead Treatments are highly structured Content ist defined XML can define it
    50. 52. This can also be applied to entire sections of text, such as the descriptions of a species and its parts. <tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source=&quot;HNS&quot; identifier=&quot;193329&quot;/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type=&quot;description&quot;> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus $ described below from paratypes.) Median clypeus .... </treatment>
    51. 53. Treatments come with a lot of overhead treatments are highly structured Content ist defined XML defines them The question is, how to get them
    52. 54. Mark-up of legacy publications
    53. 55. $$$$$$$$$$$$$$$$$
    54. 56. Prospective semantic mark-up and linking to external sources is the future
    55. 57. Treatment repository + external resources
    56. 58. BHL-Modern
    57. 59. The future is writable.
    58. 60. Happy Birthday! January 15, 2001
    59. 61. What is a scientific publication? Wikipedia entry as a publication?
    60. 62. Quality control
    61. 63. What is a scientific publication? Centrifugal versus centripetal forces or are we attractive enough?
    62. 64. Continuity
    63. 65. $$$$$$$
    64. 66. http://plazi.org Thank you very much! Donat Agosti [email_address]

    ×