Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Setting the Scene for ViBRANT – Strategy, Philosophy and Communication

1,536 views

Published on

  • Be the first to comment

Setting the Scene for ViBRANT – Strategy, Philosophy and Communication

  1. 1. The Future of Scientific Publishing Donat Agosti (Plazi, Bern) 21 January 2011 Paris
  2. 2. I don‘t know the future, but I have a dream…
  3. 3. Immersing in the knowledge
  4. 4. I want to ask a publication a question, not the author telling me what I have to read.
  5. 5. I want to find out how many and which species are there? how are they related? do they disappear? how are they distributed?
  6. 6. I want to find out how many and which species there are how are they related do they disappear Other people have different interests
  7. 7. <ul><li>An example from the Neurocommons text mining pilot: </li></ul><ul><li>PubMed abstracts: > 16,000,000 </li></ul><ul><li>CNS classified abstracts: 874,727 </li></ul><ul><li>text mining recognized: 368,688 </li></ul><ul><li>text mining processed: 94,381 </li></ul><ul><li>extracted graph of 30,000+ relationships and 5,500 genes and proteins </li></ul>“ protein-protein interaction networks” John Wilbanks, Neurocommons
  8. 9. In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other: “ protein-protein interaction networks” John Wilbanks, Neurocommons 27,266 papers 4,563 papers 41,985 papers 10,365 papers 128,437 papers
  9. 10. It will open up scientific literature for data mining “ protein-protein interaction networks” John Wilbanks, Neurocommons
  10. 11. <ul><li>An example from the taxonomy text mining pilot: </li></ul><ul><li>Every year: > 17,000 new species described / year </li></ul><ul><li>Every year: >100,000 species redescribed /year </li></ul><ul><li>Total journals: >2,000 with taxonomic content </li></ul><ul><li>Total: 1,900,000 species described </li></ul><ul><li>Total: >20,000,000 treatments </li></ul><ul><li>text mining processed: 0 </li></ul><ul><li>extracted graph of 0 species 0 relationships </li></ul>Taxon mining project
  11. 12. 1996 Conservation, Phylogeny, Systematics, Curiosity, Aesthetics, Fascination
  12. 13. 2011 Experience, Frustration, Wonder, Excitment, Satisfaction, Determination
  13. 14. Modeling taxonomic literature: TaxonX Taxpub NLM DTD Plazi
  14. 15. <ul><li>- Get LSID from Hymenoptera Name Server for names; ZooBank? </li></ul><ul><li>Add new names </li></ul>- Get bibliographic Metadata from HNS (MODS) - Get bibliographic Guids from bioguid (or EDIT?) - Get geographic long/lat from geonames.org Plazi workflow: GoldenGate mark up as an example <ul><li>Get Guids for </li></ul><ul><ul><li>CBOL </li></ul></ul><ul><ul><li>NCBI </li></ul></ul><ul><ul><li>specimen </li></ul></ul><ul><ul><li>images </li></ul></ul><ul><ul><li>..... </li></ul></ul>
  15. 16. The semantically enhanced treatments, extracted, stored on Plazi.org, and served in a human readable form, are linked to the underlying data: Fisher & Smith, 2008, PLoS ONE.
  16. 17. Plazi Search and Retrieval Server: Access to data TAPIR, SPM You You You human machine
  17. 18. The conversion comes at a cost, even though GoldenGate and other editors exist
  18. 19. Time per minute to produce clean OCR using ABBYY; publications in chronological order Production metrics to measure effort and compare various approaches and alogrithm
  19. 20. How to mark up large body of legacy publications? Inhouse? Build / use commercial services? Use the community, e.g. volunteers? Activation energy Gutenberg Semantic Web Cost per knowledge
  20. 21. Training and demos...
  21. 22. Avoid it
  22. 23. Prospective publications: Zookeys / Phytokeys
  23. 24. Semantic enhancements to published texts
  24. 25. 2036 ?
  25. 26. Why do we publish?
  26. 27. Public funded research
  27. 28. Contribute to the welfare of the nations…
  28. 29. Dissemination
  29. 30. Access
  30. 31. Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only).
  31. 32. Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)
  32. 33. The Biodiversity Heritage Library is currently digitizing and make accessible >100 million pages, most of them out of copyright, ie older then 1925. ........ to be finished in 2048...
  33. 34. What is a publication from public funded science?
  34. 36. Open Access
  35. 37. What is a scientific publication? Print, journal, article, treatment, public funding, pdf, xml Tool to disseminate scientific knowledge
  36. 38. Why do we publish the way we publish?
  37. 39. What kind of publications serve our needs?
  38. 40. IPBES
  39. 41. Access
  40. 42. Beyond the PDF
  41. 43. Access to what?
  42. 44. Scratchpad, EOL page, Wikipage, species page
  43. 45. Treatment
  44. 46. Treatments come with a lot of overhead
  45. 47. Genus Diagnosis Notes Biology Distribution Key to sp. Species descriptions The structure of a systematics publication Species treatments Title Author Abstract Introduction Taxon descriptions Suppl. Materials Acknowledgments References Species 1 Species 2 Species 3 Species 4 Species .. Species n Nomenclature Diagnosis Distribution Material Examined Comments Description Graphic art Species 1
  46. 48. Treatments come with a lot of overhead Treatments are highly structured
  47. 49. Genus Diagnosis Notes Biology Distribution Key to sp. Species descriptions The structure of a systematics publication Species treatments Title Author Abstract Introduction Taxon descriptions Suppl. Materials Acknowledgments References Species 1 Species 2 Species 3 Species 4 Species .. Species n Nomenclature Diagnosis Distribution Material Examined Comments Description Graphic art Species 1
  48. 50. Treatments come with a lot of overhead Treatments are highly structured Content ist defined
  49. 51. Treatments come with a lot of overhead Treatments are highly structured Content ist defined XML can define it
  50. 52. This can also be applied to entire sections of text, such as the descriptions of a species and its parts. <tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source=&quot;HNS&quot; identifier=&quot;193329&quot;/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type=&quot;description&quot;> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus $ described below from paratypes.) Median clypeus .... </treatment>
  51. 53. Treatments come with a lot of overhead treatments are highly structured Content ist defined XML defines them The question is, how to get them
  52. 54. Mark-up of legacy publications
  53. 55. $$$$$$$$$$$$$$$$$
  54. 56. Prospective semantic mark-up and linking to external sources is the future
  55. 57. Treatment repository + external resources
  56. 58. BHL-Modern
  57. 59. The future is writable.
  58. 60. Happy Birthday! January 15, 2001
  59. 61. What is a scientific publication? Wikipedia entry as a publication?
  60. 62. Quality control
  61. 63. What is a scientific publication? Centrifugal versus centripetal forces or are we attractive enough?
  62. 64. Continuity
  63. 65. $$$$$$$
  64. 66. http://plazi.org Thank you very much! Donat Agosti [email_address]

×