20110725 ibc xml

738 views
654 views

Published on

this is the preprint of my lecture at the International Botanical Congress http://www.ibc2011.com/

Published in: Technology
2 Comments
0 Likes
Statistics
Notes
  • now, watching the doc about Aron Swarts https://archive.org/details/TheInternetsOwnBoyTheStoryOfAaronSwartz 'The Internet's Own Boy' it makes me angry how I could have given in into this censorhip by the organizer of the symposium. The reason given was, that the director of Mellon Foundation who was supporting JSTOR will be sitting in the back of the room and might not like it. Consequences could be that he would not continue funding the organizers of the meeting (a funding that ended anyway a year later due to a thematic shift in the Mellon Foundation).
    I still believe it is a big mistake to lock of our knowledge of biodiversity behind paywalls. It needs to be free. And we have to be able to build ways to make this happen, as much as we accept we need free access to air to breath.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • There has been an argument about slide 7 at the IBC that this slide is inflammatory. There is in fact a problem with this slide as is that it is not displayed properly since it is missing the citation that was in the original version that I uploaded, both to Slideshare as well as to the conference site. The missing text is the citation of this quote that is an essential part of the slide to show one of the extremes of the discussions led in public. The citation is 'Hufpost Politics, July 19, 2011 http://www.huffingtonpost.com/2011/07/19/huffpost-hill----gang-vio_n_904027.html'
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
738
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide

20110725 ibc xml

  1. 1. A Schema for Description and Exchange of TaxonomicPublication's Content<br />Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter<br />Plazi, Bern, Switzerland<br />25. July 2011, IBC, Melbourne<br />
  2. 2. WHY?<br />
  3. 3. disseminate<br />access<br />knowledge<br />
  4. 4. New York Times, July 19, 2011<br />
  5. 5. “JSTOR's the one that should be in prison, man, for locking up knowledge.”Hufpost Politics, July 19, 2011http://www.huffingtonpost.com/2011/07/19/huffpost-hill----gang-vio_n_904027.html<br />
  6. 6. Open<br />Access<br />
  7. 7. An example from the Neurocommons text mining pilot:<br /><ul><li> PubMed abstracts: > 16,000,000
  8. 8. CNS classified abstracts: 874,727
  9. 9. text mining recognized: 368,688
  10. 10. text mining processed: 94,381
  11. 11. extracted graph of 30,000+ relationships and 5,500 genes and proteins</li></ul>“protein-protein interaction networks”<br />John Wilbanks, Neurocommons<br />
  12. 12. 27,266 papers<br />128,437 papers<br />41,985 papers<br />4,563 papers<br />10,365 papers<br />In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other:<br />“protein-protein interaction networks”<br />John Wilbanks, Neurocommons<br />
  13. 13. “protein-protein interaction networks”<br />John Wilbanks, Neurocommons<br />It will open up scientific literature for data mining<br />
  14. 14. HOW?<br />
  15. 15. accessfor human ANDmachine<br />
  16. 16. It is about digesting millions of pages: >>100 M pages taxonomic literature25M scientific publications / year25K journals>2K with zoological taxonomic descriptions18K descriptions of new species / year<br />
  17. 17. PDF is not enough<br />
  18. 18. data and information in context<br />
  19. 19. semantic markup<br />
  20. 20. context of content<br />
  21. 21. XMLeXtended Markup Language<br />
  22. 22. <tax:treatment><br /> <tax:nomenclature><br /> <tax:name><br /> <tax:xid source="HNS" identifier="193329"/><br /> <tax:xmldata><br /> <dc:Genus>Mystrium</dc:Genus><br /> <dc:Species>leonie</dc:Species><br /> </tax:xmldata><br /> Mystrium leonie Bihn & Verhaagh, new species<br /> </tax:name><br /> <tax:status>n. sp.</tax:status><br /> Fig 1 D - F<br /> </tax:nomenclature><br /> <tax:div type="description"><br /> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL <br /> 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving <br /> to a sharp apical tooth, the apex parallel to the anterior clypeal margin. <br /> (Holotype with material in mandibles, so mandibles and anterior clypeus<br />described below from paratypes.) Median clypeus<br />....<br /></treatment><br />
  23. 23. content in a complex e-environment<br />
  24. 24. linking<br />
  25. 25. Azteca instabilis<br />Would then read like<br /><tax:name><br /><tax:xid source=“LSID" identifier=“urn:lsid:biosci.ohio-state.edu.osuc_concetps:13452"/> Link to external database<br /><tax:xmldata> Normalization of data<br /> <dc:Genus>Azteca</dc:Genus><br /> <dc:Species>instabilis</dc:Species><br /> </tax:xmldata><br />Azteca instabilis<br /></tax:name><br />
  26. 26. definition of XML tags<br />DTD<br />schema<br />
  27. 27. transformations from XML<br />html<br />print<br />pdf<br />archiving<br />rdf<br />database<br />
  28. 28. legacyTaxonXTaxpubprospective <br />
  29. 29. how to use XML?<br />
  30. 30. legacy publications<br />
  31. 31. Plazi workflow: GoldenGate editor based mark up and linking<br />- Get LSID from Hymenoptera Name Server for names; ZooBank?<br /><ul><li>Add new names </li></ul>- Get bibliographic Guids from bioguid (or EDIT?)<br />- Get bibliographic Metadata from HNS (MODS)<br /><ul><li>Get Guids for
  32. 32. CBOL
  33. 33. NCBI
  34. 34. specimen
  35. 35. images
  36. 36. .....</li></ul>- Get geographic long/lat from geonames.org<br />Legacy publications<br />
  37. 37. linked data<br />
  38. 38. last resort<br />
  39. 39. prospective publications<br />
  40. 40.
  41. 41. the future<br />
  42. 42. dissemination - access<br />
  43. 43. Plazi: <br />access to treatments<br />TAPIR, SPM, etc.<br />You<br />You<br />human<br />machine<br />You<br />
  44. 44. “protein-protein interaction networks”<br />John Wilbanks, Neurocommons<br />It will open up scientific literature for data mining and extraction<br />
  45. 45. http://plazi.org<br />Thank you very much!<br />Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter<br />agosti@plazi.org<br />
  46. 46. JSTOR did not permit users:c. to make other than personal use of individually downloaded articles.Aaron Swartz indictment, July 14, 2011<br />

×