Biodiversity Informatics Course Presentation

2,567 views

Published on

Slides from a presentation to Biodiversity Informatics course, Stockholm, 16-09-2009

Published in: Education, Technology
4 Comments
4 Likes
Statistics
Notes
  • This is a great overview with real world examples that we're wrangling with now. Great to see your work towards solving many of these issues too. Also, +1 for the Rumsfeld quote, it's worth repeating...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ok, will pay 'cool'.
    was using quote compilation sites like:
    http://www.brainyquote.com/quotes/authors/n/nicolaus_copernicus_2.html
    http://thinkexist.com/quotation/to_know_that_we_know_what_we_know-and_to_know/201808.html
    etc.
    Real Media? you are kidding me! #bbc #fail
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Yeah but Rumsfeld is cooler. Plus we know he said this (http://news.bbc.co.uk/media/audio/38078000/rm/_38078601_rummer.ram), whereas Googling suggests it's not clear who said 'To know that we know what we know, and to know that we do not know what we do not know, that is true knowledge'.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • slide 14 - you might like to contrast the oft-quoted ramblings of Rumsfeld with the more erudite, succinct and poetic 'To know that we know what we know, and to know that we do not know what we do not know, that is true knowledge'
    of Copernicus.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,567
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
36
Comments
4
Likes
4
Embeds 0
No embeds

No notes for slide
  • This ant illustrates a case where three different data source (NCBI, AntWeb, journal supplementary material) are needed to discover that, in fact, GenBank has sequences for this ant.
  • If you search NCBI for “Melissotarsus insularis” you find nothing. If you search AntWeb you find some specimens, one of which is CASENT0107663-D01. In the Phils Trans barcoding paper the supplementary material (also online in BoLD) shows that CASENT0107663-D01 has been sequenced, yielding a COI sequence with accession number DQ176312. If you go back to GenBank and look up the accession number you discover the taxon “ Melissotarsus sp. BLF m1”, which must be the same as Melissotarsus insularis. Hence, GenBank should actually say “yes, I have information on Melissotarsus insularis”. There is latent knowledge in these data sources that we miss if they remain in ignorance of each other.
  • http://www.wired.com/images/article/magazine/1610/ff_barcodeoflife4_f.jpg
  • Leptotyphlops carlae
  • http://species.asu.edu/2009_species04
  • ~/Desktop/GrandChallenge/Data/DVD/LAB0370A/10557903/00420003/06003691/main.xml
  • Citation, user sees bibliography and may be able to follow links
  • Data citation with PageRank scores
  • http://www.flickr.com/photos/bastique/639784702/ by bastique
  • History flow visualisation, after Jeff Atwood’s animated GIF
  • Afrotheria
  • EOL is like Wikipedia, but not. This difference may prove it’s downfall.
  • For the species in Wikipedia I asked what web site comes top of the Google search for that name. Wikipedia dominates the search ranking. There really is only one game in town.
  • Biodiversity Informatics Course Presentation

    1. 1. Biodiversity Informatics
    2. 4. Ideas <ul><li>Linking </li></ul><ul><li>Mashups </li></ul><ul><li>Data mining </li></ul><ul><li>RSS </li></ul><ul><li>Identifiers </li></ul><ul><li>Errors </li></ul><ul><li>Wikis </li></ul>
    3. 5. Linking
    4. 6. Apomys datae
    5. 9. Apomys specimen
    6. 11. How do we integrate these data?
    7. 12. Why integrate?
    8. 13. Learn stuff we don’t know
    9. 14. <ul><li>There are known knowns , things we know that we know </li></ul><ul><li>There are known unknowns , things we now know we don’t know </li></ul><ul><li>But there are also unknown unknowns , things we do not know we don't know </li></ul>
    10. 15. Unknown knowns
    11. 16. Things we know …without knowing that we know
    12. 17. Melissotarsus insularis
    13. 18. Melissotarsus insularis no hit CASENT0107663-D01 DQ176312 Melissotarsus sp. BLF m1 DQ176312 CASENT0107663-D01 Melissotarsus insularis 1 Melissotarsus insularis Melissotarsus sp. BLF m1 =
    14. 19. No one source has all the answers
    15. 20. Joining the dots
    16. 21. Mashups
    17. 23. Single source
    18. 24. Many sources
    19. 26. Combine sources
    20. 28. ispecies.org
    21. 31. Merge things your way
    22. 32. Don’t like iSpecies?
    23. 33. Make your own!
    24. 36. Data mining
    25. 38. Text mining
    26. 39. Morphological and molecular description of Haematoloechus meridionalis n. sp. (Digenea: Plagiorchioidea: Haematoloechidae) from Rana vaillanti brocchi of Guanacaste, Costa Rica Halipegus eschi n. sp. (Digenea: Hemiuridae) in Rana vaillanti from Guanacaste Province, Costa Rica Haematoloechus danbrooksi n. sp. (Digenea: Plagiorchioidea) from Rana vaillanti from Los Tuxtlas, Veracruz, Mexico
    27. 40. RSS
    28. 42. Visualising biodiversity digitisation in real time
    29. 43. gathering new data…
    30. 44. discovering new species…
    31. 45. publishing papers…
    32. 46. Some of this knowledge is being broadcast using RSS
    33. 47. We want RSS feeds that <ul><li>Have timestamps </li></ul><ul><li>Are georeferenced </li></ul><ul><li>Have taxonomic names as tags </li></ul>
    34. 48. like Geo RSS geotagged (latitude, longitude, woeid) taxonomic name (machine tags) timestamp
    35. 49. But what if no RSS?
    36. 50. We can make it ourselves http://bioguid.info/rss Secret sauce (= screen scraping) Web page RSS
    37. 51. Then add tags using services Georeferencing Taxonomic names
    38. 52. Now we have RSS…
    39. 53.
    40. 54. … is anybody listening?
    41. 55. Challenge: aggregate and display RSS Merge RSS feeds, add missing georeferencing and taxonomic names Display where, when, what
    42. 56. http://bioguid.info/ebio09/www/3d Visualising biodiversity digitisation in real time
    43. 57. Identifiers
    44. 58. Digital Object Identifier (DOI)
    45. 60. Identifies a publication
    46. 61. Globally unique
    47. 62. 10.1016/j.ympev.2006.04.006
    48. 63. Paper
    49. 64. Why have DOIs?
    50. 65. Link rot
    51. 66. Refs
    52. 69. Cites 2006 2006
    53. 70. Forward Cites 2006 2009
    54. 71. Shoulders of giants
    55. 72. progress is incremental
    56. 73. reuse past results
    57. 74. Forward Cites 2006 2008
    58. 76. Species Genes
    59. 77. data linking
    60. 78. data citation
    61. 80. Need tools to: <ul><li>Resolve identifiers </li></ul><ul><li>Create new identifiers </li></ul><ul><li>Find existing identifiers </li></ul>
    62. 81. http://bioguid.info/openurl/
    63. 82. Errors
    64. 83. http://iphylo.org/~rpage/challenge
    65. 84. demo
    66. 85. The Carmen Electra argument for Open Access
    67. 86. reuse data
    68. 87. Electra pilosa
    69. 88. Carmen Electra versus Electra
    70. 89. reuse data
    71. 90. Homo sapiens
    72. 91. AJ711044
    73. 92. should be AJ971044
    74. 93. how do I fix this error?
    75. 94. Closed
    76. 95. Can’t easily fix
    77. 96. Open…
    78. 97. … and editable
    79. 98. Anybody could fix it
    80. 99. Wikis
    81. 100. Wikis
    82. 101. Versions 1 2 3 4 History flow
    83. 102. Afrotheria
    84. 103. EOL
    85. 105. Semantic wikis (or, what’s wrong with Wikipedia?)

    ×