Biodiversity Informatics
 
 
Ideas <ul><li>Linking </li></ul><ul><li>Mashups </li></ul><ul><li>Data mining </li></ul><ul><li>RSS </li></ul><ul><li>Iden...
Linking
Apomys datae
 
 
Apomys specimen
 
How do we integrate these data?
Why integrate?
Learn stuff we don’t know
<ul><li>There are  known   knowns , things we know that we know </li></ul><ul><li>There are  known   unknowns , things we ...
Unknown knowns
Things we know  …without knowing that we know
Melissotarsus insularis
Melissotarsus insularis no hit CASENT0107663-D01 DQ176312 Melissotarsus sp. BLF m1 DQ176312 CASENT0107663-D01 Melissotarsu...
No one source has all the answers
Joining the dots
Mashups
 
Single source
Many sources
 
Combine sources
 
ispecies.org
 
 
Merge things  your  way
Don’t like iSpecies?
Make your own!
 
 
Data mining
 
Text mining
Morphological and molecular description of  Haematoloechus   meridionalis  n. sp. (Digenea: Plagiorchioidea: Haematoloechi...
RSS
 
Visualising biodiversity digitisation in real time
gathering new data…
discovering new species…
publishing papers…
Some of this knowledge is being broadcast using RSS
We want RSS feeds that <ul><li>Have timestamps </li></ul><ul><li>Are georeferenced </li></ul><ul><li>Have taxonomic names ...
like  Geo RSS geotagged  (latitude, longitude, woeid) taxonomic name  (machine tags) timestamp
But what if no RSS?
We can make it ourselves http://bioguid.info/rss Secret sauce (= screen scraping) Web page RSS
Then add tags using services Georeferencing Taxonomic names
Now we have RSS…
… is anybody listening?
Challenge: aggregate and display RSS Merge RSS feeds, add missing georeferencing and taxonomic names Display where, when, ...
http://bioguid.info/ebio09/www/3d Visualising biodiversity digitisation in real time
Identifiers
Digital Object Identifier (DOI)
 
Identifies a publication
Globally unique
10.1016/j.ympev.2006.04.006
Paper
Why have DOIs?
Link rot
Refs
 
 
Cites 2006 2006
Forward Cites 2006 2009
Shoulders of giants
progress is incremental
reuse past results
Forward Cites 2006 2008
 
Species Genes
data linking
data citation
 
Need tools to: <ul><li>Resolve identifiers </li></ul><ul><li>Create new identifiers </li></ul><ul><li>Find existing identi...
http://bioguid.info/openurl/
Errors
http://iphylo.org/~rpage/challenge
demo
The Carmen Electra argument for Open Access
reuse data
Electra pilosa
Carmen  Electra  versus  Electra
reuse data
Homo sapiens
AJ711044
should be AJ971044
how do I fix this error?
Closed
Can’t easily fix
Open…
… and editable
Anybody could fix it
Wikis
Wikis
Versions 1 2 3 4 History flow
Afrotheria
EOL
 
Semantic wikis (or, what’s wrong with Wikipedia?)
Upcoming SlideShare
Loading in...5
×

Biodiversity Informatics Course Presentation

2,253

Published on

Slides from a presentation to Biodiversity Informatics course, Stockholm, 16-09-2009

Published in: Education, Technology
4 Comments
4 Likes
Statistics
Notes
  • This is a great overview with real world examples that we're wrangling with now. Great to see your work towards solving many of these issues too. Also, +1 for the Rumsfeld quote, it's worth repeating...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ok, will pay 'cool'.
    was using quote compilation sites like:
    http://www.brainyquote.com/quotes/authors/n/nicolaus_copernicus_2.html
    http://thinkexist.com/quotation/to_know_that_we_know_what_we_know-and_to_know/201808.html
    etc.
    Real Media? you are kidding me! #bbc #fail
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Yeah but Rumsfeld is cooler. Plus we know he said this (http://news.bbc.co.uk/media/audio/38078000/rm/_38078601_rummer.ram), whereas Googling suggests it's not clear who said 'To know that we know what we know, and to know that we do not know what we do not know, that is true knowledge'.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • slide 14 - you might like to contrast the oft-quoted ramblings of Rumsfeld with the more erudite, succinct and poetic 'To know that we know what we know, and to know that we do not know what we do not know, that is true knowledge'
    of Copernicus.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,253
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
34
Comments
4
Likes
4
Embeds 0
No embeds

No notes for slide
  • This ant illustrates a case where three different data source (NCBI, AntWeb, journal supplementary material) are needed to discover that, in fact, GenBank has sequences for this ant.
  • If you search NCBI for “Melissotarsus insularis” you find nothing. If you search AntWeb you find some specimens, one of which is CASENT0107663-D01. In the Phils Trans barcoding paper the supplementary material (also online in BoLD) shows that CASENT0107663-D01 has been sequenced, yielding a COI sequence with accession number DQ176312. If you go back to GenBank and look up the accession number you discover the taxon “ Melissotarsus sp. BLF m1”, which must be the same as Melissotarsus insularis. Hence, GenBank should actually say “yes, I have information on Melissotarsus insularis”. There is latent knowledge in these data sources that we miss if they remain in ignorance of each other.
  • http://www.wired.com/images/article/magazine/1610/ff_barcodeoflife4_f.jpg
  • Leptotyphlops carlae
  • http://species.asu.edu/2009_species04
  • ~/Desktop/GrandChallenge/Data/DVD/LAB0370A/10557903/00420003/06003691/main.xml
  • Citation, user sees bibliography and may be able to follow links
  • Data citation with PageRank scores
  • http://www.flickr.com/photos/bastique/639784702/ by bastique
  • History flow visualisation, after Jeff Atwood’s animated GIF
  • Afrotheria
  • EOL is like Wikipedia, but not. This difference may prove it’s downfall.
  • For the species in Wikipedia I asked what web site comes top of the Google search for that name. Wikipedia dominates the search ranking. There really is only one game in town.
  • Biodiversity Informatics Course Presentation

    1. 1. Biodiversity Informatics
    2. 4. Ideas <ul><li>Linking </li></ul><ul><li>Mashups </li></ul><ul><li>Data mining </li></ul><ul><li>RSS </li></ul><ul><li>Identifiers </li></ul><ul><li>Errors </li></ul><ul><li>Wikis </li></ul>
    3. 5. Linking
    4. 6. Apomys datae
    5. 9. Apomys specimen
    6. 11. How do we integrate these data?
    7. 12. Why integrate?
    8. 13. Learn stuff we don’t know
    9. 14. <ul><li>There are known knowns , things we know that we know </li></ul><ul><li>There are known unknowns , things we now know we don’t know </li></ul><ul><li>But there are also unknown unknowns , things we do not know we don't know </li></ul>
    10. 15. Unknown knowns
    11. 16. Things we know …without knowing that we know
    12. 17. Melissotarsus insularis
    13. 18. Melissotarsus insularis no hit CASENT0107663-D01 DQ176312 Melissotarsus sp. BLF m1 DQ176312 CASENT0107663-D01 Melissotarsus insularis 1 Melissotarsus insularis Melissotarsus sp. BLF m1 =
    14. 19. No one source has all the answers
    15. 20. Joining the dots
    16. 21. Mashups
    17. 23. Single source
    18. 24. Many sources
    19. 26. Combine sources
    20. 28. ispecies.org
    21. 31. Merge things your way
    22. 32. Don’t like iSpecies?
    23. 33. Make your own!
    24. 36. Data mining
    25. 38. Text mining
    26. 39. Morphological and molecular description of Haematoloechus meridionalis n. sp. (Digenea: Plagiorchioidea: Haematoloechidae) from Rana vaillanti brocchi of Guanacaste, Costa Rica Halipegus eschi n. sp. (Digenea: Hemiuridae) in Rana vaillanti from Guanacaste Province, Costa Rica Haematoloechus danbrooksi n. sp. (Digenea: Plagiorchioidea) from Rana vaillanti from Los Tuxtlas, Veracruz, Mexico
    27. 40. RSS
    28. 42. Visualising biodiversity digitisation in real time
    29. 43. gathering new data…
    30. 44. discovering new species…
    31. 45. publishing papers…
    32. 46. Some of this knowledge is being broadcast using RSS
    33. 47. We want RSS feeds that <ul><li>Have timestamps </li></ul><ul><li>Are georeferenced </li></ul><ul><li>Have taxonomic names as tags </li></ul>
    34. 48. like Geo RSS geotagged (latitude, longitude, woeid) taxonomic name (machine tags) timestamp
    35. 49. But what if no RSS?
    36. 50. We can make it ourselves http://bioguid.info/rss Secret sauce (= screen scraping) Web page RSS
    37. 51. Then add tags using services Georeferencing Taxonomic names
    38. 52. Now we have RSS…
    39. 53.
    40. 54. … is anybody listening?
    41. 55. Challenge: aggregate and display RSS Merge RSS feeds, add missing georeferencing and taxonomic names Display where, when, what
    42. 56. http://bioguid.info/ebio09/www/3d Visualising biodiversity digitisation in real time
    43. 57. Identifiers
    44. 58. Digital Object Identifier (DOI)
    45. 60. Identifies a publication
    46. 61. Globally unique
    47. 62. 10.1016/j.ympev.2006.04.006
    48. 63. Paper
    49. 64. Why have DOIs?
    50. 65. Link rot
    51. 66. Refs
    52. 69. Cites 2006 2006
    53. 70. Forward Cites 2006 2009
    54. 71. Shoulders of giants
    55. 72. progress is incremental
    56. 73. reuse past results
    57. 74. Forward Cites 2006 2008
    58. 76. Species Genes
    59. 77. data linking
    60. 78. data citation
    61. 80. Need tools to: <ul><li>Resolve identifiers </li></ul><ul><li>Create new identifiers </li></ul><ul><li>Find existing identifiers </li></ul>
    62. 81. http://bioguid.info/openurl/
    63. 82. Errors
    64. 83. http://iphylo.org/~rpage/challenge
    65. 84. demo
    66. 85. The Carmen Electra argument for Open Access
    67. 86. reuse data
    68. 87. Electra pilosa
    69. 88. Carmen Electra versus Electra
    70. 89. reuse data
    71. 90. Homo sapiens
    72. 91. AJ711044
    73. 92. should be AJ971044
    74. 93. how do I fix this error?
    75. 94. Closed
    76. 95. Can’t easily fix
    77. 96. Open…
    78. 97. … and editable
    79. 98. Anybody could fix it
    80. 99. Wikis
    81. 100. Wikis
    82. 101. Versions 1 2 3 4 History flow
    83. 102. Afrotheria
    84. 103. EOL
    85. 105. Semantic wikis (or, what’s wrong with Wikipedia?)
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×