Leveraging an international infrastructure: Case studies from the Encyclopeda of Life


Published on

Presentation to the Taxonomic Databases Working Group 2012 meeting on 25 October 2012

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • As you may know, Encyclopedia of Life is a web site providing global access to knowledge about life on earth.Global – the whole worldAccess – free, and freely re-usableKnowledge – synthesized, not rawLife on Earth – biological diversity
  • EOL takes information from about 200 sources so far, mostly scientific databases, but also including Flickr and Wikipedia, and automatically sorts it onto on taxon pages. Our curators can then trust or untrust it, or anybody can provide comments or ratings. About a thousand credentialed scientists have already volunteered to help with quality control. Actions and comments get fed back to the original providers, and the material on EOL is also available to other applications via an Application Programming Interface, which I’ll talk more about in a moment.We’re partnering with over two hundred scientific databases as well as public conribution sites like Flickr and Wikipedia.100+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages500 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views
  • These numbers a bit out of date now
  • These are only the top subjects, there are many more. Subjects are almost all infoitems from the TDWG Species Profile Model. Multiple topics includes several vague subjects like “Biology” “TaxonBiology” and “Description”
  • These are only checklists that have more than one item.
  • Images are less than half the amount of text (1.37 million). Far fewer examples of videos and sounds, but these are expected to grow.
  • Not going to name names, except to say that the two to the right here with more 5 star ratings are flickr and Wikipedia, and the ones on the left are the museums and specimens.
  • Full curators have credentials and have more power. Assistant curators do relatively more adding of common names, and are also more likely to add article text.Full curators are the only ones that can trust or untrust (red)Both spend a lot of time rating objects (1 to 5 stars)So far, few full curators are working with classifications.
  • One point: would have been faster if Rod had just posted the comment directly.Another point: Obviously it would be better if EOL regularly updated from IT IS, because it has been four months and we stil don’t have the correction on EOL.
  • This was discovered on EOL by a curator. It was a slide in a Smithsonian botanist’s slide deck that an intern scanned and added to the specimen catalog. Luckily it was “identified” as “Undetermined”. Because it was spotted by an EOL curator, the museum was able to remove it from their catalog.
  • Leveraging an international infrastructure: Case studies from the Encyclopeda of Life

    1. 1. Leveraging an international infrastructureCase studies from the Encyclopedia of LifeCynthia Parr, Katja Schulz, and Jennifer HammockTDWG 2012 @cydparrBeijing, China 25 October 2012 @eol
    2. 2. Outline• Briefly, who are we and why are we here?• The information landscape of species descriptions• Thoughts for the futureNote:Rubenstein Fellows Proposals are due 15 NovemberSee http://eol.org/info/Rubenstein_2013_competition
    3. 3. EOL aggregates and curates CurateAggregate Comment Rate, Collect eol.org Quality control Third party apps
    4. 4. Why survey the landscape?• Improve standards and set goals• Prepare for text mining• Learn how best to support quality control• Baseline for improving multilingual, open content• Because we can
    5. 5. >1.1 million taxon pages with contentfrom more than 200 providers, 1000s individuals 5 million content objects
    6. 6. Total of 1,822,079 images 9,586 videos 28,569 sounds
    7. 7. Number of text objects - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 Distribution TypeInformationSubject of text object Habitat Threats Conservation Trends Associations TrophicStrategy PopulationBiology Migration LifeExpectancy Behaviour Diseases
    8. 8. Many user-created EOL collections are local checklists Geographical checklists n=618 Other checklists n=1662
    9. 9. License restrictions vary by object type n=~5 million100%80% public domain60% cc-by40% cc-by-sa cc-by-nc20% cc-by-nc-sa 0% text images maps videos sounds n=~3 million
    10. 10. Norway Dutch USA Taiwan Mexico China Egypt India Costa Rica Colombia Peru Australia South AfricaEOL interface now in 12 languagesVia translatewiki.org
    11. 11. However…Vernacular names Text description objectsin 163 languages in 17 languages
    12. 12. Some providers get higher ratings than others100%80% 5 stars60% 4 stars 3 stars40% 2 stars20% 1 star 0% Total n = 154,308 rating actions Showing only those 17 providers who got at least 1000 ratings
    13. 13. Full curators down-rate and non-curators up-rate100%90%80%70% 560% 450% 340%30% 220% 110% 0% non-curators assistant curators full curators
    14. 14. Assistant (n=177) & full curators (n=984) are different 5816 223,639 actions 100% 90% 80% 70% common names 60% set exemplar 50% rating 40% taxon associations add articles 30% classifications 20% trust/untrust 10% 33 actions per assistant curator 0% 227 actions per full curator assistant full/master curators curators
    15. 15. Quality control case studies The case of “Panisopis”1. Rod spots the error on EOL and posts about it on his blog2. Cyndy reads the blog and posts it as a comment on EOL3. The EOL comment gets sent to ITIS automatically4. ITIS fixes its database5. EOL hasn’t yet updated from ITIS
    16. 16. The case of the Far Side cartoon
    17. 17. Conclusions• We’ve made a lot of progress – Large repository, many subjects – Great start on collections/checklists – Lots of CC-licenses – Lot of international partnerships and interface languages – Active curators• We’ve got plenty of room for more – Please share your ideas for the future
    18. 18. Thanks toJohn D. and Catherine T. MacArthurFoundation, Alfred P. Sloan Foundation, SmithsonianInstitution, Marine Biological Laboratory, HarvardUniversity, David Rubenstein, and other funders and eol.orgdonorsAll our users, content provider & global partners @cydparr especially the Chinese Academy of Sciences @eol