Introduction to EOL.org for scientists

Introduction to eol.org

Cynthia Parr
Semantic reasoning workshop @cydparr
Washington, DC 6-7 September 2012 @eol

Whirlwind tour
• What kind of information we have
• How we assemble that information
• How machines and people interact with EOL
• Next steps

>1.1 million taxon pages with content
from more than 200 providers, 1000s individuals
5 million content objects

Details tab

Leafy Seadragon example

Total of 1,344,711 images 9,586 videos 28,569 sounds

EOL has Global Partners and is
internationalized
Norway
Dutch
USA Taiwan
Mexico China
Egypt
India
Costa
Rica Colombia

Peru
Australia
South Africa

From Moorea Biocode

EOL summarizes knowledge

Erosaria caputserpentis
Serpent's Head Cowrie

Depth range based on 51 specimens in 2 taxa.
Water temperature and chemistry ranges
based on 40 samples.

Environmental ranges
Depth range (m): -5 - 67
Temperature range (°C): 23.011 - 28.496
Nitrate (umol/L): 0.048 - 0.923
Salinity (PPS): 33.821 - 35.837
Oxygen (ml/l): 4.349 - 4.825
Phosphate (umol/l): 0.088 - 0.228
From GBIF Silicate (umol/l): 0.983 - 4.026 From OBIS

Erosaria caputserpentis
Serpent's Head Cowrie

Salinity envelope (n=40)

From OBIS

http://eol.org/pages/704102

Richness scores

Cynthia Parr Global Content Summit
Species Pages Group 17-19 Jan 2011

Whirlwind tour
– Big picture
– Subject semantics
– Names infrastructure
– Curation
– Richness score
• Next steps

EOL aggregates and curates
Scientific Databases, including
BHL, GBIF, ALA, INBio, COL,
Scratchpads, LifeDesks
Scientific Journals Curate

Aggregate

Comment
Rate, Collect
eol.org

Quality control

Sharing process adds semantics to content objects

SPM
DwC infoitem
description

Plinian
Core
using
Darwin Core Archive
flat files as
transport mechanism

EOL v2

Number of text objects
0 100000 200000 300000 400000 500000 600000 700000 800000

Distribution

Multiple topics
Subject of text object

Habitat

Threats

Conservation

Trends

Associations

TrophicStrategy

PopulationBiology

Migration

LifeExpectancy

Behaviour

Diseases

Content objects are associated with taxon
names

Wikimedia Commons: Physeter macrocephalus

(note we actually have over 3.3 million named pages)

Names from different providers are matched
Physeter macrocephalus

Animal Diversity Web .... Physeter catodon Linnaeus, 1758
ARKive .................. Physeter macrocephalus Linné
BioPix .................. Physeter macrocephalus L.
INBio ................... Physeter catodon
IUCN .................... Physeter Macrocephalus
ITIS .................... Physeter macrocephalus Linnaeus, 1758
MarLIN .................. Physeter macrocephalus Linné
NCBI .................... Physeter Catodon
Species 2000 ............ Physeter macrocephalus Linnaeus, 1758
Taxon Concept ........... Physeter australasianus Desmoulins, 1822
Wikimedia Commons ....... Physeter macrocephalus
WORMS ................... Physeter macrocephalus Linnaeus 1758

Taxon concept pages:
multiple hierarchies on
Names tab

Problem: one taxon may have several names

Animal Diversity Web .... Physeter catodon Linnaeus, 1758
ARKive .................. Physeter macrocephalus Linné
BioPix .................. Physeter macrocephalus L.
INBio ................... Physeter catodon
IUCN .................... Physeter Macrocephalus
ITIS .................... Physeter macrocephalus Linnaeus, 1758
MarLIN .................. Physeter macrocephalus Linné
NCBI .................... Physeter Catodon
Species 2000 ............ Physeter macrocephalus Linnaeus, 1758
Taxon Concept ........... Physeter australasianus Desmoulins, 1822
Wikimedia Commons ....... Physeter macrocephalus
WORMS ................... Physeter macrocephalus Linnaeus 1758

Problem: the same name may apply to more
than one taxon

EOL curation

• Trust or untrust taxon associations
• Add new taxon association
• Set preferred hierarchies
• Set preferred common names
• Leave comments

Coming: Taxonomic concept curation

EOL is not Wikipedia

…though we have more than 212,000 Wikipedia
articles and 115,000 Wikimedia images
Can’t currently edit within text objects

Whirlwind tour
– API
– Third party apps
– Collections and communities
• Next steps

EOL enables machine interaction

Curate

Aggregate

Comment
Rate, Collect
eol.org

API

Third party apps

Third party applications eol.org/api

People interact with EOL content & each other

Collections

Communities

Studies currently underway
with University of Maryland
• Cross-cultural study on
motivation to engage in citizen
science – Dana Rotman
• Interaction among scientists
and non-scientists on EOL’s
social network – Jae-wook Ahn
• Website traffic analysis to aid
conservation communication –
Yurong He and Bill Fagan

Using EOL collections
to get computable data
Step 1: Search on EOL for
organisms with characteristics
of interest. Add each one to an
EOL collection.
Step 2: Write a program using
EOL API methods to retrieve the
external database identifiers for
the species in that collection.
Step 3: Add to your program
code to retrieve data using
external database APIs.
Step 4: Analyze, rinse, repeat.
From Arthur Chapman

Crowd-sourcing for computable data

Lovell and Libby Langstroth, Calphotos Foodwebs.org

Efforts underway
Phylogenetic trees: Collaboration with Open Tree of Life project
for draft tree

Computable data challenge
http://eol.org/info/data_challenge
Rod Page’s Bionames project
Alexandria Archive Institute

Devries and Thessen using DBPedia Spotlight to extract
associations among taxa and add to Linked Open Data cloud

Sloan 2 project: Marine computable data

TraitBank ABI proposal

Research wishes
• Collecting nominations for research idea
where EOL can help:
http://eol.org/info/wishes_for_research
DUE 15 SEPTEMBER

• Will follow with Rubenstein Fellows call for
proposals

Thanks to
Our funders
John D. and Catherine T. MacArthur Foundation
Alfred P. Sloane Foundation
Smithsonian Institution
Marine Biological Laboratory
Harvard University
David Rubenstein
and other funders and donors

All our content providers and global partners

Volunteer curators and individual contributors via
Flickr, Wikimedia, and members of EOL

Summary of EOL page richness
Overall Hot List
• 950,000 have content • 30 % of 75K are rich
• 2 % are rich • Average richness = ~30
• ~22 % have only links
• to literature • Red Hot List
• 56 % of 3K are rich
• Average richness = 43

Long Tail in databases contributing to EOL
600000
Number of taxa for which content is contributed to EOL

500000

400000

300000

200000

100000

0
1 11 21 31 41 51 61 71 81 91 101 111 121 131

… viewed on log scale
1000000

100000

10000

1000

100

10

1
1 11 21 31 41 51 61 71 81 91 101 111 121 131

Partners in order of # taxa contributed to EOL

Taxon page richness algorithm

a (Breadth) + b (Depth) + c (Diversity)

60% 30% 10%

Breadth: Images, topics of text
objects, references, maps, videos, sounds, conservation
status

Depth: # words per text object, # words total

Diversity: Sources (partners) 0 – 100, Threshold 40

Introduction to EOL.org for scientists

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Introduction to EOL.org for scientists

Similar to Introduction to EOL.org for scientists (20)

More from Cyndy Parr

More from Cyndy Parr (20)

Recently uploaded

Recently uploaded (20)

Introduction to EOL.org for scientists

Editor's Notes