Global content summit: Overview, content partnering, richness

Cynthia Parr Global Content Summit
Species Pages Group 17-19 Jan 2011

http://www.eol.org
• All species known to science
• Freely accessible: open
access, open source
• Available from a single portal
in a common format
• Quality
• Constantly growing
• Aimed at multiple audiences

GBIF
EOL Global Partners ViBRANT

Dutch
Pan- China
Mexico Arab
India
Costa
Rica Colombia

Peru
Australia
South Africa

BHL-
Global
BHL

Aims of global partners
Global access to knowledge about life on Earth
To increase awareness and understanding of living
nature through an Encyclopedia of Life that
gathers, generates and shares knowledge in an
open, freely accessible and trusted digital resource

Work together towards this vision and mission, sharing
expertise and knowledge as appropriate
Expand the global pool of knowledge about biodiversity and
improve access to it

Aims of this workshop
• Gather content experts from Global Partners
• Become familiar with each other’s work
• Learn how core EOL works and provide
feedback on it
• Form the Species Pages Working Group
Team at Smithsonian (SPG)
Representatives from global partners
• Draft individual plans that complement each
other towards a common goal
• Remind ourselves WHY we want to do this

What is content?
Biological information
Names and hierarchies
Descriptive text
Literature
Multimedia
Maps
Links to more information
…..what about comments, collection annotations?

Overview of agenda

Day 1: Introductions
Day 2: Sharing
Day 3: Planning

Acknowledgements
• Funding from:
David M. Rubenstein gift
John D. and Catherine T. MacArthur Foundation
Alfred P. Sloane Foundation
Smithsonian Institution
Marine Biological Laboratory
Harvard University
and other funders and donors
• All our content partners and global partners
• Volunteer curators and individual contributors via Flickr, Wikimedia,
and members of EOL
• All of you for coming
• Claire Badgley

Overview of Content Partnering


EOL is a content curation
community

Databases
Journals
LifeDesks & Scratchpads
Curate
Public contributions

Aggregate

Comment
Rate, Collect
eol.org

Quality control, prioritization API

Third party apps

http://eol.org/content_partners

http://eol.org/info/content_partner_collections

Low hanging fruit

Photo credit: Stanislas PERRIN

Partner trajectory
150

125
Number of partners

100

75

50

25

0
Y1Q3 Y1Q4 Y2Q1 Y2Q2 Y2Q3 Y2Q4 Y3Q1 Y3Q2 Y3Q3 Y3Q4 Y4Q1 Y4Q2 Y4Q3

Long Tail in databases contributing to EOL
Number of taxa for which content is contributed to EOL
600000

500000

400000

300000

200000

100000

0
1 11 21 31 41 51 61 71 81 91 101 111 121 131

… viewed on log scale
1000000

100000

10000

1000

100

10

1
1 11 21 31 41 51 61 71 81 91 101 111 121 131

Partners in order of # taxa contributed to EOL

Content strategy
Highlights
Priorities
Richness score
Processes
Goals

Content Partner process overview
Partner creates an EOL member account
Adds a content partner
We communicate with them
They (or we) upload a resource file or set a
URL where one can be found
They set a harvest frequency
EOL harvests at that frequency

Current methods of data transfer
EOL resource document (XML) (usually they do
the work)
Spreadsheet upload (either can do the work)
Connector (we do the work)
Scrape web site or PDF
Use web services
Work from a copy of DB
Darwin Core Archive (classifications, soon)
See http://eol.org/info/cp_resource_checklist

How EOL gets content n=141 partners
70

60

50

40
CSV
30 web
service
20
PDF
10 HTML
DB
0
XML resource doc Connector LD/eLD/Scratchpad
LD/eLD/Scratchpad Spreadsheet

Example partner
• Pensoft has a
process to generate
EOL-compliant XML
for new species
• Also sends images to
Morphbank,
specimens to GBIF
• They registered the
URL at EOL
• Our script checks for
changes once a day

EOL Schema Sources

Content type Standards used
Taxa Darwin Core Archive
Attribution & licensing Dublin & Darwin Core
Text objects & links Species Profile Model(and
Multimedia now +)
Dublin (+ Audubon Core)

Example biological content
EOL Table of Contents TDWG Species Profile
Model
Physical Description › Morphology #Morphology
Physical Description › Size #Size
Ecology › Habitat #Habitat
Ecology › Associations #Associations
Life History & Behavior › Life Expectancy #LifeExpectancy
Evolution and Systematics › Functional #Evolution
Adaptations
Conservation > Conservation Status #ConservationStatus
Molecular Biology and Genetics › Genetics #Genetics
Molecular Biology and Genetics › Genome #MolecularBiology
Molecular Biology and Genetics › Molecular #MolecularBiology
Biology
Nucleotide Sequences #MolecularBiology

SPM
DwC infoitem
description

Plinian
Core
using
Darwin Core Archive
flat files as
transport mechanism

EOL v2

Controlled
vocabulary
Numeric
values

Relations

EOL v3?

Partners
Can delete or replace any of their objects
Control how often we harvest, and can force a harvest
Get an automatically updating collection
Can request that we use their classification for browsing
Can change the logo and description of their project
Receive comments and curator actions immediately
Receive monthly reminders they can get traffic statistics
Get many links back to their original web resources

Partners cannot

Publish the very first time
Decide if they are pre-vetted
Roll back a harvest
Change the object of any other partners
Change classifications from any other
partners

http://eol.org/pages/704102

Richness scores


Taxon page richness algorithm

a (Breadth) + b (Depth) + c (Diversity)

60% 30% 10%

Breadth: Images, topics of text objects, references, maps,
videos, sounds, conservation status

Depth: # words per text object, # words total

Diversity: Sources (partners)
0 – 100, Threshold 40

Summary of EOL page richness
Overall Hot List
950,000 have content 30 % of 75K are rich
2 % are rich Average richness = ~30
~22 % have only links
to literature Red Hot List
56 % of 3K are rich
Average richness = 43

How richness is used
Choose images for home page “March of Life”
Allows sorting in collections Weird life example
Helps provide best search and API results

Any other ideas? Could we be matchmakers for
pages needing enrichment and users?

http://synthesis.eol.org/media/treemap

Strategies for improving richness
Crowd-sourcing Leveraging
Collections Enabling platforms
Communities Enabling journals
Mobile apps Data mining BHL etc.

The page richness index

Helps fill gaps with existing knowledge
Helps prioritize funding and training so that it
has maximum impact on closing true gaps
Will be available via API

Computing and storing richness index on
EOL is a step towards storing and serving
computable data

Global content summit: Overview, content partnering, richness

Recommended

Recommended

More Related Content

Similar to Global content summit: Overview, content partnering, richness

Similar to Global content summit: Overview, content partnering, richness (20)

More from Cyndy Parr

More from Cyndy Parr (20)

Recently uploaded

Recently uploaded (20)

Global content summit: Overview, content partnering, richness

Editor's Notes