Writing The Encyclopedia Of Life (not EoL.org)
Upcoming SlideShare
Loading in...5
×
 

Writing The Encyclopedia Of Life (not EoL.org)

on

  • 2,750 views

A presentation given at the Stockholm Biodiversity Informatics course on 16th Sept. 2009

A presentation given at the Stockholm Biodiversity Informatics course on 16th Sept. 2009

Statistics

Views

Total Views
2,750
Slideshare-icon Views on SlideShare
2,747
Embed Views
3

Actions

Likes
1
Downloads
38
Comments
0

1 Embed 3

http://vsmith.info 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The mission - to describe and document the diversity of life Inventory the Earth’s species Document their relationships “ Publish” these data
  • Problems with the current model Bias in what we are work on - taxonomy is parochial, requires specialists
  • NHM Struggled to find someone describing a new species for DC2.

Writing The Encyclopedia Of Life (not EoL.org) Writing The Encyclopedia Of Life (not EoL.org) Presentation Transcript

  • Vincent S. Smith Writing the Encyclopedia of Life
  • Background 1 The big picture of biodiversity research Goal…
    • Inventory the Earth’s species
    • Document their relationships
    • Publish & apply these data
    Data set…
    • 1.8M described species (10M names)
    • 300M pages (over last 250 years)
    • 1.5-3B specimens
    People…
    • 4-6,000 scientists
    • 30-40,000 amateurs
    • Many more citizen scientists?
  • Background 2 The process of biodiversity research Parochial…
    • Specialised
    • Experts
    • Fragmented & distributed
    How do we integrate the BIG with the small? Methodological…
    • Communities of practice
    • Hard to record & update
    • High output but low impact
    Different…
    • Data
    • Interpretations
    • Methods
  • 250 yr progress report
    • At present rates most species will be
    extinct long before we describe them The story so far…
    • Up to 87% of life on Earth is
    still undescribed
    • 6% of biodiversity scientists cover
    80% of the worlds biodiversity 250 yrs 1000 yrs!!! ? 1758 2008 3008 Bacteria 9021 Spp Archaebacteria 259 Spp. Plants 260k spp. Animals 1.18 M spp. Other 193k spp. Fungi 101k 250 year and counting!
  • Taxonomic effort Bacteria 9021 Spp Archaebacteria 259 Spp. Plants 260k spp. Animals 1.18 M spp. Other 193k spp. Fungi 101k 1.8 million species
  • Taxonomic effort Crusta- ceans 39k Birds 10k Reptiles 7.1k Mammals 5k Amphib .5k Sponges 10k Cnidarians 9k Rotifers 1.8k Flatworms 13.7k Insects 0.82 M spp. Molluscs 117 k Fish 25k Bacteria 9021 Spp Archaebacteria 259 Spp. Plants 260k spp. Animals 1.18 M spp. Other 193k spp. Fungi 101k 1.8 million species
  • Taxonomic effort Crusta- ceans 39k Birds 10k Reptiles 7.1k Mammals 5k Amphib .5k Sponges 10k Cnidarians 9k Rotifers 1.8k Flatworms 13.7k Insects 0.82 M spp. Molluscs 117 k Fish 25k Bacteria 9021 Spp Archaebacteria 259 Spp. Plants 260k spp. Animals 1.18 M spp. Other 193k spp. Fungi 101k Beetles 370k spp. Flies 85k spp. Butterflies & moths 165k spp. Bees, wasps & ants 198k spp. 0.01 papers per species per year i.e 1 paper every 100 years Birds: 1 paper per species per yr. Mammals: 2 papers per species per yr. Elephants: 47 papers per species per yr. 1.8 million species
  • 1,000’s of journals addressing a common set of questions
    • What is a species?
    • How many species are there?
    • Where are species distributed?
    • How have species distributions changed?
    • How are species related?
    • How have species characters changed?
    • To what extent is are species relationships predictive?
    DATA “ Paper minds” Traditional publication
  • 1,000’s of journals addressing a common set of questions Mol. Phyl. Evol. 21,964 pp. since 2000
    • What is a species?
    • How many species are there?
    • Where are species distributed?
    • How have species distributions changed?
    • How are species related?
    • How have species characters changed?
    • To what extent is are species relationships predictive?
    “ Paper minds” Traditional publication Menopon gallinae Numidicola antennatus Amyrsidea ventralis Somaphantus lusius Menacanthus stramineus Colimenopon urocolius Trinoton anserinum Meromenopon meropis Gruimenopon longum Hoazineus armiferus Copocephalum zebra Comatomenopon elbeli/elongatum Psittacomenopon poicephalus Odoriphila clayae/phoeniculi Ardeiphilus trochioxus Cuculiphilus fasciatus Ciconiphilus quadripustulatus Eomenopon denticulatum Piagetiella bursaepelecani Osborniella crotophagae Hohorstiella lata Neomenopon pteroclurus Machaerilaemus laticorpus/latifrons Austromenopon crocatum Eidmanniella pellucida Holomenopon brevithoracicum Dennyus hirundinis Myrsidea victrix Ancistrona vagelli Pseudomenopon pilosum Bonomiella columbae Chapinia robusta Plegadiphilus threskiornis Actornithophilus uniseriatus MEGAMENOPON Rediella mirabilis Latumcephalum lesouefi/macropus Paraboopia flava Paraheterodoxus insignis Boopia tarsata Therodoxus oweni Laemobothrion maximum Ricinus fringillae Trochiliphagus abdominalis Trochiloecetes rupununi Liposcelis bostrychophilus
  • 1,000’s of journals addressing a common set of questions
    • What is a species?
    • How many species are there?
    • Where are species distributed?
    • How have species distributions changed?
    • How are species related?
    • How have species characters changed?
    • To what extent is are species relationships predictive?
    “ Species Name” The universal linker “ Paper minds” Traditional publication RAW DATA > Logically interconnected but presently fragmented by the publication process Other problems…
    • Time & money
    • Audience mismatch
    • Findability & reusability
  • Looking within a paper Data mining publications 2. Extract text (OCR) 3. Find keywords 1. Scan Palma, R.L., and R.L.C. Pilgrim. 2002. A revision of the genus Naubates (Insecta: Phthiraptera: Philopteridae). J. R. Soc. N.Z. 32:7-60.
    • Taxonomic names
    • Author names
    • Citations
    • Collection data
    • Morphological data
    • Descriptions
    • Identification keys
    • Illustrations
    • Photographs
  • 2. Extract text (OCR) 3. Find keywords 1. Scan 4. Index 5. Annotate online Palma, R.L., and R.L.C. Pilgrim. 2002. A revision of the genus Naubates (Insecta: Phthiraptera: Philopteridae). J. R. Soc. N.Z. 32:7-60. Looking within a paper Data mining publications
    • Taxonomic names
    • Author names
    • Citations
    • Collection data
    • Morphological data
    • Descriptions
    • Identification keys
    • Illustrations
    • Photographs
  • How do we bring this all together?
    • Technical issues
    • Social issues
    • Needs to scale (web)
    • Needs to be sustainable
    People “ Publications” Specimens ?
  • Technical issues 1 Data standards
    • TDWG (since 1986)
    • GBIF
    • Bridging computer science & biology
    • Its not science!
    • “ Standards” can mean many things:
    • Data exchange standards (e.g. Darwin Core)
    • Common restricted vocabularies (Sp.2000 classification)
    • Programming standards
    • Data quality
  • Technical issues 2 Platforms
    • Generic databases with custom interfaces (MySQL, Oracle)
    • (e.g. Species 2000, IPNI)
    • Bespoke (usually commercial) databases
    • (e.g. KeEMU, Biota)
    • Content Management Systems & blogging platforms
    • such as Drupal, Plone, Wordpress etc
    • (e.g. EOL’s LifeDesks, GBIF websites)
    • Wikis such as Mediawiki, Semantic Mediawiki
    • (e.g. Wikipedia, iTaxon)
  • Technology moves fast! Technical issues 2 Platforms
  • Technical issues 2 Platforms - common design considerations
    • Need scalable and flexible platforms that support:
    • large numbers of users as passive readers and active contributors
    • editorial hierarchies serving individual and community needs
    • the epistemological richness and diversity of all contributors
    • flexible data models that can be modified or added by contributors
    • automated integration of third party content
    • automated semantic enrichment of contributed and 3rd party content
    • content workflows and curation tools
    • content archival and citation
    • content licensing and a conditions of use framework
    • web services
    • ease of use
  • Technical issues 3 Web services (integration hacks)
  • Social issues 1 The community
    • Taxonomy as a team sport
    • (Community size and the community of one)
    • Networking effects
    • (quality, multi-disciplinarity and utility of data)
    • The rise and rise of the “amateur”
    • Cost of professionals
    • Top down and bottom up organization
    • (how to partition the community)
    • Bottom up benefits, low transaction costs
    • (social information flows, motivation and relations self organize the group)
    • Support epistemological richness
    • Collaborative output, peer review, credit
    • (incentives)
  • Social issues 2 Nationalism / Politics
    • Convention on Biological Diversity, 1992
    • Biodiversity does not respect national boundaries
    • Biodiversity questions do not respect national boundaries
    • Funding is (usually) national / regional
    • Benefits are expected to be national
    • Often don’t match the questions we want to address
    • Politics amongst researchers and institutions (e.g. EDIT and Lifewatch)
    • Good politicians and not always good scientists
  • Social issues 3 Incentives
    • Article citation (most common method of peer recognition)
    • Influences authors employment, reputation and research opportunities
    • Traditional metrics of scholarly activity (no. papers, impact factor, H-Index)
    • Taxonomy is not usually high impact, but has a long half life
    • High cost of traditional publication (unaffordable to authors & libraries)
    • Lessons from Zootaxa (low cost, high volume) and Wikipedia (highly linked)
  • Social issues 4 Licensing
    • Mickey mouse, copyright and 1923
    • Copyright transfer agreements
    • As of 2009 half of all taxonomic treatments are in copyright
    Publications on ants
  • Social issues 4 Licensing
    • Who owns your work (your employer?)
    • Branding and credit
    • Creative Commons
    • Open Access
    • Open Science (making science more accountable)
  • Social issues 5 Human Computer Interactions
  • Technical solutions & social models Current options for writing the Encyclopedia of Life
    • “ New” scholarly publishing (semantic enrichment of publications)
    • One database to rule them all - the Common Data Model (CDM)
    • EOL.org, ToL.org & related initiatives
    • Wikipedia / Wikispecies
    • Scratchpads / LifeDesks
  • Encyclopedia of Life (EOL) “ A web page for every species ” http://www.eol.org/
    • A web page for all 1.8M species
    • Multi-institution collaboration
    • $50m funding (5 years)
    - MacArthur and Sloan Foundations
    • Megascience mashup
    - Aggregating data from the web
    • 10 years to complete
    - First draft 2008, “finished” 2017!
    • Multiple audiences
    - Science & outreach
  •  
  • Encyclopedia of Life (EOL) “ A web page for every species ”
    • First draft 27 Feb. 2008
    - 24 “exemplar” pages - 30,000 detailed pages (fish & amphib.) - 1 million “stubs” (names & links)
    • Huge interest
    - 11.5 million hits in first 5 hours - 500+ press articles - Pages unavailable for first two days! - Growth (needs 1,000 spp. per day)
    • Much praise but growing criticism
    - Quality vs. quantity of information - Authoritative “vetting” process - Credit for “authors”
    • Eight more years to go
    - Get more content online - Better tools to engage more people
  • What is a Scratchpad? A website for you & your community Your data 1 Published & reviewed on your site 3 Uploaded & tagged 2
  • What is a Scratchpad? A website for you & your community Your data 1 Published & reviewed on your site 3 Uploaded & tagged 2 Fast Intuitive Fit for use
  • What can Scratchpads do? Import, manage, search & browse: DNA & Phylogenies Literature Specimens Images
  • DNA & Phylogenies Specimens Literature Images What can Scratchpads do? Integration & connectivity within & between sites Taxonomy
  • +Administration -Change your site information -Change you front page -Change your logo -Activity and access logs +Backup -Backing up your data -Restoring your data +Bibliography -Creating a record -Importing from a ref. manager -Exporting to a reference manager +Blog -Creating and adding a blog +Custom Content -Defining a CCK -Importing from a spreadsheet -Creating a custom view +Fileshare -Creating and using a fileshare +Forum -Altering the forum settings -Creating a container for a forum -Creating a new forum -Creating a new topic inside a forum +Groups -Creating a group -Subscribing to a group +Image -Uploading & basic annotation -Linking image & location records -Linking image & specimen records -Linking image & publication records -Overlay annotations on images +Layout -Change your theme -Menus -Blocks and sidebars +Locations -Creating a record -Importing from a spreadsheet +Pages -Creating, editing, cloning & deleting -Configuring the panels template +Panels -Adding & configuring content -Creating a new panel -Citing a Panels page +Phylogeny -Adding a phylogenetic tree +Specimens -Creating a record -Importing from a spreadsheet -Linking specimen & location records -Linking specimen & pub. records +Tasks -Creating a tasklist +Taxonomy -Importing from a spreadsheet -Importing from ClassificationBank -Starting from scratch -Taxonomy manager -Displaying a classification -Adding names -Deleting names -Taxonomy & panels +Users -Your settings -Adding a new user -User roles and permissions -Adding and editing user profile fields -Logging in +Webform -Creating and using webforms What can Scratchpads do? In summary:
  • What can Scratchpads do? Visual taskguide
  • Current Scratchpads Ants Bees Beetles Big-headed flies Birds Blackflies Ciliates Cockroaches Dragon Trees Dung Beetles False Buttonweed Flat worms Flies Foraminifera Fossil Insects Fungus Gnats Holometabola Leaf-miner Flies Lice Lichens of Bermuda Malvaceae Megalastrum ferns Milichiid flies Mosquitoes Mosses Nannotax fossils Nepticuloid moths Palms Pearl oysters Polychaete worms Scaleworms Termites Triticid grasses Weevils Wood Ferns Sulawesi Ferns Stick insects Sites: 130+ Users: 1500+ Pages: 170k Since March 2007
  • Scratchpad applications A multipurpose, flexible technology 4th Edition Howard & Moore, Birds of the world ( fact checking, data compilation, 2010, funding ) eBooks
  • European Mosquito Bulletin (ISSN 1460-6127), Phasmid Studies (ISSN 0966-0011) ( submission, review, & dissemination of articles ) eJournals Scratchpad applications A multipurpose, flexible technology
  • Image galleries Nanno fossils, Cockroaches, Stick insects, Flatworms, Grasses, Lichens & many more… ( rapid upload, annotation, & display of images ) Scratchpad applications A multipurpose, flexible technology
  • GBIF, Zootaxa, Threatened Plants of the World (Kew), BarCoVer (DNA Barcoding) & more ( space for data collection, services, discussion, & organization ) Societies & Organizations Scratchpad applications A multipurpose, flexible technology ZOOTAXA A rapid international journal for animal taxonomists ISSN 1175-5326 (Print Edition) & ISSN 1175-5334 (Online Edition)
  • How do Scratchpads work? Getting a Scratchpad http://scratchpads.eu/apply
    • Biological focus
    • Agree to T&C’s (click-thru)
    • CC license “by-nc-sa”
    Requirements
    • Maintainer
    • Scope/Mission/API Keys
    • (Sub)domain name
    Application Content
    • Unrestricted (overlapping)
    • No branding (focus on authors)
    • Value added
  • Using a Scratchpad How do Scratchpads work?
    • User categories (maintainer, ed. contrib.)
    • Public / private content (flexible groups)
    • Admin. page (site settings & behavior)
    Management
    • Content types (biblio, maps, “page” etc)
    • Forms, managers, Excel, EndNote etc
    • Custom content (add or extend data types)
    Data Input Tagging (indexing)
    • Taxonomy terms (2M +)
    • Multiple classifications
    • Auto-tagging
  • Autotagging Indexing data to make it findable 1. Create content 2. Find terms 3. Submit (Index) (Autotag) (e.g. reference) Journal citation mentions taxon name
  • 1. Create content 2. Find terms 3. Submit (Index) (Autotag) (e.g. reference) Autotagging Indexing data to make it findable Matches taxonomy term (Drag & Drop)
  • 1. Create content 2. Find terms 3. Submit (Index) (Autotag) (e.g. reference) Autotagging Indexing data to make it findable Page tagged (indexed) with taxon name
  • Indexing data to make it findable How do Scratchpads work?
    • Tagged data can be
    presented differently
    • For example as part of
    a traditional bibliography
    • Or as small windows
    or “panels” of data
  • Integrating data & “publishing” in a Scratchpad How do Scratchpads work? Types of Scratchpad Panel… Taxonomic hierarchies Files and documents Phylogenetic trees Customized content Specimen records Photographs & illustrations Personalized instructions Common names Bibliographic literature Built with “tagged data”
  • Dynamically built species pages Integrating data & “publishing” in a Scratchpad How do Scratchpads work?
  • Browsed through a taxonomy Integrating data & “publishing” in a Scratchpad How do Scratchpads work?
  • Including 3 rd party content Integrating data & “publishing” in a Scratchpad How do Scratchpads work?
  • With data curation tools Integrating data & “publishing” in a Scratchpad How do Scratchpads work?
  • Listing all “authors” Integrating data & “publishing” in a Scratchpad How do Scratchpads work?
  • Dated, permanent & citable Integrating data & “publishing” in a Scratchpad How do Scratchpads work?
  • Choose which panels to display Adjusting the panels layout How do Scratchpads work?
  • An example based on the Catalogue of Life classification How do Scratchpads work? 2 million taxon pages Open curation at http://catlife.myspecies.info
  • Questions?
  •  
  • Scratchpad management Scalable & sustainable technology Virtual machine, open-source software, self-archiving, backed-up, multi-site configuration ( easy to move & upgrade, secure & reliable, citable, screencasts, low admin., low marginal costs ) Hardware, software & user support
  •