SlideShare a Scribd company logo
1 of 36
Introduction to eol.org




Cynthia Parr
Semantic reasoning workshop           @cydparr
Washington, DC 6-7 September 2012     @eol
Whirlwind tour
•   What kind of information we have
•   How we assemble that information
•   How machines and people interact with EOL
•   Next steps
>1.1 million taxon pages with content
from more than 200 providers, 1000s individuals
          5 million content objects
Details tab

Leafy Seadragon example
Total of 1,344,711 images 9,586 videos 28,569 sounds
Maps
Literature
EOL has Global Partners and is
          internationalized
                                Norway
                        Dutch
           USA                                       Taiwan
Mexico                                           China
                                Egypt
                                         India
   Costa
   Rica      Colombia

                 Peru
                                                       Australia
                          South Africa
From Moorea Biocode


                     EOL summarizes knowledge

Erosaria caputserpentis
Serpent's Head Cowrie


                          Depth range based on 51 specimens in 2 taxa.
                          Water temperature and chemistry ranges
                          based on 40 samples.

                          Environmental ranges
                           Depth range (m): -5 - 67
                           Temperature range (°C): 23.011 - 28.496
                           Nitrate (umol/L): 0.048 - 0.923
                           Salinity (PPS): 33.821 - 35.837
                           Oxygen (ml/l): 4.349 - 4.825
                           Phosphate (umol/l): 0.088 - 0.228
From GBIF                  Silicate (umol/l): 0.983 - 4.026              From OBIS
Erosaria caputserpentis
Serpent's Head Cowrie



        Salinity envelope (n=40)




                                   From OBIS
http://eol.org/pages/704102




                      Richness scores



Cynthia Parr                       Global Content Summit
Species Pages Group                17-19 Jan 2011
Whirlwind tour
• What kind of information we have
• How we assemble that information
  –   Big picture
  –   Subject semantics
  –   Names infrastructure
  –   Curation
  –   Richness score
• How machines and people interact with EOL
• Next steps
EOL aggregates and curates
Scientific Databases, including
BHL, GBIF, ALA, INBio, COL,
Scratchpads, LifeDesks
Scientific Journals                       Curate


            Aggregate

                                          Comment
                                          Rate, Collect
                                                          eol.org


                        Quality control
Sharing process adds semantics to content objects


                                    SPM
                   DwC            infoitem
                 description


                               Plinian
                                Core
                                          using
                                          Darwin Core Archive
                                          flat files as
                                          transport mechanism


                    EOL v2
Number of text objects
                                              0   100000   200000   300000   400000   500000   600000   700000   800000

                               Distribution

                            Multiple topics
Subject of text object




                                   Habitat

                                   Threats

                             Conservation

                                    Trends

                              Associations

                           TrophicStrategy

                         PopulationBiology

                                 Migration

                            LifeExpectancy

                                Behaviour

                                  Diseases
Content objects are associated with taxon
names




     Wikimedia Commons: Physeter macrocephalus




   (note we actually have over 3.3 million named pages)
Names from different providers are matched
             Physeter macrocephalus




Animal Diversity Web ....   Physeter   catodon Linnaeus, 1758
ARKive ..................   Physeter   macrocephalus Linné
BioPix ..................   Physeter   macrocephalus L.
INBio ...................   Physeter   catodon
IUCN ....................   Physeter   Macrocephalus
ITIS ....................   Physeter   macrocephalus Linnaeus, 1758
MarLIN ..................   Physeter   macrocephalus Linné
NCBI ....................   Physeter   Catodon
Species 2000 ............   Physeter   macrocephalus Linnaeus, 1758
Taxon Concept ...........   Physeter   australasianus Desmoulins, 1822
Wikimedia Commons .......   Physeter   macrocephalus
WORMS ...................   Physeter   macrocephalus Linnaeus 1758
Taxon concept pages:
multiple hierarchies on
Names tab
Problem: one taxon may have several names


Animal Diversity Web ....   Physeter   catodon Linnaeus, 1758
ARKive ..................   Physeter   macrocephalus Linné
BioPix ..................   Physeter   macrocephalus L.
INBio ...................   Physeter   catodon
IUCN ....................   Physeter   Macrocephalus
ITIS ....................   Physeter   macrocephalus Linnaeus, 1758
MarLIN ..................   Physeter   macrocephalus Linné
NCBI ....................   Physeter   Catodon
Species 2000 ............   Physeter   macrocephalus Linnaeus, 1758
Taxon Concept ...........   Physeter   australasianus Desmoulins, 1822
Wikimedia Commons .......   Physeter   macrocephalus
WORMS ...................   Physeter   macrocephalus Linnaeus 1758
Problem: the same name may apply to more
than one taxon
EOL curation

•   Trust or untrust taxon associations
•   Add new taxon association
•   Set preferred hierarchies
•   Set preferred common names
•   Leave comments

Coming: Taxonomic concept curation
EOL is not Wikipedia




 …though we have more than 212,000 Wikipedia
 articles and 115,000 Wikimedia images
 Can’t currently edit within text objects
Whirlwind tour
• What kind of information we have
• How we assemble that information
• How machines and people interact with EOL
  – API
  – Third party apps
  – Collections and communities
• Next steps
EOL enables machine interaction


               Curate


 Aggregate

                Comment
                Rate, Collect
                                eol.org


                                        API


                                Third party apps
Third party applications   eol.org/api
People interact with EOL content & each other




                                   Collections


                                  Communities
Studies currently underway
      with University of Maryland
• Cross-cultural study on
  motivation to engage in citizen
  science – Dana Rotman
• Interaction among scientists
  and non-scientists on EOL’s
  social network – Jae-wook Ahn
• Website traffic analysis to aid
  conservation communication –
  Yurong He and Bill Fagan
Whirlwind tour
•   What kind of information we have
•   How we assemble that information
•   How machines and people interact with EOL
•   Next steps
Using EOL collections
to get computable data
                       Step 1: Search on EOL for
                       organisms with characteristics
                       of interest. Add each one to an
                       EOL collection.
                       Step 2: Write a program using
                       EOL API methods to retrieve the
                       external database identifiers for
                       the species in that collection.
                       Step 3: Add to your program
                       code to retrieve data using
                       external database APIs.
                       Step 4: Analyze, rinse, repeat.
 From Arthur Chapman
Crowd-sourcing for computable data




Lovell and Libby Langstroth, Calphotos   Foodwebs.org
Efforts underway
Phylogenetic trees: Collaboration with Open Tree of Life project
for draft tree

Computable data challenge
   http://eol.org/info/data_challenge
   Rod Page’s Bionames project
   Alexandria Archive Institute

Devries and Thessen using DBPedia Spotlight to extract
associations among taxa and add to Linked Open Data cloud

Sloan 2 project: Marine computable data

TraitBank ABI proposal
Research wishes
• Collecting nominations for research idea
  where EOL can help:
    http://eol.org/info/wishes_for_research
               DUE 15 SEPTEMBER

• Will follow with Rubenstein Fellows call for
  proposals
Thanks to
Our funders
   John D. and Catherine T. MacArthur Foundation
   Alfred P. Sloane Foundation
   Smithsonian Institution
   Marine Biological Laboratory
   Harvard University
   David Rubenstein
        and other funders and donors


All our content providers and global partners

Volunteer curators and individual contributors via
  Flickr, Wikimedia, and members of EOL
Summary of EOL page richness
Overall                   Hot List
• 950,000 have content    • 30 % of 75K are rich
• 2 % are rich            • Average richness = ~30
• ~22 % have only links
• to literature           • Red Hot List
                          • 56 % of 3K are rich
                          • Average richness = 43
Long Tail in databases contributing to EOL
                                                         600000
Number of taxa for which content is contributed to EOL

                                                         500000

                                                         400000

                                                         300000

                                                         200000

                                                         100000

                                                              0
                                                                   1       11     21    31    41    51    61   71   81   91   101   111   121   131


                                                                                 … viewed on log scale
                                                         1000000

                                                          100000

                                                           10000

                                                             1000

                                                              100

                                                                  10

                                                                   1
                                                                       1    11     21    31    41    51   61   71   81   91   101   111   121   131


                                                                           Partners in order of # taxa contributed to EOL
Taxon page richness algorithm

a (Breadth)     +    b (Depth)      +    c (Diversity)

     60%                 30%                    10%


Breadth: Images, topics of text
objects, references, maps, videos, sounds, conservation
status

Depth: # words per text object, # words total

Diversity: Sources (partners)     0 – 100, Threshold 40

More Related Content

Viewers also liked

n-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法についてn-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法について
moai kids
 
نموذج الخطة العلاجية لمادة اللغة الإنجليزيه
نموذج الخطة العلاجية لمادة اللغة الإنجليزيهنموذج الخطة العلاجية لمادة اللغة الإنجليزيه
نموذج الخطة العلاجية لمادة اللغة الإنجليزيه
Omayma Abdullah
 
Principles of Adult Learning
Principles of Adult LearningPrinciples of Adult Learning
Principles of Adult Learning
Greg Consulta
 

Viewers also liked (19)

Understanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast DataUnderstanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast Data
 
A conquista da felicidade segundo grandes pensadores
A conquista da felicidade segundo grandes pensadoresA conquista da felicidade segundo grandes pensadores
A conquista da felicidade segundo grandes pensadores
 
Will Your Firm Thrive or Just Survive? The Critical Competency for Today’s Pr...
Will Your Firm Thrive or Just Survive? The Critical Competency for Today’s Pr...Will Your Firm Thrive or Just Survive? The Critical Competency for Today’s Pr...
Will Your Firm Thrive or Just Survive? The Critical Competency for Today’s Pr...
 
Оксана Козловская обвиняется в выводе денежных средств из бюджета томской обл...
Оксана Козловская обвиняется в выводе денежных средств из бюджета томской обл...Оксана Козловская обвиняется в выводе денежных средств из бюджета томской обл...
Оксана Козловская обвиняется в выводе денежных средств из бюджета томской обл...
 
Escuelas Económicas Parte 1
Escuelas Económicas Parte 1Escuelas Económicas Parte 1
Escuelas Económicas Parte 1
 
Testes com xUnit + Coding Dojo
Testes com xUnit + Coding DojoTestes com xUnit + Coding Dojo
Testes com xUnit + Coding Dojo
 
Human bionics - More Freedom or Electronic Slavery?
Human bionics - More Freedom or Electronic Slavery?Human bionics - More Freedom or Electronic Slavery?
Human bionics - More Freedom or Electronic Slavery?
 
n-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法についてn-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法について
 
Drone Jammer
Drone JammerDrone Jammer
Drone Jammer
 
Salesforce Lightning をやってみてあれこれ
Salesforce Lightning をやってみてあれこれSalesforce Lightning をやってみてあれこれ
Salesforce Lightning をやってみてあれこれ
 
13.11.12 Tech Hills #7 Playground - introduction
13.11.12 Tech Hills #7 Playground - introduction13.11.12 Tech Hills #7 Playground - introduction
13.11.12 Tech Hills #7 Playground - introduction
 
نموذج الخطة العلاجية لمادة اللغة الإنجليزيه
نموذج الخطة العلاجية لمادة اللغة الإنجليزيهنموذج الخطة العلاجية لمادة اللغة الإنجليزيه
نموذج الخطة العلاجية لمادة اللغة الإنجليزيه
 
Startup ipo leads to more startups
Startup ipo leads to more startupsStartup ipo leads to more startups
Startup ipo leads to more startups
 
Mindfulness: introducción, ideas básicas
Mindfulness:  introducción, ideas básicasMindfulness:  introducción, ideas básicas
Mindfulness: introducción, ideas básicas
 
How to make GAE adapt the Great Firewall
How to make GAE adapt the Great FirewallHow to make GAE adapt the Great Firewall
How to make GAE adapt the Great Firewall
 
ユーザー企業へのアジャイル導入四苦八苦 - エンタープライズアジャイル勉強会2016年11月セミナー
ユーザー企業へのアジャイル導入四苦八苦 - エンタープライズアジャイル勉強会2016年11月セミナーユーザー企業へのアジャイル導入四苦八苦 - エンタープライズアジャイル勉強会2016年11月セミナー
ユーザー企業へのアジャイル導入四苦八苦 - エンタープライズアジャイル勉強会2016年11月セミナー
 
BE Chemical Engineering Design Project Production Of Propylene Oxide
BE Chemical Engineering Design Project   Production Of Propylene OxideBE Chemical Engineering Design Project   Production Of Propylene Oxide
BE Chemical Engineering Design Project Production Of Propylene Oxide
 
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
 
Principles of Adult Learning
Principles of Adult LearningPrinciples of Adult Learning
Principles of Adult Learning
 

Similar to Introduction to EOL.org for scientists

Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
Cyndy Parr
 
BioOne Keynote
BioOne KeynoteBioOne Keynote
BioOne Keynote
drielinger
 

Similar to Introduction to EOL.org for scientists (20)

Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
 
EOL and Science: Yes we can!
EOL and Science: Yes we can!EOL and Science: Yes we can!
EOL and Science: Yes we can!
 
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
 
The emerging biodiversity data ecosystem
The emerging biodiversity data ecosystemThe emerging biodiversity data ecosystem
The emerging biodiversity data ecosystem
 
Shorthouse
ShorthouseShorthouse
Shorthouse
 
Species pages and portals
Species pages and portals Species pages and portals
Species pages and portals
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
BioOne Keynote
BioOne KeynoteBioOne Keynote
BioOne Keynote
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage Library
 
How the Encyclopedia of Life is wrangling organismal attribute data
How the Encyclopedia of Life is wrangling organismal attribute dataHow the Encyclopedia of Life is wrangling organismal attribute data
How the Encyclopedia of Life is wrangling organismal attribute data
 
The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informa...
The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informa...The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informa...
The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informa...
 
EOL Intro
EOL IntroEOL Intro
EOL Intro
 
The Road to TraitBank: What's Next for the Encyclopedia of Life
The Road to TraitBank: What's Next for the Encyclopedia of LifeThe Road to TraitBank: What's Next for the Encyclopedia of Life
The Road to TraitBank: What's Next for the Encyclopedia of Life
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Mla May 7
Mla May 7Mla May 7
Mla May 7
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 

More from Cyndy Parr

Parr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbagParr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbag
Cyndy Parr
 

More from Cyndy Parr (20)

Open data and the ag data commons
Open data and the ag data commonsOpen data and the ag data commons
Open data and the ag data commons
 
Ag Data Commons for AgBioData
Ag Data Commons for AgBioDataAg Data Commons for AgBioData
Ag Data Commons for AgBioData
 
Biodiversity informatics and the agricultural data landscape
Biodiversity informatics and the agricultural data landscapeBiodiversity informatics and the agricultural data landscape
Biodiversity informatics and the agricultural data landscape
 
Public access to research results at USDA
Public access to research results at USDAPublic access to research results at USDA
Public access to research results at USDA
 
Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataAg Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and data
 
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
 
Preparing for data-intensive science across domains.
Preparing for data-intensive science across domains.Preparing for data-intensive science across domains.
Preparing for data-intensive science across domains.
 
Parr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbagParr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbag
 
Ag Data Commons: Adding Value to open agricultural research data
Ag Data Commons: Adding Value to open agricultural research dataAg Data Commons: Adding Value to open agricultural research data
Ag Data Commons: Adding Value to open agricultural research data
 
Big Data Initiatives for Agroecosystems
Big Data Initiatives for AgroecosystemsBig Data Initiatives for Agroecosystems
Big Data Initiatives for Agroecosystems
 
TDWG 2014 opening talk: Chair's Welcome
TDWG 2014 opening talk: Chair's WelcomeTDWG 2014 opening talk: Chair's Welcome
TDWG 2014 opening talk: Chair's Welcome
 
Behavior ontology workshop princeton
Behavior ontology workshop princetonBehavior ontology workshop princeton
Behavior ontology workshop princeton
 
Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...
 
Using and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute dataUsing and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute data
 
Building EOL species pages
Building EOL species pagesBuilding EOL species pages
Building EOL species pages
 
Leveraging an international infrastructure: Case studies from the Encyclopeda...
Leveraging an international infrastructure: Case studies from the Encyclopeda...Leveraging an international infrastructure: Case studies from the Encyclopeda...
Leveraging an international infrastructure: Case studies from the Encyclopeda...
 
EOL China Center status
EOL China Center statusEOL China Center status
EOL China Center status
 
Western Ghats Portal
Western Ghats PortalWestern Ghats Portal
Western Ghats Portal
 
EOL's Hotlist and RedHotList
EOL's Hotlist and RedHotListEOL's Hotlist and RedHotList
EOL's Hotlist and RedHotList
 
Atlas of Living Australia
Atlas of Living Australia Atlas of Living Australia
Atlas of Living Australia
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Introduction to EOL.org for scientists

  • 1. Introduction to eol.org Cynthia Parr Semantic reasoning workshop @cydparr Washington, DC 6-7 September 2012 @eol
  • 2. Whirlwind tour • What kind of information we have • How we assemble that information • How machines and people interact with EOL • Next steps
  • 3. >1.1 million taxon pages with content from more than 200 providers, 1000s individuals 5 million content objects
  • 5. Total of 1,344,711 images 9,586 videos 28,569 sounds
  • 8. EOL has Global Partners and is internationalized Norway Dutch USA Taiwan Mexico China Egypt India Costa Rica Colombia Peru Australia South Africa
  • 9. From Moorea Biocode EOL summarizes knowledge Erosaria caputserpentis Serpent's Head Cowrie Depth range based on 51 specimens in 2 taxa. Water temperature and chemistry ranges based on 40 samples. Environmental ranges Depth range (m): -5 - 67 Temperature range (°C): 23.011 - 28.496 Nitrate (umol/L): 0.048 - 0.923 Salinity (PPS): 33.821 - 35.837 Oxygen (ml/l): 4.349 - 4.825 Phosphate (umol/l): 0.088 - 0.228 From GBIF Silicate (umol/l): 0.983 - 4.026 From OBIS
  • 10. Erosaria caputserpentis Serpent's Head Cowrie Salinity envelope (n=40) From OBIS
  • 11. http://eol.org/pages/704102 Richness scores Cynthia Parr Global Content Summit Species Pages Group 17-19 Jan 2011
  • 12. Whirlwind tour • What kind of information we have • How we assemble that information – Big picture – Subject semantics – Names infrastructure – Curation – Richness score • How machines and people interact with EOL • Next steps
  • 13. EOL aggregates and curates Scientific Databases, including BHL, GBIF, ALA, INBio, COL, Scratchpads, LifeDesks Scientific Journals Curate Aggregate Comment Rate, Collect eol.org Quality control
  • 14. Sharing process adds semantics to content objects SPM DwC infoitem description Plinian Core using Darwin Core Archive flat files as transport mechanism EOL v2
  • 15. Number of text objects 0 100000 200000 300000 400000 500000 600000 700000 800000 Distribution Multiple topics Subject of text object Habitat Threats Conservation Trends Associations TrophicStrategy PopulationBiology Migration LifeExpectancy Behaviour Diseases
  • 16. Content objects are associated with taxon names Wikimedia Commons: Physeter macrocephalus (note we actually have over 3.3 million named pages)
  • 17. Names from different providers are matched Physeter macrocephalus Animal Diversity Web .... Physeter catodon Linnaeus, 1758 ARKive .................. Physeter macrocephalus Linné BioPix .................. Physeter macrocephalus L. INBio ................... Physeter catodon IUCN .................... Physeter Macrocephalus ITIS .................... Physeter macrocephalus Linnaeus, 1758 MarLIN .................. Physeter macrocephalus Linné NCBI .................... Physeter Catodon Species 2000 ............ Physeter macrocephalus Linnaeus, 1758 Taxon Concept ........... Physeter australasianus Desmoulins, 1822 Wikimedia Commons ....... Physeter macrocephalus WORMS ................... Physeter macrocephalus Linnaeus 1758
  • 18. Taxon concept pages: multiple hierarchies on Names tab
  • 19. Problem: one taxon may have several names Animal Diversity Web .... Physeter catodon Linnaeus, 1758 ARKive .................. Physeter macrocephalus Linné BioPix .................. Physeter macrocephalus L. INBio ................... Physeter catodon IUCN .................... Physeter Macrocephalus ITIS .................... Physeter macrocephalus Linnaeus, 1758 MarLIN .................. Physeter macrocephalus Linné NCBI .................... Physeter Catodon Species 2000 ............ Physeter macrocephalus Linnaeus, 1758 Taxon Concept ........... Physeter australasianus Desmoulins, 1822 Wikimedia Commons ....... Physeter macrocephalus WORMS ................... Physeter macrocephalus Linnaeus 1758
  • 20. Problem: the same name may apply to more than one taxon
  • 21. EOL curation • Trust or untrust taxon associations • Add new taxon association • Set preferred hierarchies • Set preferred common names • Leave comments Coming: Taxonomic concept curation
  • 22. EOL is not Wikipedia …though we have more than 212,000 Wikipedia articles and 115,000 Wikimedia images Can’t currently edit within text objects
  • 23. Whirlwind tour • What kind of information we have • How we assemble that information • How machines and people interact with EOL – API – Third party apps – Collections and communities • Next steps
  • 24. EOL enables machine interaction Curate Aggregate Comment Rate, Collect eol.org API Third party apps
  • 26. People interact with EOL content & each other Collections Communities
  • 27. Studies currently underway with University of Maryland • Cross-cultural study on motivation to engage in citizen science – Dana Rotman • Interaction among scientists and non-scientists on EOL’s social network – Jae-wook Ahn • Website traffic analysis to aid conservation communication – Yurong He and Bill Fagan
  • 28. Whirlwind tour • What kind of information we have • How we assemble that information • How machines and people interact with EOL • Next steps
  • 29. Using EOL collections to get computable data Step 1: Search on EOL for organisms with characteristics of interest. Add each one to an EOL collection. Step 2: Write a program using EOL API methods to retrieve the external database identifiers for the species in that collection. Step 3: Add to your program code to retrieve data using external database APIs. Step 4: Analyze, rinse, repeat. From Arthur Chapman
  • 30. Crowd-sourcing for computable data Lovell and Libby Langstroth, Calphotos Foodwebs.org
  • 31. Efforts underway Phylogenetic trees: Collaboration with Open Tree of Life project for draft tree Computable data challenge http://eol.org/info/data_challenge Rod Page’s Bionames project Alexandria Archive Institute Devries and Thessen using DBPedia Spotlight to extract associations among taxa and add to Linked Open Data cloud Sloan 2 project: Marine computable data TraitBank ABI proposal
  • 32. Research wishes • Collecting nominations for research idea where EOL can help: http://eol.org/info/wishes_for_research DUE 15 SEPTEMBER • Will follow with Rubenstein Fellows call for proposals
  • 33. Thanks to Our funders John D. and Catherine T. MacArthur Foundation Alfred P. Sloane Foundation Smithsonian Institution Marine Biological Laboratory Harvard University David Rubenstein and other funders and donors All our content providers and global partners Volunteer curators and individual contributors via Flickr, Wikimedia, and members of EOL
  • 34. Summary of EOL page richness Overall Hot List • 950,000 have content • 30 % of 75K are rich • 2 % are rich • Average richness = ~30 • ~22 % have only links • to literature • Red Hot List • 56 % of 3K are rich • Average richness = 43
  • 35. Long Tail in databases contributing to EOL 600000 Number of taxa for which content is contributed to EOL 500000 400000 300000 200000 100000 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 … viewed on log scale 1000000 100000 10000 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 Partners in order of # taxa contributed to EOL
  • 36. Taxon page richness algorithm a (Breadth) + b (Depth) + c (Diversity) 60% 30% 10% Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status Depth: # words per text object, # words total Diversity: Sources (partners) 0 – 100, Threshold 40

Editor's Notes

  1. Whirlwind tour to EOLAs you may know, Encyclopedia of Life is a web site providing global access to knowledge about life on earth.Global – the whole worldAccess – free, and freely re-usableKnowledge – synthesized, not rawLife on Earth – biological diversity
  2. My goals are to give you the whirlwind tour with enough information to ring some bells in areas that might be of interest to you, and inspire you to ask deeper questions
  3. I want to emphasize that EOL deals in summarized knowledge, not raw specimen data. For example, for the serpents head cowrie, we have images like this from the Mooreabiocode project, but instead of serving the individual specimen data, we get the overall distribution of specimen data on a map from GBIF. We also get a summary of environmental data associated with specimens in the Ocean Biogeographic Information System database. Imagine if we could do a summary like this across databases.
  4. This is a graphical way of presenting the summarized data from OBIS, which Jen Hammock on my staff worked on with Edward Van den berghe and our team at the Marine Biological Lab. The salinity range for the species is shown here as just a smal, specific slice of the global ocean minimum and maximums.Looking just at 15 content providers we already work with, it is possible that numeric data such as lifespan or average body weight is already available for more than 800,000 species
  5. EOL takes information from about 200 sources so far, mostly scientific databases, but also including Flickr and Wikipedia, and automatically sorts it onto on taxon pages. Our curators can then trust or untrust it, or anybody can provide comments or ratings. About a thousand credentialed scientists have already volunteered to help with quality control. Actions and comments get fed back to the original providers, and the material on EOL is also available to other applications via an Application Programming Interface, which I’ll talk more about in a moment.We’re partnering with over two hundred scientific databases as well as public conribution sites like Flickr and Wikipedia.100+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages500 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views
  6. ExtensionLeveraging strengths
  7. EOL takes information from about 200 sources so far, mostly scientific databases, but also including Flickr and Wikipedia, and automatically sorts it onto on taxon pages. Our curators can then trust or untrust it, or anybody can provide comments or ratings. About a thousand credentialed scientists have already volunteered to help with quality control. Actions and comments get fed back to the original providers, and the material on EOL is also available to other applications via an Application Programming Interface, which I’ll talk more about in a moment.We’re partnering with over two hundred scientific databases as well as public conribution sites like Flickr and Wikipedia.100+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages500 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views
  8. Free for third party applications, as long as licenses are respectedField guidesMobile applicationWeb page widget
  9. Please see me afterwards if you are interested in any of these topics
  10. We have a feature where users can create customized collections of pages or objects on EOL.A scientist could search for a characteristic, say, red flowers, and create a collection of those taxa. Actually, we’ve been doing this with blue coloration in the “Life is blue” collection. If you wanted to test what might be driving the evolution of coloration, you could write a program that uses EOL to get all the Genbank IDs for those species identifiers or some other EOL partner that we’ve mapped to each of those taxon pages, and then use those to go to that database and pull raw data to analyze. For example, genetic sequences, or specimen locations. In the future we hope to make step 2 and step 3 even easier, so you might just be able to click a button and download lots of raw data for your collection from certain data sources.
  11. You can also use EOL for crowd-sourcing. For example, Jennifer Hammock has started a collection called “Mystery associates” and asked people to try to identify the partners shown in photos that have some sort of ecological association. When they’ve been identified, like this sea star and anemone predation interaction, then she moves the image to the “known associates” collection. This adds to the information we have from a bunch of partners on food web interactions, and then would be available for foodweb modelers. There are many other possible ways that the large crowds on EOL could be harnessed to generate new datasets from EOL content. And this is all possible to some degree now.
  12. For the future, we are working on a few new angles. First, we are working to get a more phylogenetic organization available on EOL, because that will definitely help those who are doing comparative analyses and who want a true evolutionary framework. The deadline for submitting a large tree is this weekend, Monday really. The second challenge is to propose research work using computable data and EOL in some concrete way. Perhaps as I suggested with using collections to harvest computable data or perhaps using text mining. Here the deadline is next month for the idea, and then we’re providing funds to accomplish the pilot project over the next year.Finally, in September here in Washington we’re bringing in computer scientists and biologists who have an interest in broad scale data-intensive science using biodiversity data. We expect this to lead to other projects and enhancements of the EOL platform.All this could, in my personal opinion, lead up to EOL beginning to serve as The Smithsonian’s phenotype repository. Parallel with genbank, we could be the initial point of entry for ecologists or other biologists seeking large-scale structured information about the observable characteristics of organisms.
  13. Also note that there is an implication that a “rich page” is a “high quality page” – not necessarily true but often it is.As EOL goes forward with our version 2 we’ll be gathering other inputs that can tell us if a page is successful – ratings of its objects, for example. The numbers in yellow are definitely out of date
  14. Inspired by community ecology & measures of species diversity, which of course were originally inspired by information theory, but we haven’t used those measures. Instead we put together these factors in a way that we could assign weights to different factors based on how well they capture “a rich page”We sampled dozens of pages and had team members assess them for their gestalt “richness” based on their own criteria. Then we compared those scores to those generated by the algorithm, and iteratively changed weights until we achieved a set of weights that appeared to reflect human perception of “richness.”Note that there’s a penalty that unvetted material is only worth about 75% of vetted materialAlso there are maximums for many of these input values – having 200 images may not make a page much more rich than having 25 images.Reserve the right to change this to ensure that the index is as useful as possible. Like Google PageRank, want to ensure that nobody can game the system.