Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cynthia Parr          Global Content SummitSpecies Pages Group   17-19 Jan 2011
http://www.eol.org• All species known to science• Freely accessible: open  access, open source• Available from a single po...
GBIFEOL Global Partners                                 ViBRANT                      Dutch                                ...
Aims of global partners   Global access to knowledge about life on Earth   To increase awareness and understanding of livi...
Aims of this workshop• Gather content experts from Global Partners• Become familiar with each other’s work• Learn how core...
What is content?Biological information   Names and hierarchies   Descriptive text   Literature   Multimedia   Maps   Links...
Overview of agendaDay 1: IntroductionsDay 2: SharingDay 3: Planning
Acknowledgements• Funding from:    David M. Rubenstein gift    John D. and Catherine T. MacArthur Foundation    Alfred P. ...
Overview of Content PartneringCynthia Parr               Global Content SummitSpecies Pages Group        17-19 Jan 2011
EOL is a content curation                 communityDatabasesJournalsLifeDesks & Scratchpads                               ...
http://eol.org/content_partners
http://eol.org/info/content_partner_collections
Low hanging fruit                    Photo credit: Stanislas PERRIN
Partner trajectory                     150                     125Number of partners                     100              ...
Long Tail in databases contributing to EOLNumber of taxa for which content is contributed to EOL                          ...
Content strategyHighlightsPrioritiesRichness scoreProcessesGoals
http://eol.org/info/partners
Content Partner process overviewPartner creates an EOL member accountAdds a content partnerWe communicate with themThey (o...
Current methods of data transferEOL resource document (XML) (usually they do  the work)Spreadsheet upload (either can do t...
How EOL gets content n=141 partners70605040                                    CSV30                                  web ...
Example partner• Pensoft has a  process to generate  EOL-compliant XML  for new species• Also sends images to  Morphbank, ...
EOL Schema SourcesContent type              Standards usedTaxa                      Darwin Core ArchiveAttribution & licen...
Example biological contentEOL Table of Contents                        TDWG Species Profile                               ...
SPM  DwC            infoitemdescription              Plinian               Core                         using             ...
Controlled              vocabularyNumeric values          RelationsEOL v3?
PartnersCan delete or replace any of their objectsControl how often we harvest, and can force a harvestGet an automaticall...
Partners cannotPublish the very first timeDecide if they are pre-vettedRoll back a harvestChange the object of any other p...
http://eol.org/pages/704102    Richness scoresCynthia Parr                  Global Content SummitSpecies Pages Group      ...
Taxon page richness algorithma (Breadth)     +    b (Depth)      +    c (Diversity)     60%                 30%           ...
Summary of EOL page richnessOverall                 Hot List950,000 have content    30 % of 75K are rich2 % are rich      ...
How richness is usedChoose images for home page “March of Life”Allows sorting in collections Weird life exampleHelps provi...
http://synthesis.eol.org/media/treemap
Strategies for improving richnessCrowd-sourcing    LeveragingCollections       Enabling platformsCommunities       Enablin...
The page richness indexHelps fill gaps with existing knowledgeHelps prioritize funding and training so that it has maximum...
Global content summit: Overview, content partnering, richness
Upcoming SlideShare
Loading in …5
×

Global content summit: Overview, content partnering, richness

3,343 views

Published on

These are Cyndy Parr's presentations at the EOL Global Partner Summit, starting with an overview of the meeting, and including an overview of how we set up content partnerships, and how we calculate and use page richness scores.

Published in: Technology

Global content summit: Overview, content partnering, richness

  1. Cynthia Parr Global Content SummitSpecies Pages Group 17-19 Jan 2011
  2. http://www.eol.org• All species known to science• Freely accessible: open access, open source• Available from a single portal in a common format• Quality• Constantly growing• Aimed at multiple audiences
  3. GBIFEOL Global Partners ViBRANT Dutch Pan- ChinaMexico Arab India Costa Rica Colombia Peru Australia South Africa BHL- Global BHL
  4. Aims of global partners Global access to knowledge about life on Earth To increase awareness and understanding of living nature through an Encyclopedia of Life that gathers, generates and shares knowledge in an open, freely accessible and trusted digital resourceWork together towards this vision and mission, sharingexpertise and knowledge as appropriateExpand the global pool of knowledge about biodiversity andimprove access to it
  5. Aims of this workshop• Gather content experts from Global Partners• Become familiar with each other’s work• Learn how core EOL works and provide feedback on it• Form the Species Pages Working Group Team at Smithsonian (SPG) Representatives from global partners• Draft individual plans that complement each other towards a common goal• Remind ourselves WHY we want to do this
  6. What is content?Biological information Names and hierarchies Descriptive text Literature Multimedia Maps Links to more information…..what about comments, collection annotations?
  7. Overview of agendaDay 1: IntroductionsDay 2: SharingDay 3: Planning
  8. Acknowledgements• Funding from: David M. Rubenstein gift John D. and Catherine T. MacArthur Foundation Alfred P. Sloane Foundation Smithsonian Institution Marine Biological Laboratory Harvard University and other funders and donors• All our content partners and global partners• Volunteer curators and individual contributors via Flickr, Wikimedia, and members of EOL• All of you for coming• Claire Badgley
  9. Overview of Content PartneringCynthia Parr Global Content SummitSpecies Pages Group 17-19 Jan 2011
  10. EOL is a content curation communityDatabasesJournalsLifeDesks & Scratchpads CuratePublic contributions Aggregate Comment Rate, Collect eol.org Quality control, prioritization API Third party apps
  11. http://eol.org/content_partners
  12. http://eol.org/info/content_partner_collections
  13. Low hanging fruit Photo credit: Stanislas PERRIN
  14. Partner trajectory 150 125Number of partners 100 75 50 25 0 Y1Q3 Y1Q4 Y2Q1 Y2Q2 Y2Q3 Y2Q4 Y3Q1 Y3Q2 Y3Q3 Y3Q4 Y4Q1 Y4Q2 Y4Q3
  15. Long Tail in databases contributing to EOLNumber of taxa for which content is contributed to EOL 600000 500000 400000 300000 200000 100000 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 … viewed on log scale 1000000 100000 10000 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 Partners in order of # taxa contributed to EOL
  16. Content strategyHighlightsPrioritiesRichness scoreProcessesGoals
  17. http://eol.org/info/partners
  18. Content Partner process overviewPartner creates an EOL member accountAdds a content partnerWe communicate with themThey (or we) upload a resource file or set a URL where one can be foundThey set a harvest frequencyEOL harvests at that frequency
  19. Current methods of data transferEOL resource document (XML) (usually they do the work)Spreadsheet upload (either can do the work)Connector (we do the work) Scrape web site or PDF Use web services Work from a copy of DBDarwin Core Archive (classifications, soon)See http://eol.org/info/cp_resource_checklist
  20. How EOL gets content n=141 partners70605040 CSV30 web service20 PDF10 HTML DB 0 XML resource doc Connector LD/eLD/Scratchpad LD/eLD/Scratchpad Spreadsheet
  21. Example partner• Pensoft has a process to generate EOL-compliant XML for new species• Also sends images to Morphbank, specimens to GBIF• They registered the URL at EOL• Our script checks for changes once a day
  22. EOL Schema SourcesContent type Standards usedTaxa Darwin Core ArchiveAttribution & licensing Dublin & Darwin CoreText objects & links Species Profile Model(andMultimedia now +) Dublin (+ Audubon Core)
  23. Example biological contentEOL Table of Contents TDWG Species Profile ModelPhysical Description › Morphology #MorphologyPhysical Description › Size #SizeEcology › Habitat #HabitatEcology › Associations #AssociationsLife History & Behavior › Life Expectancy #LifeExpectancyEvolution and Systematics › Functional #EvolutionAdaptationsConservation > Conservation Status #ConservationStatusMolecular Biology and Genetics › Genetics #GeneticsMolecular Biology and Genetics › Genome #MolecularBiologyMolecular Biology and Genetics › Molecular #MolecularBiologyBiologyNucleotide Sequences #MolecularBiology
  24. SPM DwC infoitemdescription Plinian Core using Darwin Core Archive flat files as transport mechanism EOL v2
  25. Controlled vocabularyNumeric values RelationsEOL v3?
  26. PartnersCan delete or replace any of their objectsControl how often we harvest, and can force a harvestGet an automatically updating collectionCan request that we use their classification for browsingCan change the logo and description of their projectReceive comments and curator actions immediatelyReceive monthly reminders they can get traffic statisticsGet many links back to their original web resources
  27. Partners cannotPublish the very first timeDecide if they are pre-vettedRoll back a harvestChange the object of any other partnersChange classifications from any other partners
  28. http://eol.org/pages/704102 Richness scoresCynthia Parr Global Content SummitSpecies Pages Group 17-19 Jan 2011
  29. Taxon page richness algorithma (Breadth) + b (Depth) + c (Diversity) 60% 30% 10%Breadth: Images, topics of text objects, references, maps,videos, sounds, conservation statusDepth: # words per text object, # words totalDiversity: Sources (partners) 0 – 100, Threshold 40
  30. Summary of EOL page richnessOverall Hot List950,000 have content 30 % of 75K are rich2 % are rich Average richness = ~30~22 % have only linksto literature Red Hot List 56 % of 3K are rich Average richness = 43
  31. How richness is usedChoose images for home page “March of Life”Allows sorting in collections Weird life exampleHelps provide best search and API resultsAny other ideas? Could we be matchmakers for pages needing enrichment and users?
  32. http://synthesis.eol.org/media/treemap
  33. Strategies for improving richnessCrowd-sourcing LeveragingCollections Enabling platformsCommunities Enabling journalsMobile apps Data mining BHL etc.
  34. The page richness indexHelps fill gaps with existing knowledgeHelps prioritize funding and training so that it has maximum impact on closing true gapsWill be available via APIComputing and storing richness index on EOL is a step towards storing and serving computable data

×