• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Global content summit: Overview, content partnering, richness

Global content summit: Overview, content partnering, richness



These are Cyndy Parr's presentations at the EOL Global Partner Summit, starting with an overview of the meeting, and including an overview of how we set up content partnerships, and how we calculate ...

These are Cyndy Parr's presentations at the EOL Global Partner Summit, starting with an overview of the meeting, and including an overview of how we set up content partnerships, and how we calculate and use page richness scores.



Total Views
Views on SlideShare
Embed Views



8 Embeds 333

http://lonewolflibrarian.wordpress.com 309
http://us-w1.rockmelt.com 7
http://demo.azzist.com 6
http://www.library.ceu.hu 5
http://a0.twimg.com 2
http://translate.googleusercontent.com 2
http://flashattackcrew.blogspot.com 1
http://isapagan.blogspot.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • EOL is a giant mashup that merges information that were created elsewhere on its pages which are then available for curators (mostly credentialed scientists) to trust or untrust and rate, or for anybody to provide comments or tags.We’re partnering with over a hundred scientific databases as well as public conribution sites like Flickr and Wikipedia.100+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages500 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views
  • Low hanging fruit is mostly goneFellowsSmaller partners
  • Partners are projects and databases that we are sharing data with
  • Why it is important to streamlineAbout 32 partners have managed to make their own XML resource docs – but that probably has the lowest cost per returnBut Connectors may be even more important -- -- web services & db connectors putting content on at least ½ million pagesLDs/Scratchpads important for small partnersSpreadsheets popular, with new transfer schema and flatfile archive format, the XML bar may go down and the spreadsheet might go up
  • Overview › Brief SummaryOverview › Comprehensive DescriptionOverview › DistributionPhysical Description › MorphologyPhysical Description › SizePhysical Description › Diagnostic DescriptionPhysical Description › Type InformationPhysical Description › Look AlikesPhysical Description › DevelopmentEcology › HabitatEcology › MigrationEcology › DispersalEcology › Diseases and ParasitesEcology › Population BiologyEcology › General EcologyLife History and Behavior › BehaviorLife History and Behavior › CyclicityLife History and Behavior › Life CycleLife History and Behavior › ReproductionLife History and Behavior › GrowthEvolution and Systematics › EvolutionEvolution and Systematics › Fossil HistoryEvolution and Systematics › Systematics or PhylogeneticsEvolution and Systematics › Functional AdaptationsPhysiology and Cell Biology › PhysiologyPhysiology and Cell Biology › Cell BiologyMolecular Biology and Genetics › GeneticsConservation › Conservation StatusConservation › TrendsConservation › ThreatsConservation › LegislationConservation › ManagementRelevance to Humans and Ecosystems › BenefitsRelevance to Humans and Ecosystems › RisksNotesTaxonomyEducation ResourcesCitizen ScienceIdentification Resources
  • ExtensionLeveraging strengths
  • Inspired by community ecology & measures of species diversity, which of course were originally inspired by information theory, but we haven’t used those measures. Instead we put together these factors in a way that we could assign weights to different factors based on how well they capture “a rich page”We sampled dozens of pages and had team members assess them for their gestalt “richness” based on their own criteria. Then we compared those scores to those generated by the algorithm, and iteratively changed weights until we achieved a set of weights that appeared to reflect human perception of “richness.”Note that there’s a penalty that unvetted material is only worth about 75% of vetted materialAlso there are maximums for many of these input values – having 200 images may not make a page much more rich than having 25 images.Reserve the right to change this to ensure that the index is as useful as possible. Like Google PageRank, want to ensure that nobody can game the system.
  • Also note that there is an implication that a “rich page” is a “high quality page” – not necessarily true but often it is.As EOL goes forward with our version 2 we’ll be gathering other inputs that can tell us if a page is successful – ratings of its objects, for example.
  • This Treemap summarizes the 1.9 million described species that each have a page on the Encyclopedis of life. Some of these pages have only a name so far but about a million of them actually have more than that, with maps, multimedia, text, at least literature references.Each of these species potentially represents a volume in a “living library,” as each has evolved solutions to nature’s challenges, solutions that can benefit human society. For example, the genomics revolution and half of our synthetic drugs were made possible by understanding the characteristics of particular species

Global content summit: Overview, content partnering, richness Global content summit: Overview, content partnering, richness Presentation Transcript

  • Cynthia Parr Global Content SummitSpecies Pages Group 17-19 Jan 2011
  • http://www.eol.org• All species known to science• Freely accessible: open access, open source• Available from a single portal in a common format• Quality• Constantly growing• Aimed at multiple audiences
  • GBIFEOL Global Partners ViBRANT Dutch Pan- ChinaMexico Arab India Costa Rica Colombia Peru Australia South Africa BHL- Global BHL
  • Aims of global partners Global access to knowledge about life on Earth To increase awareness and understanding of living nature through an Encyclopedia of Life that gathers, generates and shares knowledge in an open, freely accessible and trusted digital resourceWork together towards this vision and mission, sharingexpertise and knowledge as appropriateExpand the global pool of knowledge about biodiversity andimprove access to it
  • Aims of this workshop• Gather content experts from Global Partners• Become familiar with each other’s work• Learn how core EOL works and provide feedback on it• Form the Species Pages Working Group Team at Smithsonian (SPG) Representatives from global partners• Draft individual plans that complement each other towards a common goal• Remind ourselves WHY we want to do this
  • What is content?Biological information Names and hierarchies Descriptive text Literature Multimedia Maps Links to more information…..what about comments, collection annotations?
  • Overview of agendaDay 1: IntroductionsDay 2: SharingDay 3: Planning
  • Acknowledgements• Funding from: David M. Rubenstein gift John D. and Catherine T. MacArthur Foundation Alfred P. Sloane Foundation Smithsonian Institution Marine Biological Laboratory Harvard University and other funders and donors• All our content partners and global partners• Volunteer curators and individual contributors via Flickr, Wikimedia, and members of EOL• All of you for coming• Claire Badgley
  • Overview of Content PartneringCynthia Parr Global Content SummitSpecies Pages Group 17-19 Jan 2011
  • EOL is a content curation communityDatabasesJournalsLifeDesks & Scratchpads CuratePublic contributions Aggregate Comment Rate, Collect eol.org Quality control, prioritization API Third party apps
  • http://eol.org/content_partners
  • http://eol.org/info/content_partner_collections
  • Low hanging fruit Photo credit: Stanislas PERRIN
  • Partner trajectory 150 125Number of partners 100 75 50 25 0 Y1Q3 Y1Q4 Y2Q1 Y2Q2 Y2Q3 Y2Q4 Y3Q1 Y3Q2 Y3Q3 Y3Q4 Y4Q1 Y4Q2 Y4Q3
  • Long Tail in databases contributing to EOLNumber of taxa for which content is contributed to EOL 600000 500000 400000 300000 200000 100000 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 … viewed on log scale 1000000 100000 10000 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 Partners in order of # taxa contributed to EOL
  • Content strategyHighlightsPrioritiesRichness scoreProcessesGoals
  • http://eol.org/info/partners
  • Content Partner process overviewPartner creates an EOL member accountAdds a content partnerWe communicate with themThey (or we) upload a resource file or set a URL where one can be foundThey set a harvest frequencyEOL harvests at that frequency
  • Current methods of data transferEOL resource document (XML) (usually they do the work)Spreadsheet upload (either can do the work)Connector (we do the work) Scrape web site or PDF Use web services Work from a copy of DBDarwin Core Archive (classifications, soon)See http://eol.org/info/cp_resource_checklist
  • How EOL gets content n=141 partners70605040 CSV30 web service20 PDF10 HTML DB 0 XML resource doc Connector LD/eLD/Scratchpad LD/eLD/Scratchpad Spreadsheet
  • Example partner• Pensoft has a process to generate EOL-compliant XML for new species• Also sends images to Morphbank, specimens to GBIF• They registered the URL at EOL• Our script checks for changes once a day
  • EOL Schema SourcesContent type Standards usedTaxa Darwin Core ArchiveAttribution & licensing Dublin & Darwin CoreText objects & links Species Profile Model(andMultimedia now +) Dublin (+ Audubon Core)
  • Example biological contentEOL Table of Contents TDWG Species Profile ModelPhysical Description › Morphology #MorphologyPhysical Description › Size #SizeEcology › Habitat #HabitatEcology › Associations #AssociationsLife History & Behavior › Life Expectancy #LifeExpectancyEvolution and Systematics › Functional #EvolutionAdaptationsConservation > Conservation Status #ConservationStatusMolecular Biology and Genetics › Genetics #GeneticsMolecular Biology and Genetics › Genome #MolecularBiologyMolecular Biology and Genetics › Molecular #MolecularBiologyBiologyNucleotide Sequences #MolecularBiology
  • SPM DwC infoitemdescription Plinian Core using Darwin Core Archive flat files as transport mechanism EOL v2
  • Controlled vocabularyNumeric values RelationsEOL v3?
  • PartnersCan delete or replace any of their objectsControl how often we harvest, and can force a harvestGet an automatically updating collectionCan request that we use their classification for browsingCan change the logo and description of their projectReceive comments and curator actions immediatelyReceive monthly reminders they can get traffic statisticsGet many links back to their original web resources
  • Partners cannotPublish the very first timeDecide if they are pre-vettedRoll back a harvestChange the object of any other partnersChange classifications from any other partners
  • http://eol.org/pages/704102 Richness scoresCynthia Parr Global Content SummitSpecies Pages Group 17-19 Jan 2011
  • Taxon page richness algorithma (Breadth) + b (Depth) + c (Diversity) 60% 30% 10%Breadth: Images, topics of text objects, references, maps,videos, sounds, conservation statusDepth: # words per text object, # words totalDiversity: Sources (partners) 0 – 100, Threshold 40
  • Summary of EOL page richnessOverall Hot List950,000 have content 30 % of 75K are rich2 % are rich Average richness = ~30~22 % have only linksto literature Red Hot List 56 % of 3K are rich Average richness = 43
  • How richness is usedChoose images for home page “March of Life”Allows sorting in collections Weird life exampleHelps provide best search and API resultsAny other ideas? Could we be matchmakers for pages needing enrichment and users?
  • http://synthesis.eol.org/media/treemap
  • Strategies for improving richnessCrowd-sourcing LeveragingCollections Enabling platformsCommunities Enabling journalsMobile apps Data mining BHL etc.
  • The page richness indexHelps fill gaps with existing knowledgeHelps prioritize funding and training so that it has maximum impact on closing true gapsWill be available via APIComputing and storing richness index on EOL is a step towards storing and serving computable data