Bibliography 2.0: A case study from the Wellcome Trust Genome Campus Dr. Duncan Hull  http://twitter.com/dullhunk European Bioinformatics Institute, EBI.ac.uk e-Science workshop: The influence and impact of Web 2.0 on various applications 11th-12th May 2010, Edinburgh
Overview Introduction: Wellcome Trust Genome Campus The European Bioinformatics Institute ( ebi.ac.uk ) The Wellcome Trust Sanger Institute ( sanger.ac.uk ) The Library Problem: economics and “freakonomics” of publishing The unintended consequences of “publish or perish” Burying data in publication silos Obscuring identities and obstructing social applications Solution? Bibliography 2.0 with citeulike Incentives Disincentives Case study: What we’ve learnt Conclusions and future work 21.05.10
Wellcome to the Genome Campus Home of The European Bioinformatics Institute The Sanger Institute Just outside Cambridge, UK
EBI: a data hub for bioinformatics in Europe Literature ebi.ac.uk/citexplore DNA +RNA sequences ebi.ac.uk/ena   Genomes:  ensembl.org Transcriptomes e.g. ArrayExpress Protein structure ebi.ac.uk/pdbe Protein domains, families ebi.ac.uk/interpro Pathways  reactome.org Systems biomodels.net Small molecules ebi.ac.uk/chebi   and ebi.ac.uk/chembl Protein sequence uniprot.org Protein protein interactions ebi.ac.uk/intact ~400 staff (research/services), publishing data on the web
21.05.10 e.g. Chemical Entities of Biological Interest (ChEBI) Free database /ontology of 500,000 small molecules (many drugs)
The Wellcome Trust Sanger Institute 21.05.10 Alex Bateman ~900 Sanger staff (total)
Shared Library 21.05.10 Annual Journal  subscription  budget  £500,000 (modest compared to multi million pound journal budgets of university libraries) More later
) 21.05.10 “ People respond to incentives, although not necessarily in ways that are predictable and manifest. Therefore, one of the most powerful laws in the universe is the law of unintended consequences. This applies to schoolteachers and Realtors and crack dealers as well as expectant mothers, sumo wrestlers, bible salesman, and the Ku Klux Klan…” … and scientists too…
Unintended consequences, an example Incentive: “ publish or perish ” Publications are rewarded with recognition, hiring, promotion, tenure, fame, funding, fortune, prizes, job satisfaction etc Unintended consequences : Valuable data gets damaged, destroyed or “buried” (see later) Inaccessible to data and text mining on the Web  Copyright and toll-access journals Luddite scientists  Minimal exploitation of social software for sharing data Minimal exploitation of Web 2.0 for sharing data 21.05.10
Gene names: e.g. Hexokinase, HK1, HK2, HK3 Protein names: e.g. Hexokinase, HK1, HK2, HK3 Chemical names:  e.g. Glucose-6-phosphate, G6P, Glu, Gluc  Author names: e.g. Mark Baker (see next slide) Poor precision and recall 21.05.10 Why bury it [data] first  and then mine it again?   Barend Mons, Wikiproteins  http://proteins.wikiprofessional.org Which gene did you mean? BMC Bioinformatics. 2005 Jun 7;6:142 DOI:10.1186/1471-2105-6-142
Identity crisis: Mark Baker http://pubmed.gov?term=Baker+M[author] http://pubmed.gov?term=Mark+Baker[author] etc 21.05.10 Until we have unique author identifiers, it is difficult or impossible to reliably find the papers published by a particular person Open Researcher and Contributor ID  http://orcid.org “ Tell me whenever Mark Baker publishes a paper”
Social information (need identity for this) Socialisation: (e-science > “we-science”) How many other people have read this paper? What are my friends / enemies reading? What other papers did they also read? Personalisation (e-science > “me-science”) These are my publications This is my bibliography (stuff I’m reading / have read) Digital libraries “ document-centred ” rather than “ people-centred ” Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R. Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009), pp. 1-29.  DOI:10.1145/1552303.1552304 21.05.10
A solution, citeulike.org? http://www.citeulike.org Lack of personalisation of library data Lack of socialisation of library data Works a lot like  http://www.delicious.com   21.05.10
Click Post to Citeulike 21.05.10
Tag it (optional) e.g. author tags 21.05.10
Journal picks is a group of 40+ invited users on campus, who select interesting papers  21.05.10
21.05.10 2,016 unique articles in journal picks (less than one year) 3,880,055 unique articles total
Citeulike + ZeitGeist = CiteGeist   http://www.citeulike.org/citegeist   21.05.10
Citeulike incentives Selfish scientist (just organise my reference mess)  What’s popular (interesting stuff CiteGeist) Serendipity (find papers you wouldn’t find normally) Increase visibility and PageRank of papers? Person-centred access points into first / second page of Google results e.g.  http://www.google.com/search?q=carole+goble Has result below fairly high up list, http://www.citeulike.org/group/10570/tag/carole-goble   21.05.10
Citeulike disincentives  Privacy, don’t want to share with rivals (but can make collections private) Citeulike might go bust?  But Springer sponsored Parsers are fragile easily (and deliberately) broken by publishers  Valuable data in the hands of a commercial company? But Facebook? LinkedIn? Twitter etc? No academic reward for using it  publication = “finished” Social software works best with network effects There are LOTS of other tools that do this… 21.05.10
And the rest… 21.05.10 www.mendeley.com   www.zotero.org   www.connotea.org   www.mekentosj.com   www.hubmed.org   www.refworks.com   “ iTunes for PDF files” “ Last.fm of research”
Giant corporate commercial competitors With significant vested financial interests Scopus  http://www.scopus.com/   ISI WOK  http://isiknowledge.com Wrote a review of these systems: Hull, D., S. R. Pettifer, and D. B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web.  PLoS Comput Biol    4  (10), e1000204+.  DOI:10.1371/journal.pcbi.1000204 21.05.10
Conclusions “ Publish or perish” has some unfortunate and unintended consequences in science Citeulike is an interesting Web 2.0 tool We’ve had some success using it (typical “long tail”) Weak incentives for use by many cultural barriers to adoption Technical barriers to adoption, many tools, messy data Future work Social network analysis, clickthroughs, tag analysis Any other ideas… But the times they are a changin’ Citeulike or something like it will work much better if/when “publishing” incentives change over time… 21.05.10
Acknowledgements Mark Baker for organising this workshop EBI, Christoph Steinbeck (laboratory head) Carole Goble, University of Manchester The Sanger, Alex Bateman, Frances Martin, Tim Hubbard and all the  contributors to the Journal Picks group Richard Cameron, Kevin Emamy and the rest of the citeulike team BBSRC for funding Any questions? 21.05.10

Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

  • 1.
    Bibliography 2.0: Acase study from the Wellcome Trust Genome Campus Dr. Duncan Hull http://twitter.com/dullhunk European Bioinformatics Institute, EBI.ac.uk e-Science workshop: The influence and impact of Web 2.0 on various applications 11th-12th May 2010, Edinburgh
  • 2.
    Overview Introduction: WellcomeTrust Genome Campus The European Bioinformatics Institute ( ebi.ac.uk ) The Wellcome Trust Sanger Institute ( sanger.ac.uk ) The Library Problem: economics and “freakonomics” of publishing The unintended consequences of “publish or perish” Burying data in publication silos Obscuring identities and obstructing social applications Solution? Bibliography 2.0 with citeulike Incentives Disincentives Case study: What we’ve learnt Conclusions and future work 21.05.10
  • 3.
    Wellcome to theGenome Campus Home of The European Bioinformatics Institute The Sanger Institute Just outside Cambridge, UK
  • 4.
    EBI: a datahub for bioinformatics in Europe Literature ebi.ac.uk/citexplore DNA +RNA sequences ebi.ac.uk/ena Genomes: ensembl.org Transcriptomes e.g. ArrayExpress Protein structure ebi.ac.uk/pdbe Protein domains, families ebi.ac.uk/interpro Pathways reactome.org Systems biomodels.net Small molecules ebi.ac.uk/chebi and ebi.ac.uk/chembl Protein sequence uniprot.org Protein protein interactions ebi.ac.uk/intact ~400 staff (research/services), publishing data on the web
  • 5.
    21.05.10 e.g. ChemicalEntities of Biological Interest (ChEBI) Free database /ontology of 500,000 small molecules (many drugs)
  • 6.
    The Wellcome TrustSanger Institute 21.05.10 Alex Bateman ~900 Sanger staff (total)
  • 7.
    Shared Library 21.05.10Annual Journal subscription budget £500,000 (modest compared to multi million pound journal budgets of university libraries) More later
  • 8.
    ) 21.05.10 “People respond to incentives, although not necessarily in ways that are predictable and manifest. Therefore, one of the most powerful laws in the universe is the law of unintended consequences. This applies to schoolteachers and Realtors and crack dealers as well as expectant mothers, sumo wrestlers, bible salesman, and the Ku Klux Klan…” … and scientists too…
  • 9.
    Unintended consequences, anexample Incentive: “ publish or perish ” Publications are rewarded with recognition, hiring, promotion, tenure, fame, funding, fortune, prizes, job satisfaction etc Unintended consequences : Valuable data gets damaged, destroyed or “buried” (see later) Inaccessible to data and text mining on the Web Copyright and toll-access journals Luddite scientists Minimal exploitation of social software for sharing data Minimal exploitation of Web 2.0 for sharing data 21.05.10
  • 10.
    Gene names: e.g.Hexokinase, HK1, HK2, HK3 Protein names: e.g. Hexokinase, HK1, HK2, HK3 Chemical names: e.g. Glucose-6-phosphate, G6P, Glu, Gluc Author names: e.g. Mark Baker (see next slide) Poor precision and recall 21.05.10 Why bury it [data] first and then mine it again? Barend Mons, Wikiproteins http://proteins.wikiprofessional.org Which gene did you mean? BMC Bioinformatics. 2005 Jun 7;6:142 DOI:10.1186/1471-2105-6-142
  • 11.
    Identity crisis: MarkBaker http://pubmed.gov?term=Baker+M[author] http://pubmed.gov?term=Mark+Baker[author] etc 21.05.10 Until we have unique author identifiers, it is difficult or impossible to reliably find the papers published by a particular person Open Researcher and Contributor ID http://orcid.org “ Tell me whenever Mark Baker publishes a paper”
  • 12.
    Social information (needidentity for this) Socialisation: (e-science > “we-science”) How many other people have read this paper? What are my friends / enemies reading? What other papers did they also read? Personalisation (e-science > “me-science”) These are my publications This is my bibliography (stuff I’m reading / have read) Digital libraries “ document-centred ” rather than “ people-centred ” Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R. Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009), pp. 1-29. DOI:10.1145/1552303.1552304 21.05.10
  • 13.
    A solution, citeulike.org?http://www.citeulike.org Lack of personalisation of library data Lack of socialisation of library data Works a lot like http://www.delicious.com 21.05.10
  • 14.
    Click Post toCiteulike 21.05.10
  • 15.
    Tag it (optional)e.g. author tags 21.05.10
  • 16.
    Journal picks isa group of 40+ invited users on campus, who select interesting papers 21.05.10
  • 17.
    21.05.10 2,016 uniquearticles in journal picks (less than one year) 3,880,055 unique articles total
  • 18.
    Citeulike + ZeitGeist= CiteGeist http://www.citeulike.org/citegeist 21.05.10
  • 19.
    Citeulike incentives Selfishscientist (just organise my reference mess) What’s popular (interesting stuff CiteGeist) Serendipity (find papers you wouldn’t find normally) Increase visibility and PageRank of papers? Person-centred access points into first / second page of Google results e.g. http://www.google.com/search?q=carole+goble Has result below fairly high up list, http://www.citeulike.org/group/10570/tag/carole-goble 21.05.10
  • 20.
    Citeulike disincentives Privacy, don’t want to share with rivals (but can make collections private) Citeulike might go bust? But Springer sponsored Parsers are fragile easily (and deliberately) broken by publishers Valuable data in the hands of a commercial company? But Facebook? LinkedIn? Twitter etc? No academic reward for using it publication = “finished” Social software works best with network effects There are LOTS of other tools that do this… 21.05.10
  • 21.
    And the rest…21.05.10 www.mendeley.com www.zotero.org www.connotea.org www.mekentosj.com www.hubmed.org www.refworks.com “ iTunes for PDF files” “ Last.fm of research”
  • 22.
    Giant corporate commercialcompetitors With significant vested financial interests Scopus http://www.scopus.com/ ISI WOK http://isiknowledge.com Wrote a review of these systems: Hull, D., S. R. Pettifer, and D. B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Comput Biol   4 (10), e1000204+. DOI:10.1371/journal.pcbi.1000204 21.05.10
  • 23.
    Conclusions “ Publishor perish” has some unfortunate and unintended consequences in science Citeulike is an interesting Web 2.0 tool We’ve had some success using it (typical “long tail”) Weak incentives for use by many cultural barriers to adoption Technical barriers to adoption, many tools, messy data Future work Social network analysis, clickthroughs, tag analysis Any other ideas… But the times they are a changin’ Citeulike or something like it will work much better if/when “publishing” incentives change over time… 21.05.10
  • 24.
    Acknowledgements Mark Bakerfor organising this workshop EBI, Christoph Steinbeck (laboratory head) Carole Goble, University of Manchester The Sanger, Alex Bateman, Frances Martin, Tim Hubbard and all the contributors to the Journal Picks group Richard Cameron, Kevin Emamy and the rest of the citeulike team BBSRC for funding Any questions? 21.05.10