Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

  • 5,691 views
Uploaded on

Abstract: This talk will describe the use of http://www.citeulike.org to manage and share bibliographic references among 1300 scientists and engineers working at the Sanger Institute......

Abstract: This talk will describe the use of http://www.citeulike.org to manage and share bibliographic references among 1300 scientists and engineers working at the Sanger Institute (http://www.sanger.ac.uk) and European Bioinformatics Insitute (http://www.ebi.ac.uk) based on the Wellcome Trust Genome Campus in Cambridge, UK. Using data from references shared so far, we will illustrate the costs, benefits and adoption of citeulike to create and share bibliographic data on the web.

Presentation from The Influence and Impact of Web 2.0 on Various Applications at the National e-Science Centre, Edinburgh, UK.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,691
On Slideshare
5,571
From Embeds
120
Number of Embeds
2

Actions

Shares
Downloads
22
Comments
0
Likes
5

Embeds 120

http://www.slideshare.net 119
http://facebook.slideshare.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Bibliography 2.0: A case study from the Wellcome Trust Genome Campus Dr. Duncan Hull http://twitter.com/dullhunk European Bioinformatics Institute, EBI.ac.uk e-Science workshop: The influence and impact of Web 2.0 on various applications 11th-12th May 2010, Edinburgh
  • 2. Overview
    • Introduction: Wellcome Trust Genome Campus
        • The European Bioinformatics Institute ( ebi.ac.uk )
        • The Wellcome Trust Sanger Institute ( sanger.ac.uk )
        • The Library
    • Problem: economics and “freakonomics” of publishing
      • The unintended consequences of “publish or perish”
      • Burying data in publication silos
      • Obscuring identities and obstructing social applications
    • Solution? Bibliography 2.0 with citeulike
      • Incentives
      • Disincentives
      • Case study: What we’ve learnt
    • Conclusions and future work
    21.05.10
  • 3. Wellcome to the Genome Campus
    • Home of
    • The European Bioinformatics Institute
    • The Sanger Institute
    • Just outside Cambridge, UK
  • 4. EBI: a data hub for bioinformatics in Europe Literature ebi.ac.uk/citexplore DNA +RNA sequences ebi.ac.uk/ena Genomes: ensembl.org Transcriptomes e.g. ArrayExpress Protein structure ebi.ac.uk/pdbe Protein domains, families ebi.ac.uk/interpro Pathways reactome.org Systems biomodels.net Small molecules ebi.ac.uk/chebi and ebi.ac.uk/chembl Protein sequence uniprot.org Protein protein interactions ebi.ac.uk/intact ~400 staff (research/services), publishing data on the web
  • 5. 21.05.10 e.g. Chemical Entities of Biological Interest (ChEBI) Free database /ontology of 500,000 small molecules (many drugs)
  • 6. The Wellcome Trust Sanger Institute 21.05.10 Alex Bateman ~900 Sanger staff (total)
  • 7. Shared Library 21.05.10 Annual Journal subscription budget £500,000 (modest compared to multi million pound journal budgets of university libraries) More later
  • 8.
    • )
    21.05.10 “ People respond to incentives, although not necessarily in ways that are predictable and manifest. Therefore, one of the most powerful laws in the universe is the law of unintended consequences. This applies to schoolteachers and Realtors and crack dealers as well as expectant mothers, sumo wrestlers, bible salesman, and the Ku Klux Klan…” … and scientists too…
  • 9. Unintended consequences, an example
    • Incentive: “ publish or perish ”
      • Publications are rewarded with recognition, hiring, promotion, tenure, fame, funding, fortune, prizes, job satisfaction etc
    • Unintended consequences :
      • Valuable data gets damaged, destroyed or “buried” (see later)
      • Inaccessible to data and text mining on the Web
        • Copyright and toll-access journals
      • Luddite scientists
        • Minimal exploitation of social software for sharing data
        • Minimal exploitation of Web 2.0 for sharing data
    21.05.10
  • 10.
    • Gene names: e.g. Hexokinase, HK1, HK2, HK3
    • Protein names: e.g. Hexokinase, HK1, HK2, HK3
    • Chemical names: e.g. Glucose-6-phosphate, G6P, Glu, Gluc
    • Author names: e.g. Mark Baker (see next slide)
    • Poor precision and recall
    21.05.10 Why bury it [data] first and then mine it again? Barend Mons, Wikiproteins http://proteins.wikiprofessional.org Which gene did you mean? BMC Bioinformatics. 2005 Jun 7;6:142 DOI:10.1186/1471-2105-6-142
  • 11. Identity crisis: Mark Baker
    • http://pubmed.gov?term=Baker+M[author]
    • http://pubmed.gov?term=Mark+Baker[author]
    • etc
    21.05.10 Until we have unique author identifiers, it is difficult or impossible to reliably find the papers published by a particular person Open Researcher and Contributor ID http://orcid.org “ Tell me whenever Mark Baker publishes a paper”
  • 12. Social information (need identity for this)
    • Socialisation: (e-science > “we-science”)
      • How many other people have read this paper?
      • What are my friends / enemies reading?
      • What other papers did they also read?
    • Personalisation (e-science > “me-science”)
      • These are my publications
      • This is my bibliography (stuff I’m reading / have read)
      • Digital libraries “ document-centred ” rather than “ people-centred ” Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R. Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009), pp. 1-29. DOI:10.1145/1552303.1552304
    21.05.10
  • 13. A solution, citeulike.org?
    • http://www.citeulike.org
    • Lack of personalisation of library data
    • Lack of socialisation of library data
    • Works a lot like http://www.delicious.com
    21.05.10
  • 14. Click Post to Citeulike 21.05.10
  • 15. Tag it (optional) e.g. author tags 21.05.10
  • 16. Journal picks is a group of 40+ invited users on campus, who select interesting papers 21.05.10
  • 17. 21.05.10 2,016 unique articles in journal picks (less than one year) 3,880,055 unique articles total
  • 18. Citeulike + ZeitGeist = CiteGeist http://www.citeulike.org/citegeist 21.05.10
  • 19. Citeulike incentives
    • Selfish scientist (just organise my reference mess)
    • What’s popular (interesting stuff CiteGeist)
    • Serendipity (find papers you wouldn’t find normally)
    • Increase visibility and PageRank of papers?
    • Person-centred access points into first / second page of Google results
    • e.g. http://www.google.com/search?q=carole+goble
    • Has result below fairly high up list,
    • http://www.citeulike.org/group/10570/tag/carole-goble
    21.05.10
  • 20. Citeulike disincentives
    • Privacy, don’t want to share with rivals
      • (but can make collections private)
    • Citeulike might go bust?
      • But Springer sponsored
    • Parsers are fragile
      • easily (and deliberately) broken by publishers
    • Valuable data in the hands of a commercial company?
      • But Facebook? LinkedIn? Twitter etc?
    • No academic reward for using it
      • publication = “finished”
    • Social software works best with network effects
      • There are LOTS of other tools that do this…
    21.05.10
  • 21. And the rest… 21.05.10 www.mendeley.com www.zotero.org www.connotea.org www.mekentosj.com www.hubmed.org www.refworks.com “ iTunes for PDF files” “ Last.fm of research”
  • 22. Giant corporate commercial competitors
    • With significant vested financial interests
    • Scopus http://www.scopus.com/
    • ISI WOK http://isiknowledge.com Wrote a review of these systems: Hull, D., S. R. Pettifer, and D. B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Comput Biol   4 (10), e1000204+. DOI:10.1371/journal.pcbi.1000204
    21.05.10
  • 23. Conclusions
    • “ Publish or perish” has some unfortunate and unintended consequences in science
    • Citeulike is an interesting Web 2.0 tool
      • We’ve had some success using it (typical “long tail”)
      • Weak incentives for use by many cultural barriers to adoption
      • Technical barriers to adoption, many tools, messy data
    • Future work
      • Social network analysis, clickthroughs, tag analysis
      • Any other ideas…
    • But the times they are a changin’
      • Citeulike or something like it will work much better if/when “publishing” incentives change over time…
    21.05.10
  • 24. Acknowledgements
    • Mark Baker for organising this workshop
    • EBI, Christoph Steinbeck (laboratory head)
    • Carole Goble, University of Manchester
    • The Sanger, Alex Bateman, Frances Martin, Tim Hubbard and all the contributors to the Journal Picks group
    • Richard Cameron, Kevin Emamy and the rest of the citeulike team
    • BBSRC for funding
    • Any questions?
    21.05.10