• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Acacdemics and biologists have embraced open source. Limits of librarians and scientists patience being reached, and tie-ins and sweeteners for initial multiyear online deals now running out.

talk talk Presentation Transcript

  • Text mining and Open Access publishing Matthew Cockerill Technical Director, BioMed Central
  • Summary
    • What is Open Access publishing?
    • Open Access publishing and text mining
    • About BMC Bioinformatics
    • The BioCreative supplement
  • Summary
    • What is Open Access publishing?
    • Open Access publishing and text mining
    • About BMC Bioinformatics
    • The BioCreative supplement
  • The current model of publishing scientific research
    • Scientists carry out research
    • They write up their results
    • They submit them to a journal
    • Other scientists act as peer reviewers and editorial advisers
    • Finally, the publisher sells access to that research back to the scientific community
  • What’s wrong with this status quo?
    • Restricted access to scientific research is contrary to the interests of
      • the scientists who do the research
      • the funders who pay for it
      • society as a whole
    • It is an historical artefact of the economics of print publishing
    • It is a serious obstacle to mining of full text information
  • BioMed Central The Open Access publisher
    • Commercial organization
    • Published first article in mid-2000
    • Strict policy of immediate Open Access to all research articles
  • Growth of BioMed Central
  • Momentum for Open Access
    • PubMed Central
    • Public Library of Science
    • Open Access declarations: Budapest/Bethesda/Berlin
    • Software open-source movement
    • Mass cancellation of titles from traditional publishers
  • BioMed Central’s business model for open access publishing
    • Keep costs down via
      • Online submission and peer review
      • Automated tools to streamline article processing, conversion and layout
    • Processing charge (currently $525) for accepted articles
    • No processing charge for authors at member institutions
  • Institutional membership
    • CalTech
    • Cancer Research UK
    • Columbia University
    • Cornell University
    • University of California
    • Dana-Farber Cancer Institute
    • Harvard University
    • INSERM
    • Imperial College
    • Institut Pasteur
    • John Innes Centre
    • Johns Hopkins University
    • Kyoto University
    • Max Planck Institutes
    • Memorial Sloan-Kettering Cancer Center
    More than 400 institutions are members of BioMed Central, including, to name just a few:
    • MRC Laboratory of Molecular Biology
    • National Institutes of Health
    • National Institute for Medical Research
    • NHS England
    • Princeton University
    • Rockefeller University
    • TIGR
    • TSRI
    • Tufts University
    • Wellcome Trust Sanger Institute
    • University of Wisconsin
    • World Health Organization
    • Yale University
  • Summary
    • What is Open Access publishing?
    • Open Access publishing and text mining
    • About BMC Bioinformatics
    • The BioCreative supplement
  • Mining the full text
    • Analysing results of high-throughput experiments means biologists increasingly need text-mining tools
    • PubMed is currently the primary resource for text mining (“it’s what’s available”) but:
      • Abstracts omit critical information
      • Techniques developed for abstracts may not effectively use extra information in full text
    • Fully Open Access corpora, in standard XML formats, will help
  • Data mining - BioMed Central
    • Entire corpus of full text XML downloadable by ftp as a single zip file
    • Various groups working with the data
      • E.g Pre-BIND (automatic extraction of possible protein-protein interaction information from full text)
    • No restrictions on redistribution
    • This means other groups can use same corpus to repeat and build on results
  • Data mining - BioMed Central (screen shot)
  • Data mining - PubMed Central
    • Standard NLM archiving/interchange XML DTD: common format across multiple publishers
    • Only a subset of PubMed Central participating publishers allow download of full text XML
      • BioMed Central
      • Public Library of Science
    • Hopefully, more will follow….
    • XML made available via OAI interface
    http://www. pubmedcentral .com/about/ oai .html
  • Data mining - PubMed Central
  • Adding structure to full text data
    • Some examples of useful structure:
    • Structure of article itself (figure legends, materials and methods, references etc)
    • MathML, CML etc
    • Disambiguated references to genes/proteins…
  • Authoring tools are key
    • Manuscript structure EndNote, TeX/BibTeX pretty good already
    • MathML
    • Publicon, TeX etc.
    • CML
    • Chemsketch etc.
    • Gene/protein reference markup ?
    • Semi-automatic markup during authoring
    • Author reviews and confirms markup
    • System prompts author to clarify ambiguity c.f. grammar checker, code intelligence
  • Summary
    • What is Open Access publishing?
    • Open Access publishing and text mining
    • BMC Bioinformatics
    • The BioCreative supplement
  • BMC series of online journals
    • BMC Biochemistry
    • BMC Bioinformatics
    • BMC Biotechnology
    • BMC Cell Biology
    • BMC Chemical Biology
    • BMC Developmental Biology
    • BMC Ecology
    • BMC Evolutionary Biology
    • BMC Genetics
    • BMC Genomics
    • BMC Immunology
    • BMC Microbiology
    • BMC Molecular Biology
    • BMC Neuroscience
    • BMC Pharmacology
    • BMC Physiology
    • BMC Plant Biology
    • BMC Structural Biology
    • BMC Anesthesiology
    • BMC Blood Disorders
    • BMC Cancer
    • BMC Cardiovascular Disorders
    • BMC Clinical Pathology
    • BMC Clinical Pharmacology
    • BMC Complementary and Alternative Medicine
    • BMC Dermatology
    • BMC Ear, Nose and Throat Disorders
    • BMC Emergency Medicine
    • BMC Endocrine Disorders
    • BMC Family Practice
    • BMC Gastroenterology
    • BMC Geriatrics
    • BMC Health Services Research
    • BMC Infectious Diseases
    • BMC International Health and Human Rights
    • BMC Medical Education
    • BMC Medical Ethics
    • BMC Medical Genetics
    • BMC Medical Imaging
    • BMC Medical Informatics and Decision Making
    • BMC Medical Research Methodology
    • BMC Musculoskeletal Disorders
    • BMC Nephrology
    • BMC Neurology
    • BMC Nuclear Medicine
    • BMC Nursing
    • BMC Ophthalmology
    • BMC Oral Health
    • BMC Palliative Care
    • BMC Pediatrics
    • BMC Pregnancy and Childbirth
    • BMC Psychiatry
    • BMC Public Health
    • BMC Pulmonary Medicine
    • BMC Surgery
    • BMC Urology
    • BMC Women's Health
  • BMC Bioinformatics
  • RSS feeds
  • Open access leads to high visibility
    • Indexing/Linking
    • PubMed
    • ISI
    • BIOSIS
    • CAS
    • CrossRef
    • Scirus
    • Open Archive Initiative
    • Citebase
    • Google
    • Archiving
    • PubMed Central
    • INIST
    • LOCKSS
    • Max Planck
    • OhioLINK
  • BMC Bioinformatics - citation impact
  • Summary
    • What is Open Access publishing?
    • Open Access publishing and text mining
    • About BMC Bioinformatics
    • The BioCreative supplement
  • Process for publishing in BMC Bioinformatics supplement
    • Follow BMC Bioinformatics ‘Research Article’ instructions for authors
    • Send articles to BioCreative organizers who will coordinate peer review [do not submit articles online]
    • Supplement passed on to BioMed Central for XML markup and publication
    • $400 processing charge/article
  • Instructions for authors
  • Access to supplement
    • All articles in supplement covered by BioMed Central’s Open Access licence agreement
      • Free access
      • Free re-distribution/re-use
    • Supplement indexed in PubMed and permanently archived in PubMed Central
  • That’s it