Matthew Cockerill
Technical Director, BioMed Central
Text mining and
Open Access
publishing
March 30th 2004 BioCreative 2004
SummarySummary
 What is Open Access publishing?What is Open Access publishing?
 Open Ac...
March 30th 2004 BioCreative 2004
SummarySummary
 What is Open Access publishing?What is Open Access publishing?
 Open Ac...
March 30th 2004 BioCreative 2004
The current model of publishingThe current model of publishing
scientific researchscienti...
March 30th 2004 BioCreative 2004
What’s wrong with thisWhat’s wrong with this status quo?status quo?
 Restricted access t...
March 30th 2004 BioCreative 2004
BioMed CentralBioMed Central
The Open Access publisherThe Open Access publisher
 Commerc...
March 30th 2004 BioCreative 2004
Growth of BioMed CentralGrowth of BioMed Central
Open Access research
article publication...
March 30th 2004 BioCreative 2004
Momentum for Open AccessMomentum for Open Access
 PubMed CentralPubMed Central
 Public ...
March 30th 2004 BioCreative 2004
BioMed Central’s business modelBioMed Central’s business model
for open access publishing...
March 30th 2004 BioCreative 2004
Institutional membershipInstitutional membership
 CalTechCalTech
 Cancer Research UKCan...
March 30th 2004 BioCreative 2004
SummarySummary
 What is Open Access publishing?What is Open Access publishing?
 Open Ac...
March 30th 2004 BioCreative 2004
Mining the full textMining the full text
 Analysing results of high-throughputAnalysing ...
March 30th 2004 BioCreative 2004
Data mining - BioMed CentralData mining - BioMed Central
 Entire corpus of full text XML...
March 30th 2004 BioCreative 2004
Data mining - BioMed CentralData mining - BioMed Central
(screen shot)(screen shot)
March 30th 2004 BioCreative 2004
Data mining - PubMed CentralData mining - PubMed Central
 Standard NLM archiving/interch...
March 30th 2004 BioCreative 2004
Data mining - PubMed CentralData mining - PubMed Central
March 30th 2004 BioCreative 2004
Adding structure to full text dataAdding structure to full text data
Some examples of use...
March 30th 2004 BioCreative 2004
Authoring tools are keyAuthoring tools are key
Manuscript structureManuscript structure
E...
March 30th 2004 BioCreative 2004
SummarySummary
 What is Open Access publishing?What is Open Access publishing?
 Open Ac...
March 30th 2004 BioCreative 2004
BMC series of online journalsBMC series of online journals
 BMC BiochemistryBMC Biochemi...
March 30th 2004 BioCreative 2004
BMC BioinformaticsBMC Bioinformatics
March 30th 2004 BioCreative 2004
RSS feedsRSS feeds
March 30th 2004 BioCreative 2004
Open access leads to high visibilityOpen access leads to high visibility
Indexing/Linking...
March 30th 2004 BioCreative 2004
BMC Bioinformatics - citationBMC Bioinformatics - citation
impactimpact
BMC Bioinformatic...
March 30th 2004 BioCreative 2004
SummarySummary
 What is Open Access publishing?What is Open Access publishing?
 Open Ac...
March 30th 2004 BioCreative 2004
Process for publishing inProcess for publishing in BMCBMC
BioinformaticsBioinformatics su...
March 30th 2004 BioCreative 2004
Instructions for authorsInstructions for authors
March 30th 2004 BioCreative 2004
Access to supplementAccess to supplement
 All articles in supplement coveredAll articles...
March 30th 2004 BioCreative 2004
That’s itThat’s it
Upcoming SlideShare
Loading in...5
×

talk

289

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
289
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Acacdemics and biologists have embraced open source.
    Limits of librarians and scientists patience being reached, and tie-ins and sweeteners for initial multiyear online deals now running out.
  • talk

    1. 1. Matthew Cockerill Technical Director, BioMed Central Text mining and Open Access publishing
    2. 2. March 30th 2004 BioCreative 2004 SummarySummary  What is Open Access publishing?What is Open Access publishing?  Open Access publishing and textOpen Access publishing and text miningmining  About BMC BioinformaticsAbout BMC Bioinformatics  The BioCreative supplementThe BioCreative supplement
    3. 3. March 30th 2004 BioCreative 2004 SummarySummary  What is Open Access publishing?What is Open Access publishing?  Open Access publishing and textOpen Access publishing and text miningmining  About BMC BioinformaticsAbout BMC Bioinformatics  The BioCreative supplementThe BioCreative supplement
    4. 4. March 30th 2004 BioCreative 2004 The current model of publishingThe current model of publishing scientific researchscientific research  Scientists carry out researchScientists carry out research  They write up their resultsThey write up their results  They submit them to a journalThey submit them to a journal  Other scientists act as peerOther scientists act as peer reviewers and editorial advisersreviewers and editorial advisers  Finally, the publisherFinally, the publisher sellssells accessaccess to that research back to theto that research back to the scientific communityscientific community
    5. 5. March 30th 2004 BioCreative 2004 What’s wrong with thisWhat’s wrong with this status quo?status quo?  Restricted access to scientific researchRestricted access to scientific research is contrary to the interests ofis contrary to the interests of – the scientists who do the researchthe scientists who do the research – the funders who pay for itthe funders who pay for it – society as a wholesociety as a whole  It is an historical artefact of theIt is an historical artefact of the economics of print publishingeconomics of print publishing  It is a serious obstacle to mining of fullIt is a serious obstacle to mining of full text informationtext information
    6. 6. March 30th 2004 BioCreative 2004 BioMed CentralBioMed Central The Open Access publisherThe Open Access publisher  Commercial organizationCommercial organization  Published first article in mid-2000Published first article in mid-2000  Strict policy of immediate OpenStrict policy of immediate Open Access toAccess to allall research articlesresearch articles
    7. 7. March 30th 2004 BioCreative 2004 Growth of BioMed CentralGrowth of BioMed Central Open Access research article publications 0 500 1000 1500 2000 2000 2001 2002 2003 Fulltext accesses to Open Access articles 0m 1m 2m 3m 4m 5m 2000 2001 2002 2003
    8. 8. March 30th 2004 BioCreative 2004 Momentum for Open AccessMomentum for Open Access  PubMed CentralPubMed Central  Public Library of SciencePublic Library of Science  Open Access declarations:Open Access declarations: Budapest/Bethesda/BerlinBudapest/Bethesda/Berlin  Software open-source movementSoftware open-source movement  Mass cancellation of titles fromMass cancellation of titles from traditional publisherstraditional publishers
    9. 9. March 30th 2004 BioCreative 2004 BioMed Central’s business modelBioMed Central’s business model for open access publishingfor open access publishing  Keep costs down viaKeep costs down via – Online submission and peer reviewOnline submission and peer review – Automated tools to streamline articleAutomated tools to streamline article processing, conversion and layoutprocessing, conversion and layout  Processing charge (currently $525) forProcessing charge (currently $525) for accepted articlesaccepted articles  No processing charge for authors atNo processing charge for authors at member institutionsmember institutions
    10. 10. March 30th 2004 BioCreative 2004 Institutional membershipInstitutional membership  CalTechCalTech  Cancer Research UKCancer Research UK  Columbia UniversityColumbia University  Cornell UniversityCornell University  University of CaliforniaUniversity of California  Dana-Farber Cancer InstituteDana-Farber Cancer Institute  Harvard UniversityHarvard University  INSERMINSERM  Imperial CollegeImperial College  Institut PasteurInstitut Pasteur  John Innes CentreJohn Innes Centre  Johns Hopkins UniversityJohns Hopkins University  Kyoto UniversityKyoto University  Max Planck InstitutesMax Planck Institutes  Memorial Sloan-Kettering CancerMemorial Sloan-Kettering Cancer CenterCenter More than 400 institutions are members of BioMed Central, including,More than 400 institutions are members of BioMed Central, including, to name just a few:to name just a few:  MRC Laboratory of MolecularMRC Laboratory of Molecular BiologyBiology  National Institutes of HealthNational Institutes of Health  National Institute for MedicalNational Institute for Medical ResearchResearch  NHS EnglandNHS England  Princeton UniversityPrinceton University  Rockefeller UniversityRockefeller University  TIGRTIGR  TSRITSRI  Tufts UniversityTufts University  Wellcome Trust Sanger InstituteWellcome Trust Sanger Institute  University of WisconsinUniversity of Wisconsin  World Health OrganizationWorld Health Organization  Yale UniversityYale University
    11. 11. March 30th 2004 BioCreative 2004 SummarySummary  What is Open Access publishing?What is Open Access publishing?  Open Access publishing and textOpen Access publishing and text miningmining  About BMC BioinformaticsAbout BMC Bioinformatics  The BioCreative supplementThe BioCreative supplement
    12. 12. March 30th 2004 BioCreative 2004 Mining the full textMining the full text  Analysing results of high-throughputAnalysing results of high-throughput experiments means biologistsexperiments means biologists increasinglyincreasingly needneed text-mining toolstext-mining tools  PubMed is currently the primaryPubMed is currently the primary resource for text mining (“it’s what’sresource for text mining (“it’s what’s available”) but:available”) but: – Abstracts omit critical informationAbstracts omit critical information – Techniques developed for abstracts may notTechniques developed for abstracts may not effectively use extra information in full texteffectively use extra information in full text  Fully Open Access corpora, in standardFully Open Access corpora, in standard XML formats, will helpXML formats, will help
    13. 13. March 30th 2004 BioCreative 2004 Data mining - BioMed CentralData mining - BioMed Central  Entire corpus of full text XML downloadable byEntire corpus of full text XML downloadable by ftp as a single zip fileftp as a single zip file  Various groups working with the dataVarious groups working with the data – E.g Pre-BIND (automatic extraction of possibleE.g Pre-BIND (automatic extraction of possible protein-protein interaction information from full text)protein-protein interaction information from full text)  No restrictions on redistributionNo restrictions on redistribution  This means other groups can use same corpusThis means other groups can use same corpus to repeat and build on resultsto repeat and build on results http://www.biomedcentral.com/info/about/datamininghttp://www.biomedcentral.com/info/about/datamining
    14. 14. March 30th 2004 BioCreative 2004 Data mining - BioMed CentralData mining - BioMed Central (screen shot)(screen shot)
    15. 15. March 30th 2004 BioCreative 2004 Data mining - PubMed CentralData mining - PubMed Central  Standard NLM archiving/interchange XMLStandard NLM archiving/interchange XML DTD: common format across multipleDTD: common format across multiple publisherspublishers  Only a subset of PubMed Central participatingOnly a subset of PubMed Central participating publishers allow download of full text XMLpublishers allow download of full text XML – BioMed CentralBioMed Central – Public Library of SciencePublic Library of Science  Hopefully, more will follow….Hopefully, more will follow….  XML made available via OAI interfaceXML made available via OAI interface http://www.pubmedcentral.com/about/oai.html
    16. 16. March 30th 2004 BioCreative 2004 Data mining - PubMed CentralData mining - PubMed Central
    17. 17. March 30th 2004 BioCreative 2004 Adding structure to full text dataAdding structure to full text data Some examples of useful structure:Some examples of useful structure: 1.1. Structure of article itself (figureStructure of article itself (figure legends, materials and methods,legends, materials and methods, references etc)references etc) 2.2. MathML, CML etcMathML, CML etc 3.3. Disambiguated references toDisambiguated references to genes/proteins…genes/proteins…
    18. 18. March 30th 2004 BioCreative 2004 Authoring tools are keyAuthoring tools are key Manuscript structureManuscript structure EndNote, TeX/BibTeX pretty good alreadyEndNote, TeX/BibTeX pretty good already MathMLMathML Publicon, TeX etc.Publicon, TeX etc. CMLCML Chemsketch etc.Chemsketch etc. Gene/protein reference markupGene/protein reference markup?? Semi-automatic markup during authoringSemi-automatic markup during authoring Author reviews and confirms markupAuthor reviews and confirms markup System prompts author to clarify ambiguitySystem prompts author to clarify ambiguity c.f.c.f. grammar checker, code intelligencegrammar checker, code intelligence
    19. 19. March 30th 2004 BioCreative 2004 SummarySummary  What is Open Access publishing?What is Open Access publishing?  Open Access publishing and textOpen Access publishing and text miningmining  BMC BioinformaticsBMC Bioinformatics  The BioCreative supplementThe BioCreative supplement
    20. 20. March 30th 2004 BioCreative 2004 BMC series of online journalsBMC series of online journals  BMC BiochemistryBMC Biochemistry  BMC BioinformaticsBMC Bioinformatics  BMC BiotechnologyBMC Biotechnology  BMC Cell BiologyBMC Cell Biology  BMC Chemical BiologyBMC Chemical Biology  BMC Developmental BiologyBMC Developmental Biology  BMC EcologyBMC Ecology  BMC Evolutionary BiologyBMC Evolutionary Biology  BMC GeneticsBMC Genetics  BMC GenomicsBMC Genomics  BMC ImmunologyBMC Immunology  BMC MicrobiologyBMC Microbiology  BMC Molecular BiologyBMC Molecular Biology  BMC NeuroscienceBMC Neuroscience  BMC PharmacologyBMC Pharmacology  BMC PhysiologyBMC Physiology  BMC Plant BiologyBMC Plant Biology  BMC Structural BiologyBMC Structural Biology  BMC AnesthesiologyBMC Anesthesiology  BMC Blood DisordersBMC Blood Disorders  BMC CancerBMC Cancer  BMC Cardiovascular DisordersBMC Cardiovascular Disorders  BMC Clinical PathologyBMC Clinical Pathology  BMC Clinical PharmacologyBMC Clinical Pharmacology  BMC Complementary andBMC Complementary and Alternative MedicineAlternative Medicine  BMC DermatologyBMC Dermatology  BMC Ear, Nose and ThroatBMC Ear, Nose and Throat DisordersDisorders  BMC Emergency MedicineBMC Emergency Medicine  BMC Endocrine DisordersBMC Endocrine Disorders  BMC Family PracticeBMC Family Practice  BMC GastroenterologyBMC Gastroenterology  BMC GeriatricsBMC Geriatrics  BMC Health Services ResearchBMC Health Services Research  BMC Infectious DiseasesBMC Infectious Diseases  BMC International Health andBMC International Health and Human RightsHuman Rights  BMC Medical EducationBMC Medical Education  BMC Medical EthicsBMC Medical Ethics  BMC Medical GeneticsBMC Medical Genetics  BMC Medical ImagingBMC Medical Imaging  BMC Medical Informatics andBMC Medical Informatics and Decision MakingDecision Making  BMC Medical ResearchBMC Medical Research MethodologyMethodology  BMC Musculoskeletal DisordersBMC Musculoskeletal Disorders  BMC NephrologyBMC Nephrology  BMC NeurologyBMC Neurology  BMC Nuclear MedicineBMC Nuclear Medicine  BMC NursingBMC Nursing  BMC OphthalmologyBMC Ophthalmology  BMC Oral HealthBMC Oral Health  BMC Palliative CareBMC Palliative Care  BMC PediatricsBMC Pediatrics  BMC Pregnancy and ChildbirthBMC Pregnancy and Childbirth  BMC PsychiatryBMC Psychiatry  BMC Public HealthBMC Public Health  BMC Pulmonary MedicineBMC Pulmonary Medicine  BMC SurgeryBMC Surgery  BMC UrologyBMC Urology  BMC Women's HealthBMC Women's Health
    21. 21. March 30th 2004 BioCreative 2004 BMC BioinformaticsBMC Bioinformatics
    22. 22. March 30th 2004 BioCreative 2004 RSS feedsRSS feeds
    23. 23. March 30th 2004 BioCreative 2004 Open access leads to high visibilityOpen access leads to high visibility Indexing/LinkingIndexing/Linking  PubMedPubMed  MEDLINEMEDLINE  ISIISI  BIOSISBIOSIS  CASCAS  CrossRefCrossRef  ScirusScirus  Open Archive InitiativeOpen Archive Initiative  CitebaseCitebase  GoogleGoogle ArchivingArchiving PubMed CentralPubMed Central INISTINIST LOCKSSLOCKSS Max PlanckMax Planck OhioLINKOhioLINK
    24. 24. March 30th 2004 BioCreative 2004 BMC Bioinformatics - citationBMC Bioinformatics - citation impactimpact BMC Bioinformatics 0 100 200 300 400 2001 2002 2003 (projected) Number of articles published Times cited (ISI)
    25. 25. March 30th 2004 BioCreative 2004 SummarySummary  What is Open Access publishing?What is Open Access publishing?  Open Access publishing and textOpen Access publishing and text miningmining  About BMC BioinformaticsAbout BMC Bioinformatics  The BioCreative supplementThe BioCreative supplement
    26. 26. March 30th 2004 BioCreative 2004 Process for publishing inProcess for publishing in BMCBMC BioinformaticsBioinformatics supplementsupplement  FollowFollow BMC BioinformaticsBMC Bioinformatics ‘Research‘Research Article’ instructions for authorsArticle’ instructions for authors  Send articles to BioCreative organizersSend articles to BioCreative organizers who will coordinate peer reviewwho will coordinate peer review [do not submit articles online][do not submit articles online]  Supplement passed on to BioMedSupplement passed on to BioMed Central for XML markup and publicationCentral for XML markup and publication  $400 processing charge/article$400 processing charge/article
    27. 27. March 30th 2004 BioCreative 2004 Instructions for authorsInstructions for authors
    28. 28. March 30th 2004 BioCreative 2004 Access to supplementAccess to supplement  All articles in supplement coveredAll articles in supplement covered by BioMed Central’s Open Accessby BioMed Central’s Open Access licence agreementlicence agreement – Free accessFree access – Free re-distribution/re-useFree re-distribution/re-use  Supplement indexed in PubMedSupplement indexed in PubMed and permanently archived inand permanently archived in PubMed CentralPubMed Central
    29. 29. March 30th 2004 BioCreative 2004 That’s itThat’s it
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×