SlideShare a Scribd company logo
1 of 14
Download to read offline
ContentMine:
extracting millions of facts from scientific literature
@jenny_molloy EMBL-EBI – 17 June 2015
What is content?
What is mining?
1982
“Automatically generating logical representations of
text passages... by means of an analysis of the
coherence structure of the passages.”
Jerry R. Hobbs, Donald E. Walker, and Robert A. Amsler. 1982. Natural language access to structured text. In Proceedings of the 9th
conference on Computational linguistics - Volume 1(COLING '82), Ján Horecký (Ed.), Vol. 1. Academia Praha, , Czechoslovakia, 127-132.
DOI=10.3115/991813.991833 http://dx.doi.org/10.3115/991813.991833
2008
“The use of automated methods for exploiting
the enormous amount of knowledge available in
the biomedical literature.”
Cohen, K. Bretonnel; Hunter, Lawrence (2008). "Getting Started in Text Mining". PLoS Computational
Biology 4 (1): e20. doi:10.1371/journal.pcbi.0040020. PMC 2217579.PMID 18225946.
Mining Examples
Building bacterial supertrees
Mining chemical reactions
Better genome annotation
Only ~4% phylogenetic analyses
make underlying data available.
Supertrees
Content Mining enables AUTOMATED
extraction from daily literature and
conversion to NeXML:
- Machine-readable
- Open
- Reuseable
RAW data would be optimal!
PLUTo: Ross Mounce & Peter Murray-Rust
Chemistry
AMI reads and recognises chemicals
structures.
Can even create reaction animation.
Natural language processing
can be used to analyse
chemical methods. These are
FACTS but the paper itself may
be copyrighted.
Clinical Trials
Clinical trials offer clear use cases
for content mining.
Data extraction from graphs could be very
useful for meta-analyses where raw data is
unavailable.
Annotation
Many applications:
- Find primers
- Enhance positive controls
- Find novel sequence information
- More detailed and accurate annotation
Potential to improve
quality and efficiency
of genomic research.
Legal Considerations
Copyright
Database
rights
Contract
Law
2011
2014
From 2014
UK Law
Workshops, hackdays, presentations, collaborations,
discussions with librarians and publishers.
Putting new rights into action.
In Europe
2013
Shortly after
20132015
Research commisioned through H2020...any EU Directive >5 years away.
Ireland already considering following UK - plus other member states?.
Thank you very much
for your attention!
Any questions?
Peter Murray-Rust
Ross Mounce
Richard Smith-Unna
Steph Unna
Jenny Molloy
Mark MacGillivray
Graham Steel
Stefan Kasberger
Christopher Kittel
With thanks to:
Charles Oppenheim
Michelle Brook
Follow
@TheContentMine
contentmine.org
Find the code on
github.com/ContentMine
Funded by:
All images are licensed under CC-BY unless otherwise stated
What is Content?
Phylogenetic Tree from Figure 1 in Evolution and Taxonomic Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31,
HPV33, HPV35, HPV52, HPV58 and HPV67. Chen Z, Schiffman M, Herrero R, DeSalle R, Anastos K, et al. (2011) Evolution and Taxonomic
Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. PLoS ONE 6(5):
e20183. doi: 10.1371/journal.pone.0020183
Graph from He F, Fromion V, Westerhoff HV. (Im)Perfect robustness and adaptation of metabolic networks subject to metabolic and gene-expression
regulation: marrying control engineering with metabolic control analysis. BMC Syst Biol. 2013;7 131. doi:10.1186/1752-0509-7-131. PubMed PMID:
24261908; PubMed Central PMCID: PMC4222491.
Table from Table 1 Young GR, Mavrommatis B, Kassiotis G. Microarray analysis reveals global modulation of endogenous retroelement transcription by
microbes. Retrovirology. 2014;11 59. doi:10.1186/1742-4690-11-59. PubMed PMID: 25063042; PubMed Central PMCID: PMC4222864.
Text from Laidlaw CT, Condon JM, Belk MC. Viability Costs of Reproduction and Behavioral Compensation in Western Mosquitofish (Gambusia affinis).
PLoS One. 2014;9(11) e110524. doi:10.1371/journal.pone.0110524. PubMed PMID: 25365426; PubMed Central PMCID: PMC4217728.
Cell microscopy image from Pettinato G, Vanden Berg-Foels WS, Zhang N, Wen X. ROCK Inhibitor Is Not Required for Embryoid Body Formation from
Singularized Human Embryonic Stem Cells. PLoS One. 2014;9(11) e100742. doi:10.1371/journal.pone.0100742. PubMed PMID: 25365581; PubMed
Central PMCID: PMC4217711.
Supertrees:
Lang JM, Darling AE, Eisen JA. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One.
2013;8(4) e62510. doi:10.1371/journal.pone.0062510. PubMed PMID: 23638103; PubMed Central PMCID: PMC3636077.
McDowell A, Nagy I, Magyari M, Barnard E, Patrick S. The opportunistic pathogen Propionibacterium acnes: insights into typing, human disease, clonal
diversification and CAMP factor evolution. PLoS One. 2013;8(9) e70897. doi:10.1371/journal.pone.0070897. PubMed PMID: 24058439; PubMed Central
PMCID: PMC3772855.
Chemistry:
Diagram from Klejnstrup ML, Frandsen RJ, Holm DK, Nielsen MT, Mortensen UH, Larsen TO, Nielsen JB. Genetics of Polyketide Metabolism in
Aspergillus nidulans. Metabolites. 2012;2(1) 100-133. doi:10.3390/metabo2010100. PubMed PMID: 24957370; PubMed Central PMCID: PMC3901194.
Methods text from Greshock, T. J., Grubbs, A. W., Jiao, P., Wicklow, D. T., Gloer, J. B., & Williams, R. M. (2008). Isolation, Structure Elucidation, and
Biomimetic Total Synthesis of Versicolamide B, and the Isolation of Antipodal (−)‐Stephacidin A and (+)‐Notoamide B from Aspergillus versicolor NRRL
35600. Angewandte Chemie m frokInternational Edition, 47(19), 3573-3577.
Annotation:
Stubben, C. J., & Challacombe, J. F. (2014). Mining locus tags in PubMed Central to improve microbial gene annotation. BMC bioinformatics, 15(1), 43.
Figure from Haeussler, M., Gerner, M., & Bergman, C. M. (2011). Annotating genes and genomes with DNA sequences extracted from biomedical
articles. Bioinformatics, 27(7), 980-986.

More Related Content

What's hot

Anticancer ruthenium(ii) complexes - Anjali Devi J S
Anticancer ruthenium(ii) complexes - Anjali Devi J SAnticancer ruthenium(ii) complexes - Anjali Devi J S
Anticancer ruthenium(ii) complexes - Anjali Devi J SAnjali Devi J S
 
Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011NEB-2011
 
HVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry CuttingHVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry CuttingHuman Variome Project
 
Jonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
Jonathan Eisen talk for 2019 ADVANCE Scholar Award SymposiumJonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
Jonathan Eisen talk for 2019 ADVANCE Scholar Award SymposiumJonathan Eisen
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Hong ChangBum
 
Marine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesMarine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesJonathan Eisen
 
Nanoweapons: Nanotechnology Weapons Of Genocide
Nanoweapons: Nanotechnology Weapons Of GenocideNanoweapons: Nanotechnology Weapons Of Genocide
Nanoweapons: Nanotechnology Weapons Of Genocidebrightrainbow1172
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Bioanth.5hominidconc
Bioanth.5hominidconcBioanth.5hominidconc
Bioanth.5hominidconcjmckendricks
 
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2Wagner College
 
CHEM 2P20 SciFinder Exercise November 2015
CHEM 2P20 SciFinder Exercise November 2015CHEM 2P20 SciFinder Exercise November 2015
CHEM 2P20 SciFinder Exercise November 2015Brock University
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2Jonathan Eisen
 
Lesson 5.1 Activity: The Tree of Life
Lesson 5.1 Activity: The Tree of LifeLesson 5.1 Activity: The Tree of Life
Lesson 5.1 Activity: The Tree of LifeBig History Project
 

What's hot (20)

Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Anticancer ruthenium(ii) complexes - Anjali Devi J S
Anticancer ruthenium(ii) complexes - Anjali Devi J SAnticancer ruthenium(ii) complexes - Anjali Devi J S
Anticancer ruthenium(ii) complexes - Anjali Devi J S
 
CV_Todd Lorenz
CV_Todd LorenzCV_Todd Lorenz
CV_Todd Lorenz
 
Gene
GeneGene
Gene
 
Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011
 
HVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry CuttingHVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry Cutting
 
Kowalewski
KowalewskiKowalewski
Kowalewski
 
Human genetic diversity. ESHG Barcelona
Human genetic diversity. ESHG BarcelonaHuman genetic diversity. ESHG Barcelona
Human genetic diversity. ESHG Barcelona
 
Jonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
Jonathan Eisen talk for 2019 ADVANCE Scholar Award SymposiumJonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
Jonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
 
Marine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesMarine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and Opportunities
 
1
11
1
 
Nanoweapons: Nanotechnology Weapons Of Genocide
Nanoweapons: Nanotechnology Weapons Of GenocideNanoweapons: Nanotechnology Weapons Of Genocide
Nanoweapons: Nanotechnology Weapons Of Genocide
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
leung-summary
leung-summaryleung-summary
leung-summary
 
Bioanth.5hominidconc
Bioanth.5hominidconcBioanth.5hominidconc
Bioanth.5hominidconc
 
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
 
CHEM 2P20 SciFinder Exercise November 2015
CHEM 2P20 SciFinder Exercise November 2015CHEM 2P20 SciFinder Exercise November 2015
CHEM 2P20 SciFinder Exercise November 2015
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
Lesson 5.1 Activity: The Tree of Life
Lesson 5.1 Activity: The Tree of LifeLesson 5.1 Activity: The Tree of Life
Lesson 5.1 Activity: The Tree of Life
 

Viewers also liked

Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgpetermurrayrust
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistrypetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSSpetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Imagespetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in CambridgeTheContentMine
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)TheContentMine
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for factspetermurrayrust
 

Viewers also liked (20)

Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for Everyone
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for facts
 

Similar to ContentMine (EMBL-EBI Industry Programme)

ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 
The genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitismThe genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitismJoão Soares
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Dorothy Bishop
 
CV.Ximiao_He
CV.Ximiao_HeCV.Ximiao_He
CV.Ximiao_HeXimiao He
 
References on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopReferences on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopDorothy Bishop
 
Resume-CV - April 2016
Resume-CV - April 2016Resume-CV - April 2016
Resume-CV - April 2016Avanti Gokhale
 
s4021907_phd_finalthesis
s4021907_phd_finalthesiss4021907_phd_finalthesis
s4021907_phd_finalthesisPaul Berkman
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECTNusrat Gulbarga
 
Referencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodadoReferencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodadoelenard6
 
A natural history of the human mind
A natural history of the human mindA natural history of the human mind
A natural history of the human mindFrancys Subiaul
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral MedicineHarold Slavkin
 

Similar to ContentMine (EMBL-EBI Industry Programme) (20)

ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Tsoi cv umms_8_6_15
Tsoi cv umms_8_6_15Tsoi cv umms_8_6_15
Tsoi cv umms_8_6_15
 
The genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitismThe genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitism
 
My presentation2
My presentation2My presentation2
My presentation2
 
ncomms10165
ncomms10165ncomms10165
ncomms10165
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
 
BioPosterPP
BioPosterPPBioPosterPP
BioPosterPP
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
 
Bio
BioBio
Bio
 
CV.Ximiao_He
CV.Ximiao_HeCV.Ximiao_He
CV.Ximiao_He
 
References on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopReferences on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. Bishop
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
Resume-CV - April 2016
Resume-CV - April 2016Resume-CV - April 2016
Resume-CV - April 2016
 
Sarah Fox-Greer Resume
Sarah Fox-Greer ResumeSarah Fox-Greer Resume
Sarah Fox-Greer Resume
 
s4021907_phd_finalthesis
s4021907_phd_finalthesiss4021907_phd_finalthesis
s4021907_phd_finalthesis
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
Human genome project and elsi
Human genome project and elsiHuman genome project and elsi
Human genome project and elsi
 
Referencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodadoReferencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodado
 
A natural history of the human mind
A natural history of the human mindA natural history of the human mind
A natural history of the human mind
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral Medicine
 

More from Jenny Molloy

Engineering Life with Synthetic Biology
Engineering Life with Synthetic BiologyEngineering Life with Synthetic Biology
Engineering Life with Synthetic BiologyJenny Molloy
 
YEAR Conference 2015 - How to share our research data
YEAR Conference 2015 - How to share our research dataYEAR Conference 2015 - How to share our research data
YEAR Conference 2015 - How to share our research dataJenny Molloy
 
Legal Framework for TDM
Legal Framework for TDMLegal Framework for TDM
Legal Framework for TDMJenny Molloy
 
Introducing Open Science
Introducing Open ScienceIntroducing Open Science
Introducing Open ScienceJenny Molloy
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesJenny Molloy
 

More from Jenny Molloy (6)

Engineering Life with Synthetic Biology
Engineering Life with Synthetic BiologyEngineering Life with Synthetic Biology
Engineering Life with Synthetic Biology
 
YEAR Conference 2015 - How to share our research data
YEAR Conference 2015 - How to share our research dataYEAR Conference 2015 - How to share our research data
YEAR Conference 2015 - How to share our research data
 
Legal Framework for TDM
Legal Framework for TDMLegal Framework for TDM
Legal Framework for TDM
 
Introducing Open Science
Introducing Open ScienceIntroducing Open Science
Introducing Open Science
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro Slides
 
Id2 presentation
Id2 presentationId2 presentation
Id2 presentation
 

Recently uploaded

Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 

Recently uploaded (20)

Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 

ContentMine (EMBL-EBI Industry Programme)

  • 1. ContentMine: extracting millions of facts from scientific literature @jenny_molloy EMBL-EBI – 17 June 2015
  • 3. What is mining? 1982 “Automatically generating logical representations of text passages... by means of an analysis of the coherence structure of the passages.” Jerry R. Hobbs, Donald E. Walker, and Robert A. Amsler. 1982. Natural language access to structured text. In Proceedings of the 9th conference on Computational linguistics - Volume 1(COLING '82), Ján Horecký (Ed.), Vol. 1. Academia Praha, , Czechoslovakia, 127-132. DOI=10.3115/991813.991833 http://dx.doi.org/10.3115/991813.991833 2008 “The use of automated methods for exploiting the enormous amount of knowledge available in the biomedical literature.” Cohen, K. Bretonnel; Hunter, Lawrence (2008). "Getting Started in Text Mining". PLoS Computational Biology 4 (1): e20. doi:10.1371/journal.pcbi.0040020. PMC 2217579.PMID 18225946.
  • 4.
  • 5. Mining Examples Building bacterial supertrees Mining chemical reactions Better genome annotation
  • 6. Only ~4% phylogenetic analyses make underlying data available. Supertrees Content Mining enables AUTOMATED extraction from daily literature and conversion to NeXML: - Machine-readable - Open - Reuseable RAW data would be optimal! PLUTo: Ross Mounce & Peter Murray-Rust
  • 7. Chemistry AMI reads and recognises chemicals structures. Can even create reaction animation. Natural language processing can be used to analyse chemical methods. These are FACTS but the paper itself may be copyrighted.
  • 8. Clinical Trials Clinical trials offer clear use cases for content mining. Data extraction from graphs could be very useful for meta-analyses where raw data is unavailable.
  • 9. Annotation Many applications: - Find primers - Enhance positive controls - Find novel sequence information - More detailed and accurate annotation Potential to improve quality and efficiency of genomic research.
  • 11. 2011 2014 From 2014 UK Law Workshops, hackdays, presentations, collaborations, discussions with librarians and publishers. Putting new rights into action.
  • 12. In Europe 2013 Shortly after 20132015 Research commisioned through H2020...any EU Directive >5 years away. Ireland already considering following UK - plus other member states?.
  • 13. Thank you very much for your attention! Any questions? Peter Murray-Rust Ross Mounce Richard Smith-Unna Steph Unna Jenny Molloy Mark MacGillivray Graham Steel Stefan Kasberger Christopher Kittel With thanks to: Charles Oppenheim Michelle Brook Follow @TheContentMine contentmine.org Find the code on github.com/ContentMine Funded by:
  • 14. All images are licensed under CC-BY unless otherwise stated What is Content? Phylogenetic Tree from Figure 1 in Evolution and Taxonomic Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. Chen Z, Schiffman M, Herrero R, DeSalle R, Anastos K, et al. (2011) Evolution and Taxonomic Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. PLoS ONE 6(5): e20183. doi: 10.1371/journal.pone.0020183 Graph from He F, Fromion V, Westerhoff HV. (Im)Perfect robustness and adaptation of metabolic networks subject to metabolic and gene-expression regulation: marrying control engineering with metabolic control analysis. BMC Syst Biol. 2013;7 131. doi:10.1186/1752-0509-7-131. PubMed PMID: 24261908; PubMed Central PMCID: PMC4222491. Table from Table 1 Young GR, Mavrommatis B, Kassiotis G. Microarray analysis reveals global modulation of endogenous retroelement transcription by microbes. Retrovirology. 2014;11 59. doi:10.1186/1742-4690-11-59. PubMed PMID: 25063042; PubMed Central PMCID: PMC4222864. Text from Laidlaw CT, Condon JM, Belk MC. Viability Costs of Reproduction and Behavioral Compensation in Western Mosquitofish (Gambusia affinis). PLoS One. 2014;9(11) e110524. doi:10.1371/journal.pone.0110524. PubMed PMID: 25365426; PubMed Central PMCID: PMC4217728. Cell microscopy image from Pettinato G, Vanden Berg-Foels WS, Zhang N, Wen X. ROCK Inhibitor Is Not Required for Embryoid Body Formation from Singularized Human Embryonic Stem Cells. PLoS One. 2014;9(11) e100742. doi:10.1371/journal.pone.0100742. PubMed PMID: 25365581; PubMed Central PMCID: PMC4217711. Supertrees: Lang JM, Darling AE, Eisen JA. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One. 2013;8(4) e62510. doi:10.1371/journal.pone.0062510. PubMed PMID: 23638103; PubMed Central PMCID: PMC3636077. McDowell A, Nagy I, Magyari M, Barnard E, Patrick S. The opportunistic pathogen Propionibacterium acnes: insights into typing, human disease, clonal diversification and CAMP factor evolution. PLoS One. 2013;8(9) e70897. doi:10.1371/journal.pone.0070897. PubMed PMID: 24058439; PubMed Central PMCID: PMC3772855. Chemistry: Diagram from Klejnstrup ML, Frandsen RJ, Holm DK, Nielsen MT, Mortensen UH, Larsen TO, Nielsen JB. Genetics of Polyketide Metabolism in Aspergillus nidulans. Metabolites. 2012;2(1) 100-133. doi:10.3390/metabo2010100. PubMed PMID: 24957370; PubMed Central PMCID: PMC3901194. Methods text from Greshock, T. J., Grubbs, A. W., Jiao, P., Wicklow, D. T., Gloer, J. B., & Williams, R. M. (2008). Isolation, Structure Elucidation, and Biomimetic Total Synthesis of Versicolamide B, and the Isolation of Antipodal (−)‐Stephacidin A and (+)‐Notoamide B from Aspergillus versicolor NRRL 35600. Angewandte Chemie m frokInternational Edition, 47(19), 3573-3577. Annotation: Stubben, C. J., & Challacombe, J. F. (2014). Mining locus tags in PubMed Central to improve microbial gene annotation. BMC bioinformatics, 15(1), 43. Figure from Haeussler, M., Gerner, M., & Bergman, C. M. (2011). Annotating genes and genomes with DNA sequences extracted from biomedical articles. Bioinformatics, 27(7), 980-986.