SlideShare a Scribd company logo
1 of 32
Workshop overview
• Y/our backgrounds and interests and what we want
• How does mining work and what can it do for YOU/Cochrane?
• Demonstration with emphasis on dictionaries.
• What would YOU like a system to do?
• Your dictionary/ies in action
• Advanced (chemistry, diagram mining)
• ANY early adopter can obtain our (Open) software and run it at
home for any resource (medical, agricultural, government, climate,
etc.). We will help you during next 24 hours.
• All material CC BY.
Cochrane UK & Ireland
Symposium 2016,
Birmingham, UK, 2016-03-15
Let the Machine Help
with your
Systematic Reviews
Peter Murray-Rust1,2
Christopher Kittel2
[1]University of Cambridge
[2]TheContentMine
Simple, Universal,
Knowledge creation and re-use
The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org
Resources
• Europe PubMedCentral http://europepmc.org/
• ContentMine toolkit https://github.com/ContentMine/
• Wikidata:
https://www.wikidata.org/wiki/Wikidata:Main_Page
• Hypothes.is https://hypothes.is/ [1]
• Etherpad: http://pads.cottagelabs.com/p/cochrane2016
• Note: early adopters can obtain our (Open) software and
run it at home…
• [1] Not used in CochraneBham workshop
Europe PubMedCentral
catalogue
getpapers
query
Daily
Crawl
EPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
dictionaries
Dictionaries!
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
Dict A
Dict B
Image
Caption
Table
Caption
MINING
with sections
and dictionaries
[W3C Annotation / https://hypothes.is/ ]
Disease Dictionary (ICD-10)
<dictionary title="disease">
<entry term="1p36 deletion syndrome"/>
<entry term="1q21.1 deletion syndrome"/>
<entry term="1q21.1 duplication syndrome"/>
<entry term="3-methylglutaconic aciduria"/>
<entry term="3mc syndrome”
<entry term="corpus luteum cyst”/>
<entry term="cortical blindness" />
SELECT DISTINCT ?thingLabel WHERE {
?thing wdt:P494 ?wd .
?thing wdt:P279 wd:Q12136 .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" }
}
wdt:P494 = ICD-10 (P494) identifier
wd:Q12136 = disease (Q12136) abnormal condition that
affects the body of an organism
Wikidata ontology for disease
• ChEBI (chemicals at EBI)
ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/names_3star.tsv.gz)
• combined with WIKIDATA: World Health Organisation International Nonproprietary Name
(P2275)
* => 4947 items in the dictionary (inn.xml)
DRUGS
<dictionary title="inn">
<entry term="(r)-fenfluramine"/>
<entry term="abacavir"/>
<entry term="abafungin"/>
<entry term="abafungina"/>
<entry term="abafungine"/>
<entry term="abafunginum"/>
<entry term="abamectin"/>
<entry term="abarelix"/>
<entry term="abatacept"/>
<dictionary title="funders">
<!— from http://help.crossref.org/funder-registry with
thanks -->
<entry id="http://dx.doi.org/10.13039/100001436"
term="1675 Foundation"/>
<entry id="http://dx.doi.org/10.13039/100004343"
term="3M"/>
<entry id=“http://dx.doi.org/10.13039/501100005957”
term="8020 Promotion Foundation"/>
<entry id="http://dx.doi.org/10.13039/501100007139"
term="A Richer Life Foundation"/>
<entry id="http://dx.doi.org/10.13039/100006543"
term="A World Celiac Community Foundation"/>
<entry id="http://dx.doi.org/10.13039/100001962"
term="A-T Children's Project"/>
<entry id="http://dx.doi.org/10.13039/100008456"
term="A. Alfred Taubman Medical Research Institute"/>
11566 entries
Funders Dictionary
Dengue Mosquito
<dictionary name="genus">
<entry term="Aa"/>
<entry term="Aaaba"/>
<entry term="Aacanthocnema"/>
<entry term="Aaosphaeria"/>
<entry term="Aaptos"/>
<entry term="Aaptosyax"/>
<entry term="Aaroniella"/>
<entry term="Aaronsohnia"/>
<entry term="Abablemma"/>
Genera from NCBI TaxDump
<dictionary title="hgnc">
<entry term="A1BG" name="alpha-1-B glycoprotein"/>
<entry term="A1BG-AS1" name="A1BG antisense RNA 1"/>
<entry term="A1CF"
name="APOBEC1 complementation factor"/>
<entry term="A2M" name="alpha-2-macroglobulin"/>
<entry term="A2M-AS1"
name="A2M antisense RNA 1 (head to head)"/>
<entry term="A2ML1" name="alpha-2-macroglobulin-like 1"/>
<entry term="A2ML1-AS1" name="A2ML1 antisense RNA 1"/>
Human Genes (HGNC)
<entry term="Aaas"
name="achalasia, adrenocortical insufficiency, alacrimia"/>
<entry term="Aacs" name="acetoacetyl-CoA synthetase"/>
<entry term="Aadac"
name="arylacetamide deacetylase (esterase)"/>
<entry term="Aadacl2"
name="arylacetamide deacetylase-like 2"/>
<entry term="Aadacl3"
name="arylacetamide deacetylase-like 3"/>
<entry term="Aadat" name="aminoadipate aminotransferase"/>
<entry term="Aaed1"
name="AhpC/TSA antioxidant enzyme domain containing 1"/>
<entry term="Aagab"
name="alpha- and gamma-adaptin binding protein"/>
<entry term="Aak1" name="AP2 associated kinase 1"/>
<entry term="Aamdc"
name="adipogenesis associated Mth938 domain containing"/>
<entry term="Aamp"
name="angio-associated migratory protein"/>
Mouse genes (JAXson)
Ebola!
<dictionary title="tropicalVirus">
<entry term="ZIKV" name="Zika virus"/>
<entry term="Zika" name="Zika virus"/>
<entry term="DENV" name="Dengue virus"/>
<entry term="Dengue" name="Dengue virus"/>
<entry term="CHIKV" name="Chikungunya virus"/>
<entry term="Chikungunya" name="Chikungunya virus"/>
<entry term="WNV" name="West Nile virus"/>
<entry term="West Nile" name="West Nile virus"/>
<entry term="YFV" name="Yellow fever virus"/>
<entry term="Yellow fever" name="Yellow fever virus"/>
<entry term="HPV" name="Human papilloma virus"/>
<entry term="Human papilloma virus"
name="Human papilloma virus"/>
</dictionary>
Terms co-ocurring with “Zika”
<dictionary title="cochrane">
<entry term="Cochrane Library"/>
<entry term="Cochrane Reviews"/>
<entry
term="Cochrane Central Register of Controlled Trials"/>
<entry term="Cochrane"/>
<entry term="randomize"/>
<entry term="meta-analysis"/>
<entry term="Embase"/>
<entry term="MEDLINE"/>
<entry term="eligibility"/>
<entry term="exclusion"/>
<entry term="outcome"/>
<entry term="Review Manager"/>
<entry term="STATA"/>
<entry term="RCT"/>
</dictionary>
Terms lexically related to “meta-analysis”
Mining strategy
• Discover. negotiate permissions . => bibliography
• Crawl / Scrape (download), documents AND
supplemental
• Normalize. PDF => XML
• Index: facets => Facts and snippets (“entities”)
• Interpret/analyze entities => relationships,
aggregations (“Transformative”)
• Publish
catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
Demo
PMR runs getpapers and ami
Chris runs Python visualization of drug co-occurrence
Systematic Reviews
Can we:
• eliminate true negatives automatically?
• extract data from formulaic language?
• mine diagrams?
• Annotate existing sources?
• forward-reference clinical trials?
Polly has 20 seconds to read this paper…
…and 10,000 more
ContentMine software can do this in a few minutes
Polly: “there were 10,000 abstracts and due
to time pressures, we split this between 6
researchers. It took about 2-3 days of work
(working only on this) to get through
~1,600 papers each. So, at a minimum this
equates to 12 days of full-time work (and
would normally be done over several weeks
under normal time pressures).”
400,000 Clinical Trials
In 10 government registries
Mapping trials => papers
http://www.trialsjournal.com/content/16/1/80
2009 => 2015. What’s
happened in last 6 years??
Search the whole scientific literature
For “2009-0100068-41”
Diagram Mining
Ln Bacterial load per fly
11.5
11.0
10.5
10.0
9.5
9.0
6.5
6.0
Days post—infection
0 1 2 3 4 5
Bitmap Image and Tesseract OCR
Workshop overview dictionary demonstration
Workshop overview dictionary demonstration
Workshop overview dictionary demonstration

More Related Content

What's hot

ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! TheContentMine
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSSpetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS TheContentMine
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in CambridgeTheContentMine
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)TheContentMine
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistrypetermurrayrust
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 

What's hot (20)

ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 

Viewers also liked

ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trialspetermurrayrust
 
Cochrane Library (BVS)
Cochrane Library (BVS) Cochrane Library (BVS)
Cochrane Library (BVS) Edgar Silva
 
Ensayo clínico aleatorizado
Ensayo clínico aleatorizadoEnsayo clínico aleatorizado
Ensayo clínico aleatorizadoEly. van morc
 
Ensayo clínico aleatorizado
Ensayo clínico aleatorizadoEnsayo clínico aleatorizado
Ensayo clínico aleatorizadoEly. van morc
 
Cálculo del tamaño de la muestra
Cálculo del tamaño de la muestraCálculo del tamaño de la muestra
Cálculo del tamaño de la muestraEly. van morc
 
Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...
Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...
Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...Carlos Cuello
 
Revisiones sistemáticas
Revisiones sistemáticasRevisiones sistemáticas
Revisiones sistemáticasEly. van morc
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
El Ensayo Clínico Aleatorio: introducción
El Ensayo Clínico Aleatorio: introducciónEl Ensayo Clínico Aleatorio: introducción
El Ensayo Clínico Aleatorio: introducciónCarlos Cuello
 
Guias de practica clinica 2016 (3a parte)
Guias de practica clinica 2016 (3a parte)Guias de practica clinica 2016 (3a parte)
Guias de practica clinica 2016 (3a parte)Carlos Cuello
 
Sesión clínica: "Meta análisis y revisiones sistemáticas"
Sesión clínica: "Meta análisis y revisiones sistemáticas"Sesión clínica: "Meta análisis y revisiones sistemáticas"
Sesión clínica: "Meta análisis y revisiones sistemáticas"csjesusmarin
 
Lectura Critica de articulos médicos
Lectura Critica  de articulos médicosLectura Critica  de articulos médicos
Lectura Critica de articulos médicosRafael Bravo Toledo
 

Viewers also liked (12)

ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
Cochrane Library (BVS)
Cochrane Library (BVS) Cochrane Library (BVS)
Cochrane Library (BVS)
 
Ensayo clínico aleatorizado
Ensayo clínico aleatorizadoEnsayo clínico aleatorizado
Ensayo clínico aleatorizado
 
Ensayo clínico aleatorizado
Ensayo clínico aleatorizadoEnsayo clínico aleatorizado
Ensayo clínico aleatorizado
 
Cálculo del tamaño de la muestra
Cálculo del tamaño de la muestraCálculo del tamaño de la muestra
Cálculo del tamaño de la muestra
 
Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...
Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...
Guias de practica clinica 2016 (primera parte): Introducción, alcances, objet...
 
Revisiones sistemáticas
Revisiones sistemáticasRevisiones sistemáticas
Revisiones sistemáticas
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
El Ensayo Clínico Aleatorio: introducción
El Ensayo Clínico Aleatorio: introducciónEl Ensayo Clínico Aleatorio: introducción
El Ensayo Clínico Aleatorio: introducción
 
Guias de practica clinica 2016 (3a parte)
Guias de practica clinica 2016 (3a parte)Guias de practica clinica 2016 (3a parte)
Guias de practica clinica 2016 (3a parte)
 
Sesión clínica: "Meta análisis y revisiones sistemáticas"
Sesión clínica: "Meta análisis y revisiones sistemáticas"Sesión clínica: "Meta análisis y revisiones sistemáticas"
Sesión clínica: "Meta análisis y revisiones sistemáticas"
 
Lectura Critica de articulos médicos
Lectura Critica  de articulos médicosLectura Critica  de articulos médicos
Lectura Critica de articulos médicos
 

Similar to Workshop overview dictionary demonstration

OSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | BioschemasOSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | BioschemasOpen Science Fair
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web servicesTrish Whetzel
 
Systematic Review
Systematic ReviewSystematic Review
Systematic Review2015UPM
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)markmac
 
Web services and the Development of Semantic Applications
Web services and the Development of Semantic ApplicationsWeb services and the Development of Semantic Applications
Web services and the Development of Semantic ApplicationsTrish Whetzel
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Cohg presentation for drf day
Cohg presentation for drf dayCohg presentation for drf day
Cohg presentation for drf dayAnne Littlewood
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistrySunghwan Kim
 

Similar to Workshop overview dictionary demonstration (20)

OSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | BioschemasOSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | Bioschemas
 
Systematic reviews searching part 2 2019
Systematic reviews searching part 2 2019Systematic reviews searching part 2 2019
Systematic reviews searching part 2 2019
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Systematic Review
Systematic ReviewSystematic Review
Systematic Review
 
Literature search
Literature searchLiterature search
Literature search
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Ebi public meeting on internet chemistry databases november 2010
Ebi public meeting on internet chemistry databases november 2010Ebi public meeting on internet chemistry databases november 2010
Ebi public meeting on internet chemistry databases november 2010
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)
 
Web services and the Development of Semantic Applications
Web services and the Development of Semantic ApplicationsWeb services and the Development of Semantic Applications
Web services and the Development of Semantic Applications
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
Cohg presentation for drf day
Cohg presentation for drf dayCohg presentation for drf day
Cohg presentation for drf day
 
Data base
Data baseData base
Data base
 
How the web has weaved a web of interlinked chemistry data final
How the web has weaved a web of interlinked chemistry data finalHow the web has weaved a web of interlinked chemistry data final
How the web has weaved a web of interlinked chemistry data final
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistry
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 

More from petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestpetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 

More from petermurrayrust (20)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 

Recently uploaded

Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranTara Rajendran
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdfDolisha Warbi
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Prerana Jadhav
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Presentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPresentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPrerana Jadhav
 
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners
 
systemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxsystemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxEyobAlemu11
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdfDolisha Warbi
 
Tans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptxTans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptxKezaiah S
 
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfSGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfHongBiThi1
 
SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptMumux Mirani
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Classmanuelazg2001
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiGoogle
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMAANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMADivya Kanojiya
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptxBibekananda shah
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt downloadAnkitKumar311566
 
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxPERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxdrashraf369
 
maternal mortality and its causes and how to reduce maternal mortality
maternal mortality and its causes and how to reduce maternal mortalitymaternal mortality and its causes and how to reduce maternal mortality
maternal mortality and its causes and how to reduce maternal mortalityhardikdabas3
 

Recently uploaded (20)

Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes Functions
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Presentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPresentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous System
 
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
 
systemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxsystemic bacteriology (7)............pptx
systemic bacteriology (7)............pptx
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
 
Tans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptxTans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptx
 
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfSGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
 
SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.ppt
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Class
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali Rai
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMAANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt download
 
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxPERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
 
maternal mortality and its causes and how to reduce maternal mortality
maternal mortality and its causes and how to reduce maternal mortalitymaternal mortality and its causes and how to reduce maternal mortality
maternal mortality and its causes and how to reduce maternal mortality
 

Workshop overview dictionary demonstration

  • 1. Workshop overview • Y/our backgrounds and interests and what we want • How does mining work and what can it do for YOU/Cochrane? • Demonstration with emphasis on dictionaries. • What would YOU like a system to do? • Your dictionary/ies in action • Advanced (chemistry, diagram mining) • ANY early adopter can obtain our (Open) software and run it at home for any resource (medical, agricultural, government, climate, etc.). We will help you during next 24 hours. • All material CC BY.
  • 2. Cochrane UK & Ireland Symposium 2016, Birmingham, UK, 2016-03-15 Let the Machine Help with your Systematic Reviews Peter Murray-Rust1,2 Christopher Kittel2 [1]University of Cambridge [2]TheContentMine Simple, Universal, Knowledge creation and re-use
  • 3. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011 http://contentmine.org
  • 4. Resources • Europe PubMedCentral http://europepmc.org/ • ContentMine toolkit https://github.com/ContentMine/ • Wikidata: https://www.wikidata.org/wiki/Wikidata:Main_Page • Hypothes.is https://hypothes.is/ [1] • Etherpad: http://pads.cottagelabs.com/p/cochrane2016 • Note: early adopters can obtain our (Open) software and run it at home… • [1] Not used in CochraneBham workshop
  • 6.
  • 7. catalogue getpapers query Daily Crawl EPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature dictionaries
  • 9. abstract methods references Captioned Figures Fig. 1 HTML tables abstract methods references Captioned Figures Fig. 1 HTML tables Dict A Dict B Image Caption Table Caption MINING with sections and dictionaries [W3C Annotation / https://hypothes.is/ ]
  • 10. Disease Dictionary (ICD-10) <dictionary title="disease"> <entry term="1p36 deletion syndrome"/> <entry term="1q21.1 deletion syndrome"/> <entry term="1q21.1 duplication syndrome"/> <entry term="3-methylglutaconic aciduria"/> <entry term="3mc syndrome” <entry term="corpus luteum cyst”/> <entry term="cortical blindness" /> SELECT DISTINCT ?thingLabel WHERE { ?thing wdt:P494 ?wd . ?thing wdt:P279 wd:Q12136 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } wdt:P494 = ICD-10 (P494) identifier wd:Q12136 = disease (Q12136) abnormal condition that affects the body of an organism Wikidata ontology for disease
  • 11. • ChEBI (chemicals at EBI) ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/names_3star.tsv.gz) • combined with WIKIDATA: World Health Organisation International Nonproprietary Name (P2275) * => 4947 items in the dictionary (inn.xml) DRUGS <dictionary title="inn"> <entry term="(r)-fenfluramine"/> <entry term="abacavir"/> <entry term="abafungin"/> <entry term="abafungina"/> <entry term="abafungine"/> <entry term="abafunginum"/> <entry term="abamectin"/> <entry term="abarelix"/> <entry term="abatacept"/>
  • 12. <dictionary title="funders"> <!— from http://help.crossref.org/funder-registry with thanks --> <entry id="http://dx.doi.org/10.13039/100001436" term="1675 Foundation"/> <entry id="http://dx.doi.org/10.13039/100004343" term="3M"/> <entry id=“http://dx.doi.org/10.13039/501100005957” term="8020 Promotion Foundation"/> <entry id="http://dx.doi.org/10.13039/501100007139" term="A Richer Life Foundation"/> <entry id="http://dx.doi.org/10.13039/100006543" term="A World Celiac Community Foundation"/> <entry id="http://dx.doi.org/10.13039/100001962" term="A-T Children's Project"/> <entry id="http://dx.doi.org/10.13039/100008456" term="A. Alfred Taubman Medical Research Institute"/> 11566 entries Funders Dictionary
  • 14. <dictionary name="genus"> <entry term="Aa"/> <entry term="Aaaba"/> <entry term="Aacanthocnema"/> <entry term="Aaosphaeria"/> <entry term="Aaptos"/> <entry term="Aaptosyax"/> <entry term="Aaroniella"/> <entry term="Aaronsohnia"/> <entry term="Abablemma"/> Genera from NCBI TaxDump
  • 15. <dictionary title="hgnc"> <entry term="A1BG" name="alpha-1-B glycoprotein"/> <entry term="A1BG-AS1" name="A1BG antisense RNA 1"/> <entry term="A1CF" name="APOBEC1 complementation factor"/> <entry term="A2M" name="alpha-2-macroglobulin"/> <entry term="A2M-AS1" name="A2M antisense RNA 1 (head to head)"/> <entry term="A2ML1" name="alpha-2-macroglobulin-like 1"/> <entry term="A2ML1-AS1" name="A2ML1 antisense RNA 1"/> Human Genes (HGNC)
  • 16. <entry term="Aaas" name="achalasia, adrenocortical insufficiency, alacrimia"/> <entry term="Aacs" name="acetoacetyl-CoA synthetase"/> <entry term="Aadac" name="arylacetamide deacetylase (esterase)"/> <entry term="Aadacl2" name="arylacetamide deacetylase-like 2"/> <entry term="Aadacl3" name="arylacetamide deacetylase-like 3"/> <entry term="Aadat" name="aminoadipate aminotransferase"/> <entry term="Aaed1" name="AhpC/TSA antioxidant enzyme domain containing 1"/> <entry term="Aagab" name="alpha- and gamma-adaptin binding protein"/> <entry term="Aak1" name="AP2 associated kinase 1"/> <entry term="Aamdc" name="adipogenesis associated Mth938 domain containing"/> <entry term="Aamp" name="angio-associated migratory protein"/> Mouse genes (JAXson)
  • 18. <dictionary title="tropicalVirus"> <entry term="ZIKV" name="Zika virus"/> <entry term="Zika" name="Zika virus"/> <entry term="DENV" name="Dengue virus"/> <entry term="Dengue" name="Dengue virus"/> <entry term="CHIKV" name="Chikungunya virus"/> <entry term="Chikungunya" name="Chikungunya virus"/> <entry term="WNV" name="West Nile virus"/> <entry term="West Nile" name="West Nile virus"/> <entry term="YFV" name="Yellow fever virus"/> <entry term="Yellow fever" name="Yellow fever virus"/> <entry term="HPV" name="Human papilloma virus"/> <entry term="Human papilloma virus" name="Human papilloma virus"/> </dictionary> Terms co-ocurring with “Zika”
  • 19. <dictionary title="cochrane"> <entry term="Cochrane Library"/> <entry term="Cochrane Reviews"/> <entry term="Cochrane Central Register of Controlled Trials"/> <entry term="Cochrane"/> <entry term="randomize"/> <entry term="meta-analysis"/> <entry term="Embase"/> <entry term="MEDLINE"/> <entry term="eligibility"/> <entry term="exclusion"/> <entry term="outcome"/> <entry term="Review Manager"/> <entry term="STATA"/> <entry term="RCT"/> </dictionary> Terms lexically related to “meta-analysis”
  • 20. Mining strategy • Discover. negotiate permissions . => bibliography • Crawl / Scrape (download), documents AND supplemental • Normalize. PDF => XML • Index: facets => Facts and snippets (“entities”) • Interpret/analyze entities => relationships, aggregations (“Transformative”) • Publish
  • 21. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
  • 22. Demo PMR runs getpapers and ami Chris runs Python visualization of drug co-occurrence
  • 23. Systematic Reviews Can we: • eliminate true negatives automatically? • extract data from formulaic language? • mine diagrams? • Annotate existing sources? • forward-reference clinical trials?
  • 24. Polly has 20 seconds to read this paper… …and 10,000 more
  • 25. ContentMine software can do this in a few minutes Polly: “there were 10,000 abstracts and due to time pressures, we split this between 6 researchers. It took about 2-3 days of work (working only on this) to get through ~1,600 papers each. So, at a minimum this equates to 12 days of full-time work (and would normally be done over several weeks under normal time pressures).”
  • 26. 400,000 Clinical Trials In 10 government registries Mapping trials => papers http://www.trialsjournal.com/content/16/1/80 2009 => 2015. What’s happened in last 6 years?? Search the whole scientific literature For “2009-0100068-41”
  • 27.
  • 29. Ln Bacterial load per fly 11.5 11.0 10.5 10.0 9.5 9.0 6.5 6.0 Days post—infection 0 1 2 3 4 5 Bitmap Image and Tesseract OCR