SlideShare a Scribd company logo
1 of 35
ContentMine and WikiData
Peter Murray-Rust
Wikimania,
London UK 2014-08-08
ContentMine: We use machines to
liberate 100 million facts /yr from the
scientific literature and make them free
for everyone (WikiData)
With Wikipedia we are ALL scientists
ContentMine is a social machine
WikiData is the future of science data
http://en.wikipedia.org/wiki/Tim_Berners-Lee
Everything in this presentation is ODOSOS
(Open Data, Open Standards, Open Source)
CC0, CC-BY, W3C etc., Apache2, etc. *
http://contentmine.org
http://bitbucket.org/petermr
http://wwmm.ch.cam.ac.uk
*Sorry about the Powerpoint (Power corrupts, Powerpoint corrupts absolutely (Tufte))
A promise: I (Petermr) will never sell out to non-transparent organizations.
petermr: I believe in Wikipedia
• 2006 http://en.wikipedia.org/wiki/User:Petermr
• 2006 started Open Data (term unknown then!)
• 2009: “the bit of Wikipedia that I wrote is correct” [challenging the
idea of “WP is junk”]
• 2009: “Wikipedia is the digital library of this century”
• 2012: I alert WP that Springer has copyrighted > 1000 of our
images [Springergate]
• 2014: “For facts in maths, physical and biological sciences I trust
Wikipedia.” (Wikimania2014)
A meritocratic
critical
volunteer
community
Volunteer community in chemistry: Open Data/Source/Standards
Scientific and Medical publication (STM)[+]
• World Citizens pay $400,000,000,000…
• … for research in 1,500,000 articles …
• … cost $300,000 each to create …
• … $7000 each to “publish” [*]…
• … $10,000,000,000 from academic libraries …
• … to “publishers” who forbid access to 99.9% of
citizens of the world …
[+] Figures probably +- 50 %
[*] arXiV preprint server costs $7 USD per paper
4 Billion USD on human genome
yielded 800 Billion USD and 4 M job-years
Gloom Warning
…three problems—flawed design, non-
publication, and poor reporting—together
meant >85% of research funds were wasted, a
global total loss >100 billion USD per year.
[Lancet 2009]
[Even more] waste clearly occurs after
publication: from poor access, poor
dissemination, and poor uptake of the findings
of research. [PLOS Medicine 2014-05-27]
Bad publication wastes science
Publishers’ PDFs destroy science
PDFs do not contain words
or subscripts!
PDFs do not contain tables
and do not have columns
SVG is turned into JPEG because it’s easier to process
Elsevier wants to control Open Data
[asked by Michelle Brook]
STM Publishers Licence
2012_03_15_Sample_Licence_Text_Data_Mining.pdf
(Summary: PMR has NO rights)
• [cannot publish to: ] “libraries, repositories, or archives”
• [cannot] “Make the results of any TDM Output available on an externally facing server or
website”
• “Subscriber shall pay a […] fee”
Heather Piwowar: “negotiating with publishers [made me physically ill]”
WE WALKED OUT
• Brit Library
• JISC
• RLUK
• OKFN
• …
• Ross Mounce
• PM-R
Licences destroy Content Mining
CLOSED ACCESS MEANS PEOPLE DIE
CLOSED DATA MEANS PEOPLE DIE
Happiness Restored
http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(Budapest Open Access Initiative, 2003)
The Right to Read is the Right to Mine
http://contentmine.org
• Science can be read and understood by
human-machine Amanuensis-symbionts.
• Amanuenses are based on Wikipedia,
databases and software (e.g. ContentMine’s
AMI)
• The results are fed back into WP and WikiData
http://en.wikipedia.org/wiki/Symbiosishttp://en.wikipedia.org/wiki/Eric_Fenby
• Crawl scientific literature
(Open Bibliography)
• Scrape each scientific article
(ContentMine-quickscrape)
• Extract the facts (ContentMine-AMI)
• Index (Wikipedia)
• Republish (WikiData)
Machine Extraction of scientific facts
Human-machine symbionts can read science!
WP_Lion
WP_Aspergillus_oryzae
WP_Soybean
Facts Marked by “non-scientists” in ContentMine workshops
With Wikipedia everyone can be a scientist
“nuggets” in a scientific paper
quantity
units
Value ranges
Humans aren’t designed to mine this … 
chemical
project places
Parsing chemical sentences
A FACT, uncopyrightable, and representable by triples
http://wwmm.ch.cam.ac.uk/chemicaltagger
• Typical
Typical chemical synthesis
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
RSU: Richard Smith-Unna
PMR: Peter Murray-Rust
CL: CottageLabs
Queues
Repos
Scientific
literature
Science
Plugins
Science
Volunteers
But we can now
turn PDFs into
Science
We can’t turn a hamburger into a cow
UNITS
TICKS
QUANTITY
SCALE
TITLES
DATA!!
2000+ points
Dumb PDF
CSV
Semantic
Spectrum
2nd Derivative
Gaussian
Filter
Automatic
extraction
Takes < 1 second
Bacterial WP_phylogenetic tree
Our machines have read and interpreted 4300 in an hour with > 95% accuracy
Trees From http://ijs.sgmjournals.org/ used under new UK legislation (Hargreaves)
WP: Clostridium_butyricum
Genbank ID
American Type
Culture Collection
(http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0036933 –
“Adaptive Evolution of HIV at HLA Epitopes Is Associated with Ethnicity in Canada” .
((n122,((n121,n205),((n39,(n84,((((n35,n98),n191),n22),n17))),((n10,n182),(
(((n232,n76),n68),(n109,n30)),(n73,(n106,n58))))))),((((((n103,n86),(n218,(n
215,n157))),((n164,n143),((n190,((n108,n177),(n192,n220))),((n233,n187),
n41)))),((((n59,n184),((n134,n200),(n137,(n212,((n92,n209),n29))))),(n88,(n
102,n161))),((((n70,n140),(n18,n188)),(n49,((n123,n132),(n219,n198)))),(((
n37,(n65,n46)),(n135,(n11,(n113,n142)))),(n210,((n69,(n216,n36)),(n231,n1
60))))))),(((n107,n43),((n149,n199),n74)),(((n101,(n19,n54)),n96),(n7,((n139
,n5),((n170,(n25,n75)),(n146,(n154,(n194,(((n14,n116),n112),(n126,n222)))
)))))))),(((((n165,(n168,n128)),n129),((n114,n181),(n48,n118))),((n158,(n91,(
n33,n213))),(n87,n235))),((n197,(n175,n117)),(n196,((n171,(n163,n227)),((
n53,n131),n159)))))));
http://en.wikipedia.org/wiki/Digital_image_processing
http://en.wikipedia.org/wiki/Newick_format http://en.wikipedia.org/wiki/Phylogenetics
Open notebook science is the practice of
making the entire primary record of a research
project publicly available online as it is
recorded. (WP)
Jean-Claude Bradley was a chemist who
actively promoted Open Science in
chemistry,… He coined the term Open
Notebook Science. … A memorial
symposium was held July 14, 2014 at
Cambridge University, UK.[9]
RSU: Richard Smith-Unna
PMR: Peter Murray-Rust
CL: CottageLabs
Queues
Repos
Scientific
literature
Science
Plugins
Science
Volunteers
My Wikiwishes
• An Open Bibliography of science, updated
daily
• An interface for ContentMine to feed new
facts into WikiData
• Domain-specific enthusiasts to create and run
fact extraction and validation
• Wikipedia to become a C21 publisher of
science
Thanks
• Shuttleworth Foundation and Fellowship
• Contentmine.org: Michelle Brook, Jenny Molloy,
Ross Mounce, Richard Smith-Unna,
CottageLabs, Charles Oppenheim
• Open Knowledge Foundation Community
• Wikimedia Community
• Blue Obelisk Community

More Related Content

What's hot

ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesespetermurrayrust
 
The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)petermurrayrust
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technologypetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Datapetermurrayrust
 
Embrace the Open Revolution
Embrace the Open RevolutionEmbrace the Open Revolution
Embrace the Open Revolutionpetermurrayrust
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDataTheContentMine
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Sciencepetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesTheContentMine
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Sciencepetermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trialspetermurrayrust
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical TrialsTheContentMine
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Ross Mounce
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 

What's hot (20)

ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 
The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technology
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Data
 
Embrace the Open Revolution
Embrace the Open RevolutionEmbrace the Open Revolution
Embrace the Open Revolution
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Science
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 

Viewers also liked

How Chinese Companies Make Investment Decisions in the U.S.
How Chinese Companies Make Investment Decisions in the U.S.How Chinese Companies Make Investment Decisions in the U.S.
How Chinese Companies Make Investment Decisions in the U.S.IntelCollab.com
 
Competitive intelligence in action final copy 1 english
Competitive intelligence in action final copy 1 englishCompetitive intelligence in action final copy 1 english
Competitive intelligence in action final copy 1 englishtom tao
 
Llb sc u 1.4 law of agency
Llb sc u 1.4 law of agencyLlb sc u 1.4 law of agency
Llb sc u 1.4 law of agencyRai University
 
Diari del 3 de novembre de 2014
Diari del 3 de novembre de 2014Diari del 3 de novembre de 2014
Diari del 3 de novembre de 2014diarimes
 
Diari del 10 de novembre de 2014
Diari del 10 de novembre de 2014Diari del 10 de novembre de 2014
Diari del 10 de novembre de 2014diarimes
 
1 massacre da lapa o delator e a tropicália maoista final
1 massacre da lapa o delator e a tropicália maoista final1 massacre da lapa o delator e a tropicália maoista final
1 massacre da lapa o delator e a tropicália maoista finalMarcia Nascimento
 
Loans, Marketing, Strategy And Many More
Loans, Marketing, Strategy And Many More Loans, Marketing, Strategy And Many More
Loans, Marketing, Strategy And Many More Rahul Tiwari
 
ISI TICs - SENAI INSTITUTES and status - english version v3
ISI TICs - SENAI INSTITUTES and status - english version v3ISI TICs - SENAI INSTITUTES and status - english version v3
ISI TICs - SENAI INSTITUTES and status - english version v3Jones Albuquerque
 
Webcast - Creative Best Practices for Mortgage Marketing
Webcast - Creative Best Practices for Mortgage MarketingWebcast - Creative Best Practices for Mortgage Marketing
Webcast - Creative Best Practices for Mortgage MarketingHarland Clarke
 
Apresentação (EQV– Encontro da Qualidade de Vida) Oficial
Apresentação (EQV– Encontro da Qualidade de Vida) OficialApresentação (EQV– Encontro da Qualidade de Vida) Oficial
Apresentação (EQV– Encontro da Qualidade de Vida) OficialHaroldo Jr Lima
 
Aci 318 08-seismic-requirements-l e garcia
Aci 318 08-seismic-requirements-l e garciaAci 318 08-seismic-requirements-l e garcia
Aci 318 08-seismic-requirements-l e garciaAycin Mungan
 
Haroldo Lima: Dilma, convoque eleições já!
Haroldo Lima: Dilma, convoque eleições já!Haroldo Lima: Dilma, convoque eleições já!
Haroldo Lima: Dilma, convoque eleições já!44610
 
March 2010 - New Regulation for the Oil Sector - A Salty Debate
March 2010 - New Regulation for the Oil Sector - A Salty DebateMarch 2010 - New Regulation for the Oil Sector - A Salty Debate
March 2010 - New Regulation for the Oil Sector - A Salty DebateFGV Brazil
 
Atlas of economic complexity part I
Atlas of economic complexity part IAtlas of economic complexity part I
Atlas of economic complexity part IMarcos CAVALCANTI
 

Viewers also liked (17)

Cultural event presentation_khurram
Cultural event presentation_khurramCultural event presentation_khurram
Cultural event presentation_khurram
 
How Chinese Companies Make Investment Decisions in the U.S.
How Chinese Companies Make Investment Decisions in the U.S.How Chinese Companies Make Investment Decisions in the U.S.
How Chinese Companies Make Investment Decisions in the U.S.
 
Competitive intelligence in action final copy 1 english
Competitive intelligence in action final copy 1 englishCompetitive intelligence in action final copy 1 english
Competitive intelligence in action final copy 1 english
 
Llb sc u 1.4 law of agency
Llb sc u 1.4 law of agencyLlb sc u 1.4 law of agency
Llb sc u 1.4 law of agency
 
Diari del 3 de novembre de 2014
Diari del 3 de novembre de 2014Diari del 3 de novembre de 2014
Diari del 3 de novembre de 2014
 
Diari del 10 de novembre de 2014
Diari del 10 de novembre de 2014Diari del 10 de novembre de 2014
Diari del 10 de novembre de 2014
 
1 massacre da lapa o delator e a tropicália maoista final
1 massacre da lapa o delator e a tropicália maoista final1 massacre da lapa o delator e a tropicália maoista final
1 massacre da lapa o delator e a tropicália maoista final
 
Loan
Loan Loan
Loan
 
Loans, Marketing, Strategy And Many More
Loans, Marketing, Strategy And Many More Loans, Marketing, Strategy And Many More
Loans, Marketing, Strategy And Many More
 
ISI TICs - SENAI INSTITUTES and status - english version v3
ISI TICs - SENAI INSTITUTES and status - english version v3ISI TICs - SENAI INSTITUTES and status - english version v3
ISI TICs - SENAI INSTITUTES and status - english version v3
 
Webcast - Creative Best Practices for Mortgage Marketing
Webcast - Creative Best Practices for Mortgage MarketingWebcast - Creative Best Practices for Mortgage Marketing
Webcast - Creative Best Practices for Mortgage Marketing
 
Apresentação (EQV– Encontro da Qualidade de Vida) Oficial
Apresentação (EQV– Encontro da Qualidade de Vida) OficialApresentação (EQV– Encontro da Qualidade de Vida) Oficial
Apresentação (EQV– Encontro da Qualidade de Vida) Oficial
 
Aci 318 08-seismic-requirements-l e garcia
Aci 318 08-seismic-requirements-l e garciaAci 318 08-seismic-requirements-l e garcia
Aci 318 08-seismic-requirements-l e garcia
 
Haroldo Lima: Dilma, convoque eleições já!
Haroldo Lima: Dilma, convoque eleições já!Haroldo Lima: Dilma, convoque eleições já!
Haroldo Lima: Dilma, convoque eleições já!
 
March 2010 - New Regulation for the Oil Sector - A Salty Debate
March 2010 - New Regulation for the Oil Sector - A Salty DebateMarch 2010 - New Regulation for the Oil Sector - A Salty Debate
March 2010 - New Regulation for the Oil Sector - A Salty Debate
 
Atlas of economic complexity part I
Atlas of economic complexity part IAtlas of economic complexity part I
Atlas of economic complexity part I
 
PORQUE SOY UJIER?
PORQUE SOY UJIER?PORQUE SOY UJIER?
PORQUE SOY UJIER?
 

Similar to ContentMine and WikiData

Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic BiologyTheContentMine
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome TrustTheContentMine
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteTheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and HumansTheContentMine
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and TechnologyTheContentMine
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016Jisc
 

Similar to ContentMine and WikiData (20)

Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technology
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 

More from petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestpetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyonepetermurrayrust
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017petermurrayrust
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Librariespetermurrayrust
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistrypetermurrayrust
 

More from petermurrayrust (20)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistry
 

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

ContentMine and WikiData

  • 1. ContentMine and WikiData Peter Murray-Rust Wikimania, London UK 2014-08-08
  • 2. ContentMine: We use machines to liberate 100 million facts /yr from the scientific literature and make them free for everyone (WikiData) With Wikipedia we are ALL scientists ContentMine is a social machine WikiData is the future of science data
  • 3. http://en.wikipedia.org/wiki/Tim_Berners-Lee Everything in this presentation is ODOSOS (Open Data, Open Standards, Open Source) CC0, CC-BY, W3C etc., Apache2, etc. * http://contentmine.org http://bitbucket.org/petermr http://wwmm.ch.cam.ac.uk *Sorry about the Powerpoint (Power corrupts, Powerpoint corrupts absolutely (Tufte)) A promise: I (Petermr) will never sell out to non-transparent organizations.
  • 4. petermr: I believe in Wikipedia • 2006 http://en.wikipedia.org/wiki/User:Petermr • 2006 started Open Data (term unknown then!) • 2009: “the bit of Wikipedia that I wrote is correct” [challenging the idea of “WP is junk”] • 2009: “Wikipedia is the digital library of this century” • 2012: I alert WP that Springer has copyrighted > 1000 of our images [Springergate] • 2014: “For facts in maths, physical and biological sciences I trust Wikipedia.” (Wikimania2014)
  • 6. Volunteer community in chemistry: Open Data/Source/Standards
  • 7. Scientific and Medical publication (STM)[+] • World Citizens pay $400,000,000,000… • … for research in 1,500,000 articles … • … cost $300,000 each to create … • … $7000 each to “publish” [*]… • … $10,000,000,000 from academic libraries … • … to “publishers” who forbid access to 99.9% of citizens of the world … [+] Figures probably +- 50 % [*] arXiV preprint server costs $7 USD per paper
  • 8. 4 Billion USD on human genome yielded 800 Billion USD and 4 M job-years
  • 10. …three problems—flawed design, non- publication, and poor reporting—together meant >85% of research funds were wasted, a global total loss >100 billion USD per year. [Lancet 2009] [Even more] waste clearly occurs after publication: from poor access, poor dissemination, and poor uptake of the findings of research. [PLOS Medicine 2014-05-27] Bad publication wastes science
  • 11. Publishers’ PDFs destroy science PDFs do not contain words or subscripts! PDFs do not contain tables and do not have columns SVG is turned into JPEG because it’s easier to process
  • 12. Elsevier wants to control Open Data [asked by Michelle Brook]
  • 13. STM Publishers Licence 2012_03_15_Sample_Licence_Text_Data_Mining.pdf (Summary: PMR has NO rights) • [cannot publish to: ] “libraries, repositories, or archives” • [cannot] “Make the results of any TDM Output available on an externally facing server or website” • “Subscriber shall pay a […] fee” Heather Piwowar: “negotiating with publishers [made me physically ill]” WE WALKED OUT • Brit Library • JISC • RLUK • OKFN • … • Ross Mounce • PM-R Licences destroy Content Mining
  • 14. CLOSED ACCESS MEANS PEOPLE DIE CLOSED DATA MEANS PEOPLE DIE
  • 16. http://www.budapestopenaccessinitiative.org/read … an unprecedented public good. … … completely free and unrestricted access to [peer- reviewed literature] by all scientists, scholars, teachers, students, and other curious minds. … …Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge. (Budapest Open Access Initiative, 2003)
  • 17. The Right to Read is the Right to Mine http://contentmine.org
  • 18. • Science can be read and understood by human-machine Amanuensis-symbionts. • Amanuenses are based on Wikipedia, databases and software (e.g. ContentMine’s AMI) • The results are fed back into WP and WikiData http://en.wikipedia.org/wiki/Symbiosishttp://en.wikipedia.org/wiki/Eric_Fenby
  • 19. • Crawl scientific literature (Open Bibliography) • Scrape each scientific article (ContentMine-quickscrape) • Extract the facts (ContentMine-AMI) • Index (Wikipedia) • Republish (WikiData) Machine Extraction of scientific facts
  • 20. Human-machine symbionts can read science! WP_Lion WP_Aspergillus_oryzae WP_Soybean
  • 21. Facts Marked by “non-scientists” in ContentMine workshops With Wikipedia everyone can be a scientist
  • 22. “nuggets” in a scientific paper quantity units Value ranges Humans aren’t designed to mine this …  chemical project places
  • 23. Parsing chemical sentences A FACT, uncopyrightable, and representable by triples
  • 25. Open Content Mining of FACTs Machines can interpret chemical reactions We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
  • 26. RSU: Richard Smith-Unna PMR: Peter Murray-Rust CL: CottageLabs Queues Repos Scientific literature Science Plugins Science Volunteers
  • 27. But we can now turn PDFs into Science We can’t turn a hamburger into a cow
  • 30. Bacterial WP_phylogenetic tree Our machines have read and interpreted 4300 in an hour with > 95% accuracy Trees From http://ijs.sgmjournals.org/ used under new UK legislation (Hargreaves) WP: Clostridium_butyricum Genbank ID American Type Culture Collection
  • 31. (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0036933 – “Adaptive Evolution of HIV at HLA Epitopes Is Associated with Ethnicity in Canada” . ((n122,((n121,n205),((n39,(n84,((((n35,n98),n191),n22),n17))),((n10,n182),( (((n232,n76),n68),(n109,n30)),(n73,(n106,n58))))))),((((((n103,n86),(n218,(n 215,n157))),((n164,n143),((n190,((n108,n177),(n192,n220))),((n233,n187), n41)))),((((n59,n184),((n134,n200),(n137,(n212,((n92,n209),n29))))),(n88,(n 102,n161))),((((n70,n140),(n18,n188)),(n49,((n123,n132),(n219,n198)))),((( n37,(n65,n46)),(n135,(n11,(n113,n142)))),(n210,((n69,(n216,n36)),(n231,n1 60))))))),(((n107,n43),((n149,n199),n74)),(((n101,(n19,n54)),n96),(n7,((n139 ,n5),((n170,(n25,n75)),(n146,(n154,(n194,(((n14,n116),n112),(n126,n222))) )))))))),(((((n165,(n168,n128)),n129),((n114,n181),(n48,n118))),((n158,(n91,( n33,n213))),(n87,n235))),((n197,(n175,n117)),(n196,((n171,(n163,n227)),(( n53,n131),n159))))))); http://en.wikipedia.org/wiki/Digital_image_processing http://en.wikipedia.org/wiki/Newick_format http://en.wikipedia.org/wiki/Phylogenetics
  • 32. Open notebook science is the practice of making the entire primary record of a research project publicly available online as it is recorded. (WP) Jean-Claude Bradley was a chemist who actively promoted Open Science in chemistry,… He coined the term Open Notebook Science. … A memorial symposium was held July 14, 2014 at Cambridge University, UK.[9]
  • 33. RSU: Richard Smith-Unna PMR: Peter Murray-Rust CL: CottageLabs Queues Repos Scientific literature Science Plugins Science Volunteers
  • 34. My Wikiwishes • An Open Bibliography of science, updated daily • An interface for ContentMine to feed new facts into WikiData • Domain-specific enthusiasts to create and run fact extraction and validation • Wikipedia to become a C21 publisher of science
  • 35. Thanks • Shuttleworth Foundation and Fellowship • Contentmine.org: Michelle Brook, Jenny Molloy, Ross Mounce, Richard Smith-Unna, CottageLabs, Charles Oppenheim • Open Knowledge Foundation Community • Wikimedia Community • Blue Obelisk Community