SlideShare a Scribd company logo
MIOSS 2016, EBI, UK, 2016-
05-17
Open Content +
Open Programs
Peter Murray-Rust1,2
[1]University of Cambridge
[2]TheContentMine
pm286 AT cam DOT ac DOT uk
Open
Software
Articles
Data
Infrastructure
Themes
• Open :
– Faster
– Better
– Agile
– Inclusive
– Re-usable
Pomerantz, J. and Peek, R. 2016. Fifty shades of open
http://dx.doi.org/10.5210/fm.v21i5.6360
Open source. Open access. Open society. Open knowledge. Open government. Even open food. The word “open” has been applied to a wide variety of
words to create new terms, some of which make sense, and some not so much. This essay disambiguates the many meanings of the word “open” as
it is used in a wide range of contexts.
• Demos:
– OSCAR, OPSIN, etc.
– Content Mining and Annotation
Open Source Demos
• Centre for Molecular Informatics
– OSCAR (chemical entity recognition)
– Opsin (name2structure)
– ChemicalTagger (chemical language parsing)
– [OSRIC] (chemical image interpretation)
• ContentMine
– getpapers, quickscrape, norma, ami, canary
– Mining the complete scholarly literature
• 10,000 articles per day
• > 1 million facts per day
Open: [state+private] investment
Rufus Pollock “The Open Information age” TBP 2016 [1]
CMI Software (OSCAR, OPSIN, ChemicalTagger, [OSRIC]): sponsors
– 2006 …
– Unilever
– Nature PG, RSC, Int Union of Cryst.
– EPSRC
– OMII
– NCI
– [Microsoft]
– JISC
– CambridgeIP
– Linguamatics
– … 2016
• Peter Corbett, Joe Townsend, Chris Waudby, Sam Adams, David Jessop, Lezan
Hawizy, Nico Adams, Mark Williamson, Andy Howlett, Daniel Lowe…
[1] https://www.youtube.com/watch?v=D2oNxhn6POA
Community
Mat Todd (Sydney) and MANY collaborators
http://opensourcemalaria.org/ (Chrome for interactivity)
Mat Todd, Univ Sydney, runs an Open Notebook community
to create new antimalarials.
Interactive OPEN chemical search tool from cheminfo.org
Interactive OPEN molecular display Jmol (Bob Hanson et al)
Interactive OPEN chemical search tool from cheminfo.org
data is associated with the proposed
scientific endeavour prior to or at the
point of creation rather than by
annotating the data with commentary
after the experiment has taken place
University of Southampton
Natural Language Processing
Part of speech tagging (Wordnet, Brown Corpus, etc.)
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
Automatic semantic markup of chemistry
Could be used for analytical, crystallization, etc.
Parsing chemical sentences
This could be extended to much other scientific language
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home
Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:
AMI reads the complete diagram,
recognizes the paths and
generates the molecules. Then
she creates a stop-fram animation
showing how the 12 reactions
lead into each other
CLICK HERE FOR ANIMATION
(may be browser dependent)
“OSRIC”
Is anyone interested
In taking this further?
@Senficon (Julia Reda) :Text & Data mining in times of
#copyright maximalism:
"Elsevier stopped me doing my research"
http://onsnetwork.org/chartgerink/2015/11/16/elsevi
er-stopped-me-doing-my-research/ … #opencon #TDM
Elsevier stopped me doing my research
Chris Hartgerink
I am a statistician interested in detecting potentially problematic research such as data fabrication,
which results in unreliable findings and can harm policy-making, confound funding decisions, and
hampers research progress.
To this end, I am content mining results reported in the psychology literature. Content mining the
literature is a valuable avenue of investigating research questions with innovative methods. For
example, our research group has written an automated program to mine research papers for errors in
the reported results and found that 1/8 papers (of 30,000) contains at least one result that could
directly influence the substantive conclusion [1].
In new research, I am trying to extract test results, figures, tables, and other information reported in
papers throughout the majority of the psychology literature. As such, I need the research papers
published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research
papers from, for instance, Sciencedirect. I was doing this for scholarly purposes and took into account
potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention
to redistribute the downloaded materials, had legal access to them because my university pays a
subscription, and I only wanted to extract facts from these papers.
Full disclosure, I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days.
This boils down to a server load of 0.0021GB/[min], 0.125GB/h, 3GB/day.
Approximately two weeks after I started downloading psychology research papers, Elsevier notified my
university that this was a violation of the access contract, that this could be considered stealing of
content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading
(which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.
I am now not able to mine a substantial part of the literature, and because of this Elsevier is directly
hampering me in my research.
[1] Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The
prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22.
doi: 10.3758/s13428-015-0664-2
Chris Hartgerink’s blog post
Julia Reda, Pirate MEP, running ContentMine
software to liberate science 2016-04-16
The Right to Read is the Right to Mine
http://contentmine.org
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
Latest 20150908
Mining for phytochemicals
• getpapers –q carvone –o carvone –x –k 100
Search for “carvone”, output directory, XML, limit hits to 100
• –q carvne –o carvone –x –k 100
Search for “carvone”, output directory, XML, limit hits to 100
getpapers

More Related Content

What's hot

Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
TheContentMine
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
TheContentMine
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
TheContentMine
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
TheContentMine
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
TheContentMine
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
TheContentMine
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
TheContentMine
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
petermurrayrust
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
petermurrayrust
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
TheContentMine
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
petermurrayrust
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
petermurrayrust
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
petermurrayrust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
petermurrayrust
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
petermurrayrust
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
petermurrayrust
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
petermurrayrust
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
TheContentMine
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
TheContentMine
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
petermurrayrust
 

What's hot (20)

Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 

Viewers also liked

The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
Ross Mounce
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
TheContentMine
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for Everyone
TheContentMine
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
petermurrayrust
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
petermurrayrust
 
ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)
Jenny Molloy
 
The Open Science Agenda in Europe: Policy convergence & diversity of approaches
The Open Science Agenda in Europe: Policy convergence & diversity of approachesThe Open Science Agenda in Europe: Policy convergence & diversity of approaches
The Open Science Agenda in Europe: Policy convergence & diversity of approaches
LIBER Europe
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
petermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
petermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
petermurrayrust
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
petermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
petermurrayrust
 
Active actionable DMPs
Active actionable DMPsActive actionable DMPs
Active actionable DMPs
Sarah Jones
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for facts
petermurrayrust
 

Viewers also liked (14)

The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for Everyone
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
 
ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)
 
The Open Science Agenda in Europe: Policy convergence & diversity of approaches
The Open Science Agenda in Europe: Policy convergence & diversity of approachesThe Open Science Agenda in Europe: Policy convergence & diversity of approaches
The Open Science Agenda in Europe: Policy convergence & diversity of approaches
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Active actionable DMPs
Active actionable DMPsActive actionable DMPs
Active actionable DMPs
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for facts
 

Similar to Open software and knowledge for MIOSS

WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
petermurrayrust
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
petermurrayrust
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
petermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
petermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
petermurrayrust
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
petermurrayrust
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
petermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
petermurrayrust
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
LEARN Project
 
Open access impact
Open access impactOpen access impact
Open access impact
Iryna Kuchma
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
Duncan Hull
 
Digital Scholarship
Digital ScholarshipDigital Scholarship
Digital Scholarship
petermurrayrust
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
David De Roure
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
Bram Zandbelt
 
Murpha11
Murpha11Murpha11
Murpha11
Philip Bourne
 
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
David De Roure
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Aaron Sloman
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
TheContentMine
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
petermurrayrust
 

Similar to Open software and knowledge for MIOSS (20)

WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
Open access impact
Open access impactOpen access impact
Open access impact
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Digital Scholarship
Digital ScholarshipDigital Scholarship
Digital Scholarship
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
 
Murpha11
Murpha11Murpha11
Murpha11
 
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 

More from petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
petermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
petermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
petermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
petermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
petermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
petermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
petermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
petermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
petermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
petermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
petermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
petermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
petermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
petermurrayrust
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
petermurrayrust
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
petermurrayrust
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistry
petermurrayrust
 

More from petermurrayrust (18)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistry
 

Recently uploaded

Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 

Recently uploaded (20)

Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 

Open software and knowledge for MIOSS

  • 1. MIOSS 2016, EBI, UK, 2016- 05-17 Open Content + Open Programs Peter Murray-Rust1,2 [1]University of Cambridge [2]TheContentMine pm286 AT cam DOT ac DOT uk Open Software Articles Data Infrastructure
  • 2. Themes • Open : – Faster – Better – Agile – Inclusive – Re-usable Pomerantz, J. and Peek, R. 2016. Fifty shades of open http://dx.doi.org/10.5210/fm.v21i5.6360 Open source. Open access. Open society. Open knowledge. Open government. Even open food. The word “open” has been applied to a wide variety of words to create new terms, some of which make sense, and some not so much. This essay disambiguates the many meanings of the word “open” as it is used in a wide range of contexts. • Demos: – OSCAR, OPSIN, etc. – Content Mining and Annotation
  • 3. Open Source Demos • Centre for Molecular Informatics – OSCAR (chemical entity recognition) – Opsin (name2structure) – ChemicalTagger (chemical language parsing) – [OSRIC] (chemical image interpretation) • ContentMine – getpapers, quickscrape, norma, ami, canary – Mining the complete scholarly literature • 10,000 articles per day • > 1 million facts per day
  • 4. Open: [state+private] investment Rufus Pollock “The Open Information age” TBP 2016 [1] CMI Software (OSCAR, OPSIN, ChemicalTagger, [OSRIC]): sponsors – 2006 … – Unilever – Nature PG, RSC, Int Union of Cryst. – EPSRC – OMII – NCI – [Microsoft] – JISC – CambridgeIP – Linguamatics – … 2016 • Peter Corbett, Joe Townsend, Chris Waudby, Sam Adams, David Jessop, Lezan Hawizy, Nico Adams, Mark Williamson, Andy Howlett, Daniel Lowe… [1] https://www.youtube.com/watch?v=D2oNxhn6POA
  • 6. Mat Todd (Sydney) and MANY collaborators http://opensourcemalaria.org/ (Chrome for interactivity) Mat Todd, Univ Sydney, runs an Open Notebook community to create new antimalarials.
  • 7. Interactive OPEN chemical search tool from cheminfo.org
  • 8. Interactive OPEN molecular display Jmol (Bob Hanson et al)
  • 9. Interactive OPEN chemical search tool from cheminfo.org
  • 10. data is associated with the proposed scientific endeavour prior to or at the point of creation rather than by annotating the data with commentary after the experiment has taken place University of Southampton
  • 11. Natural Language Processing Part of speech tagging (Wordnet, Brown Corpus, etc.)
  • 13. Automatic semantic markup of chemistry Could be used for analytical, crystallization, etc.
  • 14. Parsing chemical sentences This could be extended to much other scientific language
  • 15. Open Content Mining of FACTs Machines can interpret chemical reactions We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
  • 16. AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY: AMI reads the complete diagram, recognizes the paths and generates the molecules. Then she creates a stop-fram animation showing how the 12 reactions lead into each other CLICK HERE FOR ANIMATION (may be browser dependent) “OSRIC” Is anyone interested In taking this further?
  • 17. @Senficon (Julia Reda) :Text & Data mining in times of #copyright maximalism: "Elsevier stopped me doing my research" http://onsnetwork.org/chartgerink/2015/11/16/elsevi er-stopped-me-doing-my-research/ … #opencon #TDM Elsevier stopped me doing my research Chris Hartgerink
  • 18. I am a statistician interested in detecting potentially problematic research such as data fabrication, which results in unreliable findings and can harm policy-making, confound funding decisions, and hampers research progress. To this end, I am content mining results reported in the psychology literature. Content mining the literature is a valuable avenue of investigating research questions with innovative methods. For example, our research group has written an automated program to mine research papers for errors in the reported results and found that 1/8 papers (of 30,000) contains at least one result that could directly influence the substantive conclusion [1]. In new research, I am trying to extract test results, figures, tables, and other information reported in papers throughout the majority of the psychology literature. As such, I need the research papers published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research papers from, for instance, Sciencedirect. I was doing this for scholarly purposes and took into account potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention to redistribute the downloaded materials, had legal access to them because my university pays a subscription, and I only wanted to extract facts from these papers. Full disclosure, I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days. This boils down to a server load of 0.0021GB/[min], 0.125GB/h, 3GB/day. Approximately two weeks after I started downloading psychology research papers, Elsevier notified my university that this was a violation of the access contract, that this could be considered stealing of content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading (which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university. I am now not able to mine a substantial part of the literature, and because of this Elsevier is directly hampering me in my research. [1] Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22. doi: 10.3758/s13428-015-0664-2 Chris Hartgerink’s blog post
  • 19. Julia Reda, Pirate MEP, running ContentMine software to liberate science 2016-04-16
  • 20. The Right to Read is the Right to Mine http://contentmine.org
  • 22. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts Latest 20150908
  • 23. Mining for phytochemicals • getpapers –q carvone –o carvone –x –k 100 Search for “carvone”, output directory, XML, limit hits to 100 • –q carvne –o carvone –x –k 100 Search for “carvone”, output directory, XML, limit hits to 100 getpapers