SlideShare a Scribd company logo
1 of 75
The Avoidable Waste of Scholarly Publishing
Peter Murray-Rust*,
ContentMine.org and the University of Cambridge
PLoS, Cambridge, UK 2015-07-09
Scholarly Publishing un/wittingly destroys huge amounts of publicly
funded research.
There are solutions; what is needed is will
Background
• Contentmine aims to make large areas of scientific fact OPEN (100
million facts/year)
• We’re working with WellcomeTrust, Europe PubMedCentral, etc.
• A politically “hot” area (Hargreaves legislation, EU activity)
• 2015 WellcomeTrust workshop on TDM and Neuroscience; “rough
consensus” on what was needed.
• Day workshop at Cochrane, UK (Amy Price, Anna Noel Storr, Ben
Goldacre)
• 2-day workshop at Edinburgh on Systematic Reviews of Animal Test
publications
• In the last few months we’ve prototyped a unique Open starting
point, continuously released.
• Can PLoS and ContentMine find constructive ways forward?
PM-R’s “first real paper”, doing science by
re-using the results of otherts in a novel way
1974:
Each point represented 1-4 hours
in library – discovery, volume delivery,
Transcription, hand calculation.
http://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about-
ebola.html
We were stunned recently when we stumbled across an article by European
researchers in Annals of Virology [1982]: “The results seem to indicate that
Liberia has to be included in the Ebola virus endemic zone.” In the future,
the authors asserted, “medical personnel in Liberian health centers should be
aware of the possibility that they may come across active cases and thus be
prepared to avoid nosocomial epidemics,” referring to hospital-acquired
infection.
Adage in public health: “The road to inaction is paved with research
papers.”
Bernice Dahn (chief medical officer of Liberia’s Ministry of Health)
Vera Mussah (director of county health services)
Cameron Nutt (Ebola response adviser to Partners in Health)
A System Failure of Scholarly Publishing
MONROVIA, Liberia — The conventional
wisdom among public health authorities is
that the Ebola virus, which killed at least
10,000 people in Liberia, Sierra Leone and
Guinea, was a new phenomenon, not seen in
West Africa before 2013. (The one exception
was an anomalous case in Ivory Coast in 1994,
when a Swiss primatologist was infected after
performing an autopsy on a chimpanzee.)
The conventional wisdom is wrong. We were
stunned recently when we stumbled across an
article by European researchers in Annals of
Virology: “The results seem to indicate that
Liberia has to be included in the Ebola virus
endemic zone.” In the future, the authors
asserted, “medical personnel in Liberian health
centers should be aware of the possibility that
they may come across active cases and thus be
prepared to avoid nosocomial epidemics,”
referring to hospital-acquired infection.
As members of a team drafting Liberia’s Ebola
recovery plan last month, we systematically
reviewed the literature on Ebola surveillance
since the virus’s discovery in central Africa in
1976. We learned that the virologists who wrote
that report, who were from Germany, had
analyzed frozen blood samples taken in 1978 and
1979 from 433 Liberian citizens. They found that
26 (or 6 percent) had antibodies to the Ebola
virus.
Three other studies published in 1986
documented Ebola antibody prevalence rates of
10.6, 13.4 and 14 percent, respectively, in
northwestern Liberia, not far from its borders
with Sierra Leone and Guinea. These articles,
along with other forgotten reports from the
1980s on antibody prevalence in neighboring
Sierra Leone and Guinea, suggest the possibility
of what some call “sanctuary sites,” or
persistent, if latent, Ebola infection in humans.
Bernice Dahn is the chief medical officer of Liberia’s Ministry of Health, where Vera Mussah
is the director of county health services. Cameron Nutt is the Ebola response adviser to Dr.
Paul Farmer at the nonprofit group Partners in Health.
“Free” and “Open”
• "Free software is a matter of liberty, not price.
’free speech', not 'free beer'”. (R M Stallman)
• “A piece of data or content is open if anyone is
free to use, reuse, and redistribute it”
(OKFN)http://opendefinition.org/
• “open” (access) has multiple incompatible “definitions”. Major split
is “human eyeballs” vs copying and machine “reusability”
• “Open” is a marketing term for publishers, who frequently (often
deliberately) do not grant full Openness.
“Gratis” vs “Libre”
http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(Budapest Open Access Initiative, 2003)
Scientific and Medical publication (STM)[+]
• World Citizens pay $400,000,000,000…
• … for research in 1,500,000 articles …
• … cost $300,000 each to create …
• … $7000 each to “publish” [*]…
• … $10,000,000,000 from academic libraries …
• … to “publishers” who forbid access to 99.9% of citizens of
the world …
• 85% of medical research is wasted (not published, badly
conceived, duplicated, …)
[+] Figures probably +- 50 %
[*] arXiV preprint server costs $7 USD per paper
• “creative use of these large data sets in the US health care sector
could generate more than $300bn in value per annum” [MGI,
McKinsey]
• Gartner Inc. has identified 'Big Data' and 'Next-Generation
Analytics' as two of the 'Top 10 Strategic Technologies' for 2012.
• Given the volume of text generated by business, academic and
social activities – in for example competitor reports, research
publications or customer opinions on social networking sites – text
mining is, however, highly important. [JISC]
• there are some tasks that simply could not be achieved without
using text mining. For example, a major pharmaceutical company
used text mining tools to evaluate 50,000 patents in 18 months.
This would have taken 50 person years to achieve manually,
meaning that it would not even have been contemplated. [JISC]
“Big Data – and Analytics (ContentMining)
Prof. Ian Hargreaves (2011): "David Cameron's
exam question”: "Could it be true that laws
designed more than three centuries ago with the
express purpose of creating economic incentives
for innovation by protecting creators' rights are
today obstructing innovation and economic
growth?”
“yes. We have found that the UK's intellectual
property framework, especially with regard to
copyright, is falling behind what is needed.” "Digital
Opportunity" by Prof Ian Hargreaves - http://www.ipo.gov.uk/ipreview.htm. Licensed under CC BY 3.0 via Wikipedia -
https://en.wikipedia.org/wiki/File:Digital_Opportunity.jpg#/media/File:Digital_Opportunity.jpg
PUBLISHER TDM LICENCE INITIATIVES
GENERALLY DO NOT HELP
• Publishers have started offering their own TDM licences and policies
• Their licences often impose unfair (and in the case of the UK, unenforceable)
constraints on researchers’ freedom to exploit TDM, e.g., requiring users to
employ publisher’s API, putting unnecessary restrictions on how much can be
copied, or how fast it can be copied.
• Why “unenforceable”? Because, as noted earlier, UK law specifically states
that any contract or licence term that prevents anyone from doing TDM in the
manner prescribed in the new exception shall be deemed null and void.
• Really need a test case on these attempted restrictions.
• Springer and Royal Society offer generous TDM provisions.
• So why are so many publishers offering restrictive licences in the UK? Maybe
they hope licensees are ignorant of the strength of the new law, or the
publishers in fact don’t know about it. So they are either deliberately
misleading, or ignorant
Prof Charles Oppenheim and contentmine.org
Elsevier wants to control Open Data
[asked by Michelle Brook]
Front. Pharmacol., 03 October 2011 |
http://dx.doi.org/10.3389/fphar.2011.00051
How “data” are published in the 21st C
http://drugmonkey.scientopia.org/2010/08/11/yay-j-neuroscience-agrees-with-me-that-
supplementary-materials-is-bs-and-ruining-science/
w00000t!!!!1111!!!!ELEVEN!!!!
YAYAYAYAYAYAY!!!! Damn
tootin'!!!!!
Supplemental material also
undermines the concept of a
self-contained research report
by providing a place for critical
material to get lost. Methods
that are essential for replicating
the experiments, analyses that
are central to validating the
results, and awkward
observations are increasingly
being relegated to supplemental
material. Such material is not
supplemental and belongs in the
body of the article, but authors
can be tempted (or, with some
journals, encouraged) to place
essential article components in
the supplemental material.
catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
Regular Expressions for Systematic Reviews of Animal Tests
Preceding Text
Following Text
Extracted term
Today’s Results!! We searched papers for 200 regex-based
Terms and got ca 100 hits per paper
Questions we can tackle
• How to we find (mentions of) clinical/animal trials?
• Is a document a trial?
• What is the subject of the trial?
• What is the methodology used?
• Does the design and practice conform to
CONSORT/ARRIVE?
• What are the outcomes?
• Can we extract specific re-usable information?
• Who are involved? (researchers, sponsors, patients?)
• Has a proposed trial been completed and reported?
Linked Open Data – the world’s knowledge
very little physical science and THESES?? 
http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png
DBPedia
BIO
Comp
Lib
PDB
Ontologies
GOV
GOV.uk
Music,
Art
Literature
Social
Knowledge
bases
RDF
triples
Liberation Software
The Right to Read is the Right to Mine
http://contentmine.org
https://en.wikipedia.org/wiki/Irrigation#mediaviewer/File:Pump-
enabled_Riverside_Irrigation_in_Comilla,_Bangladesh,_25_April_2014.jpg CC BY-SA 3.0
Daily Stream of 100,000 Open Facts
Twitter?Indexed by CAT
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
PLoSONE BMC
1
BMC
2
Closed1 Closed2Hybrid
CATalog
Enhanced annotated
articles
FACTSFACTS
Daily Crawl
Crawl … Scrape … Normalize … Mine
Linked OpenData
Semantic
Scientific Objects
2000-5000
Articles
What is “Content”?
catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
Machine-Human symbioses
• Wikipedia
• Open StreetMap
• Google
We aim to make it trivial for a human+machine
to mine the scientific literature.
By building Communities
ContentMine Workshops and
Hackdays
Open Science Brazil, 2014-08
Easily distributed software
Get started in 30 mins
Build application
in a morning
Start simple: bagOfWords, Stemming, Regex, templates
Facts Marked by “non-scientists” in ContentMine workshops
With Wikipedia everyone can be a scientist
Oxford 2013
Berlin 2014
Delhi 2014
Jenny Molloy with mascot AMI
Workshops
(1-hour -> full day or more)
2014-May->Nov
• Budapest/Shuttleworth
• Leicester Univ
• Electronic Theses and Dissertations
• Austrian Science Fund AT
• OKFest DE
• Eur. Bioinformatics Institute
• Open Science Rio de Janeiro BR
• Sci DataCon , Delhi IN
• Univ of Chicago US
• OpenCon 2014, Wash DC. US
• JISC , London
Upcoming
• LIBER
• Cochrane
• BL
• Wellcome Trust (April)
• WHO
Collaborators
• Wikimedia/Wikidata
• Mozilla
• Open Knowledge
• LIBER (European Research Libraries)
• British Library
• Wellcome Trust
• EBI (Eur. Bioinf. Inst.)
• JISC
• Open Access Button
• SPARC
• Creative Commons
• CORE
• EuropePubmedCentral
• CRAWL the web for scientific documents
(articles, grey literature, repositories)
• quickSCRAPE pages (text, graphics, images, data)
• NORMA-lize page to semantic form
…Open semantic science …
• MINE pages with your methods and tools (AMI)
• CAT-alogue results in searchable index
• Automate daily process (CANARY)
contentmine.org Infrastructure
catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
quickscrape
Crawl
Feed
Norma Index &
Transform
TXT
XML
URL
DOI
Scientific
literature
Repositories DOC
CSV
sHTML
Plugins
Regex
SequencesSpecies
Bespoke
Scrapers
XPathPer-Journal
Taggers
Per- Journal
MetadataChemistry
Phylogenetics Farming
AMI
BadHTML
OCR
Diagrams
Open NORMA-lized Scientific
Literature + Facts
CANARY pipeline
CAT-alogue index
PDF
https://commons.wikimedia.org/wiki/File:Flickr_-_DVIDSHUB_-_RSP_Warrior_Challenge_Prepares_Soldiers_Mentally,_Physically_%281%29.jpg
CRAWLing the Literature
NO Central Table of Contents
Massive technical, political, legal opposition
Little interest from Academia
Tedious
Few general tools
The Right to Read is The Right To Mine
PMR in 2012: http://blog.okfn.org/2012/06/01/the-right-to-read-is-the-right-to-mine/
SCRAPE
https://en.wikipedia.org/wiki/Gleaning#mediaviewer/File:Millet_Gleaners.jpg PublicDomain
PDF
HTML
XML quickscrape*
*Scrapers created by
Richard Smith-Unna +
Community
HTML
PDF
XML
PNG
SVG
CSV
DOC
LaTeX
CIF
…
Non-standard per-publisher site
https://en.wikipedia.org/wiki/W._Heath_Robinson#mediaviewer/File:Robinson%28WH%29-%28%27Uncle_Lubin%27%29.jpg PublicDomain
NORMA-lization of Scientific Literature
PDFs, Broken HTML
PNGs for Math, etc.
NORMA
Unicode
Diacritics
Well-formed
Sectioned
Tagged
SVG diagrams
AMI-plugins
• BagOfWords, Stemming and Regular Expressions
• Species
• Biological Sequences
• Chemical compounds & reactions
• Farming * (Rory Aaronson)
• Crystallography * (Saulius Grazulis, COD)
• Clinical Trials * (Amy Price)
• Phylogenetics * (Ross Mounce)
• Phytochemistry * (Chris Steinbeck, PMR)
* subcommunities
Text-based plugins
• Bag of words
(https://en.wikipedia.org/wiki/Bag-of-
words_model)
• https://en.wikipedia.org/wiki/Tf%E2%80%93idf
(Term-frequency, inverse document frequency)
• Templates and regexes (regular expressions).
“Bag of Words”
Three fulltext articles from trialsjournal.com
Regular Expressions for Systematic Reviews of Animal Tests
Preceding Text
Following Text
Extracted term
“nuggets” in a scientific paper
quantity
units
Value ranges
Humans aren’t designed to mine this … 
chemical
project places
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
Ln Bacterial load per fly
11.5
11.0
10.5
10.0
9.5
9.0
6.5
6.0
Days post—infection
0 1 2 3 4 5
Bitmap Image and Tesseract OCR
UNITS
TICKS
QUANTITY
SCALE
TITLES
DATA!!
2000+ points
Dumb PDF
CSV
Semantic
Spectrum
2nd Derivative
Smoothing
Gaussian Filter
Automatic
extraction
AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home
Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:
AMI reads the complete diagram,
recognizes the paths and
generates the molecules. Then
she creates a stop-fram animation
showing how the 12 reactions
lead into each other
CLICK HERE FOR ANIMATION
(may be browser dependent)
https://blogs.ch.cam.ac.uk/pmr/2014/06/25/content-mining-we-can-now-
mine-images-of-phylogenetic-trees-and-more/ for story of extraction
Thinning Topology
Serialization
Newick
Peter
Murray-Rust
BMC publisher
Blue Obelisk paper (20
co-authors)
Sub-network
From CATalog
Phytochemistry extraction
O. dayi
“volatile composition of “
A.sibeiri
A. judaica
Displayed by CAT (CottageLabs)
What we can do
• Recognize and promote autonomous sub-
communities
• Engage Early Career Researchers, including
undergraduates and let THEM BUILD the
systems.
• COMMUNALLY build tools for data checking
• Insist on semantic data input, even if it costs
submissions
contentmine.org team

More Related Content

What's hot

Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)Rob Johnson
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Sciencepetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literaturepetermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7Scott Edmunds
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgScott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgGigaScience, BGI Hong Kong
 
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing	for the Big-Data EraScott Edmunds at Tech4Dev on Open Publishing	for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data EraGigaScience, BGI Hong Kong
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open ScienceTheContentMine
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecutureScott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecutureScott Edmunds
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak openLilian Juma
 

What's hot (20)

Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgScott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
 
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing	for the Big-Data EraScott Edmunds at Tech4Dev on Open Publishing	for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Science as open enterprise
Science as open enterpriseScience as open enterprise
Science as open enterprise
 
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecutureScott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak open
 

Viewers also liked

ContentMining at Cambridge
ContentMining at CambridgeContentMining at Cambridge
ContentMining at Cambridgepetermurrayrust
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesespetermurrayrust
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSSpetermurrayrust
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulpetermurrayrust
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trialspetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgpetermurrayrust
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for factspetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Sciencepetermurrayrust
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machinespetermurrayrust
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Sciencepetermurrayrust
 

Viewers also liked (17)

Contentmineatopencon2
Contentmineatopencon2Contentmineatopencon2
Contentmineatopencon2
 
ContentMining at Cambridge
ContentMining at CambridgeContentMining at Cambridge
ContentMining at Cambridge
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and theses
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Digital Scholarship
Digital ScholarshipDigital Scholarship
Digital Scholarship
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for facts
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
 
Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 

Similar to Plosslides

Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...Crossref
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulTheContentMine
 
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...GigaScience, BGI Hong Kong
 
Re-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open ScholarshipRe-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open ScholarshipLeslie Chan
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome TrustTheContentMine
 
Open Access and Research Integrity Workshop 2014 - Advocacy
Open Access and Research Integrity Workshop 2014 - AdvocacyOpen Access and Research Integrity Workshop 2014 - Advocacy
Open Access and Research Integrity Workshop 2014 - AdvocacyRight to Research
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsScott Edmunds
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...GigaScience, BGI Hong Kong
 
Open access publishing and open access data sharing for malaria research and ...
Open access publishing and open access data sharing for malaria research and ...Open access publishing and open access data sharing for malaria research and ...
Open access publishing and open access data sharing for malaria research and ...BioMedCentral
 
Your research matters: increasing visibility, usage and impact
Your research matters: increasing visibility, usage and impactYour research matters: increasing visibility, usage and impact
Your research matters: increasing visibility, usage and impactIna Smith
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 

Similar to Plosslides (20)

Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
 
Re-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open ScholarshipRe-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open Scholarship
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Open Access and Research Integrity Workshop 2014 - Advocacy
Open Access and Research Integrity Workshop 2014 - AdvocacyOpen Access and Research Integrity Workshop 2014 - Advocacy
Open Access and Research Integrity Workshop 2014 - Advocacy
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott Edmunds
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Open access publishing and open access data sharing for malaria research and ...
Open access publishing and open access data sharing for malaria research and ...Open access publishing and open access data sharing for malaria research and ...
Open access publishing and open access data sharing for malaria research and ...
 
Open science
Open scienceOpen science
Open science
 
Your research matters: increasing visibility, usage and impact
Your research matters: increasing visibility, usage and impactYour research matters: increasing visibility, usage and impact
Your research matters: increasing visibility, usage and impact
 
Content Mining
Content MiningContent Mining
Content Mining
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
 

More from petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestpetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistrypetermurrayrust
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literaturepetermurrayrust
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 

More from petermurrayrust (20)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistry
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 

Recently uploaded

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 

Recently uploaded (20)

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 

Plosslides

  • 1. The Avoidable Waste of Scholarly Publishing Peter Murray-Rust*, ContentMine.org and the University of Cambridge PLoS, Cambridge, UK 2015-07-09 Scholarly Publishing un/wittingly destroys huge amounts of publicly funded research. There are solutions; what is needed is will
  • 2. Background • Contentmine aims to make large areas of scientific fact OPEN (100 million facts/year) • We’re working with WellcomeTrust, Europe PubMedCentral, etc. • A politically “hot” area (Hargreaves legislation, EU activity) • 2015 WellcomeTrust workshop on TDM and Neuroscience; “rough consensus” on what was needed. • Day workshop at Cochrane, UK (Amy Price, Anna Noel Storr, Ben Goldacre) • 2-day workshop at Edinburgh on Systematic Reviews of Animal Test publications • In the last few months we’ve prototyped a unique Open starting point, continuously released. • Can PLoS and ContentMine find constructive ways forward?
  • 3. PM-R’s “first real paper”, doing science by re-using the results of otherts in a novel way
  • 4. 1974: Each point represented 1-4 hours in library – discovery, volume delivery, Transcription, hand calculation.
  • 5.
  • 6. http://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about- ebola.html We were stunned recently when we stumbled across an article by European researchers in Annals of Virology [1982]: “The results seem to indicate that Liberia has to be included in the Ebola virus endemic zone.” In the future, the authors asserted, “medical personnel in Liberian health centers should be aware of the possibility that they may come across active cases and thus be prepared to avoid nosocomial epidemics,” referring to hospital-acquired infection. Adage in public health: “The road to inaction is paved with research papers.” Bernice Dahn (chief medical officer of Liberia’s Ministry of Health) Vera Mussah (director of county health services) Cameron Nutt (Ebola response adviser to Partners in Health) A System Failure of Scholarly Publishing
  • 7. MONROVIA, Liberia — The conventional wisdom among public health authorities is that the Ebola virus, which killed at least 10,000 people in Liberia, Sierra Leone and Guinea, was a new phenomenon, not seen in West Africa before 2013. (The one exception was an anomalous case in Ivory Coast in 1994, when a Swiss primatologist was infected after performing an autopsy on a chimpanzee.) The conventional wisdom is wrong. We were stunned recently when we stumbled across an article by European researchers in Annals of Virology: “The results seem to indicate that Liberia has to be included in the Ebola virus endemic zone.” In the future, the authors asserted, “medical personnel in Liberian health centers should be aware of the possibility that they may come across active cases and thus be prepared to avoid nosocomial epidemics,” referring to hospital-acquired infection. As members of a team drafting Liberia’s Ebola recovery plan last month, we systematically reviewed the literature on Ebola surveillance since the virus’s discovery in central Africa in 1976. We learned that the virologists who wrote that report, who were from Germany, had analyzed frozen blood samples taken in 1978 and 1979 from 433 Liberian citizens. They found that 26 (or 6 percent) had antibodies to the Ebola virus. Three other studies published in 1986 documented Ebola antibody prevalence rates of 10.6, 13.4 and 14 percent, respectively, in northwestern Liberia, not far from its borders with Sierra Leone and Guinea. These articles, along with other forgotten reports from the 1980s on antibody prevalence in neighboring Sierra Leone and Guinea, suggest the possibility of what some call “sanctuary sites,” or persistent, if latent, Ebola infection in humans. Bernice Dahn is the chief medical officer of Liberia’s Ministry of Health, where Vera Mussah is the director of county health services. Cameron Nutt is the Ebola response adviser to Dr. Paul Farmer at the nonprofit group Partners in Health.
  • 8. “Free” and “Open” • "Free software is a matter of liberty, not price. ’free speech', not 'free beer'”. (R M Stallman) • “A piece of data or content is open if anyone is free to use, reuse, and redistribute it” (OKFN)http://opendefinition.org/ • “open” (access) has multiple incompatible “definitions”. Major split is “human eyeballs” vs copying and machine “reusability” • “Open” is a marketing term for publishers, who frequently (often deliberately) do not grant full Openness. “Gratis” vs “Libre”
  • 9. http://www.budapestopenaccessinitiative.org/read … an unprecedented public good. … … completely free and unrestricted access to [peer- reviewed literature] by all scientists, scholars, teachers, students, and other curious minds. … …Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge. (Budapest Open Access Initiative, 2003)
  • 10. Scientific and Medical publication (STM)[+] • World Citizens pay $400,000,000,000… • … for research in 1,500,000 articles … • … cost $300,000 each to create … • … $7000 each to “publish” [*]… • … $10,000,000,000 from academic libraries … • … to “publishers” who forbid access to 99.9% of citizens of the world … • 85% of medical research is wasted (not published, badly conceived, duplicated, …) [+] Figures probably +- 50 % [*] arXiV preprint server costs $7 USD per paper
  • 11. • “creative use of these large data sets in the US health care sector could generate more than $300bn in value per annum” [MGI, McKinsey] • Gartner Inc. has identified 'Big Data' and 'Next-Generation Analytics' as two of the 'Top 10 Strategic Technologies' for 2012. • Given the volume of text generated by business, academic and social activities – in for example competitor reports, research publications or customer opinions on social networking sites – text mining is, however, highly important. [JISC] • there are some tasks that simply could not be achieved without using text mining. For example, a major pharmaceutical company used text mining tools to evaluate 50,000 patents in 18 months. This would have taken 50 person years to achieve manually, meaning that it would not even have been contemplated. [JISC] “Big Data – and Analytics (ContentMining)
  • 12. Prof. Ian Hargreaves (2011): "David Cameron's exam question”: "Could it be true that laws designed more than three centuries ago with the express purpose of creating economic incentives for innovation by protecting creators' rights are today obstructing innovation and economic growth?” “yes. We have found that the UK's intellectual property framework, especially with regard to copyright, is falling behind what is needed.” "Digital Opportunity" by Prof Ian Hargreaves - http://www.ipo.gov.uk/ipreview.htm. Licensed under CC BY 3.0 via Wikipedia - https://en.wikipedia.org/wiki/File:Digital_Opportunity.jpg#/media/File:Digital_Opportunity.jpg
  • 13. PUBLISHER TDM LICENCE INITIATIVES GENERALLY DO NOT HELP • Publishers have started offering their own TDM licences and policies • Their licences often impose unfair (and in the case of the UK, unenforceable) constraints on researchers’ freedom to exploit TDM, e.g., requiring users to employ publisher’s API, putting unnecessary restrictions on how much can be copied, or how fast it can be copied. • Why “unenforceable”? Because, as noted earlier, UK law specifically states that any contract or licence term that prevents anyone from doing TDM in the manner prescribed in the new exception shall be deemed null and void. • Really need a test case on these attempted restrictions. • Springer and Royal Society offer generous TDM provisions. • So why are so many publishers offering restrictive licences in the UK? Maybe they hope licensees are ignorant of the strength of the new law, or the publishers in fact don’t know about it. So they are either deliberately misleading, or ignorant Prof Charles Oppenheim and contentmine.org
  • 14. Elsevier wants to control Open Data [asked by Michelle Brook]
  • 15.
  • 16.
  • 17. Front. Pharmacol., 03 October 2011 | http://dx.doi.org/10.3389/fphar.2011.00051
  • 18.
  • 19.
  • 20. How “data” are published in the 21st C
  • 21.
  • 22. http://drugmonkey.scientopia.org/2010/08/11/yay-j-neuroscience-agrees-with-me-that- supplementary-materials-is-bs-and-ruining-science/ w00000t!!!!1111!!!!ELEVEN!!!! YAYAYAYAYAYAY!!!! Damn tootin'!!!!! Supplemental material also undermines the concept of a self-contained research report by providing a place for critical material to get lost. Methods that are essential for replicating the experiments, analyses that are central to validating the results, and awkward observations are increasingly being relegated to supplemental material. Such material is not supplemental and belongs in the body of the article, but authors can be tempted (or, with some journals, encouraged) to place essential article components in the supplemental material.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts
  • 36. Regular Expressions for Systematic Reviews of Animal Tests Preceding Text Following Text Extracted term Today’s Results!! We searched papers for 200 regex-based Terms and got ca 100 hits per paper
  • 37. Questions we can tackle • How to we find (mentions of) clinical/animal trials? • Is a document a trial? • What is the subject of the trial? • What is the methodology used? • Does the design and practice conform to CONSORT/ARRIVE? • What are the outcomes? • Can we extract specific re-usable information? • Who are involved? (researchers, sponsors, patients?) • Has a proposed trial been completed and reported?
  • 38. Linked Open Data – the world’s knowledge very little physical science and THESES??  http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png DBPedia BIO Comp Lib PDB Ontologies GOV GOV.uk Music, Art Literature Social Knowledge bases RDF triples
  • 40. The Right to Read is the Right to Mine http://contentmine.org
  • 43. PLoSONE BMC 1 BMC 2 Closed1 Closed2Hybrid CATalog Enhanced annotated articles FACTSFACTS Daily Crawl Crawl … Scrape … Normalize … Mine Linked OpenData Semantic Scientific Objects 2000-5000 Articles
  • 45. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts
  • 46. Machine-Human symbioses • Wikipedia • Open StreetMap • Google We aim to make it trivial for a human+machine to mine the scientific literature. By building Communities
  • 47. ContentMine Workshops and Hackdays Open Science Brazil, 2014-08 Easily distributed software Get started in 30 mins Build application in a morning Start simple: bagOfWords, Stemming, Regex, templates
  • 48. Facts Marked by “non-scientists” in ContentMine workshops With Wikipedia everyone can be a scientist
  • 49. Oxford 2013 Berlin 2014 Delhi 2014 Jenny Molloy with mascot AMI
  • 50. Workshops (1-hour -> full day or more) 2014-May->Nov • Budapest/Shuttleworth • Leicester Univ • Electronic Theses and Dissertations • Austrian Science Fund AT • OKFest DE • Eur. Bioinformatics Institute • Open Science Rio de Janeiro BR • Sci DataCon , Delhi IN • Univ of Chicago US • OpenCon 2014, Wash DC. US • JISC , London Upcoming • LIBER • Cochrane • BL • Wellcome Trust (April) • WHO Collaborators • Wikimedia/Wikidata • Mozilla • Open Knowledge • LIBER (European Research Libraries) • British Library • Wellcome Trust • EBI (Eur. Bioinf. Inst.) • JISC • Open Access Button • SPARC • Creative Commons • CORE • EuropePubmedCentral
  • 51. • CRAWL the web for scientific documents (articles, grey literature, repositories) • quickSCRAPE pages (text, graphics, images, data) • NORMA-lize page to semantic form …Open semantic science … • MINE pages with your methods and tools (AMI) • CAT-alogue results in searchable index • Automate daily process (CANARY) contentmine.org Infrastructure
  • 52. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts
  • 53. quickscrape Crawl Feed Norma Index & Transform TXT XML URL DOI Scientific literature Repositories DOC CSV sHTML Plugins Regex SequencesSpecies Bespoke Scrapers XPathPer-Journal Taggers Per- Journal MetadataChemistry Phylogenetics Farming AMI BadHTML OCR Diagrams Open NORMA-lized Scientific Literature + Facts CANARY pipeline CAT-alogue index PDF
  • 54. https://commons.wikimedia.org/wiki/File:Flickr_-_DVIDSHUB_-_RSP_Warrior_Challenge_Prepares_Soldiers_Mentally,_Physically_%281%29.jpg CRAWLing the Literature NO Central Table of Contents Massive technical, political, legal opposition Little interest from Academia Tedious Few general tools
  • 55. The Right to Read is The Right To Mine PMR in 2012: http://blog.okfn.org/2012/06/01/the-right-to-read-is-the-right-to-mine/
  • 56. SCRAPE https://en.wikipedia.org/wiki/Gleaning#mediaviewer/File:Millet_Gleaners.jpg PublicDomain PDF HTML XML quickscrape* *Scrapers created by Richard Smith-Unna + Community HTML PDF XML PNG SVG CSV DOC LaTeX CIF … Non-standard per-publisher site
  • 57. https://en.wikipedia.org/wiki/W._Heath_Robinson#mediaviewer/File:Robinson%28WH%29-%28%27Uncle_Lubin%27%29.jpg PublicDomain NORMA-lization of Scientific Literature PDFs, Broken HTML PNGs for Math, etc. NORMA Unicode Diacritics Well-formed Sectioned Tagged SVG diagrams
  • 58. AMI-plugins • BagOfWords, Stemming and Regular Expressions • Species • Biological Sequences • Chemical compounds & reactions • Farming * (Rory Aaronson) • Crystallography * (Saulius Grazulis, COD) • Clinical Trials * (Amy Price) • Phylogenetics * (Ross Mounce) • Phytochemistry * (Chris Steinbeck, PMR) * subcommunities
  • 59. Text-based plugins • Bag of words (https://en.wikipedia.org/wiki/Bag-of- words_model) • https://en.wikipedia.org/wiki/Tf%E2%80%93idf (Term-frequency, inverse document frequency) • Templates and regexes (regular expressions).
  • 60. “Bag of Words” Three fulltext articles from trialsjournal.com
  • 61. Regular Expressions for Systematic Reviews of Animal Tests Preceding Text Following Text Extracted term
  • 62. “nuggets” in a scientific paper quantity units Value ranges Humans aren’t designed to mine this …  chemical project places
  • 64. Open Content Mining of FACTs Machines can interpret chemical reactions We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
  • 65.
  • 66. Ln Bacterial load per fly 11.5 11.0 10.5 10.0 9.5 9.0 6.5 6.0 Days post—infection 0 1 2 3 4 5 Bitmap Image and Tesseract OCR
  • 67.
  • 70. AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY: AMI reads the complete diagram, recognizes the paths and generates the molecules. Then she creates a stop-fram animation showing how the 12 reactions lead into each other CLICK HERE FOR ANIMATION (may be browser dependent)
  • 72. Peter Murray-Rust BMC publisher Blue Obelisk paper (20 co-authors) Sub-network From CATalog
  • 73. Phytochemistry extraction O. dayi “volatile composition of “ A.sibeiri A. judaica Displayed by CAT (CottageLabs)
  • 74. What we can do • Recognize and promote autonomous sub- communities • Engage Early Career Researchers, including undergraduates and let THEM BUILD the systems. • COMMUNALLY build tools for data checking • Insist on semantic data input, even if it costs submissions

Editor's Notes

  1. Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture. In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.