Published on Nov 13, 2014 by PMR
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Copyright is one of the greatest barrier to Open Data. This presentation for insidegovernment UK shows the struggle between those who want to reform copyright and those opposed to reform
Maths, Chemistry, Physics are very well suited for the Semantic Web, but very poorly represented. Here I show how valuable it can be and what (relatively little) needs to be dome
A 5-minute presentation at University of Edinburgh for UK Ontology Workshop 2013-04-11. The animals demonstrate that ontologies can be simple and lament the lack og good ontologies in most of physical science, especially computational chemistry. Blog at http://blogs.ch.cam.ac.uk/pmr
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
Published on Jun 04, 2015 by PMR
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
Jean-Claude Bradley was a pioneer of doing Open Science and on 2014-07-14 we held a memorial meeting in Cambridge (see also http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium)
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
ContentMine: Liberating scholarship from Open publications and thesesTheContentMine
Published on Apr 21, 2015 by PMR
Theses represent a huge amount of untapped value. We show how contentmine.org technology can be used to mine them and extract knowledge
The scientific and medical literature is a vast resource of knowledge, but it needs turning into semantic FAIR form. The ContentMine can do this and we presented a rapid overview of the potential
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Copyright is one of the greatest barrier to Open Data. This presentation for insidegovernment UK shows the struggle between those who want to reform copyright and those opposed to reform
Maths, Chemistry, Physics are very well suited for the Semantic Web, but very poorly represented. Here I show how valuable it can be and what (relatively little) needs to be dome
A 5-minute presentation at University of Edinburgh for UK Ontology Workshop 2013-04-11. The animals demonstrate that ontologies can be simple and lament the lack og good ontologies in most of physical science, especially computational chemistry. Blog at http://blogs.ch.cam.ac.uk/pmr
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
Published on Jun 04, 2015 by PMR
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
Jean-Claude Bradley was a pioneer of doing Open Science and on 2014-07-14 we held a memorial meeting in Cambridge (see also http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium)
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
ContentMine: Liberating scholarship from Open publications and thesesTheContentMine
Published on Apr 21, 2015 by PMR
Theses represent a huge amount of untapped value. We show how contentmine.org technology can be used to mine them and extract knowledge
The scientific and medical literature is a vast resource of knowledge, but it needs turning into semantic FAIR form. The ContentMine can do this and we presented a rapid overview of the potential
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Dagger 2 Injeção de dependências no mundo AndroidClerton Leal
Uma palestra sobre o histórico do uso de injeção de dependências no mundo android, passando pelo Guice, Dagger e Dagger 2. Link para o código de exemplo no ultimo slide.
Are you facing problem with your LAPTOP or TABLET, searching one permenant solutation for all these issues. Comeon step into Amcsquare is the Solutation for all your problems.
A presentation by Open Climate Knowledge for European Forum for Advanced Practices. Showing how the scientific literature can be searched for knowledge on this multidisciplinary topic.
Basics of ContentMining presented to Synthetic Biologists. This was followed by a lively discussion of what components could be extracted from the literature
Published on May 18, 2015 by PMR
Basics of ContentMining presented to Synthetic Biologists. This was followed by a lively discussion of what components could be extracted from the literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
Presented to UIUC CIRSS seminars to a mixed group of Library, CS, domain scientists with a great contingent of Early Career Researchers. Starts by honouring the creation of the wonderful NCSA Mosaic at UIUC in 1993 and the paradise of knowledge and community it opened. Then shows the gradual and tragic decline of the web into a megacorporate neocolonialist empire, where knowledge is sacrificed for money and power.
You have seen many of the slides before but the words are different and have been recorded.
Opening talk at the "Interdisciplinary Data Resources to Address the Challenges of Urban Living” Workshop at the Urban Big Data Centre, University of Glasgow, 4 April 2016
Can Computers understand the scientific literature (includes compscie material)TheContentMine
Published on Jan 24, 2014 by PMR
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
The Culture of Research Data, by Peter Murray-RustLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Peter Murray-Rust, ContentMine.org and University of Cambridge
The ContentMine system (Open Source) can search EuropePMC and download hundreds of articles in seconds. These can be indexed by AMI dictionaries allowing a rapid evaluations and refinement of the search
Published on Mar 05, 2015 by PMR
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
A dystopian view of our evolving knowledge infrastructure. Talk in session "Reproducibility in new digital scholarship – bigger, faster, better?" at the Alan Turing Institute Symposium on Reproducibility for Data Centric Research, St Hugh's, Oxford, 7th April 2016
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Keynote talk at the Web Science Summer School, Singapore, 8 December 2014. Today we see the rise of Social Machines, like Twitter, Wikipedia and Galaxy Zoo—where communities identify and solve their own problems, harnessing commitment, local knowledge and embedded skills, without having to rely on experts or governments.
The Social Machines paradigm provides a lens onto the interacting sociotechnical systems of our hybrid digital-physical world, citizen-centric and at scale—emphasising empowerment and sociality in a world of pervasive technology adoption and automation.
This talk will present the Social Machines paradigm as an approach to social media analytics and a rethinking of our scholarly practices and knowledge infrastructure.
Similar to ContentMine: Open Data and Social Machines (20)
High throughput mining of the scholarly literature TheContentMine
Published on Jun 7, 2016 by PMR
Talk given to statisticians in Tilburg, with emphasis on scholarly comms for detecting unusual features. Includes demo of Amanuens.is and image mining
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
Published on May 19, 2016 by PMR
about 10,000 scholarly articles ("papers") are published each day. Amanuens.is is a symbiont of ContentMine and Hypothes.is (both Shuttleworth projects/Fellows) which annotates theses using an array of controlled vocabularies ("dictionaries"). The results, in semantic form are used to annotate the original material. The talk had live demos and used plant chemistry as the examples
Published on May 18, 2016 by PMR
Talk to EBI Industry group on Open Software for chemical and pharmaceutical sciences. Covers examples of chemistry , wit demos, and argues that all public knowledge should be Openly accessible
Automatic Extraction of Knowledge from the LiteratureTheContentMine
Published on May 11, 2016 by PMR
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
Published on Mar 16, 2016 by PMR
A plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Published on Feb 07, 2016 by PMR
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus. Includes clips of the software in action
Published on Jan 27, 2016 by PMR
We have developed image processing techniques to extract data from diagrams used in science and scientific publications. These slides were presented at a workshop session for the Cambridge MPhil in Computational biology. There is an overview of the main techniques for cleaning diagrams, such as thresholding, binarization, edge detection and thinning. Examples are given from plots, phylogenetic trees, chemistry and neuroscience spikes. All software is Open Source and most is Java
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
Published on Dec 17, 2015 by PMR
Every year 500 Billion USD of public funding is spent on research, but much of this lies hidden in papers that are never read. I describe how machines can help us to read the literature. However there is massive opposition from publishers who are trying to prevent open scholarship and who build walled gardens that they control
Published on Jul 21, 2014 by PMR
Jean-Claude Bradley was a pioneer of doing Open Science and on 2014-07-14 we held a memorial meeting in Cambridge (see also http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium)
Published on Jul 24, 2014 by PMR
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
Published on Aug 22, 2014 by PMR
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
Published on Nov 26, 2014 by PMR
Followup meeting in London to OpenCon2014, on the need for different models of scholarly communication. I explore the history of 20thC academic student-based revolutions, with special relevance to young people and the scope for action today.
Published on Dec 01, 2014 by PMR
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
1. ContentMine: Open data
And Social Machines
Peter Murray-Rust
,
Computation Lab, Univ of Chicago, 2014-11-12
2. ContentMine: We use machines to
liberate[1] 100 million facts /yr from
the scientific scholarly literature and
make them free for everyone
(WikiData)
WikiData and ContentMines are social
machines
There are no longer any technical
obstacles, only people.
[1] Friday workshop: build your own social machine: scraping XML,
4. http://en.wikipedia.org/wiki/Tim_Berners-Lee
Everything in this presentation is ODOSOS
(Open Data, Open Standards, Open Source)
CC0, CC-BY, W3C etc., Apache2, etc.
Open = “Free to use, re-use and redistribute
http://contentmine.org
http://bitbucket.org/petermr
http://wwmm.ch.cam.ac.uk
A promise: I (Petermr) will never sell out to non-transparent organizations.
5. http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(Budapest Open Access Initiative, 2003)
6. Scientific and Medical publication (STM)[+]
• World Citizens pay $400,000,000,000…
• … for research in 1,500,000 articles …
• … cost $300,000 each to create …
• … $7000 each to “publish” [*]…
• … $10,000,000,000 from academic libraries …
• … to “publishers” who forbid access to 99.9% of
citizens of the world …
[+] Figures probably +- 50 %
[*] arXiV preprint server costs $7 USD per paper
7. petermr: I believe in Wikipedia
• 2006 http://en.wikipedia.org/wiki/User:Petermr
• 2006 started Open Data (term unknown then!)
• 2009: “the bit of Wikipedia that I wrote is correct” [challenging the
idea of “WP is junk”]
• 2009: “Wikipedia is the digital library of this century”
• 2012: I alert WP that Springer has copyrighted > 1000 of our
images [Springergate]
• 2014: “For facts in maths, physical and biological sciences I trust
Wikipedia.” (Wikimania2014)
12. …three problems—flawed design, non-
publication, and poor reporting—together
meant >85% of research funds were wasted, a
global total loss >100 billion USD per year.
[Lancet 2009]
[Even more] waste clearly occurs after
publication: from poor access, poor
dissemination, and poor uptake of the findings
of research. [PLOS Medicine 2014-05-27]
Bad publication wastes science
13. Publishers’ PDFs destroy science
PDFs do not contain words
or subscripts!
PDFs do not contain tables
and do not have columns
SVG is turned into JPEG because it’s easier to process
15. STM Publishers Licence
2012_03_15_Sample_Licence_Text_Data_Mining.pdf
(Summary: PMR has NO rights)
• [cannot publish to: ] “libraries, repositories, or archives”
• [cannot] “Make the results of any TDM Output available on an externally facing server or
website”
• “Subscriber shall pay a […] fee”
Heather Piwowar: “negotiating with publishers [made me physically ill]”
WE WALKED OUT
• Brit Library
• JISC
• RLUK
• OKFN
• …
• Ross Mounce
• PM-R
Licences destroy Content Mining
18. The scientist’s amanuensis
• "The bane of my life is doing things I know computers could do
for me" (Dan Connolly, W3C)
Example: A semantic amanuensis could
• Give me a daily digest of mineralogy papers
• Extract all the crystal structures from them
• Compute physical properties with GULP and NWChem
• Compare the results statistically
• Preserve and distribute the complete operation
• Prepare the results for publication
The semantic web is having a personal amanuensis
19. Artificial Intelligence in science
In 1970 chess and chemistry were the sandboxes for AI. Some
approaches:
• Lookup (Knowledge)
• Natural Language Processing (NLP)
• Brute force calculation (inc. physical methods)
• Tree-pruning and heuristics
• Logic (cf. OWL-DL)
• Human-machine integration (crowdsourcing)
• Computer Vision
Domain-specific Turing test: Can a machine pass a first-year
chemistry exam?
20. The Semantic Web
"The Semantic Web is an extension of the
current web in which information is given well-
defined meaning, better enabling computers
and people to work in cooperation."
Tim Berners-Lee, James Hendler, Ora Lassila, The
Semantic Web, Scientific American, May 2001
CC-BY-SA Images from Wikipedia
21. “Which Rivers flow into the Rhine and are longer
than 50 kilometers?” or “Which Skyscrapers
in China have more than 50 floors and have
been constructed before the year 2000?”
Open Crystallography?
“Which countries where tropical diseases are
endemic have published structures of chiral
natural products?”
Linked Open data from Wikipedia
CC-BY-SA from Wikipedia
22. The Right to Read is the Right to Mine
http://contentmine.org
23. • Science can be read and understood by
human-machine Amanuensis-symbionts.
• Amanuenses are based on Wikipedia,
databases and software (e.g. ContentMine’s
AMI)
• The results are fed back into WP and WikiData
http://en.wikipedia.org/wiki/Symbiosishttp://en.wikipedia.org/wiki/Eric_Fenby
24. • Crawl scientific literature
(Open Bibliography)
• Scrape each scientific article
(ContentMine-quickscrape)
• Extract the facts (ContentMine-AMI)
• Index (Wikipedia)
• Republish (WikiData)
Machine Extraction of scientific facts
25. RSU: Richard Smith-Unna
PMR: Peter Murray-Rust
CL: CottageLabs
Queues
Repos
Scientific
literature
Science
Plugins
Science
Volunteers
26. Linked Open Data – the world’s knowledge
very little physical science
http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png
DBPedia
BIO
Comp
Lib
PDB
Ontologies
GOV
GOV.uk
Music,
Art
Literature
Social
Knowledge
bases
RDF
triples
27. Part of a COD RDF entry
The Semantic Web understands this
28. Mathematics Markup Language
Energy of c.c.p lattice of argon
4 pages clippedHuman-friendly
Machine-
friendly
Many editors and tools exist
We used MathWeaver
Automatic!
MathML
32. Semantic network closes the loop
Data mined from
document
Data available for
e-science and re-
use
Computation
Measurement
Semantic
Authoring
Community
Analysis
33. The network grows autonomously
Machine-machine
Machine-human
Human-machine
Human-human
41. Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
42. But we can now
turn PDFs into
Science
We can’t turn a hamburger into a cow
48. Chemical Optical Character Recognition
Small alphabet, clean typefaces, clear boundaries make
this relatively tractable. Problems are “I” “O” etc.
49. AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home
Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:
AMI reads the complete diagram,
recognizes the paths and
generates the molecules. Then
she creates a stop-fram animation
showing how the 12 reactions
lead into each other
CLICK HERE FOR ANIMATION
(may be browser dependent)
51. Bacterial WP_phylogenetic tree
Our machines have read and interpreted 4300 in an hour with > 95% accuracy
Trees From http://ijs.sgmjournals.org/ used under new UK legislation (Hargreaves)
WP: Clostridium_butyricum
Genbank ID
American Type
Culture Collection
52. (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0036933 –
“Adaptive Evolution of HIV at HLA Epitopes Is Associated with Ethnicity in Canada” .
((n122,((n121,n205),((n39,(n84,((((n35,n98),n191),n22),n17))),((n10,n182),(
(((n232,n76),n68),(n109,n30)),(n73,(n106,n58))))))),((((((n103,n86),(n218,(n
215,n157))),((n164,n143),((n190,((n108,n177),(n192,n220))),((n233,n187),
n41)))),((((n59,n184),((n134,n200),(n137,(n212,((n92,n209),n29))))),(n88,(n
102,n161))),((((n70,n140),(n18,n188)),(n49,((n123,n132),(n219,n198)))),(((
n37,(n65,n46)),(n135,(n11,(n113,n142)))),(n210,((n69,(n216,n36)),(n231,n1
60))))))),(((n107,n43),((n149,n199),n74)),(((n101,(n19,n54)),n96),(n7,((n139
,n5),((n170,(n25,n75)),(n146,(n154,(n194,(((n14,n116),n112),(n126,n222)))
)))))))),(((((n165,(n168,n128)),n129),((n114,n181),(n48,n118))),((n158,(n91,(
n33,n213))),(n87,n235))),((n197,(n175,n117)),(n196,((n171,(n163,n227)),((
n53,n131),n159)))))));
http://en.wikipedia.org/wiki/Digital_image_processing
http://en.wikipedia.org/wiki/Newick_format http://en.wikipedia.org/wiki/Phylogenetics
53. Open notebook science is the practice of
making the entire primary record of a research
project publicly available online as it is
recorded. (WP)
Jean-Claude Bradley was a chemist who
actively promoted Open Science in
chemistry,… He coined the term Open
Notebook Science. … A memorial
symposium was held July 14, 2014 at
Cambridge University, UK.[9]
54. RSU: Richard Smith-Unna
PMR: Peter Murray-Rust
CL: CottageLabs
Queues
Repos
Scientific
literature
Science
Plugins
Science
Volunteers
55. Thanks
• Shuttleworth Foundation and Fellowship
• Contentmine.org: Michelle Brook, Jenny Molloy,
Ross Mounce, Richard Smith-Unna,
CottageLabs, Charles Oppenheim
• Open Knowledge Foundation Community
• Wikimedia Community
• Blue Obelisk Community
56. My/our Dream
• An Open Bibliography of science, updated
daily
• An interface for ContentMine to feed new
facts into WikiData
• Domain-specific enthusiasts to create and run
fact extraction and validation
• Wikipedia to become a C21 publisher of
reference science