Maths, Chemistry, Physics are very well suited for the Semantic Web, but very poorly represented. Here I show how valuable it can be and what (relatively little) needs to be dome
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
A 5-minute presentation at University of Edinburgh for UK Ontology Workshop 2013-04-11. The animals demonstrate that ontologies can be simple and lament the lack og good ontologies in most of physical science, especially computational chemistry. Blog at http://blogs.ch.cam.ac.uk/pmr
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
ContentMine: Open Data and Social MachinesTheContentMine
Published on Nov 13, 2014 by PMR
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
increasingly we find that mega-corporations have taken control over scholarship. We could use the scholarly literature as a knowledge resource but megacorps try to stop this - and often libraries support them rather than researchers.
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
A 5-minute presentation at University of Edinburgh for UK Ontology Workshop 2013-04-11. The animals demonstrate that ontologies can be simple and lament the lack og good ontologies in most of physical science, especially computational chemistry. Blog at http://blogs.ch.cam.ac.uk/pmr
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
ContentMine: Open Data and Social MachinesTheContentMine
Published on Nov 13, 2014 by PMR
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
increasingly we find that mega-corporations have taken control over scholarship. We could use the scholarly literature as a knowledge resource but megacorps try to stop this - and often libraries support them rather than researchers.
The scientific scholarly literature now contains many millions of articles. The contain semi-structured information of high quality and veracity. We show how this resource can be converted to a universal Wikicite format and full-text indexed against Wikidata dictionaries. We now have > 5 million bibliographic records and over 200 dictionaries based in Wikidata properties and queriable by SPARQL.
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
Open science, open-source, and open data: Collaboration as an emergent property?Hilmar Lapp
Talk I gave as part of the panel "How will cyberinfrastructure capabilities shape the future of scientific collaboration?" at the Cyberinfrastructure for Collaborative Science workshop, held at the National Evolutionary Synthesis Center (NESCent), May 18-20, 2011.
More information about the workshop at
https://www.nescent.org/wg_collabsci/2011_Workshop
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
Understanding the Big Picture of e-ScienceAndrew Sallans
A. Sallans. "Understanding the Big Picture of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
Can Computers understand the scientific literature (includes compscie material)TheContentMine
Published on Jan 24, 2014 by PMR
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
The scientific scholarly literature now contains many millions of articles. The contain semi-structured information of high quality and veracity. We show how this resource can be converted to a universal Wikicite format and full-text indexed against Wikidata dictionaries. We now have > 5 million bibliographic records and over 200 dictionaries based in Wikidata properties and queriable by SPARQL.
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
Open science, open-source, and open data: Collaboration as an emergent property?Hilmar Lapp
Talk I gave as part of the panel "How will cyberinfrastructure capabilities shape the future of scientific collaboration?" at the Cyberinfrastructure for Collaborative Science workshop, held at the National Evolutionary Synthesis Center (NESCent), May 18-20, 2011.
More information about the workshop at
https://www.nescent.org/wg_collabsci/2011_Workshop
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
Understanding the Big Picture of e-ScienceAndrew Sallans
A. Sallans. "Understanding the Big Picture of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
Can Computers understand the scientific literature (includes compscie material)TheContentMine
Published on Jan 24, 2014 by PMR
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Text (personal views position statement) to accompany presentation on what research infrastructures really need for data, XLDB-Europe, 8-10th June 2011, Edinburgh
Keynote presentation at GlobusWorld 2021. Highlights product updates and roadmap, as well as user success stories in research data management. Presented by Ian Foster, Rachana Ananthakrishnan, Kyle Chard and Vas Vasiliadis.
A presentation given at the "Data Stewardship: Increasing the Integrity and Effectiveness of Science and Scholarship" Session on Friday, June 8 2012 at the IASSIT 2012 conference in Washington DC.
This presentation introduced data publishing, using a social science (archaeology) case study to explore editorial processes and dissemination outcomes that increasingly demand “Linked Data” capabilities.
The common use by archaeologists of ubiquitous technologies such as computers and digital cameras means that archaeological research projects now produce huge amounts of diverse, digital documentation. However, while the technology is available to collect this documentation, we still largely lack community accepted dissemination channels appropriate for such torrents of data. Open Context (http://www.opencontext.org) aims to help fill this gap by providing open access data publication services for archaeology. Open Context has a flexible and generalized technical architecture that can accommodate most archaeological datasets, despite the lack of common recording systems or other documentation standards. Open Context includes a variety of tools to make data dissemination easier and more worthwhile. Authorship is clearly identified through citation tools, a web-based publication systems enables individuals upload their own data for review, and collaboration is facilitated through easy download and other features. While we have demonstrated a potentially valuable approach for data sharing, we face significant challenges in scaling Open Context up for serving large quantities of data from multiple projects.
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
This is the slidedeck for my ACRL 2015 TechConnect Presentation with Nicole Vasilevsky (OHSU). For more on the program see - <a>http://bit.ly/1xcQbCr</a>.
This is one out of a series of presentations which I have given during a recent trip to the United States. I will make them all public, but content does not vary a lot between some of them
Research results in peer-reviewed publications are reproducible, right? If only it was so clear cut. With high profile paper retractions and pushes for better data sharing by funders, publishers and the community, the spotlight is now focussing on the whole way research is conducted around the world.
This talk from the Software Sustainability Institute's Collaborations Workshop 2014 describes how cloud computing, with Microsoft Azure, is helping researchers realize the goals of scientific reproducibility.
Find out more at www.azure4research.com
Can machines understand the scientific literature?petermurrayrust
A presentation to Cambridge MPhil Computational Biology. 2020-11-11 . Presenters Peter Murray-Rust, Shweata Hegde and Ambreen Hamadani from https://github.com/petermr/openvirus .
This chunk is PMR with a large break in the middle for SH and AH talks.
I cover Global Challenges, knowledge equity, semantics of scientific articles, Wikidata, Data Extraction from images, and ethics/politics.
Answer: Yes, technically. No, politically as the Publisher-Academic Complex will block it.
Semantic content created from Open Access papers to help in the fight against viral epidemics. Includes contributions from NIPGR interns, 5 supported by Indian National Young Academy of Scientists.
Overview of openVirus project. Interns in India have worked for 2 months to extract scientific knowledge from the literature about viral epidemics. Covers data science, machine learning and virtual collaboration
Automatic mining of data from materials science literaturepetermurrayrust
The literature on materials science (batteries, etc.) contains huge amounts of scientific facts, but not in easily accessible form. our AMI program has been developed to automatically:
scrape , clean, annotate and display/publish
data for re-use in science.
Examples will be given from electrochemistry, magnetism and other fields . The general principles and (open) tech are applicable to many other disciplines.
A presentation by Open Climate Knowledge for European Forum for Advanced Practices. Showing how the scientific literature can be searched for knowledge on this multidisciplinary topic.
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
XML can represent almost all well derfined scientific objects. chemistry, plants medcine. But it's not yet widely used. Is this because publishers oppose thr re-use of science?
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
Highlights the importance of supporting Early Career Researchers to pursue their own ideas, possibly alongside their main research. Illustrated with biology but applies to all fields of science. This was a 14 min presentation and shows narratives of how ECRs develop and reinforce each other.
Presentation given at NUI, Galway 2019-04-11 for Open Science Week.
An overview of Early Career Researchers, their innovation and contribution towards Open Infrastructure
The ContentMine system (Open Source) can search EuropePMC and download hundreds of articles in seconds. These can be indexed by AMI dictionaries allowing a rapid evaluations and refinement of the search
The scientific and medical literature is a vast resource of knowledge, but it needs turning into semantic FAIR form. The ContentMine can do this and we presented a rapid overview of the potential
A 10-minute talk to lovers of early science (e.g. 1600-1900) at the Royal Society. Archivists , computer vision, scientific historical metadata all relevant.
I chose 4 examples of monochrome diagrams that I can extract something from automatically. Some of the methids would scale to larger volumes , e.g. tables for figures, or maps with points
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
WikiFactMine https://www.wikidata.org/wiki/Wikidata:WikiFactMine consists of several hundreds dictionaries created from Wikidata. They cover everything from science to medicine to geo to arts. Every item has a unique identifier (Q) and normally has several properties (P) creating a series of triples. Using SPARQL it's possible to create sophiticated queries and run them in seconds
The Publisher -Academic complex is a dystopian cycle where academia gives (mega)publishers manuscripts, reviews and money and the publishers give personal and institutional glory(vanity). This is analysed in its origins, impact and harm. The disruption can come from Advocacy/Activism, Community and Tools. Disruption comes from doing things Better or Novel, not Prices
AUDIO : https://soundcloud.com/damahub/peter-murray-rust-disturbing-the-publisher-academic-complex-210418-british-library
Thanks to DaMaHub
This has now been edited by Ewan McAndrew (Edinburgh Wikimedian in Residence) many thanks - to synchronize the slides with the soundtrack. https://media.ed.ac.uk/media/1_46h85ltt Brilliant
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
Presented to UIUC CIRSS seminars to a mixed group of Library, CS, domain scientists with a great contingent of Early Career Researchers. Starts by honouring the creation of the wonderful NCSA Mosaic at UIUC in 1993 and the paradise of knowledge and community it opened. Then shows the gradual and tragic decline of the web into a megacorporate neocolonialist empire, where knowledge is sacrificed for money and power.
You have seen many of the slides before but the words are different and have been recorded.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
1. The Semantic Web in Physical
Science/Engineering
Peter Murray-Rust
University of Cambridge
Open Knowledge Foundation
Culham Laboratory, 2013-09-11, UK
2. Themes
To make the complete scientific literature
accessible to machines and humans
•
•
•
•
The Semantic Web.
The power and need for Open
Building Communities
Multidisciplinarity
Funding includes JISC, Unilever, EPSRC.
3. The Semantic Web
"The Semantic Web is an extension of the
current web in which information is given welldefined meaning, better enabling computers
and people to work in cooperation."
Tim Berners-Lee, James Hendler, Ora Lassila, The
Semantic Web, Scientific American, May 2001
4. The scientist’s amanuensis
• "The bane of my life is doing things I know computers could do
for me" (Dan Connolly, W3C)
Example: A semantic amanuensis could
• Give me a daily digest of zeolite papers
• Extract all the crystal structures from them
• Compute physical properties with GULP and NWChem
• Compare the results statistically
• Preserve and distribute the complete operation
• Prepare the results for publication
The semantic web is having a personal amanuensis
5. Linked Open Data – the world’s knowledge
RDF
triples
Music,
Social
Art
Literature
Knowledge
bases
DBPedia
Lib
GOV.uk
Comp
PDB
GOV
Ontologies
BIO
very little physical science
http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png
6. Linked Open data from Wikipedia
“Which Rivers flow into the Rhine and are longer
than 50 kilometers?” or “Which Skyscrapers
in China have more than 50 floors and have
been constructed before the year 2000?”
Open Crystallography?
“Which countries where tropical diseases are
endemic have published structures of chiral
natural products?”
CC-BY-SA from Wikipedia
7. Semantics: (Things Take Time)*
• 1994 1st WWW Conference
• 1994 , Chemical MIME , Chemical Markup
Language (Henry Rzepa, PMR)
• 2001 UK eScience programme, eMinerals
• 2005 Materials Grid (Martin Dove group)
• 2006 Blue Obelisk (Open Source chemistry)
• 2011 PNNL (US) meetings and visit
• 2012 Semantic Physical Science (Cambridge)
*TTT: Piet Hein
12. MathML
Mathematics Markup Language
Energy of c.c.p lattice of argon
Automatic!
Human-friendly
4 pages clipped
Many editors and tools exist
We used MathWeaver
Machinefriendly
14. Current scientific information flow
… is broken for data-rich science
Non-semantic
data
PDF
Lineprinter output
Human input
Text files
Data extraction
difficult and
incomplete
Human
readers
15. Semantic network closes the loop
Measurement
Computation
Semantic
Authoring
Analysis
Community
Data available for
e-science and reuse
Data mined from
document
16. The network grows autonomously
Human-machine
Human-human
Machine-human
Machine-machine
24. Materials Search Challenge
• What would you like a “Google for materials”
to find for you in the scientific literature?
25. TimBerners-Lee’s Open data
http://5stardata.info
★
CIFDIC
ACS ★★
IUCr
make your stuff available on the Web (whatever
format) under an OPEN license
make it available as structured data (i.e. NOT
PDF)
CRYSTALEYE
★★★
use non-proprietary formats (e.g., CSV)
★★★
★
use URIs to denote things, so that people can
point at your stuff
★★★
★★
link your data to other data to provide context
29. Semantic authoring IUCr
• http://blogs.ch.cam.ac.uk/pmr/2012/01/23/brian-mcmahonpublishing-semantic-crystallography-every-science-data-publishershould-watch-this-all-the-way-through/
•
•
•
•
•
•
•
•
•
1:08 CIF
3:36 CIF Syntax and dataTypes
4:30 Publishing with CIF
6:41 Demonstration: CheckCIF
12:02 Interactive Chemical validation
14:42 Linking data to journal article and search for novelty of data
15:08 Jmol display applet
21:03 Supplementary data
21:47 PublCIF a tool to merge data and text and annotate them
34. Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
35. Open Content Mining of FACTS
Machines can interpret phylogenetic trees
Unusable
FACT
Re-usable
FACT
>100,000 diagrams in literature; cost 1,000,000,000 hours
37. Crowdcrafting for Aegis/CERN
•
•
•
•
•
•
•
Does antimatter fall down or up?
Help the AEgIS experiment at CERN to work out how antimatter is affected by
gravity. Just join the dots!
Antimatter
The observable universe is composed almost entirely of matter but we can
produce stuff called antimatter in the lab. Antimatter is material composed of
antiparticles.
Antiparticles have the same mass as normal matter particles but the opposite
charge. When an antiparticle collides with an ordinary matter particle they both
annihilate - producing a burst of other particles and radiation.
Antiparticles should interact gravitationally just like particles of ordinary matter
because Einstein's weak equivalence principle states that gravity doesn't depend
on composition. But if they don't then gravity is much more complicated than our
current understanding indicates.
http://crowdcrafting.org http://crowdcrafting.org/antimatter
39. RCUK
Wellcome
ERC
NSF …
require
fully OPEN
[at Research Data Alliance, we are entering a new “era of open science”, which will be “good
for citizens, good for scientists and good for society”.
She explicitly highlighted the transformative potential of open access, open data, open
software and open educational resources – mentioning the EU’s policy requiring open access
to all publications and data resulting from EU funded research.
http://blog.okfn.org/2013/03/21/we-are-entering-an-era-of-open-science-says-eu-vp-neeliekroes/#sthash.3SWDXDE6.dpuf
40. Open Definition
• “A piece of data or content is open if anyone is
free to use, reuse, and redistribute it —
subject only, at most, to the requirement to
attribute and/or share-alike.”
OPEN
NOT OPEN
PDB
COD,Crystaleye
CCDC, ICSD
RSC/ACS/IUCr CIFs
Elsevier/Wiley/Springer CIFs
Acta Cryst E
Acta Cryst ABCD (default)
CIF dictionaries
42. Crystaleye
• A database of 200,000 crystal structures scraped
from publications CIF supplemental information
• CML molecules and name-value pairs
• Re-usable as fragment base
Nick Day, Jim Downing, Sam Adams, N. W. England
and Peter Murray-Rust*
J.Appl.Cryst. (2012). 45 , 316–323,
doi:10.1107/S0021889812006462
http://wwmm.ch.cam.ac.uk/crystaleye
48. COD Letter to Editors 2012
[We] have become aware of growing concerns regarding the publication,
preservation and quality maintenance of crystallographic data. …However,
we believe that completely open deposition of data and multiple checks can
ensure the quality and wide availability of scientific data
[Please] recommend to your authors that, they also deposit their
supplementary crystallographic data into the COD when they submit
scientific papers to your journals.
Being open by its design, the COD enables the creation of multiple mirrors
and backup copies. It provides, thus, archival storage of scientific data with
adequate reliability. … services for reviewers and editors to facilitate the
peer-review. …since our database follows the Open Access model, all
material deposited into the COD is available to other databases. The COD
team actually encourages the use of our data collection for any possible
scientific or industrial application by putting the database into the public
domain
49. Recommendations for Open
Crystallography
• Require Open Crystal Data for all publications
• Deposition of Open Data in COD
• Integrate CIF dictionaries as RDF into Linked
Open Data
• Integrate COD into Linked Open Data Cloud
• CCDC/ICSD to publish RAW author CIFs Openly
50. Most “Open Access” is not re-usable
CC-BY / Reusable
Restricted by
licence or
lack of clarity
CC-NC
CC-ND
Nothing/
unclear
0
6000
PRICE per article USD
Ross Mounce
Panton Fellow
2012
51. Panton Principles for Open Data in Science
Why? Wanted to avoid the mess in OA
• Peter Murray-Rust, Cameron
Neylon, Rufus Pollock, John
Wilbanks
2008-> 2010 (launch) at
Panton Arms
Launch 2010
Peter
John
Jordan
Panton Fellowships (2012)Murray-Rust
Hatcher Wilbanks
Jenny
Molloy
Rufus
Pollock
Cameron
Neylon
“Licence STM Data as CC0”
53. * Data should be open
• Make your wishes clear
• Use an appropriate licence
54. Open Mining Manifesto
1. Define ‘open content mining’ in a broad and useful
manner
‘Open Content Mining’ means the unrestricted right of subscribers to extract, process and
republish content manually or by machine in whatever form (without prior specific
permissions and subject only to community norms of responsible behaviour in the electronic
age.
Text
Numbers
Tables
Diagrams
Graphical representations of relationships between variables
Images and video and audio when it is the means of expressing a fact.
Semantics (XML, RDF)
55. 2. Urge publishers and institutional repositories to adhere to the following principles:
Principle 1: Right of Legitimate Accessors to Mine
We assert that there is no legal, ethical or moral reason to refuse to allow legitimate
accessors of research content (OA or otherwise) to use machines to analyse the published
output of the research community. Researchers expect to access and process the full
content of the research literature with their computer programs and should be able to use
their machines as they use their eyes. The
right to read is the right to mine
Principle 2: Lightweight Processing Terms and Conditions
Mining by legitimate subscribers should not be prohibited by contractual or other legal
barriers. Publishers should add clarifying language in subscription agreements that content
is available for information mining by download or by remote access. Where access is
through researcher-provided tools, no further cost should be required. Users and
providers should encourage machine processing
Principle 3: Use
Researchers can and will publish facts and excerpts which they discover by reading and
processing documents. They expect to disseminate and aggregate statistical results as facts
and context text as fair use excerpts, openly and with no restrictions other than attribution.
Publisher efforts to claim rights in the results of mining further retard the advancement of
science by making those results less available to the research community; Such claims should
be prohibited.
Facts don’t belong to anyone.
56. 3. Strategies
Assert the above rights by:
Educating researchers and librarians about the potential of
content mining and the current impediments to doing so,
including alerting librarians to the need not to cede any of the
above rights when signing contracts with publishers
Compiling a list of publishers and indicating what rights they
currently permit, in order to highlight the gap between the
rights here being asserted and what is currently possible
Urging governments and funders to promote and aid the
enjoyment of the above rights.
57. Take-away messages
•
•
•
•
•
Lost/unused STM* data costs 30-100Billion /yr [1]
Licence: DATA as CCZero and TEXT as CC-BY
Content Mining for DATA is a RIGHT
Apathy is our worst enemy
Trust and empower young people
“A piece of content or data is open if anyone is free to
use, reuse, and redistribute it — subject only, at most,
to the requirement to attribute and/or share-alike.”
Une donnée est ouverte, si chacun est libre de l'utiliser,
de la réutiliser et de la redistribuer
*Scientific Technical Medical
[1] PMR: submission to UK Hargreaves process
58. To make the complete scientific literature
accessible to machines and humans