contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
The scientific and medical literature contains huge amounts of valuable unused information. This talk shows how to discover it, extract, re-use and interpret it. Wikidata is presented as a key new tool and infrastructure. Everyone can become involved. However some of the barriers to use are sociopolitical and these are identified and discussed.
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
Copyright is one of the greatest barrier to Open Data. This presentation for insidegovernment UK shows the struggle between those who want to reform copyright and those opposed to reform
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Asking the scientific literature to tell us about metabolismpetermurrayrust
Talk to Lhasa Ltd (a world leader in predicting drug metabolism and toxicity. Uses the scientific literature to answer questions on metabolism, chemical transformation. Almost all of the data in a paper can be queried.
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
The scientific and medical literature contains huge amounts of valuable unused information. This talk shows how to discover it, extract, re-use and interpret it. Wikidata is presented as a key new tool and infrastructure. Everyone can become involved. However some of the barriers to use are sociopolitical and these are identified and discussed.
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
Copyright is one of the greatest barrier to Open Data. This presentation for insidegovernment UK shows the struggle between those who want to reform copyright and those opposed to reform
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Asking the scientific literature to tell us about metabolismpetermurrayrust
Talk to Lhasa Ltd (a world leader in predicting drug metabolism and toxicity. Uses the scientific literature to answer questions on metabolism, chemical transformation. Almost all of the data in a paper can be queried.
Mining the scientific literature for plants and chemistrypetermurrayrust
ContentMine can read the daily scientific literature and extract facts. This talk was given to the OpenPlant project - with whom ContentMine collaborate at a meeting on 2016-07-25/27 in Norwich. Examples of extracted facts are given.
Architecture of ContentMine Components contentmine.orgpetermurrayrust
This is the evolving architecture of ContentMine (contentmine.org) architecture. It includes an overview ( slide #2, ) showing getpapers, quickscrape, norma and ami.
The key container is the CTree and the architecture shows where components are added or transformed to this.
These slides are dated and may be out-of-date wrt code. Some diagrams are autogenerated from *.dot files.
Please use http://discuss.contentmine.org/c/software as the main source of up-to-date info. Feel free to ask questions, offer help, critique, etc.
All s/w is Open (BSD, Apache2)
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
about 10,000 scholarly articles ("papers") are published each day. Amanuens.is is a symbiont of ContentMine and Hypothes.is (both Shuttleworth projects/Fellows) which annotates theses using an array of controlled vocabularies ("dictionaries"). The results, in semantic form are used to annotate the original material. The talk had live demos and used plant chemistry as the examples
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
a plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
Talk to EBI Industry group on Open Software for chemical and pharmaceutical sciences. Covers examples of chemistry , wit demos, and argues that all public knowledge should be Openly accessible
Asking the scientific literature to tell us about metabolismpetermurrayrust
Talk at Lhasa (https://www.lhasalimited.org/) a leading organization for "in silico prediction and database systems for use in metabolism, toxicology and related sciences". ContentMine software can extract data from papers on compound metabolism in reusable semantic form, including metabolic pathways, pharmacokinetic data.
Can Computers understand the scientific literature (includes compscie material)TheContentMine
Published on Jan 24, 2014 by PMR
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
High throughput mining of the scholarly literature TheContentMine
Published on Jun 7, 2016 by PMR
Talk given to statisticians in Tilburg, with emphasis on scholarly comms for detecting unusual features. Includes demo of Amanuens.is and image mining
We have developed image processing techniques to extract data from diagrams used in science and scientific publications. These slides were presented at a workshop session for the Cambridge MPhil in Computational biology. There is an overview of the main techniques for cleaning diagrams, such as thresholding, binarization, edge detection and thinning. Examples are given from plots, phylogenetic trees, chemistry and neuroscience spikes. All software is Open Source and most is Java
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Basics of ContentMining presented to Synthetic Biologists. This was followed by a lively discussion of what components could be extracted from the literature
Published on May 18, 2015 by PMR
Basics of ContentMining presented to Synthetic Biologists. This was followed by a lively discussion of what components could be extracted from the literature
The Culture of Research Data, by Peter Murray-RustLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Peter Murray-Rust, ContentMine.org and University of Cambridge
Published on Aug 22, 2014 by PMR
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
ContentMine: Open Data and Social MachinesTheContentMine
Published on Nov 13, 2014 by PMR
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
Presented to UIUC CIRSS seminars to a mixed group of Library, CS, domain scientists with a great contingent of Early Career Researchers. Starts by honouring the creation of the wonderful NCSA Mosaic at UIUC in 1993 and the paradise of knowledge and community it opened. Then shows the gradual and tragic decline of the web into a megacorporate neocolonialist empire, where knowledge is sacrificed for money and power.
You have seen many of the slides before but the words are different and have been recorded.
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
Published on Jun 04, 2015 by PMR
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
The scientific scholarly literature now contains many millions of articles. The contain semi-structured information of high quality and veracity. We show how this resource can be converted to a universal Wikicite format and full-text indexed against Wikidata dictionaries. We now have > 5 million bibliographic records and over 200 dictionaries based in Wikidata properties and queriable by SPARQL.
A presentation by Open Climate Knowledge for European Forum for Advanced Practices. Showing how the scientific literature can be searched for knowledge on this multidisciplinary topic.
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
David De Roure's Inaugural Lecture on 28th October at Oxford e-Research Centre, University of Oxford, UK
10 years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artefacts, both for 'big science' and the 'long tail scientist'.
Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before?
This talk will describe three generations of e-Research, using the myExperiment social website as a lens to glimpse future research practice, and focusing on a web-scale computational musicology project as an illustration of 3rd generation thinking.
Also available from http://wiki.myexperiment.org/index.php/Presentations
The scientific and medical literature is a vast resource of knowledge, but it needs turning into semantic FAIR form. The ContentMine can do this and we presented a rapid overview of the potential
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
Published on May 19, 2016 by PMR
about 10,000 scholarly articles ("papers") are published each day. Amanuens.is is a symbiont of ContentMine and Hypothes.is (both Shuttleworth projects/Fellows) which annotates theses using an array of controlled vocabularies ("dictionaries"). The results, in semantic form are used to annotate the original material. The talk had live demos and used plant chemistry as the examples
Published on May 18, 2016 by PMR
Talk to EBI Industry group on Open Software for chemical and pharmaceutical sciences. Covers examples of chemistry , wit demos, and argues that all public knowledge should be Openly accessible
Automatic Extraction of Knowledge from the LiteratureTheContentMine
Published on May 11, 2016 by PMR
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
Published on Mar 16, 2016 by PMR
A plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Published on Feb 07, 2016 by PMR
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus. Includes clips of the software in action
Published on Jan 27, 2016 by PMR
We have developed image processing techniques to extract data from diagrams used in science and scientific publications. These slides were presented at a workshop session for the Cambridge MPhil in Computational biology. There is an overview of the main techniques for cleaning diagrams, such as thresholding, binarization, edge detection and thinning. Examples are given from plots, phylogenetic trees, chemistry and neuroscience spikes. All software is Open Source and most is Java
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
Published on Dec 17, 2015 by PMR
Every year 500 Billion USD of public funding is spent on research, but much of this lies hidden in papers that are never read. I describe how machines can help us to read the literature. However there is massive opposition from publishers who are trying to prevent open scholarship and who build walled gardens that they control
Published on Jul 21, 2014 by PMR
Jean-Claude Bradley was a pioneer of doing Open Science and on 2014-07-14 we held a memorial meeting in Cambridge (see also http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium)
Published on Jul 24, 2014 by PMR
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
Published on Nov 26, 2014 by PMR
Followup meeting in London to OpenCon2014, on the need for different models of scholarly communication. I explore the history of 20thC academic student-based revolutions, with special relevance to young people and the scope for action today.
Published on Mar 05, 2015 by PMR
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
Published on Dec 01, 2014 by PMR
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
Published on Mar 19, 2015 by PMR
Copyright is one of the greatest barrier to Open Data. This presentation for insidegovernment UK shows the struggle between those who want to reform copyright and those opposed to reform
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
1. Open Mining of the Bioscience Literature
Peter Murray-Rust,
ContentMine.org and the University of Cambridge
UNAM, MX 2015-10-09
Millions of data points are hidden in the bioscience literature.
ContentMine has Open technology to liberate them automatically.
Using OpenNotebook approaches
The major problem is politico-legal
This is an exploratory talk, looking for ideas and projects
The future depends on young people
4. Some particularly relevant Fellows/Alumni and projects:
• Rufus Pollock: Open Knowledge Foundation
• Mark Surman: Mozilla
• Dan Whaley: Hypothes.is
• Daniel Lombrana-Gonzales: PyBossa/Crowdcrafting
Erin McKiernan, 2015 Flash Award
ContentMine and Peter Murray-Rust are funded by:
5. The Right to Read is the Right to Mine
http://contentmine.org
6. ContentMine Workshops and
Hackdays
Open Science Brazil, 2014-08
Easily distributed software
Get started in 30 mins
Build application
in a morning
Start simple: bagOfWords, Stemming, Regex, templates
8. Why do we publish science?
• Communicate our results
• Archival
• Get feedback from peers.
• Provide material that others can re-use.
• Priority and esteem.
9.
10. http://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about-
ebola.html
[Liberian Ministry of Health] were stunned recently when we stumbled across
an article by European researchers in Annals of Virology [1982]: “The results
seem to indicate that Liberia has to be included in the Ebola virus endemic
zone.” In the future, the authors asserted, “medical personnel in Liberian health
centers should be aware of the possibility that they may come across active
cases and thus be prepared to avoid nosocomial epidemics,” referring to
hospital-acquired infection.
Adage in public health: “The road to inaction is paved with research
papers.”
Bernice Dahn (chief medical officer of Liberia’s Ministry of Health)
Vera Mussah (director of county health services)
Cameron Nutt (Ebola response adviser to Partners in Health)
A System Failure of Scholarly Publishing
16. Output of scholarly publishing
[2] https://en.wikipedia.org/wiki/Mont_Blanc#/media/File:Mont_Blanc_depuis_Valmorel.jpg
586,364 Crossref DOIs 201,507 [1] per month
1.5 million (papers + supplemental data) /year [citation needed]*
each 3 mm thick
4500 m high per year [2]
* Most is not Publicly readable
[1] http://www.crossref.org/01company/crossref_indicators.html
17. Scientific and Medical publication (STM)[+]
• World Citizens pay $450,000,000,000…
• … for research in 1,500,000 articles …
• … cost $300,000 each to create …
• … $7000 each to “publish” [*]…
• … $10,000,000,000 from academic libraries …
• … to “publishers” who forbid access to 99.9% of citizens of
the world …
• 85% of medical research is wasted (not published, badly
conceived, duplicated, …) [Lancet 2009]
[+] Figures probably +- 50 %
[*] arXiV preprint server costs $7 USD per paper
19. ContentMine approaches
0. Open software, Open content, Open notebooks
1. Daily liberation of facts which are easy and widely
useful.
– Species (Bacillus subtilis, Okapia johnstoni)
– Genes (BRCA1*, APOE)
– Chemicals (acetone, CH3OH)
– Identifiers (RRIDs, museum specimens, )
1. CMunities of practice with bespoke tools:
– Clinical Trials
– Phylogenetic trees
– Systematic reviews
21. Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
22. C) What’s the problem with this spectrum?
Org. Lett., 2011, 13 (15), pp 4084–4087
Original thanks to ChemBark
28. “adult nonpregnant patients, aged ≥18 years”,
“randomization sequence using a permuted block design with random
block sizes stratified by study center”.
“blinding of the patients and caregivers is not possible”.
“Investigators performing analysis are blinded for the intervention”.
“Continuous normally distributed variables … mean and standard deviation,
counts (n) and percentages (%). … Student’s t-test … or the Mann–Whitney U test
… Categorical … Chi-square test or Fisher's exact tests. Statistical significance is
considered to be at a P value <0.05 …”
Formulaic language in reporting clinical trials
29. Text-based plugins
• Bag of words
(https://en.wikipedia.org/wiki/Bag-of-
words_model)
• https://en.wikipedia.org/wiki/Tf%E2%80%93idf
(Term-frequency, inverse document frequency)
• Templates and regexes (regular expressions).
31. Regular Expressions for Systematic Reviews of Animal Tests
Preceding Text
Following Text
Extracted term
In 30 minutes 6 scientists (most were unfamiliar with regex)
wrote 200 regexes for ARRIVE (NC3R guidelines)
43. Traditional Research and Publication
“Lab” work paper/th
esis
Write
rewrite
Re-experiment
publish
???
Validation??
DATA
output “belongs”
to publisher
45. Open Notebook Content Mining
• “No insider knowledge”
• Anyone can become involved
• All raw non-copyright material on Github
• Planning and discussion on Open Discourse
• All output (however imperfect) on Github CC0
• Immediate upload
• Inspired by Free/Libre/Open Source, Wikipedia,
Open StreetMap.
50. Automatic Open Notebook of computations
Everything is posted to Github before being analyzed
51. Bacillus subtilis [131238]*
Bacteroides fragilis [221817]
Brevibacillus brevis
Cyclobacterium marinum
Escherichia coli [25419]
Filobacillus milosensis
Flectobacillus major [15809775]
Flexibacter flexilis [15809789]
Formosa algae
Gelidibacter algens [16982233]
Halobacillus halophilus
Lentibacillus salicampi [18345921]
Octadecabacter arcticus
Psychroflexus torquis [16988834]
Pseudomonas aeruginosa [31856]
Sagittula stellata [16992371]
Salegentibacter salegens
Sphingobacterium spiritivorum
Terrabacter tumescens
• [Identifier in Wikidata]
• Missing = not found with Wikidata API
20 commonest organisms (in > 30 papers) in trees from IJSEM*
Half do not appear to be in Wikidata
Can the Wikipedia Scientists comment?
*Int. J. Syst. Evol. Microbiol.
70. Prof. Ian Hargreaves (2011): "David Cameron's
exam question”: "Could it be true that laws
designed more than three centuries ago with the
express purpose of creating economic incentives
for innovation by protecting creators' rights are
today obstructing innovation and economic
growth?”
“yes. We have found that the UK's intellectual
property framework, especially with regard to
copyright, is falling behind what is needed.” "Digital
Opportunity" by Prof Ian Hargreaves - http://www.ipo.gov.uk/ipreview.htm. Licensed under CC BY 3.0 via Wikipedia -
https://en.wikipedia.org/wiki/File:Digital_Opportunity.jpg#/media/File:Digital_Opportunity.jpg
Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture.
In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.