A publisher would care about open data for several reasons:
1) Open data increases the value of all parts of the web by allowing programs, not just people, to utilize the data through interconnecting and joining it.
2) Publishers are evolving from linear supply chains focused on content delivery to users, to becoming marketplaces that optimize the number of interactions between users through networked open science.
3) The future of publishing involves networked open science where data is openly accessible, annotated with metadata, and linked together in research objects, increasing findability, accessibility, interoperability, and reusability of research outputs.
What role can publishers play in the open data ecosystem?Varsha Khodiyar
Presentation at session 3 of the NIH workshop 'Role of Generalist Repositories to Enhance Data Discoverability and Reuse' on Feb 11th, at the NIH Main Campus.
Keynote presentation at 2020 NIH/NLM workshop on generalist repositories. Central themes include software as a richer pathway to data than articles, the development of new metrics for software (such as the CHAOSS framework), working with the technology companies through organizations like the Eclipse Foundation, and the importance of linked data. In particular, the concept of the "value line" as a means to map generalist repositories represents an important opportunity.
An update on the latest BioSharing work; including work with ELIXIR and NIH BD2K, also our survey to assess user needs (530 replies) and the work on the recommender tool
What role can publishers play in the open data ecosystem?Varsha Khodiyar
Presentation at session 3 of the NIH workshop 'Role of Generalist Repositories to Enhance Data Discoverability and Reuse' on Feb 11th, at the NIH Main Campus.
Keynote presentation at 2020 NIH/NLM workshop on generalist repositories. Central themes include software as a richer pathway to data than articles, the development of new metrics for software (such as the CHAOSS framework), working with the technology companies through organizations like the Eclipse Foundation, and the importance of linked data. In particular, the concept of the "value line" as a means to map generalist repositories represents an important opportunity.
An update on the latest BioSharing work; including work with ELIXIR and NIH BD2K, also our survey to assess user needs (530 replies) and the work on the recommender tool
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
Slides from Friday 3rd August - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
An overview on FAIR Data and FAIR Data stewardship, and the roadmap for FAIR Data solutions coordinated by the Dutch Techcentre for Life Sciences. This presentation was given at the Netherlands eScience Center's "Essential skills in data-intensive research" course week.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
Slides from Friday 3rd August - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
An overview on FAIR Data and FAIR Data stewardship, and the roadmap for FAIR Data solutions coordinated by the Dutch Techcentre for Life Sciences. This presentation was given at the Netherlands eScience Center's "Essential skills in data-intensive research" course week.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
FAIR for the future: embracing all things dataARDC
FAIR for the future: embracing all things data - Natasha Simons, Keith Russell and Liz Stokes, presented at Taylor & Francis Scholarly Summits in Sydney 11 Feb 2019 and Melbourne 14 Feb 2019.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Presented by Michael Victor, Abenet Yabowork, Jane Poole, Harrison Njamba, Erick Rutto and Peter Ballantyne at the ILRI open access week workshop, ILRI, Nairobi, 23-25 October 2019
NISO Two Day Virtual Conference:
Using the Web as an E-Content Distribution Platform:
Challenges and Opportunities
Oct 21-22, 2014
Maryann Martone, Ph.D., Professor of Neuroscience, University of California, San Diego
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk discusses findings from an analysis of data sharing and citation policies in Open Access journals and describes a set of novel tools for open data publication in open access journal workflows. Bring your lunch and enjoy a discussion fit for scholars, Open Access fans, and students alike.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology.
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
The NIDDK Information Network (dkNET; http://dknet.org) is a open community resource for basic and clinical investigators in metabolic, digestive and kidney disease. dkNET’s portal facilitates access to a collection of diverse research resources (i.e. the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain) that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). This webinar was presented by dkNET principle investigator Dr. Jeffrey Grethe.
Ross Wilkinson - Data Publication: Australian and Global Policy DevelopmentsWiley
Australia invests $AUD1-2B per annum in research data. Like most countries, it wants to get the best return possible on this data. Europe is spending E1.4B on their open data “pilot”. This means the data should be FAIR: findable, accessible, interoperable, and reusable. Part of this is that data should be routinely “published” and available in a “data repository”. But what does this mean?
Ross Wilkinson
CEO, Australian National Data Service
Presented at the 2015 Wiley Publishing Seminar, 5 November, Melbourne, Australia.
Facilitating good research data management practice as part of scholarly publ...Varsha Khodiyar
Presentation given to the SciDataCon #IDW2018 session: Democratising Data Publishing: A Global Perspective, on Tuesday 6th November 2018, Gaborone, Botswana
Talk at the World Science Festival at Columbia, June 2, 2017: session on Big Data and Physics: http://www.worldsciencefestival.com/programs/big-data-future-physics/
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 – March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
1. Why would a publisher care
about open data?
Anita de Waard
November 2019
2. Why would a publisher care about open data?
What do we mean by open?
What do we mean by data?
What do we mean by a publisher?
3. data
Data, after all, is stuff machines can handle […]
we could create a world in which it would be programs
-- not just people -- that would enjoy the data.
For data, as for documents, the value of any part of the web is
increased by the amount of other stuff out there.
For documents it is the ability to follow links,
but for open data it is the ability to also interconnect and join,
to summarise and compare, to monitor, extrapolate, to infer.
Tim Berners-Lee, 2009
NOW!
• Provenance of data: STAR Methods at Cell
• Contributor Roles (CRediT) taxonomy
• Citation and linking to data and software
• Versioned linking to data & software
REAGENT/RESOURCE SOURCE IDENTIFIER
Antibodies
Rabbit monoclonal anti-
Snail
Cell Signaling
Technology
Cat#3879S; RRID:
AB_2255011
Mouse monoclonal anti-
Tubulin (clone DM1A)
Sigma-Aldrich Cat#T9026; RRID:
AB_477593
Rabbit polyclonal anti-
BMAL1
This paper N/A
Bacterial and Virus Strains
pAAV-hSyn-DIO-
hM3D(Gq)-mCherry
Krashes et al.,
2011
Addgene AAV5;
44361-AAV5
AAV5-EF1a-DIO-
hChR2(H134R)-EYFP
Hope Center Viral
Vectors Core
N/A
Cowpox virus Brighton
Red
BEI Resources NR-88
Zika-SMGC-1,
GENBANK: KX266255
Isolated from
patient (Wa 2016)
N/A
Staphylococcus aureus ATCC ATCC 29213
Streptococcus pyogenes:
M1 serotype strain: strain
SF370; M1 GAS
ATCC ATCC 700294
Biological Samples
Healthy adult BA9 brain
tissue
University of
Maryland Brain &
Tissue Bank
Cat#UMB1455
4. 19.11.2019
Elsevier Data Solutions for Research
open
Scholix: A Linked Open Data Hub
to connect papers and datasets
Research Object Composer:
An Open source editor for
Research Objects
5. a publisher
What does a publisher even do anymore?
cites
20081977
newexisting
Example 1: Human papilloma virus causes cervical cancer
6. What does a publisher even do anymore?
Example 2: Top 20 universities in Quantum Computing
7. 7
7
Author
Editor/
Publishers
Reader/
User
Researcher
Data Results Article UI
article
article
article
article
tool
tool
data
user
user
tool
data
article
article
tool
tool
data
data
data
datauser
user
user
article
Model: Castle
• Goal: selling content
• Metrics: number of units sold
• Strategy: optimize content delivery to users
Model: Marketplace
• Goal: grow number of interactions
• Metrics: number of interactions between users
• Strategy: optimize number of network interactions
Today:
linear supply chains
Linear supply chains are evolving into complex,
dynamic and connected value webs
Win by reputation Win by trust
Why publishers care about open science:
The future:
networked open science
8. 19.11.2019
Elsevier Data Solutions for Research
Extra Slides:
1. Elsevier in numbers
2. Research Data Management
3. Research Object Composer
4. Entellect and Life Science Solutions
5. Data analytics: Quantum Computing
6. Elsevier and Open Science
10. Elsevier by the numbers
25,000
Our products are used at
more than 25,000 Academic
and Government institutes
globally
14+ m
people a month use Science
Direct, our flagship platform
for academic research
320+
Reaxys®'s ML capability enables the
chemistry of drug discovery, and
materials innnovation for over 320
pharma innovators, 130 chemical
companies, and over 1100
7,500
Elsevier has 7,500
employees and serves
customers in over 180
countries.
430,000
Elsevier publishes 430,000
peer-reviewed articles
annually
9 m
Mendeley is a scientific social media
platform that enables around 9
million users worldwide, to organize,
write, collaborate and promote their
12. 19.11.2019
Elsevier Data Solutions for Research
Elsevier Data Solutions for Research
DisseminateAnalyzeCollaborateControlStoreCreate & Collect
Collect
Create
Extract
Store
Secure
Manage
Control
Workspaces
Researchers
Data sets
Search
Integrate
Analyze
Share
Publish
Archive
EntellectTM
MACRO EDC
Hivebench GDPR
13. 19.11.2019
Elsevier Data Solutions for Research
How we deliver
1. Open system: through open
APIs, modules can be
integrations with other RDM tools
2. Data remains private at or
owned by institution
3. System is integrated with the
researcher workflows, to ensure
simple and clear use
4. Researchers continue to work
the same way, avoiding
additional bureaucracy and
administration
14. 19.11.2019
Elsevier Data Solutions for Research
Data Search
Retrieve active data, discover public data
Discover data
• 10 million+ datasets indexed from more than
35 repositories
• Deep indexing of data significantly enhances
the relevancy of results
• Keyword search within data files
• Filter search results by specific author,
institution, journal, subject category
Retrieve active data*
• Navigate to locally held institutional data
• Powerful keyword search and filtering
15. 19.11.2019
Elsevier Data Solutions for Research
Data Manager
Researchers can
• Share data privately within a research project
• Invite external collaborators to join a project
• Gather research data from data sources as it’s
generated (including ELNs)
• Annotate research data with detailed, subject-
specific metadata
• Curate data according to project or institutional
workflows
• Prepare to publish data on a repository of your
choice
• Open APIs allow tailored upload forms, automated
workflows, analyze and re-upload data files
Go from raw files to active datasets
16. 19.11.2019
Elsevier Data Solutions for Research
Data Repository
Researchers can
• Store up to 100 GB of data per
dataset in many formats
• Describe how experiments can be
reproduced
• Keep track of dataset versions
• Create DOI
for citation
(or university prefix)
Store datasets in a secure and trusted repository
17. 19.11.2019
Elsevier Data Solutions for Research
Data Monitor
Institutions can
• Keep track of data inside
and outside institution
• Achieve credibility,
visibility and integrity of
key research outputs
• Maintain visibility of
events in RDM space
• Improve researcher's
adoption of data sharing
tools
• Communicate value of
data sharing to
researchers during the
research process
Encourage and monitor compliance
18. Five Facts about Elsevier and Research Data
Fact #1 Elsevier’s Mendeley Data supports the entire lifecycle of research data
The 5 modules that make up Mendeley Data are specifically designed to utilize data
to its fullest potential, simplifying and enhancing current way of working.
Fact #3 Mendeley Data is an open system
It is a flexible platform — modules are designed to be used together, standalone, or
combined with other Elsevier and non-Elsevier solutions
Fact #2 Researchers and institutions own and control all the data
Mendeley Data allows researchers to keep data private, or publish it under one of
16 open data licenses, so they stay in full control
Fact #4 Mendeley Data can increase the exposure and impact of research
Mendeley Data Search indexes over 10 million datasets from more than 35
repositories
Fact #5 Elsevier is an active participant in the open data community
Elsevier partners with the open data community, and is currently working on
more than 20 projects globally
19. 19.11.2019
Elsevier Data Solutions for Research
Mendeley Data already integrates through open APIs with the global Research Data
Management ecosystem, as well as other Elsevier solutions
+ 35 repositories
(BePress planned)
• Mendeley Data Repository
datasets are automatically
synced with the Pure
curation workflow
• Projects, grants,
equipment, showcase
on portal (planned)
• Mendeley Data Search results
are visible on Scopus
• Notify new articles to Monitor
for data sharing compliance
• Datasets appear as records
on Scopus (planned)
• Mendeley Data usage is
accessible through Plum API
and widget
• Plumx metrics (citations,
usage, social mentions) are
captured and shown on
Mendeley Data Repository
Publish datasets
alongside an article
on Mendeley Data
within the SSRN
publication flow
Publish or link datasets
alongside an article on
Mendeley Data within the
ScienceDirect publication flow
Researcher and
Institutional
Dataset metrics
• User identity & login
• Library (planned)
• Notes (planned)
• Projects (planned)
Existing integration
Planned integration
• Mendeley Data indexed
by OpenAIRE index
• OpenAire Zenodo
repository indexed by
Mendeley Data Search
Long-term
preservation of
published datasets
Links between articles and datasets:
• Contributed by Mendeley
Data to Scholix
• Indexed by Menndeley Data
Search and Data Monitor
• Consumed by Scopus and
ScienceDirect
Integrate with machine
readabledata management plans
• For more than 35 repositories the
metadata as well as the underlying
datasets are indexed by Mendeley
Data Search
• First repositories are actively
integrating with the free and open
‘push API’ of Mendeley Data
Search
• Mint DOIs for Mendeley Data
Repository
• Data Cite indexed by
Mendeley Data Search
21. Building an open interoperable data ecosystem:
Aggregates
link things together
Annotations
about things & their
relationships
Container
Packaging content & links:
Zip files, BagIt, Docker images
Identification
locate things
regardless where
21
22. Building an open interoperable data ecosystem:
database
Open
repository
Workflow Tool
Task 1
Workflow
Input
Task 2
Task 3
Output
Research Object Composer
http://www.researchobject.org
Research Object Profiler
Add annotation and
relationships (metadata)
to collection to describe a
research object:
- URI
- Length
- Filename
- Checksums
etc.
Research Object Serializer
(a manifest itemizing file names)
Serialise Research Object
in standard format based BagIt
=1
=2
=3
RO
1
2
3
Open API
22
Mendeley Data
RO
1
2
3
• DOIs
• Metadata
(Findability)
• Open repo
(Accessibility)
• Versioning
• RO Standard
(Interoperability,
Reusability)
23. • The RO Composer is not a registry of research objects, but it can list research objects currently under construction.
• The RO Composer is a microservice which responsibility is to help other services create and deposit research objects.
• The composer acts as a temporary construction site that can be completed by multiple services (e.g. a data management
system, a workflow system, a user interface).
• These clients will be jointly building a Research Object
that can then be validated according to the schema,
before the RO is downloaded or deposited into an archive
(like Zenodo or Mendeley Data).
• Clients of the RO Composer are applications
(driven by a user interface) or agents (engaged
automatically from other events, e.g. a workflow run).
• The RO Composer is not a required component to this:
any software may generate research objects by following
Research Object specifications.
Purpose of the Research Object Composer*:
23* From: https://github.com/ResearchObject/research-object-composer/blob/master/introduction.ipynb
27. 27
Human Papilloma Virus and Cervical Cancer
2008
zur Hausen awarded
Nobel Prize
1976
zur Hausen
proposes link
between HPV and
Cervical Cancer
1946
Papanicolau
develops PAP
smear
2006
Gardasil HPV
vaccine approved
Study impact of intervening
research in this talk
28. 28
Early Work
1977
“a hypothesis has been presented that the virus
found in genital warts may be involved in the etiology
of human genital cancer”
30. 30
Citation Mapping Process
19.11.2019
Build corpus of papers using broad search (~20,000 papers) on all aspects of cervical
cancer and HPV
Expand corpus by adding all cited works not in the original corpus
Add cited works from the cited corpus (“grandchild” references )
Connect the discrete steps of scientific advances connecting the works
Apply graph mathematics to find all connected paths
31. 31
Assembling The Graph
19.11.2019
• Dense interconnected web of
cititations
• Filter for only cited works within 3
years of the citing work – building
on relevant knowledge
First level Second level
Recognize
identities in
graph
Corpus
32. 32
Building the Corpus
19.11.2019
'papillomaviridae' AND 'cancer' AND [article]/lim - 2,747 results from 1975-2019
• 55,414 references total cited in this set
• 29,064 unique references (the references overlap) 1870-2019
• 719,470 references cited in this set of 29,064 papers
• 259,908 unique in this set.
Total corpus of work using this method is 182,402 unique articles
• Citation network has 103,443 edges
33. 33
Path Finding
19.11.2019
Select “interesting” endpoints
• Significant starting point – proposal that HPV could be related to cancer
• Significant endpoint – recognition of HPV/cancer connection
Use graph traversal analytics to find all paths greater than 5 papers that connect the two
ideas
Separate by year
44. Top 20 universities active in Quantum Computing
University of
Waterloo
National University
of Singapore
Massachusetts
Institute of
TechnologyUniversity of Science
and Technology of
China
University of Oxford
Tsinghua University
University of Tokyo
Harvard University
University of
Maryland
University of New
South Wales
University of
California at Santa
Barbara
ETH Zurich
University of Sydney
RAS
University of
Southern California
Perimeter Institute
for Theoretical
Physics
University College
London
Princeton University
University of
Michigan
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 50 100 150 200 250
FWCI
Publications
55. ELSEVIER I Elsevier Open Science: Creating value through collaboration I
CONFIDENTIAL
55
Global market dynamics and technologies are reconfiguring the academic ecosystem:
Macroeconomic developments
Ecological and societal sustainability
• Global population is growing; 9B people in 2050
• Challenge to produce more with less and cleaner
input
• Challenge to solve poverty and unequal
allocation of resources
Shifting power balance from West to East
• Strong economic growth in China and India
• Rise of the middle class; improvement of
educational and health care system and food
supply chain
Technological developments
The web
• Everyone is a publisher
• Content access is ubiquitous
The social web
• Professional and personal networks emerge
without traditional institutions
• Everyone is a peer reviewer
Big data
• Explosion of data through networking of
measurement tools
• Radically cheaper tools and computing
power
Social developments
• Pressure from society and funders to
justify the costs of science
• Need for reliable research results (that can
be trusted).
• Patients/citizens demand access.
increased participation
• Distributed computing makes it easier to
make and share tools, content and code
• Overall need for more transparency and
accountability, also in doing and reporting
research
Emergence of open
science
Open Peer Review
New social networks
Data, tools and workflows are sharedOpen Data
Society is engaging moreOpen API’s Open Source Software
56. ELSEVIER I Elsevier Open Science: Creating value through collaboration I
CONFIDENTIAL
Carl Kesselman builds tools to enable
neuroscientists to store and share their data
in a better way
Viktor Pankratius builds software programs
that generate hypotheses about volcano
eruptions: the software can steer drones to
collect data.
Lena Deus solves scientific problems
through Kraggle: the system awards her
points for scoring highest on Machine
Learning tasks.
Scientists build data sharing
tools Computers are scientists
Science becomes a game,
which anyone can play
Some examples of Open Science:
57. ELSEVIER I Elsevier Open Science: Creating value through collaboration I
CONFIDENTIAL
57
Moving to a network of connected components:
Take an Open Source data repository and find some Open Data:1
Deriva, an Open
Source data
repository
2
Write some Open Source
software to mash them up:
3 Share outputs as
OA/OD/OS:
Share new data
sets on data
Deriva
Publish
papers in an
OA journal
Share code on
platforms like
Github
user
A
1
Community adds
elements to open
science platforms that
can be used by
everyone.
2
Researchers build upon
the combination of
shared content/system
elements. This leads to
new scientific knowledge
and output.
All sharable elements find
their way to other open
platforms and formats and
can be re-used, causing a
network effect.
3
Networked system:
PLATFOR
M A
Data v1
user
B
PLATFOR
M BTools B
Open Research Platform
Data v2
Tools Carticle
user
C
Open Data
Repositorie
s
Open
Access
Journals
Code
Networks
Neuroscience data
Jupyter Notebook to calculate
properties
Share code on
platforms like
Github
58. ELSEVIER I Elsevier Open Science: Creating value through collaboration I
CONFIDENTIAL
58
Manu-
facturers
Distri-
butors
Consu-
mers
Suppliers
data
tool article user
article
article
article
article
tool
tool
data
user
user
tool
data
article
article
tool
tool
data
data
data
datauser
user
user
article
Open Science represents a transition from a pipeline to a networked knowledge system:
Model: Castle
• Goal: selling content
• Metrics: number of units sold
• Strategy: optimize content delivery to users
Model: Marketplace
• Goal: grow number of interactions
• Metrics: number of interactions between users
• Strategy: optimize number of network interactions
Today:
linear supply chains
The future:
networked open
science
Linear supply chains are evolving into complex, dynamic and connected value
webs
Win by reputation Win by trust
59. ELSEVIER I Elsevier Open Science: Creating value through collaboration I
CONFIDENTIAL
59
Some current Open Science efforts:
Open
Access
Open
Data
Open
Metrics
Research
Integrity
&
Reproduci
bility
Science
&
Society
Open Tools and Software
Open Science
Open Access:
- Hybrid/Gold journals, open/self-
archive options
- Contributing to CHORUS,
CrossMark, RA21
- ‘Platinum OA’ on bepress Digital
Commons
- Pilot SSRN Preprint of the Lancet
.
Research Integrity and Reproducibility:
Many efforts, including:
- Full GDPR Compliance across all Elsevier products
- Preregistration and Registered Reports
- STAR Methods for Cell, transparent reporting
- Plagiarism and Image manipulation detection
- Statistics checking
- Reproducibility badges/TOP guidelines
- Transparency in contributorship roles (CRediT
Taxonomy)
- Research collaborations e.g Humboldt, Data Integrity
Science and Society:
- Science Literacy effort: Topic Pages,
Audioslides, Science and People
- Access to content via Patient Inform,
Research4life, Bookshare and Load2Learn.
- Elsevier Foundation supporting many
projects including Green and Sustainable
Chemistry, awards for early-career women
scientists from developing world, many
more
Open Data:
- All data is open on all platforms
- Following TOP guidelines across board
- Coleads on Enabling FAIR Data requiring
data deposits in Earth & Space Science
- Coleads Data Citation Principles in
Force11
- Supporting Scholix Linked Data repository
and other open data standards, efforts
through RDA, ORCID, CrossRef, etc
Open Metrics:
- CiteScore free API
- PlumX metrics and NewsFlo: free layer of
societal impact metrics on article level
- Helping lead RDA Make Data Count effort
with CDL/Datacite to establish data
metrics
Open Tools and Software:
- Open APIs for most products
- Many research collaborations leading to Open Source
software, e.g. Github4Labs, NIH Data commons
- Hackathons, in medicine <Elsevier Hacks>, for Mendeley
- Content and data available for research and development
and hackathons
Editor's Notes
Analogies:
Manager is like OneDrive for dataset: collaborate on active project; Allows for review and approval of datasets prior to publication by library
Manager is the Trello for research project management
RESEARCHER: Example from Wouter: Why would a psychologist use this?
Project management dashboard : It enables organized project management (where is the data? Could be dropbox)
Templates can be set up
MOVE FROM FILES TO DATASET (files with description, metadata and structure)
Manager helps make your data FAIR
INSTITUTION: Monitir allows for clear presentation and enables librarians to make a decision to keep/delete private data, esp when someone has left the instituions. Archival policies. Monitor helps prevent «data loss»
Now let’s dive a little deeper into each module, starting with Repository. We know that counting only publications does not reflect the true amount of research created during an experience- we know there is likely more than 1 dataset tied to a published article. By using Repository, Researchers can:
Store up to 100GB of data per dataset
Ensure proper metadata tagging and storage
Increase discoverability of their dataset by easily creating a DOI to allow for citation. This also ensures datasets gets counted as a research output.
Standards-based metadata framework for logically and physically bundling resources with context http://researchobject.org
So let’s get to quantum computing, which is the area we were asked to focus on within the larger topic of quantum technologies. Here we can see the institutions that create the largest number of papers on QC, with the Chinese Academy of Sciences and CNRS, two national lab systems, at or near the top.
If we flip this to look at field-weighted citation impact, however, a measure of the works relative impact in the field, we get a very different picture—still highly international, but more US institutions here, and notably a number of US companies producing high-impact work.
The word cloud represents the top 50 semantically-derived keyphrases for the total set of papers representing quantum computing.
If we click on the specific term “polynomial approximation” in the word cloud, we find out how quickly the topic is growing over the last 5 years, and even which individuals and instituions worldwide are working on the particular concept of polynomial approximation. It’s immediately evident that quantum computing is a highly international and competitive field. And remember, 50 keyphrases exist for each of the 100,000 topics that are modeled in the topic prominence calculation.
Let’s slice the data in a different way. Here are top 20 institutions outside the US, again arranged by FWCI, who are doing important work in quantum computing. Notice anything? Virtually every one of these is a university.
Here’s the same list for the US. What is different here? For the US list alone, there are 3 large corporations, the NSF, and a DOE national lab contributing high-impact research. We know that quantum computing is being invested in and chased vigorously across the globe. The Chinese are pouring immense financial resources into this, and they have plenty of human talent, including many who are likely employed by the people in this room.
In my view, it is this nexus of different organizations, the close linkages between them, that gives the US its edge, if we have any edge. SEMATECH is another example of a complex of different organizations engaging in coordinated action. Over 90% of the research papers that Google publishes, and over 80% that IBM publishes, are done with one or more collaborators from academia.
So what does this difference look like in action? This geomap captures global research activity in quantum computing. The size of the bubble is the number of papers, the color intensity is the FWCI. Here we can see research is fairly evenly distributed in the US, Europe, and East Asia.
The Y axis here is the Field-Weighted Citation Impact for each university, while the position on the X-axis looks at total number of papers—clearly UC Santa Barbara is doing something exceptional here, we’ll explore that a bit more later. Waterloo and NUS are producing a lot of papers, though at a relatitvely low citation impact. Generally, the more papers one is publishing, the lower the overall impact will be. (traken from slightly different data set)
We can look at other proxies for quality, including the number of outputs in top percentiles—here the percentage of research in the top 10% of cited outputs, which is around 29% for the US in 2016, around 15.5% for non-US institutions.
Here’s the same map, but now the color intensity is the level of academic-corporate collaboration. The dark red are tech companies, but US universities also have much higher levels of AC collaboration than others. Europe and Asia are very pale by comparison.
Let’s look at different and more granular view of the same information. So there is a lot going on in this graphic—It’s a different way of looking at the landscape. The bluer the dot, the higher the FWCI. The thicker the line, the more papers are shared between between the two nodes. Network centrality implies higher levels of connectedness. Japan is peripheral and mostly connected to other Japanese entities. China, particularly Tsinghua and UST China, are more connected, Singapore still more so. However, they are not as connected or central as a few key US, UK, Australian and Canadian institutions, and one can clearly see that as few large US corporations are also quite central here.
In my view, one remaining advantage the US seems to have (in addition to lots of high-quality research) is the nexus between industry and academia--because of the enormous manufacturing complexities, the SEMATECH kind of highly coordinated approach (academia/industry/govt) may make more sense in this sector than many--also given questions of cryptographic security and national security implications.
,. We can also look at three-factor analysis. Here we map total scholarly output on the Y axis. US output of 2392 papers (2008-2016) represents about 27% of global output. The X axis is the level of academic-corporate collaboration. 7.7% of US papers, but only 1.2% of non-US papers, are AC collaborations. Finally, the size of the bubble shows the number of patent citations for every thousand papers published. For the US, this is 111 citations, meaning over 11% of these papers were cited in patents worldwide. It generally takes 3-5 years before papers are cited in patents, so this likely understates the total since we have 2016 papers in here. The same measure for non-US institutions is 21.6 per 1000, less than one-fifth the level.
This graphic really points out the large gap between the US and the ROW regarding UI collaboration, and overall patenting activity driven by university research as well.
The quantum computing topic is actually an aggregate made up of somewhat more and distinct granular topics—the same kinds of analysis can be done on these topics, which are generated directly from the topical model that I covered earlier.
This is the same topics by country and number of downloaded articles.
We can look at top corporations publishing in this area, and can see that the bulk are US firms with some Japanese representation as well.
Top universities for the same topic—Yale, UCSB, Berkeley, and MIT produce a great deal, with UCSB and Yale authors having a particularly high FWCI
One can always do a Keyphrase-based analysis if you want to delve into a particular aspect of the topic. Here we look at the same set of papers on flux qubits that cover the concepts of circuits, resonators, and Josephson junctions—note the number of papers from Yale has gone down from 85 to 46 here. Dr. Devoret has produced more work than anyone else covering these concepts.