NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
This document provides information about Scientific Data, an online publication from Nature Research that publishes peer-reviewed descriptions of scientifically valuable datasets. It summarizes the goals of Scientific Data, which are to promote data sharing, reuse, and reproducibility. The document outlines the structured format for Data Descriptors, which include both a narrative component and experimental metadata. It describes the peer review process, which focuses on data quality, completeness of description, and potential for reuse rather than novelty of findings. Finally, it provides examples of diverse current content and encourages collaboration with data repositories.
ScientificData is a data journal that provides concise summaries of research data in 3 sentences or less:
ScientificData publishes structured data descriptors and accompanying research data to promote open and reproducible science. Data descriptors provide detailed methods and validation to allow other researchers to understand and reuse shared data. Through peer review of data quality and reuse potential, as well as providing incentives like citations, ScientificData aims to help address issues like selective reporting and make shared research data more accessible and useful.
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
- The document discusses the need for open and accessible data in research. It notes that over 50% of studies are not published due to selective reporting of results.
- There is a movement for "FAIR data" in life and medical sciences, where data is findable, accessible, interoperable, and reusable. However, not much data currently meets these standards.
- Publishers can play a role in incentivizing data sharing by implementing policies requiring data availability and format standards for publishing research. This includes supporting data citations and data journals.
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...Kudos
Scholars, scientists, academic institutions, publishers and funders are all interested in impact. We have different roles and goals, and therefore different reasons for needing to understand impact; we are therefore asking different questions about impact, and those questions continue to evolve, much as the concept of impact itself is evolving. To answer our different questions, do we need different data, in separate silos, or are we looking at the same data, from different angles? This session gathered researcher, library, publisher and metrics provider perspectives to consider who has an interest in impact, what data they are interested in, how they use it, and how the situation is evolving as e.g. business models and technical infrastructures shift.
Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. ChemSpider is one of the chemistry community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this eScience cheminformatics platform and the nature of the solutions that it helps to enable including structure validation, text mining and semantic markup, the National Chemical Database Service for the United Kingdom and the development of a chemistry data repository. We will also discuss the possibilities it offers in the domain of crowdsourcing and open data sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
This document discusses content standards for better describing scientific data. It notes that while some common features exist across domains, descriptions of experimental context are often inconsistent or duplicated. The author advocates for community-developed content standards to structure, enrich and report dataset descriptions and their experimental context to facilitate discovery, sharing, understanding and reuse of data. Standards should include minimum reporting requirements, controlled vocabularies and conceptual models to allow data to flow between systems. This will help enable better science from better described data.
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
The document discusses the vision of a "connected digital research enterprise" where researchers can more easily find and collaborate with others based on shared data and outputs. It describes a scenario where Researcher X discovers commonalities in data with Researcher Y, views Y's datasets and publications, and initiates a collaboration. Their joint work is captured and indexed, and a company utilizes some of the outputs while providing funding back to the researchers. The vision aims to more closely connect scientific work through shared digital resources.
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
This document provides information about Scientific Data, an online publication from Nature Research that publishes peer-reviewed descriptions of scientifically valuable datasets. It summarizes the goals of Scientific Data, which are to promote data sharing, reuse, and reproducibility. The document outlines the structured format for Data Descriptors, which include both a narrative component and experimental metadata. It describes the peer review process, which focuses on data quality, completeness of description, and potential for reuse rather than novelty of findings. Finally, it provides examples of diverse current content and encourages collaboration with data repositories.
ScientificData is a data journal that provides concise summaries of research data in 3 sentences or less:
ScientificData publishes structured data descriptors and accompanying research data to promote open and reproducible science. Data descriptors provide detailed methods and validation to allow other researchers to understand and reuse shared data. Through peer review of data quality and reuse potential, as well as providing incentives like citations, ScientificData aims to help address issues like selective reporting and make shared research data more accessible and useful.
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
- The document discusses the need for open and accessible data in research. It notes that over 50% of studies are not published due to selective reporting of results.
- There is a movement for "FAIR data" in life and medical sciences, where data is findable, accessible, interoperable, and reusable. However, not much data currently meets these standards.
- Publishers can play a role in incentivizing data sharing by implementing policies requiring data availability and format standards for publishing research. This includes supporting data citations and data journals.
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...Kudos
Scholars, scientists, academic institutions, publishers and funders are all interested in impact. We have different roles and goals, and therefore different reasons for needing to understand impact; we are therefore asking different questions about impact, and those questions continue to evolve, much as the concept of impact itself is evolving. To answer our different questions, do we need different data, in separate silos, or are we looking at the same data, from different angles? This session gathered researcher, library, publisher and metrics provider perspectives to consider who has an interest in impact, what data they are interested in, how they use it, and how the situation is evolving as e.g. business models and technical infrastructures shift.
Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. ChemSpider is one of the chemistry community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this eScience cheminformatics platform and the nature of the solutions that it helps to enable including structure validation, text mining and semantic markup, the National Chemical Database Service for the United Kingdom and the development of a chemistry data repository. We will also discuss the possibilities it offers in the domain of crowdsourcing and open data sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
This document discusses content standards for better describing scientific data. It notes that while some common features exist across domains, descriptions of experimental context are often inconsistent or duplicated. The author advocates for community-developed content standards to structure, enrich and report dataset descriptions and their experimental context to facilitate discovery, sharing, understanding and reuse of data. Standards should include minimum reporting requirements, controlled vocabularies and conceptual models to allow data to flow between systems. This will help enable better science from better described data.
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
The document discusses the vision of a "connected digital research enterprise" where researchers can more easily find and collaborate with others based on shared data and outputs. It describes a scenario where Researcher X discovers commonalities in data with Researcher Y, views Y's datasets and publications, and initiates a collaboration. Their joint work is captured and indexed, and a company utilizes some of the outputs while providing funding back to the researchers. The vision aims to more closely connect scientific work through shared digital resources.
This presentation was provided by Patricia Payton of Proquest during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
Wouter Haak's presentation on open science and research data management from the Elsevier Library Connect Event 2016 "Navigating the new publishing & open science terrain: what librarians need to know." Wouter is Elsevier's Vice President of Research Data Management Solutions.
This document summarizes Catriona MacCallum's presentation on data publishing at PLOS. The key points are:
1) PLOS requires authors to make all underlying data openly available without restriction, with rare exceptions. Authors must provide a Data Availability Statement describing compliance.
2) Over 47,000 PLOS papers have included a data statement. Most data is found within submission files or repositories like Dryad and Figshare. PLOS checks data accessibility and ensures anonymity of clinical datasets.
3) PLOS supports initiatives like CRediT for attributing research contributions and data citation principles for giving credit to data producers. PLOS is also involved in projects beyond traditional publishing like preprints and experimental
The document summarizes the bioCADDIE team's work developing the DATS (Data Tag Suite) metadata model. It describes the iterative development process including collecting use cases, mapping existing schemas, and refining the model. The key features of the DATS model include a set of core and extended metadata elements with defined properties, definitions, and allowed values. Core elements are generic while extended elements include domain-specific elements. Serializations include JSON and JSON-LD using schema.org vocabulary. The goal is to enable scalable discovery and access to datasets.
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
Whose articles cite a body of work? Is this a high-impact journal? How might others assess my scholarly impact? Citation analysis is one of the primary methods used to answer these questions.
Rebecca Raworth presented a workshop on research data management. The presentation covered:
- Why research data management plans are important, such as satisfying funder requirements and increasing research efficiency.
- Current requirements for data management plans in Canada.
- Tools for research data management, including Portage for creating data management plans and Dataverse for data storage and access.
- Best practices for organizing, documenting, storing and sharing research data, including using metadata standards, file naming conventions, and choosing appropriate data repositories.
Knowledge graph construction for research & medicinePaul Groth
1) Elsevier aims to build knowledge graphs to help address challenges in research and medicine like high drug development costs and medical errors.
2) Knowledge graphs link entities like people, concepts, and events to provide answers by going beyond traditional bibliographic descriptions.
3) Elsevier constructs knowledge graphs using techniques like information extraction from text, integrating data sources, and predictive modeling of large patient datasets to identify statistical correlations.
Presentation by Ruth Wilson on Nature Publishing Group's Scientific Data journal given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
Transparency and reproducibility in researchLouise Corti
Talk given at the ESS Summer School: An introduction to using big data in the social sciences, 20-24 July 2020, University of Essex, Colchester, UK.
In the morning we look at publishing and sharing data and the importance of research replication, code sharing, examining what methodological issues peer reviewers might look for in a published paper using big data. An increasing number of journals in the sciences and social sciences expect a high degree of transparency and knowing how best to publish high quality raw (or processed data), methodology and code is a useful skill. We show how ‘data papers’ help to elucidate how datasets were constructed, compiled and processed, and help to showcase the value of data beyond the original research.
This document provides an overview of library instruction for nursing students presented by Maletta Payne, the Nursing Library Liaison. The agenda covers accessing the library's website and LibGuides, research strategies like using subject headings and boolean operators, locating print and electronic materials, and services for graduate students including interlibrary loan and study rooms. Library hours and contact information for research help are also included.
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
This document discusses the need to better understand the impact of datasets beyond just citations. It notes that datasets can be engaged with in many ways, such as through views, saves, discussions, and recommendations, by various groups like researchers, teachers, students, and policymakers. It calls for exposing more metrics of engagement, supporting more tools for interacting with datasets at all stages, and making metrics and data more openly available to help reveal how datasets are being used.
F1000Research is an open science publishing platform that aims to provide unrestricted access to research findings including reanalyses, confirmatory, and negative results. It uses open peer review after publication and versioning of "living articles" to increase transparency and reproducibility. Authors are required to include the underlying source data for their results, hosted in open repositories, to allow reanalysis and reduce waste.
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
This document discusses leveraging publication metadata to help address the data ingest bottleneck in scientific publishing. It proposes integrating data submission with manuscript submission to journals to make data archiving integral to the publication process. This integrated approach would help overcome issues around orphan data and allow linking of publications to underlying data through identifiers. Benefits include increased data findability, reuse, and credit to data creators. Challenges include gaining widespread adoption among journals and developers.
Federal Funding Agency's Public Access Policies and YouMargaret Janz
Slides used for an information session about agency responses to the Feb. 22, 2013 OSTP Memo. Session was held May 7, 2015 in the Science & Engineering Library a Temple University and presented by Margaret Janz.
Focus was on NSF, NASA, and NIH response documents based on the interests of the attendees who made those indications when they RSVP'd to the event.
Notes added 5/13/15.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
This document outlines best practices for creating research data. [1] It recommends using consistent data organization with standardized formats and descriptive file names. [2] Researchers should perform quality assurance checks and use scripted programs to analyze data while keeping notes. [3] All aspects of data collection and analysis should be thoroughly documented. Following these practices will improve data usability, sharing, and reproducibility.
This document discusses the development of the DATS (Data Tag Suite), which is needed for DataMed to index data sources in a scalable way, similar to how JATS indexes literature for PubMed. The DATS model was developed through a community-driven process involving use cases and existing metadata schemas. It includes core and extended elements to describe datasets and other digital research objects. The model is designed around the dataset entity and serialized in JSON and JSON-LD mapped to schema.org to increase visibility, accessibility, and searchability. Efforts are ongoing to further align DATS with schema.org and integrate it with related metadata standards and tools.
This presentation was provided by Patricia Payton of Proquest during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
Wouter Haak's presentation on open science and research data management from the Elsevier Library Connect Event 2016 "Navigating the new publishing & open science terrain: what librarians need to know." Wouter is Elsevier's Vice President of Research Data Management Solutions.
This document summarizes Catriona MacCallum's presentation on data publishing at PLOS. The key points are:
1) PLOS requires authors to make all underlying data openly available without restriction, with rare exceptions. Authors must provide a Data Availability Statement describing compliance.
2) Over 47,000 PLOS papers have included a data statement. Most data is found within submission files or repositories like Dryad and Figshare. PLOS checks data accessibility and ensures anonymity of clinical datasets.
3) PLOS supports initiatives like CRediT for attributing research contributions and data citation principles for giving credit to data producers. PLOS is also involved in projects beyond traditional publishing like preprints and experimental
The document summarizes the bioCADDIE team's work developing the DATS (Data Tag Suite) metadata model. It describes the iterative development process including collecting use cases, mapping existing schemas, and refining the model. The key features of the DATS model include a set of core and extended metadata elements with defined properties, definitions, and allowed values. Core elements are generic while extended elements include domain-specific elements. Serializations include JSON and JSON-LD using schema.org vocabulary. The goal is to enable scalable discovery and access to datasets.
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
Whose articles cite a body of work? Is this a high-impact journal? How might others assess my scholarly impact? Citation analysis is one of the primary methods used to answer these questions.
Rebecca Raworth presented a workshop on research data management. The presentation covered:
- Why research data management plans are important, such as satisfying funder requirements and increasing research efficiency.
- Current requirements for data management plans in Canada.
- Tools for research data management, including Portage for creating data management plans and Dataverse for data storage and access.
- Best practices for organizing, documenting, storing and sharing research data, including using metadata standards, file naming conventions, and choosing appropriate data repositories.
Knowledge graph construction for research & medicinePaul Groth
1) Elsevier aims to build knowledge graphs to help address challenges in research and medicine like high drug development costs and medical errors.
2) Knowledge graphs link entities like people, concepts, and events to provide answers by going beyond traditional bibliographic descriptions.
3) Elsevier constructs knowledge graphs using techniques like information extraction from text, integrating data sources, and predictive modeling of large patient datasets to identify statistical correlations.
Presentation by Ruth Wilson on Nature Publishing Group's Scientific Data journal given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
Transparency and reproducibility in researchLouise Corti
Talk given at the ESS Summer School: An introduction to using big data in the social sciences, 20-24 July 2020, University of Essex, Colchester, UK.
In the morning we look at publishing and sharing data and the importance of research replication, code sharing, examining what methodological issues peer reviewers might look for in a published paper using big data. An increasing number of journals in the sciences and social sciences expect a high degree of transparency and knowing how best to publish high quality raw (or processed data), methodology and code is a useful skill. We show how ‘data papers’ help to elucidate how datasets were constructed, compiled and processed, and help to showcase the value of data beyond the original research.
This document provides an overview of library instruction for nursing students presented by Maletta Payne, the Nursing Library Liaison. The agenda covers accessing the library's website and LibGuides, research strategies like using subject headings and boolean operators, locating print and electronic materials, and services for graduate students including interlibrary loan and study rooms. Library hours and contact information for research help are also included.
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
This document discusses the need to better understand the impact of datasets beyond just citations. It notes that datasets can be engaged with in many ways, such as through views, saves, discussions, and recommendations, by various groups like researchers, teachers, students, and policymakers. It calls for exposing more metrics of engagement, supporting more tools for interacting with datasets at all stages, and making metrics and data more openly available to help reveal how datasets are being used.
F1000Research is an open science publishing platform that aims to provide unrestricted access to research findings including reanalyses, confirmatory, and negative results. It uses open peer review after publication and versioning of "living articles" to increase transparency and reproducibility. Authors are required to include the underlying source data for their results, hosted in open repositories, to allow reanalysis and reduce waste.
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
This document discusses leveraging publication metadata to help address the data ingest bottleneck in scientific publishing. It proposes integrating data submission with manuscript submission to journals to make data archiving integral to the publication process. This integrated approach would help overcome issues around orphan data and allow linking of publications to underlying data through identifiers. Benefits include increased data findability, reuse, and credit to data creators. Challenges include gaining widespread adoption among journals and developers.
Federal Funding Agency's Public Access Policies and YouMargaret Janz
Slides used for an information session about agency responses to the Feb. 22, 2013 OSTP Memo. Session was held May 7, 2015 in the Science & Engineering Library a Temple University and presented by Margaret Janz.
Focus was on NSF, NASA, and NIH response documents based on the interests of the attendees who made those indications when they RSVP'd to the event.
Notes added 5/13/15.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
This document outlines best practices for creating research data. [1] It recommends using consistent data organization with standardized formats and descriptive file names. [2] Researchers should perform quality assurance checks and use scripted programs to analyze data while keeping notes. [3] All aspects of data collection and analysis should be thoroughly documented. Following these practices will improve data usability, sharing, and reproducibility.
This document discusses the development of the DATS (Data Tag Suite), which is needed for DataMed to index data sources in a scalable way, similar to how JATS indexes literature for PubMed. The DATS model was developed through a community-driven process involving use cases and existing metadata schemas. It includes core and extended elements to describe datasets and other digital research objects. The model is designed around the dataset entity and serialized in JSON and JSON-LD mapped to schema.org to increase visibility, accessibility, and searchability. Efforts are ongoing to further align DATS with schema.org and integrate it with related metadata standards and tools.
The document discusses an organization that works in several areas related to data management including data capture, publication, provenance, ontologies, standards, and software development. They work with various communities and consortia in the UK, Europe, and internationally including those focused on enabling reproducible research and open science. They aim to increase annotation and use of standards in lab notes, spreadsheets, and tables and represent facts as linked data statements to make information accessible to both humans and machines.
This document outlines plans for a new content type called a Data Descriptor to be launched in May 2014 by the journal Scientific Data. A Data Descriptor will consist of an article component and a structured metadata component. It aims to provide all the information needed to understand, reuse, and reproduce datasets in a peer-reviewed publication. This will help address barriers to data sharing by providing credit for sharing data and making datasets more discoverable and reusable through standardized metadata. The Data Descriptor is presented as a complement to traditional journal articles and data repositories.
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013Susanna-Assunta Sansone
Overview of the landscape of standards in life sciences for the NIH BD2K
"Frameworks for Community-Based Standards Efforts" workshop
September 25, 2013 - September 26, 2013
Co-Chairs: Susanna Sansone, PhD and David Kennedy PhD.
The overall goal of this workshop is to learn what has worked and what has not worked in community-based standards efforts. Participants will have experience in leading specific community based standards initiatives. Prior to the workshop, participants will be asked to address in writing answers to specific questions regarding formulating, conducting, and maintaining such efforts. This information will be used to facilitate focused and actionable discussion at the workshop. Issuance of a Request for Information soliciting comment from the broader community on some of the key issues addressed in the workshop is currently envisioned.
Contact: BD2Kworkshops@mail.nih.gov
Agenda: Frameworks for Community-Based Standards Efforts (PDF 40.7KB)
Participant List: Roster of Invited Participants (PDF 32KB)
Forum (Join the discussion): http://frameworks.prophpbb.com
Watch Live: http://videocast.nih.gov/summary.asp?live=13088 - See more at: http://bd2k.nih.gov/workshops.html#cbse
This document summarizes Susanna-Assunta Sansone's presentation on open access and open data at Nature Publishing Group. Some key points discussed include:
- The benefits of open data including reducing errors/fraud and increasing return on investment in research. However, barriers also exist such as lack of incentives and standards.
- Recent initiatives at NPG to improve data/reproducibility such as requiring data behind figures and expanding methods sections.
- The role of data journals in increasing credit/visibility for shared data and promoting standards/best practices.
- Market research found researchers want increased visibility, usability, and credit for sharing their data.
Short overview responding to the following 4 questions, as suggested by the RDA Long Tail Data IG:
1. Name and location of institution/service
2. What type of data do you collect and how do you acquire the data?
3. What services do you provide?
4. How do you intend to interoperate with a global ecosystem of research data?
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
This document outlines the services provided by Scientific Data, a publication from Nature that helps authors publish, discover, and reuse research data. It provides structured metadata and a narrative component for Data Descriptors, which describe datasets in detail without new scientific findings. The publication works with over 50 repositories and provides submission assistance and semantic annotation to help authors find appropriate data archiving locations.
Susanna-Assunta Sansone is a data consultant and honorary academic editor who works on several projects related to making data FAIR (Findable, Accessible, Interoperable, Reusable). She is the associate director of Scientific Data, a peer-reviewed journal focused on publishing data descriptors to describe and provide access to scientifically valuable datasets. The goal of Scientific Data is to help promote open science and data reuse by publishing structured metadata and narratives about datasets alongside traditional research articles.
Susanna-Assunta Sansone is the Principal Investigator and Associate Director of a team that works on data capture, curation, publication, and standards. The team develops databases, works with various communities involved in data, and conducts software development and training. Key areas of focus include data provenance, ontologies, the semantic web, and working with pre-competitive initiatives.
The document discusses challenges in data standards, sharing, and publication in life sciences. It notes there are many reporting standards to describe experiments but issues in identifying, tracking usage, and assessing impact of standards. It proposes creating a registry of standards that is searchable and associates standards with data policies, databases, and metrics to evaluate use. This would help stakeholders identify appropriate standards and credit contributors to maintaining open standards.
This document discusses data management and curation in bioinformatics. It describes Susanna-Assunta Sansone as the principal investigator and team leader at the University of Oxford e-Research Centre, where her team works on data management, biocuration, software development, databases, and community standards and ontologies for various domains including toxicology, health, and agriculture. The document promotes the importance of data standards to enable data sharing and reproducibility in bioscience research.
Update on the BioSharing WG activities at the joint ELIXIR IG and BioSharing WG breakout: https://rd-alliance.org/joint-meeting-ig-elixir-bridging-force-wg-biosharing-registry.html
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Susanna-Assunta Sansone
This document discusses the importance of digital research objects being not just open, but FAIR (Findable, Accessible, Interoperable, Reusable). It notes that big life science companies have evolved from keeping data and innovation mostly inside the company to now distributing data more openly and collaborating in heterogeneous partnerships across different organizations. However, current academic incentive and evaluation systems do not properly recognize or reward activities like sharing data, software, publications or patents. The document calls for rethinking these systems and designing new career paths for data scientists to better align incentives with open and collaborative research practices.
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...Susanna-Assunta Sansone
Part of the SciDataCon14 workshop on "Data Papers and their applications" run by myself and Brian Hole to help attendees understand current data-publishing journals and trends and help them understand the editorial processes on NPG's Scientific Data and Ubiquity's Open Health Data.
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
This document introduces Scientific Data, a new peer-reviewed journal for publishing data descriptors from Nature Publishing Group. It will provide structured metadata and narrative articles to describe datasets for reuse. The journal is now open for submissions and will launch in May 2014, featuring an advisory panel and sections for standardized data descriptor articles and experimental metadata. It aims to give proper credit for data sharing and promote open access, reuse and peer review of curated scientific datasets.
The document summarizes a presentation about making scientific data FAIR (Findable, Accessible, Interoperable, Reusable). It discusses the concept of FAIR data and several of the presenter's related projects. Examples are provided of using standards like ISA-Tab to structure metadata and make datasets interoperable. The presentation outlines the presenter's roles in data capture, publication, and standards development efforts to promote FAIR data principles. Scientific Data, a new journal for peer-reviewed data descriptions, is introduced as a way to make datasets more discoverable and reusable.
The document discusses making experimental data and methods more reproducible and accessible by providing structured metadata alongside narrative descriptions. It recommends using community standards and ontologies to semantically tag key information, and machine-readable formats to structure descriptions in a consistent way. Tools are proposed to help authors report structured information and curate it according to these standards to make data fully FAIR (findable, accessible, interoperable, reusable). The goal is to move from experiments that are difficult to reproduce to those that are "born reproducible".
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
1) The document discusses Susanna-Assunta Sansone's roles and work related to promoting FAIR data standards and practices.
2) It highlights some of her leadership positions with organizations like BioSharing that work to map and promote standards.
3) The document also discusses Scientific Data, a peer-reviewed journal launched by Nature Publishing Group to publish detailed descriptions of scientifically valuable datasets to facilitate reuse.
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
Seminar Presentation for PMB Department, UC Berkeley for Love Data Week. Subject is how to prepare publications and associated data sets for maximum reuse.
This document summarizes the work of developing a Data Discovery Index prototype that helps users find and access shared biomedical data from various repositories. It ingests metadata from different standards and sources using ElasticSearch. It was presented at the Alan Turing Institute Symposium in April 2016. The project aims to organize data through an aggregator framework and portal. It involves mapping various metadata standards to have maximum coverage of use cases with minimal data elements. More information can be found at the listed websites.
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
This document discusses the DataBridge project, which aims to enable easier discoverability and use of long tail science data. DataBridge will create a multidimensional network and social network for scientific data by mapping datasets connected by relationships between their metadata, usage, and the methods used to analyze them. This will allow researchers to more easily find relevant datasets by automatically forming communities of similar data. The document outlines DataBridge's vision and progress to date, including the algorithms it is investigating for measuring similarity between datasets in order to facilitate searching for collaborators and discoveries.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Scientific Data is a new category of publication that provides detailed descriptions of scientifically valuable datasets to improve data reproducibility and reuse, with descriptors covering topics like methods, data records, and technical validation. These descriptors undergo a peer review process to assess completeness, consistency, integrity, and experimental rigor. The publication is hosted on Nature.com and aims to improve data discoverability, curation, and peer review through machine-readable metadata and clear links between data, descriptors, and related research papers.
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
Talk given at the Sciencedigital@UNGA75 on 29th September as part of a series of side events to mark the 75th anniversary of the United Nations General Assembly.
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
A 45min presentation given at the 'Getting published in Nature's Scientific Data journal', hosted by the University of Cambridge Research Data Management team (www.data.cam.ac.uk). Presented on Monday 11th January 2016.
1) Big data standards are needed to make data understandable, reusable, and shareable across different databases and domains.
2) Effective standards require reporting sufficient experimental details and context in both human-readable and machine-readable formats.
3) Developing standards is a collaborative process involving different stakeholder groups to define requirements, vocabularies, and data models through both formal standards bodies and grassroots organizations.
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...NASIG
Libraries have long sought to demonstrate the value of their collections through a variety of usage statistics. Traditionally, a strong emphasis is placed on high usage statistics when evaluating journals in collection development discussions. However, as budget pressures persist, administrators are increasingly concerned with looking beyond traditional usage metrics to determine the real impact of library services and collections. By examining journal usage in the context of scholarly communication, we hope to gain a more holistic understanding of the use and impact of our library’s resources. In this session, we begin by outlining our methodology for gathering comprehensive publication and citation data for authors affiliated with Northwestern University’s Feinberg School of Medicine, utilizing Web of Science as our primary data source and leveraging a custom Python script to manage the data. Using this data we discuss various potential metrics that could be employed to measure and evaluate journals in institutional and field-specific contexts, including but not limited to: number of publications and references per journal, co-citation networks, percentage of references per journal, and increases or decreases of references over time per title. We then consider the development of normalized benchmarks and criteria for creating field-specific core journal lists. We also discuss a process for establishing usage thresholds to evaluate existing journal subscriptions and to highlight potential gaps in the collection. Finally, we apply and compare these metrics to traditional collection development tools like COUNTER usage reports, cost-per-use analysis, Inter-Library Loan statistics and turnaway reports, to determine what correlations or discrepancies might exist. We finish by highlighting some use-cases which demonstrate the value of considering publication and citation metrics, and provide suggestions for incorporating these metrics into library collection development practices.
Speakers: Joelen Pastva and Jonathan Shank, Northwestern University
Project GitHub page: https://goo.gl/2C2Pcy
This document discusses challenges around scholarly data, including fragmented and poorly described data. It emphasizes the importance of experimental details, data availability, and data publication for reproducibility. Springer Nature's Scientific Data is highlighted as a new open-access journal for detailed data descriptors. The Scientific Data ISA-explorer is presented as a web application for discovering, exploring and visualizing data descriptors.
Similar to High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014 (20)
This document summarizes a presentation given by Susanna Sansone at the GSC 23rd meeting education day in Bangkok, Thailand on August 7, 2023. The presentation discussed standards across life sciences, including definitions of different types of standards and over 1,600 identified standards. It covered standard organizations and grassroots groups, as well as the FAIRsharing database which catalogs over 2,885 standards and databases and aims to promote their use and value across research.
The FAIRsharing journey in RDA document discusses:
1) FAIRsharing's growth and involvement with RDA since 2011, including its Working Group established in 2015 to curate standards, databases, and policies to promote FAIR data.
2) FAIRsharing's current activities and impact, such as its registry of over 4,000 records from many disciplines and usage in various tools and services.
3) Opportunities for further engagement with RDA, such as leveraging their expertise for contributions to the FAIR Cookbook, an open resource providing technical recipes for applying FAIR principles to life science data.
Overview of metadata standards, and how FAIRsharing and the FAIR Cookbook help selecting and using them. Presentation to the What is metadata? Common standards and properties. EHP Workshop, November 9, 2022: https://ephconference.eu/pre-conference-programme-441
Pharmas and academia are joining forces to make data FAIR (Findable, Accessible, Interoperable, and Reusable) through the development of the FAIR Cookbook. The FAIR Cookbook provides over 70 recipes and growing that give step-by-step guidance on improving the FAIRness of different data types through the use of tools, technologies, and best practices. It aims to provide practical examples and guidelines to support researchers, data managers, and others in managing data according to FAIR principles. The FAIR Cookbook is an open, community-developed resource overseen by an editorial board, with contributions from nearly 100 life sciences professionals.
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
Overview of FAIR, FAIRsharing and the FAIR Cookbook at the ATI event on Knowledge Graphs: https://github.com/turing-knowledge-graphs/meet-ups/blob/main/symposium-2022.md
Presentation to the EOSC workshop on policies (https://www.google.com/url?q=https://eoscfuture.eu/eventsfuture/monitoring-eosc-readiness-fair-data-policies) on what FAIRsharing does for policies, including providing registration, discovery, flexible and clearer descriptions, relationships, machine readability and comparability.
The document summarizes how FAIRsharing assists others with promoting FAIR data principles without directly assessing FAIRness compliance. It does this by (1) providing a lookup service for standards and repositories via its API, (2) serving as a registry for FAIRness tests and indicators to make them discoverable, and (3) enabling communities to create profiles declaring which standards and repositories they use. The document also outlines FAIRsharing's operations, advisory boards, and future plans to further support assessment and tracking of FAIRness improvements over time.
ELIXIR is a European infrastructure that brings together life science resources from across Europe. It offers databases, tools, computing capabilities, and training opportunities. ELIXIR nodes provide these services and connect national data infrastructures. ELIXIR communities connect infrastructure experts to drive service developments. ELIXIR is funded through a mixed model including public sources. It works to sustain important biological data resources and make data FAIR through recommended standards and interoperability resources. ELIXIR also aims to develop a sustainable tools ecosystem and provides training through its portal.
Presentation to the EC Workshop on Maximizing investments in health research: FAIR data for a coordinate COVID-19 response. Workshop III, November 8, 2021.
Presentation to the EC Workshop on Maximizing investments in health research: FAIR data for a coordinate COVID-19 response. Workshop I, October 11, 2021.
The FAIR Cookbook poster, as presented at the ELIXIR-UK Node and the UK Conference of Bioinformatics and Computational Biology 2021: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21
The FAIR Cookbook poster, as presented at the UK Conference of Bioinformatics and Computational Biology 2021: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
1. !
High quality data publications:!
drives and needs!
!
Susanna-Assunta Sansone, PhD!
!
@biosharing!
@isatools!
@scientificdata!
!
B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 12 Nov, 2014
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
3. Plagued by selective reporting of data and methods
• Over 50% of completed studies in
biomedicine do not appear in the
published literature!
!
• Often because results do not
conform to author's hypotheses!
“Only half the health-related
studies funded by the European
Union between 1998 and 2006 -
an expenditure of €6 billion - led
to identifiable reports”!
4. Incentivizing individual contributor to share data
• Big science efforts!
o data is often better organized, reported and shared!
• Small independent efforts, yielding a rich variety of specialty data sets!
o Most of these data (such as null findings) is unpublished!
o These dark data hold a potential wealth of knowledge!
5. From made reproducible to born reproducible
“Reproducing the method took several months of effort, and
required using new versions and new software that posed
challenges to reconstructing and validating the results”
10. Data/reproducibility at NPG
Wang et al, Nature, 2013
doi:10.1038/nature12730
• Figure source data
o putting data behind figures/graphs
11. Data/reproducibility at NPG
• Figure source data
o putting data behind figures/graphs
• Data citation
o tackling both styling and format; monitoring community developments,
such the Data Citation Synthesis Group
• Code reproducibility
o peer review, availability and reuse
• NPG’s Linked Data release – CC0
• A new data journal
12. Role of data papers and data journals
• Incentive, credit for sharing!
• Peer review focus!
• Value of data vs. analysis!
• Discoverability and reusability!
13. market research (2011)
• What do researchers want from a data publications?
o 96% - increased visibility and discovery
o 95% - increased usability of their research data
o 93% - credit mechanism for deposit of data
o 80% - peer review of content/datasets
Respondent characteristics
387 respondents (329 active researchers
Physics (24%)
Earth and environmental science (21%)
Biology (20%)
Chemistry (19%)
Others (16%)
14. !
!
!
Helping you publish, discover and reuse research data
Credit for sharing
your data
Focused on reuse
and reproducibility
Peer reviewed,
curated
Promoting community
data and code
repositories
Open Access
• Currently covering life, natural and environmental
sciences!
• Big and small data!
o power of small data are in their aggregation and
integration with other datasets!
• New and previously published individual datasets,
curated collections and citizen science!
o a fuller, more in-depth look at the data processing
steps, additional data files, codes etc!
o tutorial-like information for scientists interested in
reusing or integrating the data with their own!
15. Introducing a new content type: Data Descriptor
Methods and technical analyses supporting the quality
of the measurements:!
What did I do to generate the data?!
How was the data processed?!
Where is the data?!
Who did what when!
How can the data be used or reused?!
Designed to make data
more discoverable,
interpretable
and reusable
16. Relation with traditional article - content
!
!
!
!
!
!
!
!
Scientific hypotheses:!
Synthesis!
Analysis!
Conclusions!
Methods and technical analyses supporting the quality
of the measurements:!
What did I do to generate the data?!
How was the data processed?!
Where is the data?!
Who did what when!
How can the data be used or reused?!
17. Relation with traditional article - time
Publish
Data!
AFTER: expand on your research articles, adding further information for reuse of the data
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
OR BEFORE !
18. Share your data, get credited and cited
!
!
!
!
!
!
!
!
!
Code in GitHub
!
!
!
!
!
!
!
!
!
Data in OpenfMRI
19. Data Descriptor: narrative and structure
!
!
!
Experimental metadata or !
structured component!
(in-house curated, machine-readable
formats)!
Article or !
narrative component!
(PDF and HTML) !
20. Data Descriptor: narrative
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Sections:!
• Title!
• Abstract!
• Background & Summary!
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
!
Joint Declaration of Data Citation Principles by the
Data Citation Synthesis Group
21. Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Sections:!
• Title!
• Abstract!
• Background & Summary!
In traditional publications this
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
information is not provided in a
sufficiently detailed manner
However this information is
essential for understanding,
reusing, and reproducing
datasets
!
Data Descriptor: narrative
22. Data Descriptor: structure (CC0)
In-house editorial curator:!
• assists users to submit the structured
content via simple templates and an
internal authoring tool!
• performs value-added semantic
annotation of the experimental
metadata!
For advanced users/service providers
willing to export ISA-Tab for direct
submission, we have released a technical
specification:!
Data file or !
record in a
database!
analysis !
method! script!
23. Adding value to research articles and data records
Research
papers
Descriptors
Data
Data
records
We currently recognize over
60 public data repositories!
!
25. Peer review process focused on quality and reuse!
Evaluation is not be based on the perceived impact !
or novelty of the findings or size of the data!
!
• Experimental rigour and technical data quality!
o Methodologically sound!
o Technical validation experiments and statistical analyses!
o Depth, coverage, size, and/or completeness of data sufficient for the types
of applications!
• Completeness of the description!
o Sufficient details to allow others to reproduce the results, reuse or
integrate it with other data!
o Compliance with relevant minimum information or reporting standards!
• Integrity of the data files and repository record!
o Data files match the descriptions in the Data Descriptor!
o Deposited in the most appropriate available databases!