The document discusses the vision of a "connected digital research enterprise" where researchers can more easily find and collaborate with others based on shared data and outputs. It describes a scenario where Researcher X discovers commonalities in data with Researcher Y, views Y's datasets and publications, and initiates a collaboration. Their joint work is captured and indexed, and a company utilizes some of the outputs while providing funding back to the researchers. The vision aims to more closely connect scientific work through shared digital resources.
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
This document discusses content standards for better describing scientific data. It notes that while some common features exist across domains, descriptions of experimental context are often inconsistent or duplicated. The author advocates for community-developed content standards to structure, enrich and report dataset descriptions and their experimental context to facilitate discovery, sharing, understanding and reuse of data. Standards should include minimum reporting requirements, controlled vocabularies and conceptual models to allow data to flow between systems. This will help enable better science from better described data.
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
- The document discusses the need for open and accessible data in research. It notes that over 50% of studies are not published due to selective reporting of results.
- There is a movement for "FAIR data" in life and medical sciences, where data is findable, accessible, interoperable, and reusable. However, not much data currently meets these standards.
- Publishers can play a role in incentivizing data sharing by implementing policies requiring data availability and format standards for publishing research. This includes supporting data citations and data journals.
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
Kathleen Fear, ICPSR, University of Michigan
“The impact of data reuse: a pilot study of 5 measures”
Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
This document discusses the need to make research data more discoverable and usable by connecting disparate data through metadata. Currently, the majority of research data is stored in isolated locations like personal hard drives, resulting in lost opportunities for analysis across experiments. The document advocates for culture change where researchers curate and share their data in centralized repositories to enable new insights from aggregating and comparing data in connected ways. This would help address challenges like variability between specimens and complexity in living systems that reductionist approaches cannot capture alone. Ensuring long-term sustainability of data repositories and defining roles for libraries and institutions are also discussed.
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
Some thoughts on successful data for the agricultural domain. Keynote at Linked Open Data in Agriculture
MACS-G20 Workshop in Berlin, September 27th and 28th, 2017 https://www.ktbl.de/inhalte/themen/ueber-uns/projekte/macs-g20-loda/lod/
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
This document discusses content standards for better describing scientific data. It notes that while some common features exist across domains, descriptions of experimental context are often inconsistent or duplicated. The author advocates for community-developed content standards to structure, enrich and report dataset descriptions and their experimental context to facilitate discovery, sharing, understanding and reuse of data. Standards should include minimum reporting requirements, controlled vocabularies and conceptual models to allow data to flow between systems. This will help enable better science from better described data.
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
- The document discusses the need for open and accessible data in research. It notes that over 50% of studies are not published due to selective reporting of results.
- There is a movement for "FAIR data" in life and medical sciences, where data is findable, accessible, interoperable, and reusable. However, not much data currently meets these standards.
- Publishers can play a role in incentivizing data sharing by implementing policies requiring data availability and format standards for publishing research. This includes supporting data citations and data journals.
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
Kathleen Fear, ICPSR, University of Michigan
“The impact of data reuse: a pilot study of 5 measures”
Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
This document discusses the need to make research data more discoverable and usable by connecting disparate data through metadata. Currently, the majority of research data is stored in isolated locations like personal hard drives, resulting in lost opportunities for analysis across experiments. The document advocates for culture change where researchers curate and share their data in centralized repositories to enable new insights from aggregating and comparing data in connected ways. This would help address challenges like variability between specimens and complexity in living systems that reductionist approaches cannot capture alone. Ensuring long-term sustainability of data repositories and defining roles for libraries and institutions are also discussed.
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
Some thoughts on successful data for the agricultural domain. Keynote at Linked Open Data in Agriculture
MACS-G20 Workshop in Berlin, September 27th and 28th, 2017 https://www.ktbl.de/inhalte/themen/ueber-uns/projekte/macs-g20-loda/lod/
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
This document discusses challenges and opportunities around research data management. It notes that while the majority of research data is currently stored locally on hard drives, funding agencies and researchers are increasingly focused on sharing, curating and ensuring long-term access to data. However, there are open questions around how to incentivize researchers to share data, ensure sustainable funding models for repositories, and develop interoperable metadata standards. The document explores potential roles for libraries, institutions, publishers and domain-specific repositories in addressing these issues.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
Lesson 1: Introduction to research data management. From a series of lectures from a 10-week, 2-credit graduate-level course in research data management (GRAD521, offered at Oregon State University).
The course description is: "Careful examination of all aspects of research data management best practices. Designed to prepare students to exceed funder mandates for performance in data planning, documentation, preservation and sharing in an increasingly complex digital research environment. Open to students of all disciplines."
Major course content includes: Overview of research data management, definitions and best practices; Types, formats and stages of research data; Metadata (data documentation); Data storage, backup and security; Legal and ethical considerations of research data; Data sharing and reuse; Archiving and preservation.
See also, "Whitmire, Amanda (2014): GRAD 521 Research Data Management Lectures. figshare. http://dx.doi.org/10.6084/m9.figshare.1003835. Retrieved 23:25, Jan 07, 2015 (GMT)"
Knowledge graph construction for research & medicinePaul Groth
1) Elsevier aims to build knowledge graphs to help address challenges in research and medicine like high drug development costs and medical errors.
2) Knowledge graphs link entities like people, concepts, and events to provide answers by going beyond traditional bibliographic descriptions.
3) Elsevier constructs knowledge graphs using techniques like information extraction from text, integrating data sources, and predictive modeling of large patient datasets to identify statistical correlations.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
This document provides an introduction to data management. It discusses why data management is important, covering key aspects like developing data management plans, file organization, documentation and metadata, storage and backup, legal and ethical considerations, sharing and reuse, and preservation. Effective data management is critical for research success as it supports reproducibility, sharing, and preventing data loss. The document outlines best practices and resources like the library that can help with developing strong data management strategies.
This document discusses challenges in managing large amounts of scientific data from various sources like experiments, simulations, literature, and archives. It proposes making all scientific data available online to increase scientific information sharing and productivity. Key steps discussed are data ingest, organization, modeling, integration with literature, documentation, curation and long-term preservation. The cloud is presented as a way to provide scalable access and analysis of large datasets.
The literature contains a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data reuse. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data.
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...Kudos
Scholars, scientists, academic institutions, publishers and funders are all interested in impact. We have different roles and goals, and therefore different reasons for needing to understand impact; we are therefore asking different questions about impact, and those questions continue to evolve, much as the concept of impact itself is evolving. To answer our different questions, do we need different data, in separate silos, or are we looking at the same data, from different angles? This session gathered researcher, library, publisher and metrics provider perspectives to consider who has an interest in impact, what data they are interested in, how they use it, and how the situation is evolving as e.g. business models and technical infrastructures shift.
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
This document discusses open access to research data and peer review of data publications. It notes that as a first step, data underpinning journal articles should be made concurrently available in accessible databases. The Royal Society report in 2012 advocated for all science literature and data to be online and interoperable. Key issues in linking data to the scientific record are data persistence, quality, attribution, and credit. The document provides examples from astronomy of data reuse leading to new publications and cites a study finding poor reproducibility of ecological data sets over time as data availability declines. It outlines different levels of research data from raw to processed to published and discusses initiatives for open data publication and peer review.
This document outlines best practices for creating research data. [1] It recommends using consistent data organization with standardized formats and descriptive file names. [2] Researchers should perform quality assurance checks and use scripted programs to analyze data while keeping notes. [3] All aspects of data collection and analysis should be thoroughly documented. Following these practices will improve data usability, sharing, and reproducibility.
This document summarizes Catriona MacCallum's presentation on data publishing at PLOS. The key points are:
1) PLOS requires authors to make all underlying data openly available without restriction, with rare exceptions. Authors must provide a Data Availability Statement describing compliance.
2) Over 47,000 PLOS papers have included a data statement. Most data is found within submission files or repositories like Dryad and Figshare. PLOS checks data accessibility and ensures anonymity of clinical datasets.
3) PLOS supports initiatives like CRediT for attributing research contributions and data citation principles for giving credit to data producers. PLOS is also involved in projects beyond traditional publishing like preprints and experimental
Recomendations for infrastructure and incentives for open science, presented to the Research Data Alliance 6th Plenary. Presenter: William Gunn, Director of Scholarly Communications for Mendeley.
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
Keynote at JISC Digifest 2015 on Reproducibility and Research Objects in Scholarly Communication
Includes hidden slides
All material except maybe the IT Crowd screengrab reusable
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
While the generation or collection of large, complex research datasets is becoming easier and less expensive all the time, researchers often lack the knowledge and skills that are necessary to properly manage them. Having these skills is paramount in ensuring data quality, integrity, discoverability, integration, reproducibility, and reuse over time. Librarians have been preserving, managing and disseminating information for thousands of years. As scholarly research is increasingly carried out digitally, and products of research have expanded from primarily text-based manuscripts to include datasets, metadata, maps, software code etc., it is a natural expansion of scope for libraries to be involved in the stewardship of these materials as well. This kind of evolution requires that libraries bring in faculty with new skills and collaborate more intimately with researchers during the research data lifecycle, and this is exactly what is happening in academic libraries across the country. In this webinar, two researchers-turned-data-specialists, both based in academic libraries, will share their experiences and perspectives on the development of research data services at their respective institutions. Each will share their perspective on the important role that libraries can play in helping researchers manage, preserve, and share their data.
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
dkNET provides a single portal for discovering over 3,500 biomedical research resources and datasets. It aims to make these resources findable, accessible, interoperable, and reusable in accordance with the FAIR principles. The portal contains three main sections for browsing community resources, additional resources, and literature. It utilizes faceted searching and provides analytics and notifications to help users track changes to resources over time.
This document discusses challenges and proposed solutions for improving data sharing, integration, and reuse in research. It outlines the current research data lifecycle and issues like a lack of linking between data and publications. A proposal is made for researchers to publish data in repositories under embargo and automatically notify funders, then link the data to publications. The document also describes efforts by organizations like FORCE11, the National Data Service, and RDA to improve data search, linking, and publishing through collaboration. Key areas discussed include electronic lab notebooks, data repositories, search, linking data to publications, and citation.
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
This document discusses challenges and opportunities around research data management. It notes that while the majority of research data is currently stored locally on hard drives, funding agencies and researchers are increasingly focused on sharing, curating and ensuring long-term access to data. However, there are open questions around how to incentivize researchers to share data, ensure sustainable funding models for repositories, and develop interoperable metadata standards. The document explores potential roles for libraries, institutions, publishers and domain-specific repositories in addressing these issues.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
Lesson 1: Introduction to research data management. From a series of lectures from a 10-week, 2-credit graduate-level course in research data management (GRAD521, offered at Oregon State University).
The course description is: "Careful examination of all aspects of research data management best practices. Designed to prepare students to exceed funder mandates for performance in data planning, documentation, preservation and sharing in an increasingly complex digital research environment. Open to students of all disciplines."
Major course content includes: Overview of research data management, definitions and best practices; Types, formats and stages of research data; Metadata (data documentation); Data storage, backup and security; Legal and ethical considerations of research data; Data sharing and reuse; Archiving and preservation.
See also, "Whitmire, Amanda (2014): GRAD 521 Research Data Management Lectures. figshare. http://dx.doi.org/10.6084/m9.figshare.1003835. Retrieved 23:25, Jan 07, 2015 (GMT)"
Knowledge graph construction for research & medicinePaul Groth
1) Elsevier aims to build knowledge graphs to help address challenges in research and medicine like high drug development costs and medical errors.
2) Knowledge graphs link entities like people, concepts, and events to provide answers by going beyond traditional bibliographic descriptions.
3) Elsevier constructs knowledge graphs using techniques like information extraction from text, integrating data sources, and predictive modeling of large patient datasets to identify statistical correlations.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
This document provides an introduction to data management. It discusses why data management is important, covering key aspects like developing data management plans, file organization, documentation and metadata, storage and backup, legal and ethical considerations, sharing and reuse, and preservation. Effective data management is critical for research success as it supports reproducibility, sharing, and preventing data loss. The document outlines best practices and resources like the library that can help with developing strong data management strategies.
This document discusses challenges in managing large amounts of scientific data from various sources like experiments, simulations, literature, and archives. It proposes making all scientific data available online to increase scientific information sharing and productivity. Key steps discussed are data ingest, organization, modeling, integration with literature, documentation, curation and long-term preservation. The cloud is presented as a way to provide scalable access and analysis of large datasets.
The literature contains a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data reuse. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data.
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...Kudos
Scholars, scientists, academic institutions, publishers and funders are all interested in impact. We have different roles and goals, and therefore different reasons for needing to understand impact; we are therefore asking different questions about impact, and those questions continue to evolve, much as the concept of impact itself is evolving. To answer our different questions, do we need different data, in separate silos, or are we looking at the same data, from different angles? This session gathered researcher, library, publisher and metrics provider perspectives to consider who has an interest in impact, what data they are interested in, how they use it, and how the situation is evolving as e.g. business models and technical infrastructures shift.
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
This document discusses open access to research data and peer review of data publications. It notes that as a first step, data underpinning journal articles should be made concurrently available in accessible databases. The Royal Society report in 2012 advocated for all science literature and data to be online and interoperable. Key issues in linking data to the scientific record are data persistence, quality, attribution, and credit. The document provides examples from astronomy of data reuse leading to new publications and cites a study finding poor reproducibility of ecological data sets over time as data availability declines. It outlines different levels of research data from raw to processed to published and discusses initiatives for open data publication and peer review.
This document outlines best practices for creating research data. [1] It recommends using consistent data organization with standardized formats and descriptive file names. [2] Researchers should perform quality assurance checks and use scripted programs to analyze data while keeping notes. [3] All aspects of data collection and analysis should be thoroughly documented. Following these practices will improve data usability, sharing, and reproducibility.
This document summarizes Catriona MacCallum's presentation on data publishing at PLOS. The key points are:
1) PLOS requires authors to make all underlying data openly available without restriction, with rare exceptions. Authors must provide a Data Availability Statement describing compliance.
2) Over 47,000 PLOS papers have included a data statement. Most data is found within submission files or repositories like Dryad and Figshare. PLOS checks data accessibility and ensures anonymity of clinical datasets.
3) PLOS supports initiatives like CRediT for attributing research contributions and data citation principles for giving credit to data producers. PLOS is also involved in projects beyond traditional publishing like preprints and experimental
Recomendations for infrastructure and incentives for open science, presented to the Research Data Alliance 6th Plenary. Presenter: William Gunn, Director of Scholarly Communications for Mendeley.
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
Keynote at JISC Digifest 2015 on Reproducibility and Research Objects in Scholarly Communication
Includes hidden slides
All material except maybe the IT Crowd screengrab reusable
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
While the generation or collection of large, complex research datasets is becoming easier and less expensive all the time, researchers often lack the knowledge and skills that are necessary to properly manage them. Having these skills is paramount in ensuring data quality, integrity, discoverability, integration, reproducibility, and reuse over time. Librarians have been preserving, managing and disseminating information for thousands of years. As scholarly research is increasingly carried out digitally, and products of research have expanded from primarily text-based manuscripts to include datasets, metadata, maps, software code etc., it is a natural expansion of scope for libraries to be involved in the stewardship of these materials as well. This kind of evolution requires that libraries bring in faculty with new skills and collaborate more intimately with researchers during the research data lifecycle, and this is exactly what is happening in academic libraries across the country. In this webinar, two researchers-turned-data-specialists, both based in academic libraries, will share their experiences and perspectives on the development of research data services at their respective institutions. Each will share their perspective on the important role that libraries can play in helping researchers manage, preserve, and share their data.
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
dkNET provides a single portal for discovering over 3,500 biomedical research resources and datasets. It aims to make these resources findable, accessible, interoperable, and reusable in accordance with the FAIR principles. The portal contains three main sections for browsing community resources, additional resources, and literature. It utilizes faceted searching and provides analytics and notifications to help users track changes to resources over time.
This document discusses challenges and proposed solutions for improving data sharing, integration, and reuse in research. It outlines the current research data lifecycle and issues like a lack of linking between data and publications. A proposal is made for researchers to publish data in repositories under embargo and automatically notify funders, then link the data to publications. The document also describes efforts by organizations like FORCE11, the National Data Service, and RDA to improve data search, linking, and publishing through collaboration. Key areas discussed include electronic lab notebooks, data repositories, search, linking data to publications, and citation.
The document discusses the THOR project, which aims to place Persistent Identifiers (PIDs) at the fingertips of researchers and integrate them into existing research services and outputs. The goals are to uniquely attribute work to researchers and make PID use the default across the research lifecycle. The project focuses on biological sciences, earth sciences, physical sciences, social sciences, and humanities. It provides examples of how PIDs can improve credit for researchers, discoverability and reuse of data and publications, demonstrate value for data centers, improve evidence for publishers, and measure impact for funders.
Alain Frey Research Data for universities and information producersIncisive_Events
Research data is growing exponentially but is disparate and challenging to understand fully. Universities face challenges in managing research data to meet funding and standards requirements. Thomson Reuters launched the Data Citation Index to make research data discoverable, accessible, and citable by bringing important data from diverse repositories into one searchable index. This addresses the need for a single access point for quality research data across disciplines and locations.
ODIN: Connecting research and researchersSergio Ruiz
The document discusses the ODIN project which aims to explore opportunities for linking ORCID and DataCite identifiers to support open science. It notes that ORCID and DataCite are emerging as participative initiatives that could play a significant role in underpinning a sustainable persistent identifier e-infrastructure. The project conducted a gap analysis and developed a roadmap. It carried out proofs-of-concept in the humanities/social sciences and high energy physics domains. The second year of ODIN focused on promoting adoption of ORCID and DataCite, encouraging interoperability with other systems, establishing workflows for specific domains, and exploring common approaches across domains.
Why would a publisher care about open data?Anita de Waard
A publisher would care about open data for several reasons:
1) Open data increases the value of all parts of the web by allowing programs, not just people, to utilize the data through interconnecting and joining it.
2) Publishers are evolving from linear supply chains focused on content delivery to users, to becoming marketplaces that optimize the number of interactions between users through networked open science.
3) The future of publishing involves networked open science where data is openly accessible, annotated with metadata, and linked together in research objects, increasing findability, accessibility, interoperability, and reusability of research outputs.
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
This document discusses research data management and curation. It describes how data sharing has increased as open science mandates have promoted data availability. Research data is now often shared alongside research articles through bi-directional linking. Self-curation repositories are being developed to help researchers publish and share their data. The benefits of open access include increased visibility, new discoveries through wider collaboration, and compliance with funder mandates. Key requirements for open data include availability, access, redistribution and reuse. Dataverse is presented as a solution for research data management that facilitates data sharing, preservation, citation, exploration and analysis. It issues persistent identifiers and supports various data formats and protocols. Challenges of data management include meaningful aggregation, privacy concerns
FAIR for the future: embracing all things dataARDC
FAIR for the future: embracing all things data - Natasha Simons, Keith Russell and Liz Stokes, presented at Taylor & Francis Scholarly Summits in Sydney 11 Feb 2019 and Melbourne 14 Feb 2019.
Paper was presented at European Survey Research Association 2013, in the session Research Data Management for Re-use: Bringing Researchers and Archivists closer.
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...Susanna-Assunta Sansone
Part of the SciDataCon14 workshop on "Data Papers and their applications" run by myself and Brian Hole to help attendees understand current data-publishing journals and trends and help them understand the editorial processes on NPG's Scientific Data and Ubiquity's Open Health Data.
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
UKOLN advocates that libraries take seven steps to support data management and open science in the data decade:
1) Provide briefings on cloud data services in partnership with IT services.
2) Build usable data management tools in partnership with researchers.
3) Develop data sustainability strategies and articulate the costs and benefits.
4) Publish case studies on open science to show benefits of universal data sharing.
5) Present at university ethics committees to highlight open data issues.
6) Raise awareness of citizen science opportunities and guidelines for good practice.
7) Promote data citation and attribution to embed in publication practice.
This document summarizes Susanna-Assunta Sansone's presentation on open access and open data at Nature Publishing Group. Some key points discussed include:
- The benefits of open data including reducing errors/fraud and increasing return on investment in research. However, barriers also exist such as lack of incentives and standards.
- Recent initiatives at NPG to improve data/reproducibility such as requiring data behind figures and expanding methods sections.
- The role of data journals in increasing credit/visibility for shared data and promoting standards/best practices.
- Market research found researchers want increased visibility, usability, and credit for sharing their data.
The document summarizes the Jisc Managing Research Data Programme which aims to support universities in improving research data management. It discusses why managing research data is important, highlighting funder policies and the benefits of open data. It provides an overview of Jisc's activities including training projects, guidance resources, and funding for institutional infrastructure services and repositories. The presentation emphasizes the importance of institutional policies, support services, skills development and cultural change to effectively manage research data in line with funder expectations.
This document summarizes a presentation on open science and open data. It discusses the importance of open research data for reproducibility and innovation. It outlines key policy developments promoting open data, including funder data policies and journal data policies. It also describes CODATA's activities related to data policies, frameworks for developing open data strategies, and components of the international open science ecosystem.
Similar to On community-standards, data curation and scholarly communication - BITS, Italy, 2016 (20)
This document summarizes a presentation given by Susanna Sansone at the GSC 23rd meeting education day in Bangkok, Thailand on August 7, 2023. The presentation discussed standards across life sciences, including definitions of different types of standards and over 1,600 identified standards. It covered standard organizations and grassroots groups, as well as the FAIRsharing database which catalogs over 2,885 standards and databases and aims to promote their use and value across research.
The FAIRsharing journey in RDA document discusses:
1) FAIRsharing's growth and involvement with RDA since 2011, including its Working Group established in 2015 to curate standards, databases, and policies to promote FAIR data.
2) FAIRsharing's current activities and impact, such as its registry of over 4,000 records from many disciplines and usage in various tools and services.
3) Opportunities for further engagement with RDA, such as leveraging their expertise for contributions to the FAIR Cookbook, an open resource providing technical recipes for applying FAIR principles to life science data.
Overview of metadata standards, and how FAIRsharing and the FAIR Cookbook help selecting and using them. Presentation to the What is metadata? Common standards and properties. EHP Workshop, November 9, 2022: https://ephconference.eu/pre-conference-programme-441
Pharmas and academia are joining forces to make data FAIR (Findable, Accessible, Interoperable, and Reusable) through the development of the FAIR Cookbook. The FAIR Cookbook provides over 70 recipes and growing that give step-by-step guidance on improving the FAIRness of different data types through the use of tools, technologies, and best practices. It aims to provide practical examples and guidelines to support researchers, data managers, and others in managing data according to FAIR principles. The FAIR Cookbook is an open, community-developed resource overseen by an editorial board, with contributions from nearly 100 life sciences professionals.
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
Overview of FAIR, FAIRsharing and the FAIR Cookbook at the ATI event on Knowledge Graphs: https://github.com/turing-knowledge-graphs/meet-ups/blob/main/symposium-2022.md
Presentation to the EOSC workshop on policies (https://www.google.com/url?q=https://eoscfuture.eu/eventsfuture/monitoring-eosc-readiness-fair-data-policies) on what FAIRsharing does for policies, including providing registration, discovery, flexible and clearer descriptions, relationships, machine readability and comparability.
The document summarizes how FAIRsharing assists others with promoting FAIR data principles without directly assessing FAIRness compliance. It does this by (1) providing a lookup service for standards and repositories via its API, (2) serving as a registry for FAIRness tests and indicators to make them discoverable, and (3) enabling communities to create profiles declaring which standards and repositories they use. The document also outlines FAIRsharing's operations, advisory boards, and future plans to further support assessment and tracking of FAIRness improvements over time.
ELIXIR is a European infrastructure that brings together life science resources from across Europe. It offers databases, tools, computing capabilities, and training opportunities. ELIXIR nodes provide these services and connect national data infrastructures. ELIXIR communities connect infrastructure experts to drive service developments. ELIXIR is funded through a mixed model including public sources. It works to sustain important biological data resources and make data FAIR through recommended standards and interoperability resources. ELIXIR also aims to develop a sustainable tools ecosystem and provides training through its portal.
Presentation to the EC Workshop on Maximizing investments in health research: FAIR data for a coordinate COVID-19 response. Workshop III, November 8, 2021.
Presentation to the EC Workshop on Maximizing investments in health research: FAIR data for a coordinate COVID-19 response. Workshop I, October 11, 2021.
The FAIR Cookbook poster, as presented at the ELIXIR-UK Node and the UK Conference of Bioinformatics and Computational Biology 2021: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21
The FAIR Cookbook poster, as presented at the UK Conference of Bioinformatics and Computational Biology 2021: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21
Generative Classifiers: Classifying with Bayesian decision theory, Bayes’ rule, Naïve Bayes classifier.
Discriminative Classifiers: Logistic Regression, Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Attribute selection measures- Gini impurity; Entropy, Regularization Hyperparameters, Regression Trees, Linear Support vector machines.
Enhanced data collection methods can help uncover the true extent of child abuse and neglect. This includes Integrated Data Systems from various sources (e.g., schools, healthcare providers, social services) to identify patterns and potential cases of abuse and neglect.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
On community-standards, data curation and scholarly communication - BITS, Italy, 2016
1. On community-standards, data curation and
scholarly communication
Susanna-Assunta Sansone, PhD
@SusannaASansone
13th Annual Meeting of the Bioinformatics Italian Society, University of Salerno, Italy, 15-17 June 2016.
Data Consultant,
Founding Academic Editor
Associate Director,
Principal Investigator
Member,
Executive Committee
2. • Better data better science – the FAIR meme
• Publication of digital research outputs – why it matters
• Interoperability standards – as enablers
Outline
3. Research as a Connected Digital Enterprise aka The Commons
• Researcher X is automatically made aware of researcher Y through commonalities
in their respective data located in the Commons.
The vision - P. Bourne (NIH Associate Director for Data Science)
4. Research as a Connected Digital Enterprise aka The Commons
• Researcher X is automatically made aware of researcher Y through commonalities
in their respective data located in the Commons.
• Research X locates the researcher Y’s data sets with their associated usage
statistics, navigates to the associated publications and starts to explore various
ideas to engage with researcher Y and their research network.
The vision - P. Bourne (NIH Associate Director for Data Science)
5. Research as a Connected Digital Enterprise aka The Commons
• Researcher X is automatically made aware of researcher Y through commonalities
in their respective data located in the Commons.
• Research X locates the researcher Y’s data sets with their associated usage
statistics, navigates to the associated publications and starts to explore various
ideas to engage with researcher Y and their research network.
• A fruitful collaboration ensues and they generate publications, data sets and
software; their output is captured in PubMed and the Commons, and is indexed by
the data and software catalogs.
The vision - P. Bourne (NIH Associate Director for Data Science)
6. Research as a Connected Digital Enterprise aka The Commons
• Researcher X is automatically made aware of researcher Y through commonalities
in their respective data located in the Commons.
• Research X locates the researcher Y’s data sets with their associated usage
statistics, navigates to the associated publications and starts to explore various
ideas to engage with researcher Y and their research network.
• A fruitful collaboration ensues and they generate publications, data sets and
software; their output is captured in PubMed and the Commons, and is indexed by
the data and software catalogs.
• Company Z identifies relevant data and software that, based on the metrics from
the catalogs, have utilization above a threshold indicating that those data and
software are heavily utilized by the community.
The vision - P. Bourne (NIH Associate Director for Data Science)
7. Research as a Connected Digital Enterprise aka The Commons
• Researcher X is automatically made aware of researcher Y through commonalities
in their respective data located in the Commons.
• Research X locates the researcher Y’s data sets with their associated usage
statistics, navigates to the associated publications and starts to explore various
ideas to engage with researcher Y and their research network.
• A fruitful collaboration ensues and they generate publications, data sets and
software; their output is captured in PubMed and the Commons, and is indexed by
the data and software catalogs.
• Company Z identifies relevant data and software that, based on the metrics from
the catalogs, have utilization above a threshold indicating that those data and
software are heavily utilized by the community. An open source version remains, but
the company adds services on top of the software and revenue flows back to the
labs of researchers X and Y which is used to develop new innovative software for
open distribution.
The vision - P. Bourne (NIH Associate Director for Data Science)
8. Research as a Connected Digital Enterprise aka The Commons
• Researcher X is automatically made aware of researcher Y through commonalities
in their respective data located in the Commons.
• Research X locates the researcher Y’s data sets with their associated usage
statistics, navigates to the associated publications and starts to explore various
ideas to engage with researcher Y and their research network.
• A fruitful collaboration ensues and they generate publications, data sets and
software; their output is captured in PubMed and the Commons, and is indexed by
the data and software catalogs.
• Company Z identifies relevant data and software that, based on the metrics from
the catalogs, have utilization above a threshold indicating that those data and
software are heavily utilized by the community. An open source version remains, but
the company adds services on top of the software and revenue flows back to the
labs of researchers X and Y which is used to develop new innovative software for
open distribution.
• Researchers X and Y provide hands-on advice in the use of their new version and
their course is offered as a MOOC (Massive Open Online Courses).
The vision - P. Bourne (NIH Associate Director for Data Science)
9. Research as a Connected Digital Enterprise aka The Commons
The vision - P. Bourne (NIH Associate Director for Data Science)
https://datascience.nih.gov/commons
10. A Data Discovery Index prototype that:
• Helps users find and access shared data
• Interoperates in the NIH Commons
16. “Over 50% of completed studies in biomedicine do not
appear in the published literature….Often because
results do not conform to author's hypotheses”
“Only half the health-related studies funded by the
European Union between 1998 and 2006 - an
expenditure of €6 billion - led to identifiable reports”
Selective reporting is still an unfortunate practice
• Small independent efforts, yielding a rich variety of specialty data sets
o Most of these data (such as null findings) is unpublished
o These dark data hold a potential wealth of knowledge
17. • Researchers still lack of or insufficient motivations
• Hypothesis-confirming results get prioritized
• Agreements, disagreements and timing
• Loose requirements and monitoring by journals and
funders
But why?
18. • Most researchers are
sharing data, and using the
data of others
• Direct contact* between
researchers (on request) is
a common way of sharing
data
• Repositories are second
most common method of
sharing
Kratz JE, Strasser C (2015) Researcher Perspectives on Publication and Peer Review of Data. PLoS ONE 10(2): e0117619.
Current approaches to sharing
* Data associated with published works disappears at a rate of ~17% per year (Vines et al. 2014, doi:10.1016/j.cub.2013.11.014
Datasets not referenced in a manuscript are essentially invisible and data producers do not get appropriate credit for their work
19. • Outputs are multi-dimensional, not always well cited, stored
o Software, codes, workflows are hard(er) to get hold of
• Poorly described for third party reuse
o Different level of details and annotation
• Curation activities are perceived as time consuming
o Collection and harmonization of detailed methods and
experimental steps is done/rushed at publication stage
Shared data is not always understandable, reusable
20. A B C D E
1 Group1 Group2
2 Day 0
3 Sodium 139 142
4 Potassium 3.3 4.8
5 Chloride 100 108
6 BUN 18 18
7 Creatine 1.2 1.2
8 Uric acid 5.5* 6.2*
9 Day 7
10 Sodium 140 146
11 Potassium 3.4 5.1
12 Chloride 97 108
S1Sh.cuo
Sharing starts with good metadata…
Credit to: Iain Hrynaszkiewicz
21. A B C D E
1 Group1 Group2
2 Day 0
3 Sodium 139 142
4 Potassium 3.3 4.8
5 Chloride 100 108
6 BUN 18 18
7 Creatine 1.2 1.2
8 Uric acid 5.5* 6.2*
9 Day 7
10 Sodium 140 146
11 Potassium 3.4 5.1
12 Chloride 97 108
S1Sh.cuo Meaningless
column titles
Special characters
can cause text
mining errors
No units
Unhelpful
document name
Undefined
abbreviation
Formatting for
information that
should be in
metadata
….…but this not!
Credit to: Iain Hrynaszkiewicz
22. A B C D E F
1 Parameter Day Control Treated Units P
2 Sodium 0 139 142 mEq/l 0.82
3 Sodium 7 140 146 mEq/l 0.70
4 Sodium 14 140 158 mEq/l 0.03
5 Sodium 21 143 160 mEq/l 0.02
6 Potassium 0 3.3 4.8 mEq/l 0.06
7 Potassium 7 3.4 5.1 mEq/l 0.07
8 Potassium 14 3.7 4.7 mEq/l 0.10
9 Potassium 21 3.1 3.6 mEq/l 0.52
10 Chloride 0 100 108 mEq/l 0.56
11 Chloride 7 97 108 mEq/l 0.68
12 Chloride 14 101 106 mEq/l 0.79
Table_S1_Shanghai_blood.xls
….this is much clearer!
Credit to: Iain Hrynaszkiewicz
24. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
2
4
…breadth and depth of the
context is pivotal…
…including capturing
experimental design and
statistical analysis
25. Among these, publishers occupy a
leverage point, because of importance of
formal publications in the academic
incentive structure
Stakeholders mobilizations, old and new driving forces
26. • Incentive, credit for sharing
o Big and small data
o Unpublished data
o Long tail of data
o Curated aggregation
• Peer review of data
• Value of data vs. analysis
• Discoverability and reusability
o Complementing community
databases
Growing number of data papers and data journals
27. nature.com/scientificdataHonorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Varsha Khodiyar
Publisher
Iain Hrynaszkiewicz
A new open-access, online-only publication for
descriptions of scientifically valuable datasets
Supported by
28. A new article type
A new category of publication that provides detailed
descriptors of scientifically valuable datasets
Mandates open data, without unnecessary
restrictions, as a condition of submission
30. Scientific hypotheses:
Synthesis
Analysis
Conclusions
Methods and technical analyses supporting the quality
of the measurements:
What did I do to generate the data?
How was the data processed?
Where is the data?
Who did what when
Relation with traditional articles – content
32. Experimental metadata or
structured component
(in-house curated, machine-
readable formats)
Article or
narrative component
(PDF and HTML)
Data Descriptors has two components
33. The Data Curation Editor is responsible for creating and
curating the machine-readable structured component
• Enables browsing and searching the articles
• Facilitates links to related journal articles and repository
records
Curation and discoverability
34. Created with the input of the
authors, includes value-added
semantic annotation of the
experimental metadata
analysis
method script
Data file or
record in a
database
Data Descriptors: structured component
39. “The Data Descriptor made it easier to use
the data, for me it was critical that everything
was there…all the technical details like voxel
size.”
Professor Daniele Marinazzo
Why data papers? Data reuse is easier!
Credit to: Varsha Khodiyar
41. • Better data better science – the FAIR meme
• Publication of digital research outputs – why it matters
• Interoperability standards – as enablers
Outline
42. de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
• To structure, enrich and report the description of the datasets and the
experimental context under which they were produced
• To facilitate discovery, sharing, understanding and reuse of datasets
Community-developed content standards
43. de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
Content standards as enabler for better described data
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
45. Is there a database, implementing
standards, where to deposit my
metagenomics dataset?
My funder’s data sharing policy
recommends the use of
established standards, but
which ones are widely
endorsed and applicable to my
toxicological and clinical data?
Am I using the most up-to-date
version of this terminology to
annotate cell-based assays?
I understand this format has been
deprecated; what has been replaced
by and how is leading the work?
Are there databases implementing
this exchange format, whose
development we have funded?
What are the mature
standards and
standards-compliant
databases we should
recommend to our
authors?
But how do we help users to make informed decisions?
46. A web-based, curated and searchable registry ensuring that
standards and databases are registered, informative and
discoverable; monitoring development and evolution of standards,
their use in databases and adoption of both in data policies
An informative and educational resource
1,400 records and growing
48. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
Tracking evolution, e.g. deprecations and substitutions
49. Model/format formalizing reporting guideline -->
<-- Reporting guideline used by model/format
Cross-linking standards to standards and databases
57. Philippe
Rocca-Serra, PhD
Senior Research Lecturer
Alejandra
Gonzalez-Beltran, PhD
Research Lecturer
Milo
Thurston, DPhD
Research Software Engineer
Massimiliano
Izzo, PhD
Research Software Engineer
Peter
McQuilton, PhD
Knowledge Engineer
Allyson
Lister, PhD
Knowledge Engineer
Eamonn
Maguire, DPhil
Software Engineer contractor
David
Johnson, PhD
Research Software Engineer
Susanna-Assunta Sansone, PhD
Principal Investigator, Associate Director
We also acknowledge our network of collaborators
in the following active projects: H2020 PhenoMeNal,
H2020 ELIXIR-EXCELERATE, H2020 MultiMot,
NIH bioCADDIE, NIH CEDAR and IMI eTRIKS