This document summarizes roles for libraries in providing research data management services. It describes data services at the University of Oregon Library including consultations, education workshops, and developing data management web pages. It discusses support for documentation provided by the University of Idaho Library through instruction sessions, research consultations, and emphasizing good documentation practices. It outlines data management trainings provided by Oregon Health & Science University Library including workshops with researchers, individual consultations, and developing new data services.
The MIAPA ontology: An annotation ontology for validating minimum metadata re...Hilmar Lapp
This document describes the MIAPA (Minimum Information About a Phylogenetic Analysis) ontology, which was developed to standardize the annotation and reporting of metadata for phylogenetic analyses. The MIAPA ontology reuses terms from existing ontologies and is designed according to OBO Foundry best practices. It provides a standard way to annotate key information about phylogenetic tree topologies, operational taxonomic units, branch lengths, character matrices, alignment and tree inference methods. The goal is to facilitate increased access to and reuse of phylogenetic data through consistent annotation of published trees according to the MIAPA standard.
Managing data throughout the research lifecycleMarieke Guy
This document summarizes a presentation about managing data throughout the research lifecycle. It discusses the stages of the research lifecycle, including planning, data creation, documentation, storage, sharing, and preservation. It provides examples of research lifecycle models and addresses key questions to consider at each stage, such as what formats to use, how to document data, where to store it, and how to share and preserve it. The presentation emphasizes making informed decisions about data management and talking to colleagues for support and advice.
This document summarizes a webinar about data management plans and tools. It discusses what a data management plan is, why researchers should create one, and what funders like NSF require in a DMP. It also introduces several tools like the DMPTool that can help researchers create DMPs and describes the goals and development of the DMPTool project. Finally, it provides information on how researchers can participate and give feedback to improve DMP tools.
The document provides logistics for a webinar on data curation profiles and the DMPTool. It includes instructions for calling into the audio, asking questions in the chat, and finding recordings and slides. The webinar will discuss the history of data curation profiles, comparing them to data management plans, and a case study of using data curation profiles. Data curation profiles involve interviewing researchers about their data practices and needs in order to understand how to support them, while data management plans focus on requirements for funding. Both tools can help librarians engage with researchers, though data curation profiles provide a more in-depth understanding of researchers' full data lifecycles.
1) Producing life sciences linked open data presents challenges as biologists want to publish and control their data but providing query and analysis services is expensive. They need technical assistance and funding support.
2) Consuming linked data in life sciences means connecting data to existing standards like pathways and proteins. Data analysis, mining, crawling and reasoning services are needed but expensive for individual database owners.
3) Scalability issues arise when reasoning over complex ontologies like BioPAX Level 3 with large datasets, as state-of-the-art reasoners cannot handle inconsistencies or provide query endpoints for such data.
Florida State University (FSU) entered into a formal digital preservation strategy agreement with Florida Digital Archive (FDA) in 2009. However, prior to joining FDA, FSU requested permission from FDA to develop a plan to preserve a faculty member's research data. FDA agreed to allow the development of a FSU demo preservation of FSU Biological Scientist, Dr. A.K.S.K. Prasad, images of biological silica collection which was later presented in several national and international conference presentations.
This talk will include oral history and a presentation detailing the development of FSU utilizing locally developed preservation strategy of DAITSS, known as Dark Archive in the Sunshine State, starting with demo preservation of faculty research data which was later used to influence senior management to join FDA.
The MIAPA ontology: An annotation ontology for validating minimum metadata re...Hilmar Lapp
This document describes the MIAPA (Minimum Information About a Phylogenetic Analysis) ontology, which was developed to standardize the annotation and reporting of metadata for phylogenetic analyses. The MIAPA ontology reuses terms from existing ontologies and is designed according to OBO Foundry best practices. It provides a standard way to annotate key information about phylogenetic tree topologies, operational taxonomic units, branch lengths, character matrices, alignment and tree inference methods. The goal is to facilitate increased access to and reuse of phylogenetic data through consistent annotation of published trees according to the MIAPA standard.
Managing data throughout the research lifecycleMarieke Guy
This document summarizes a presentation about managing data throughout the research lifecycle. It discusses the stages of the research lifecycle, including planning, data creation, documentation, storage, sharing, and preservation. It provides examples of research lifecycle models and addresses key questions to consider at each stage, such as what formats to use, how to document data, where to store it, and how to share and preserve it. The presentation emphasizes making informed decisions about data management and talking to colleagues for support and advice.
This document summarizes a webinar about data management plans and tools. It discusses what a data management plan is, why researchers should create one, and what funders like NSF require in a DMP. It also introduces several tools like the DMPTool that can help researchers create DMPs and describes the goals and development of the DMPTool project. Finally, it provides information on how researchers can participate and give feedback to improve DMP tools.
The document provides logistics for a webinar on data curation profiles and the DMPTool. It includes instructions for calling into the audio, asking questions in the chat, and finding recordings and slides. The webinar will discuss the history of data curation profiles, comparing them to data management plans, and a case study of using data curation profiles. Data curation profiles involve interviewing researchers about their data practices and needs in order to understand how to support them, while data management plans focus on requirements for funding. Both tools can help librarians engage with researchers, though data curation profiles provide a more in-depth understanding of researchers' full data lifecycles.
1) Producing life sciences linked open data presents challenges as biologists want to publish and control their data but providing query and analysis services is expensive. They need technical assistance and funding support.
2) Consuming linked data in life sciences means connecting data to existing standards like pathways and proteins. Data analysis, mining, crawling and reasoning services are needed but expensive for individual database owners.
3) Scalability issues arise when reasoning over complex ontologies like BioPAX Level 3 with large datasets, as state-of-the-art reasoners cannot handle inconsistencies or provide query endpoints for such data.
Florida State University (FSU) entered into a formal digital preservation strategy agreement with Florida Digital Archive (FDA) in 2009. However, prior to joining FDA, FSU requested permission from FDA to develop a plan to preserve a faculty member's research data. FDA agreed to allow the development of a FSU demo preservation of FSU Biological Scientist, Dr. A.K.S.K. Prasad, images of biological silica collection which was later presented in several national and international conference presentations.
This talk will include oral history and a presentation detailing the development of FSU utilizing locally developed preservation strategy of DAITSS, known as Dark Archive in the Sunshine State, starting with demo preservation of faculty research data which was later used to influence senior management to join FDA.
Investigating plant systems using data integration and network analysisCatherine Canevet
The document discusses challenges in integrating plant data from multiple sources and proposes solutions. It notes that plant data is sparse, distributed across many databases in various formats, and focused primarily on the model plant Arabidopsis. Data integration is necessary to address key biological questions by consolidating information from pathway databases, gene annotations, protein interactions, and more. The document outlines approaches to data integration including controlled vocabularies, ontologies, data standards, and integration applications specifically designed to combine data sources like Ondex. Effective integration is important to fully leverage available plant data.
The NIH as a Digital Enterprise: Implications for PAGPhilip Bourne
The document discusses the NIH's vision of becoming a digital enterprise to enhance biomedical research. It outlines how research is becoming more digital and data-driven. The NIH aims to foster open sharing of data and tools through its Commons platform to facilitate collaboration and reproducibility. It also stresses the importance of training the next generation of data scientists to enable the digital enterprise. The end goal is to accelerate discovery and improve health outcomes through more integrated and data-driven research.
Spring 2014 Data Management Lab: Session 1 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Data management plans existed long before the NSF started requiring them. DMPs have inherent value despite their being relatively unknown to researchers until now. Proper, thorough data management plans are potentially a major time saver and a huge asset for the project. In this webinar, we will cover how to go beyond funder requirements and develop more thorough data DMPs The Gulf of Mexico Research Initiative requires an extensive data management plan for projects it funds; we will hear about their efforts and how they are planning to use the DMPTool going forward.
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
The document summarizes data science education resources developed by researchers at Oregon Health & Science University. It describes the challenges of managing vast amounts of biomedical data and the goal of providing training to address this issue. The team developed skills courses and open educational resources (OERs) on topics across the data science life cycle. Courses included introductory, advanced, and targeted workshops. OER modules covered a range of data science topics and mapped to competencies for health sciences librarians. The team seeks to disseminate the resources broadly while addressing challenges around customization for different users and protection of intellectual property.
Couture Curricula - BD2K Data Science Tailored to Your NeedsNicole Vasilevsky
Poster presentation at Force2016 (https://www.force11.org/meetings/force2016) describing Big Data to Science (BD2K) efforts at Oregon Health & Science University.
On the Reproducibility of Science: Unique Identification of Research Resourc...Nicole Vasilevsky
Poster presentation at the Data Information Literacy Symposium at Purdue University in Indiana, Sept. 2013. This study is published here: https://peerj.com/articles/148/
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA.
Abstract
In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
This is the slidedeck for my ACRL 2015 TechConnect Presentation with Nicole Vasilevsky (OHSU). For more on the program see - <a>http://bit.ly/1xcQbCr</a>.
Empowering patients by increasing accessibility to clinical terminologyNicole Vasilevsky
Flash talk at Medical Library Association Pacific Northwest Chapter meeting in Portland, OR on October 18, 2016.
http://pnc-mla.cloverpad.org/annual2016
Authors: Erin Foster, Mark Engelstad, Chris Mungall, Peter Robinson, Sebastian Kohler, Melissa Haendel and Nicole Vasilevsky
The Role of Libraries in Data Management and CurationNicole Vasilevsky
The Role of Libraries in Data Management and Curation, presented at the American Library Association conference in Las Vegas, NV, 07/29/14.
Abstract:
As increasing amounts of data are being generated, applying best practices in handling data is important, and librarians are well poised to assist users. During this session, we will discuss the role of libraries in assisting with data management, application of metadata, ontologies, data standards, and the publication of data in repositories and on the Semantic Web. This talk will describe best data practices and engage the attendees in interactive activities to demonstrate these principles.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
This document provides an introduction to research data management for humanities and social sciences librarians. It discusses why data management is an important part of a librarian's role in supporting faculty research, and some key concepts in data management including data formats, storage, security, preservation, and sharing. The document emphasizes that while librarians do not need to be data experts, having a basic understanding of data management concepts can help librarians better serve faculty research needs and expand their role on campus.
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
This document summarizes a presentation on research data management for social and behavioral sciences and humanities. The presentation covered topics such as what data management is, why it is important to manage and share data, how to create data management plans, organize data files through naming conventions and folder structures, describe data through metadata and codebooks, issues around data ownership, and data storage, archiving and sharing options. The presentation was aimed at providing guidance to researchers at the University of Utah on best practices for managing and sharing their research data.
Presenters : Libbie Stephenson, Jared Lyle
This session discusses the value of and methods for curating data, especially in light of recent government and academic initiatives. Special attention will be paid to data management plans.
Librarians can provide valuable data management services to researchers on campus. An effective strategy includes surveying researchers to identify needs, communicating service offerings through workshops and consultations, and providing in-depth guidance on data management plans and long-term data preservation. Developing workshops involves setting learning objectives, evaluating content, and securing resources like space and food. Consultations allow librarians to help with specific topics like choosing file formats or finding metadata standards. Creating a data management plan requires detailing a data inventory, metadata description, long-term preservation and access methods. Trusted disciplinary repositories and use of stable identifiers help ensure long-term findability and access.
Investigating plant systems using data integration and network analysisCatherine Canevet
The document discusses challenges in integrating plant data from multiple sources and proposes solutions. It notes that plant data is sparse, distributed across many databases in various formats, and focused primarily on the model plant Arabidopsis. Data integration is necessary to address key biological questions by consolidating information from pathway databases, gene annotations, protein interactions, and more. The document outlines approaches to data integration including controlled vocabularies, ontologies, data standards, and integration applications specifically designed to combine data sources like Ondex. Effective integration is important to fully leverage available plant data.
The NIH as a Digital Enterprise: Implications for PAGPhilip Bourne
The document discusses the NIH's vision of becoming a digital enterprise to enhance biomedical research. It outlines how research is becoming more digital and data-driven. The NIH aims to foster open sharing of data and tools through its Commons platform to facilitate collaboration and reproducibility. It also stresses the importance of training the next generation of data scientists to enable the digital enterprise. The end goal is to accelerate discovery and improve health outcomes through more integrated and data-driven research.
Spring 2014 Data Management Lab: Session 1 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Data management plans existed long before the NSF started requiring them. DMPs have inherent value despite their being relatively unknown to researchers until now. Proper, thorough data management plans are potentially a major time saver and a huge asset for the project. In this webinar, we will cover how to go beyond funder requirements and develop more thorough data DMPs The Gulf of Mexico Research Initiative requires an extensive data management plan for projects it funds; we will hear about their efforts and how they are planning to use the DMPTool going forward.
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
The document summarizes data science education resources developed by researchers at Oregon Health & Science University. It describes the challenges of managing vast amounts of biomedical data and the goal of providing training to address this issue. The team developed skills courses and open educational resources (OERs) on topics across the data science life cycle. Courses included introductory, advanced, and targeted workshops. OER modules covered a range of data science topics and mapped to competencies for health sciences librarians. The team seeks to disseminate the resources broadly while addressing challenges around customization for different users and protection of intellectual property.
Couture Curricula - BD2K Data Science Tailored to Your NeedsNicole Vasilevsky
Poster presentation at Force2016 (https://www.force11.org/meetings/force2016) describing Big Data to Science (BD2K) efforts at Oregon Health & Science University.
On the Reproducibility of Science: Unique Identification of Research Resourc...Nicole Vasilevsky
Poster presentation at the Data Information Literacy Symposium at Purdue University in Indiana, Sept. 2013. This study is published here: https://peerj.com/articles/148/
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA.
Abstract
In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
This is the slidedeck for my ACRL 2015 TechConnect Presentation with Nicole Vasilevsky (OHSU). For more on the program see - <a>http://bit.ly/1xcQbCr</a>.
Empowering patients by increasing accessibility to clinical terminologyNicole Vasilevsky
Flash talk at Medical Library Association Pacific Northwest Chapter meeting in Portland, OR on October 18, 2016.
http://pnc-mla.cloverpad.org/annual2016
Authors: Erin Foster, Mark Engelstad, Chris Mungall, Peter Robinson, Sebastian Kohler, Melissa Haendel and Nicole Vasilevsky
The Role of Libraries in Data Management and CurationNicole Vasilevsky
The Role of Libraries in Data Management and Curation, presented at the American Library Association conference in Las Vegas, NV, 07/29/14.
Abstract:
As increasing amounts of data are being generated, applying best practices in handling data is important, and librarians are well poised to assist users. During this session, we will discuss the role of libraries in assisting with data management, application of metadata, ontologies, data standards, and the publication of data in repositories and on the Semantic Web. This talk will describe best data practices and engage the attendees in interactive activities to demonstrate these principles.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
This document provides an introduction to research data management for humanities and social sciences librarians. It discusses why data management is an important part of a librarian's role in supporting faculty research, and some key concepts in data management including data formats, storage, security, preservation, and sharing. The document emphasizes that while librarians do not need to be data experts, having a basic understanding of data management concepts can help librarians better serve faculty research needs and expand their role on campus.
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
This document summarizes a presentation on research data management for social and behavioral sciences and humanities. The presentation covered topics such as what data management is, why it is important to manage and share data, how to create data management plans, organize data files through naming conventions and folder structures, describe data through metadata and codebooks, issues around data ownership, and data storage, archiving and sharing options. The presentation was aimed at providing guidance to researchers at the University of Utah on best practices for managing and sharing their research data.
Presenters : Libbie Stephenson, Jared Lyle
This session discusses the value of and methods for curating data, especially in light of recent government and academic initiatives. Special attention will be paid to data management plans.
Librarians can provide valuable data management services to researchers on campus. An effective strategy includes surveying researchers to identify needs, communicating service offerings through workshops and consultations, and providing in-depth guidance on data management plans and long-term data preservation. Developing workshops involves setting learning objectives, evaluating content, and securing resources like space and food. Consultations allow librarians to help with specific topics like choosing file formats or finding metadata standards. Creating a data management plan requires detailing a data inventory, metadata description, long-term preservation and access methods. Trusted disciplinary repositories and use of stable identifiers help ensure long-term findability and access.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
This document discusses the need for critical infrastructure to promote data synthesis and evidence-based nutrient management. It outlines 10 steps for real-time data uptake, analysis, and customized nutrient recommendations. Key challenges include data standards, minimum data sets, provenance, and repositories. The Purdue University Research Repository is presented as a solution, providing preservation, curation, and publication of agricultural data. Hands-on support from librarians and agronomists is discussed to help researchers transition data and ensure best practices.
Immersive informatics - research data management at Pitt iSchool and Carnegie...Keith Webster
A joint presentation by Liz Lyon and Keith Webster on providing education for librarians engaged in research data management. This was delivered at Library Research Seminar VI, at the University of Illinois Urbana Champaign in September 2014. The presentation looks at a class delivered by Lyon at the University of Pittsburgh's iSchool in 2014, and the related needs for immersive training opportunities amongst experienced practicing librarians, using Carnegie Mellon University's library, led by Webster, as a case study.
1. The document discusses some early observations from the Associate Director for Data Science at the National Institutes of Health regarding data at NIH.
2. It notes that NIH does not fully understand how existing data is used, has focused more on why data should be shared rather than how to share it, and lacks plans for long-term sustainability of data.
3. Potential solutions discussed include developing a biomedical commons, modifying the review process, improving education in data science, and expanding the Big Data to Knowledge initiative. The goal is to create a digital research enterprise that better connects all aspects of the research lifecycle.
Scholars and researchers are being asked by an increasing number of research sponsors and journals to outline how they will manage and share their research data. This is an introduction to data management and sharing practices with some specific information for Columbia University researchers.
This document discusses issues related to attributing and citing scientific data. It addresses technical, scientific, institutional, and socio-cultural challenges. Key questions are outlined regarding data citation standards and practices. The roles of different actors in the research enterprise are also discussed. Effective data attribution requires consideration of provenance, ethics, discoverability, relationships between data, intellectual property issues, and policies. Metrics for data use must be grounded in scientific theory to ensure their validity and reliability.
Ginny Pannabecker, Life Science & Scholarly Communications Librarian at Virginia Tech, is an ACRL Science and Technology Section (STS) liaison to the American Institute of Biological Sciences (AIBS). This presentation shares key points for librarians and researchers from an AIBS workshop on "Changing Practices in Data Publications," which took place in December 2014 and involved representatives from federal funding agencies; publishers and librarians; scientific societies and journals; and data services / providers.
Meeting the NSF DMP Requirement June 13, 2012IUPUI
The document provides guidance on developing a data management plan (DMP) to meet requirements for National Science Foundation grant proposals. It discusses the context and rationale for federal data policies, defines the key elements required for a DMP, and provides examples of DMPs for different types of research data. The main points are: understanding the NSF data policy aims to increase research impact and data sharing/reuse; a DMP must address the types of data generated, metadata standards, data access/sharing plans, long-term preservation, and associated costs; and good planning helps ensure data remains accessible, usable and preserved into the future. Resources and guidance are available to help researchers develop robust and fundable DMPs.
Responsible conduct of research: Data ManagementC. Tobin Magle
A presentation for the Food and Nutrition Science Responsible conduct of research class on data management best practices. Covers material in the context of writing a data management plan.
Introduction to research data managementrds-wayne-edu
This document provides an introduction to research data management. It discusses why sharing and preserving data is important, including meeting funder requirements and enabling data reuse. It outlines common barriers to data sharing, such as time and lack of credit. The document then reviews data sharing policies from various funders and journals. It provides examples of National Science Foundation data management plans and ways to share data, such as through repositories, personal websites or data journals. Overall, the document aims to introduce best practices for managing, sharing and preserving research data.
Next generation data services at the Marriott LibraryRebekah Cummings
This document discusses next generation data services at the Marriott Library. It begins by asking how data needs in the social sciences and humanities may change over the next five years, and how libraries can partner with faculty on data needs. The document then discusses the library's role in data curation, challenges, and examples of data services like research data consultation, metadata assistance, and repository services. It provides examples of collaborations like embedded librarianship and a project with the UCLA Civil Rights Project to archive publications and datasets. The discussion emphasizes the changing landscape and growing importance of data sharing and management.
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Sarah Shreeves
The document discusses the challenges of curating scientific data from small research projects and laboratories. It provides examples of different types of data collected from various sciences, including biology, crystallography, LIDAR imaging, and crop studies. The data varies widely in size, format, access restrictions, and curation needs. Curation challenges include data heterogeneity, determining access and use policies, long-term preservation, and linking data to related publications and analyses. However, libraries are well-positioned to work with individual scientists and disciplines to negotiate data curation solutions and help make research outputs accessible over time.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Survey of research data management practices up2010digschol2011heila1
An analysis of data management practices at a large South African university was conducted through interviews with researchers and students to identify needs and challenges. The findings showed that while data collection methods vary, data storage is often ad hoc with no centralized support or resources. Researchers expressed a need for a central university server or repository for secure data storage and assistance with time constraints. It was concluded that a formal research data management program and staff support are needed to improve current practices.
Funding agencies are instituting requirements for data management and sharing as a condition of receiving research funds. This presentation addresses why researchers should care about research data management, what libraries have to do with it, and a case study of what one research specialist at the University of Colorado Anschutz Medical Campus is doing in this area.
This document summarizes a webinar about managing and preserving scientific data sets. It discusses the definition of science data according to the federal government, why science data is different than other data, current trends and challenges in digital preservation for science. It outlines several levels of digital preservation and provides examples of data being preserved. The webinar discusses the benefits of data management, such as supporting open access and future funding. It also describes existing problems around data management including lack of standards, resources and staffing. Potential solutions discussed include implementing research data management plans and using existing and upcoming tools to help with various stages of the research lifecycle from data creation to long-term preservation and access.
The document discusses the Community Cancer Data Harmonization (CCDH) project which aims to harmonize data across different cancer research data commons (CRDCs). It provides an overview of the multi-step process used to develop a harmonized data model called CRDC-H. This includes:
1. Standardizing documentation of source node models
2. Generating an Aggregated Data Model (ADM) representing all source model elements
3. Mapping the ADM to standard models like BRIDG and FHIR
4. Refactoring the ADM into a more normalized Conceptual Domain Model (CDM) prototype
The presentation describes the first phase of work focusing on harmonizing administrative and biospecimen
This document summarizes Nicole Vasilevsky's presentation on teaching data science to undergraduate students. It discusses the need for data science training, the open educational resources (OERs) developed by OHSU Library to address this need, and workshops offered including "Data and Donuts". The OERs cover the entire research process, from finding data to analysis to sharing results. Workshops are hands-on and interactive. Future plans include continuing "Data and Donuts" and potentially a larger OHSU Library Data Science Institute. The overall goal is to provide accessible data science training to address the growing demand.
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
The document describes efforts to enhance the Human Phenotype Ontology (HPO) by adding layperson synonyms to make it more useful and accessible for patients. The workflow involved systematically reviewing over 12,000 HPO classes, searching external sources to identify potential layperson synonyms, assigning the synonyms and validating their use, and integrating the new synonyms into the HPO ontology. As a result, 44% of existing HPO synonyms are now classified as layperson terms, increasing the usability and interoperability of HPO for patients, clinicians and researchers.
Poster presentation at the Rare Disease Symposium at Oregon Health & Science University in Portland, Oregon, 2015.
http://openwetware.org/wiki/OHSU_Rare_Disease_Research_Consortium_Symposium_2015
Poster presentation about the Resource Identification Initiative (http://www.force11.org/Resource_identification_initiative) at the Research Data Alliance meeting in Dublin, Ireland in March 2014 (https://rd-alliance.org/rda-third-plenary-meeting.html).
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
1. Roles for Libraries in Providing
Research Data Management
Services
Nicole Vasilevsky, Oregon Health & Science University
Victoria Mitchell, University of Oregon
Jeremy Kenyon, University of Idaho
5. Why do our patrons
need to know about
data management?
6.
7. Why?
Researcher Perspective
Version
control Track
processes for
reproducibility
Quality
Control
Stay Organized Save Time and Stress
Avoid
Data
Loss
Format data for
reuse (by self,
team, or others)
Document for own
recollection,
accountability, reuse
11. The UO Environment
• No campus-wide research data policy
• Library leading on research data
management and preservation
• Collaborating with campus IT, Research
Services
12. The UO Environment
• Digital Scholarship Center
• Open Access Publishing
• Digital Collections
• Institutional Repository
• Interactive Media Development
• Data Services
• Science Data Services Librarian
• Social Science Data Librarian
17. Graduate Seminar in Data
Management
• 2 iterations so far
• 1st: Spring 2013 – 1 credit course, LIB 407/507
• Made it available to upper-division undergrads; none
signed up
• 2nd Spring 2014 – 1 credit course, LIB 607
18. Graduate Seminar in Data
Management
Based course around creation of a DMP for a
funding agency
• Students registering for the course were
strongly encouraged to have a research
project already in mind or underway
• Also used, in part and with modification, the
education modules created by DataONE
19. • Natural disaster
• Facilities infrastructure failure
• Storage failure
• Server hardware/software
failure
• Application software failure
• External dependencies (e.g.
PKI failure)
• Format obsolescence
• Legal encumbrance
• Human error
• Malicious attack by human or
automated agents
• Loss of staffing competencies
• Loss of institutional
commitment
• Loss of financial stability
• Changes in user expectations
and requirements
Data Loss
CCimagebySharynMorrowonFlickr
CCimagebymomboleumonFlickr
Slide adapted from DataONE Education Module: Why Data
Management. DataOne. Retrieved March 21, 2013
20. Spreadsheet for Help with
Organizing
Research
Project:
[Name of research
project]
Name: [Your name]
Dates:
[when you'll be
conducting your
research, e.g. 7/14-
1/15]
Project Data
Folder:
[e.g.
dissertation_coldfusion
_data]
Research
Process/Method
/ Data Source
Collection
Dates Storage Format
Original
Format
Working
Format Access Format
Preservation
Format(s)
File Naming
Convention
Folder /
Convention Versioning Strategy
Storage
Location Who can help?
Access
restrictions?
Who
needs
access?
Software /
Tools Required
Metadata
Schema Notes
21. LIB 607 v.3
• Changed to Data Management for the
Social Sciences (and Digital Humanities)
• Less emphasis on DMP per funder
requirements
• More time to address issues specific to the
social sciences and humanities
22. @ the University of Idaho Library
Research Data Services
Credit: University of Idaho Creative Services
23. University of Idaho Characteristics:
• Public, comprehensive, land-grant university
• Strong emphasis on agriculture, environmental science, engineering
• Recent emphasis on developing research data and research
cyberinfrastructure, including library research data services, INSIDE
Idaho, the geospatial data repository, and NKN, a multi-disciplinary
institutional data repository
27. Research Data
Services at the
U-Idaho Library
Appointments
&
Consultations
Northwest
Knowledge
Network
(institutional
data repository)
Embedded
Services
(Buy-outs of
librarian time)Tool & Technology
Support:
IQ-Station,
ESRI Products,
DMPTool,
Metadata editors
Website:
Data
Management
Best Practices
Guide
Instruction &
Workshops
Many modes of service
Raise awareness of research data management & our services
Create a culture of documentation
Transform thinking across disciplines about data distribution &
publishing
28. Focus: creating a culture of documentation
FISH502 “One-shot” Instruction Session
- Class participants: fisheries biology and statistics graduate students
- Exercise:
1) review the following spreadsheet
2) identify the information needed to re-use this dataset
29. Focus: creating a culture of documentation
Research consultation: environmental modelling
Post-doc from a multi-institutional project was
primary contact for several teams
Consultation on metadata was made towards the
end of project
Producing 6 discrete collections of data as netCDF
(format required by funder)
Repository required ISO 19115 XML metadata for
describing whole collections
30. Focus: creating a culture of documentation
Challenges:
Understanding the standard
Attribute Conventions for Dataset Discovery
ISO 19115-2
Codelists and controlled vocabularies
Rules for free-text fields
what does a good title look like?
Placement of content
should variables be listed in keywords, title, or description?
Responsibilities
who should create XML files – the researcher or us?
31. Focus: creating a culture of documentation
Re-use and comprehension of
data requires good
documentation
Researchers often have
idiosyncratic and localized, i.e.
customized, documentation
practices
Content standards are often not
well-known among researchers
Disciplinary content standards
are necessary for enabling
advanced modes of data access
Library services
must emphasize
documentation
32. Future Directions
Fienberg, S.E. et al. (1985). Sharing
Research Data. Washington, D.C: National
Academies Press.
http://www.nap.edu/catalog/2033/sharing-
research-data
33. at Oregon Health & Science University
Research Data Management Efforts
34. What would you do with
$1k today to make
research communication
better that doesn’t involve
building another tool?
39. Your Data: Gummy Bear Raw Data
Bounces Amplitude Color
15 4 blue
43 3 red
58 9 green
75 82 purple
Materials:
• Haribo Gummi Bears
Sugar Free, 5 lb bag,
Amazon.com (UPC: 422384500110)
• SpringOMatic 3000
(ICanPickleThat, Portland, OR)
http://laughingsquid.com/the-anatomy-of-a-gummy-
bear-by-jason-freeny/
40. Figure 1. A) Gummy skeleton with belly button annotated
with red arrow B) Springiness by sample color.
Methods Section: Haribo Gummi Bears (Sugar Free) were purchased from
Amazon.com (UPC: 422384500110). Gummy bears were placed in the
SpringOMatic 3000 (ICanPickleThat, Portland OR) according to the manufactures
instructions. The Gummy Anatomy (Jason Freeny) image was cropped in PPT
(Microsoft) and annotate to highlight the bellybutton.
Gummy Bear Final Figure
0
2
4
6
8
10
12
14
16
blue red green purple
Springiness(bounces/length)
Sample Color
A B Figure
legends/metadat
a
Manipulating
images
Attribution
Metadata about
research
resources
41.
42. Group 1: Gummy Bear Final Data
0
2
4
6
8
10
12
14
16
blue red green purple
4 3 9 82
15 43 58 75
Springiness (Bounces/Amplitude)
15 4 blue
43 3 red
58 9 green
75 82 purple
Methods:
A schematic of a Gummi Bear was cropped to
indicate where the belly button is located (Fig.
1). At this point, raw experimental data
showing the bounce, amplitude, and color
were analyzed and the springiness calculated
for each color of bear. This was accomplished
by dividing the bounce by the amplitude and
plotting this against bear color.
Fig. 1
Belly button of
Haribo Sugar Free
Gummi Bear
What is missing?
A.Image manipulation
B. Attribution
C. Figure Legends
D.Metadata about
resources
43. Figure 1. A) Gummy skeleton with belly button
annotated with red arrow B) Springiness by sample
color.
Methods Section: Haribo Gummi Bears (Sugar
Free) were purchased from Amazon.com (UPC:
422384500110). Gummy bears were placed in the
SpringOMatic 3000 (ICanPickleThat, Portland OR)
according to the manufactures instructions.
Group 2: Gummy Bear Final Data
0
2
4
6
8
10
12
14
16
blue red green purple
Springiness(bounces/length)
Sample Color
A
B
What is missing?
A.Image manipulation
B. Attribution
C. Figure Legends
D.Metadata about
resources
44. Figure 2: Schematic depiction of
Haribo Gummi Bear umbilical
skeletal anatomy.
Methods & Materials
Gummi Bears were obtained through Amazon in 3 kg bags. Lot and temperature during transport
data were not made available. Bears were housed in a plastic bowl in accordance with IACUC
policy and national standards for gummi bear care. They were housed at room temperature on a
natural light cycle.
Food and water were provided ad libitum (consumption was not monitored)
Each bear was sampled only once to reduce costs
Group 3: Gummy Bear Final Data
What is missing?
A.Image manipulation
B. Attribution
C. Figure Legends
D.Metadata about
resources
45. Belly Button
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
blue red green purple
Springiness(bounces/amplitude)
Gummy Bear Color
(a) (b)
Fig. 1. (a) schematic of the anatomy of a gummy bear (adapted from 1). (b)
springiness of bear by color using spring-o-matic.
Methods: Insert the sample of interest, specifically
a colored gummy bear (Haribo, Japan). Position
the probe above the sample. Press "Tickle" and
the SpringOMatic (ICanPickleThat, Portland) will
poke the belly button a standard depth of 1 cm.
Record the number of bounces and the amplitude
of the largest bounce in cm. From these values,
the springiness can be calculated
(bounce/amplitude).
What is missing?
A.Image manipulation
B. Attribution
C. Figure Legends
D.Metadata about
resources
Group 4: Gummy Bear Final Data
46. GUMMY BEARS TAUGHT US…
• People see the same data very
differently
• “Detailed” means different things…
• Metadata?!?
• File management is difficult
• Workflow
Vasilevsky N; Wirz J, Champieux R, Hannon T, Laraway B Banerjee K, Shaffer C, and Haendel M.
Lions, Tigers, and Gummi Bears: Springing Towards Effective Engagement with Research Data
Management (2014). Scholar Archive. Paper 3571.
48. Researchers DO need assistance:
Finding and choosing data standards
File versioning
Applying metadata to facilitate data sharing
“Gummi Bear” themed data management exercise
resonated well with students
Lack of awareness of services and expertise
offered by the Library
Conclusions
49. OHSU New Directions
OHSU Library is developing
data services for researchers
BD2K educational grants in
collaboration with DMICE
www.ohsu.edu/xd/education/library/data
50. Acknowledgements
OHSU
Melissa Haendel
Robin Champieux
Jackie Wirz
Kyle Banerjee
Bryan Laraway
Chris Shaffer
Kaiser
Todd Hannon
UO
Brian Westra
Karen Estlund
Cathy Flynn- Purvis
John Russell
Idaho
Bruce Godfrey
Nancy Sprague
Lynn Baird
Greg Gollberg
Luke Sheneman
Steven Daley-Laursen
Why |
Funding agencies are creating mandates to develop data management and sharing plans, and additionally, there is increased focus on reproducibility of science and other disciplines that stems from a need for improved data management.
Victoria is going to add a different slide with more examples.
As professionals in curation, organization and classification of information, librarians are well poised to assist researchers by providing data management services and training.
Soc sci data librarian: More recently created (partial) position
Consultations with faculty about data produced by their research, their needs for collecting, managing, etc., data; depositing data in our repository
Also, Northwest Indian Language Institute – Endangered Languages
E.g., Office of Research and Innovation – workshop for new faculty on grant-writing for NSF and NIH – give us a little time.
EXAMPLE of slide borrowed from DataONE
Use as in-class exercise; students keep adding information as course progresses
At the DMOH, we discussed topics including scholarly attribution, data sharing, managing your scholarly footprint. At the DMOH, we had researchers attend from various career levels, from grad students to post-docs, to core lab directors to PIs.
While the research at OHSU is primarily focused on biomedical health research, the specific research projects vary quite greatly, from bench science, to clinical research, across topics such as cancer biology or biomedical engineering.
We wanted to come up with an interactive exercise, where we could demonstrate some of the importance of data management skills at each step, but centered around a topic that was either not too specific to someone’s field or too distant from their field. We chose a topics that was fictional and playful- we asked them to pretend they were doing a study that assessed the “springiness’ of a gummi bear
These are the materials that we gave the students
This is best viewed as the “slide show”, so you can see the animations.
I wanted to point out what the ideal results would be, and some of the key attributes we wanted them to take away.
I am going to show them all the results from each group, then ask them to raise their hands to answer ‘what is missing’. For example, for this group, the is missing all of the options.
This group is missing attribution of the image.
I left off the graph here because I was running out of room.
At the DMOH and Data Wrangling session, we recruited individual researchers to schedule individual consultations with us. We had X # sign up and X # follow through with consultations.
We found, even with the incentive of the gift card, it was difficult to recruit researchers to participate in the consultations.