This document proposes a system called "Landing Pages" to improve data citation practices. Landing Pages would serve as a publishing record for datasets, describing the data to allow for proper citation. They would provide context on how to access and use the data and include links directly to the data. Related "Citation Pages" created by authors would link to the Landing Page and describe any additional processing of the data. The system aims to address challenges around identifying datasets, assigning persistent identifiers, and encouraging authors to cite data properly.
This document discusses several topics that will drive the future of digital libraries, including data management plans, data citation, curation service models, sustainability, training data practitioners, and more. Specific issues covered include scientific data support, data identifiers, curation best practices, cost models, educating librarians in data management, and the role of digital libraries in enabling reproducible science through 2050.
This document provides an overview of research data and the role of libraries in supporting research data services. It discusses that research data takes many forms and differs across disciplines. Libraries can help with research data in several ways, including learning about data practices in their organizations, identifying gaps, and helping researchers find and manage data through various services and skills like data analysis and visualization. The document outlines potential areas libraries can provide support and ways to continue building data skills, such as through online courses and conferences.
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
This document outlines best practices for creating research data. [1] It recommends using consistent data organization with standardized formats and descriptive file names. [2] Researchers should perform quality assurance checks and use scripted programs to analyze data while keeping notes. [3] All aspects of data collection and analysis should be thoroughly documented. Following these practices will improve data usability, sharing, and reproducibility.
This presentation was provided by Maria Praetzellis of California Digital Library, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
The document discusses data management plan requirements for proposals submitted to the U.S. Department of Energy Office of Science for research funding. It provides context on the history of data management policies, outlines the four main requirements for inclusion of a data management plan, and suggests elements that should be included in the plan such as data types/sources, content/format, sharing/preservation, and protection. It also discusses tools like the Public Access Gateway for Energy and Science that can help manage access to research publications and data.
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
Kathleen Fear, ICPSR, University of Michigan
“The impact of data reuse: a pilot study of 5 measures”
Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
This document discusses several topics that will drive the future of digital libraries, including data management plans, data citation, curation service models, sustainability, training data practitioners, and more. Specific issues covered include scientific data support, data identifiers, curation best practices, cost models, educating librarians in data management, and the role of digital libraries in enabling reproducible science through 2050.
This document provides an overview of research data and the role of libraries in supporting research data services. It discusses that research data takes many forms and differs across disciplines. Libraries can help with research data in several ways, including learning about data practices in their organizations, identifying gaps, and helping researchers find and manage data through various services and skills like data analysis and visualization. The document outlines potential areas libraries can provide support and ways to continue building data skills, such as through online courses and conferences.
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
This document outlines best practices for creating research data. [1] It recommends using consistent data organization with standardized formats and descriptive file names. [2] Researchers should perform quality assurance checks and use scripted programs to analyze data while keeping notes. [3] All aspects of data collection and analysis should be thoroughly documented. Following these practices will improve data usability, sharing, and reproducibility.
This presentation was provided by Maria Praetzellis of California Digital Library, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
The document discusses data management plan requirements for proposals submitted to the U.S. Department of Energy Office of Science for research funding. It provides context on the history of data management policies, outlines the four main requirements for inclusion of a data management plan, and suggests elements that should be included in the plan such as data types/sources, content/format, sharing/preservation, and protection. It also discusses tools like the Public Access Gateway for Energy and Science that can help manage access to research publications and data.
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
Kathleen Fear, ICPSR, University of Michigan
“The impact of data reuse: a pilot study of 5 measures”
Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
1. The document discusses best practices for managing research data over the data life cycle, from collection through sharing and archiving. It provides tips for organizing, documenting, and storing data in sustainable file formats and naming conventions. Following best practices helps ensure usability, reproducibility, and long-term access to research data.
2. Specific best practices covered include using consistent organization, standardized naming and formats, descriptive filenames, quality assurance, scripting for processing, documenting file contents, and choosing open file formats. The document also addresses data security, backup, and storage considerations.
3. Managing data properly is important for reuse and sharing data with others now or in the future. Scripting helps capture data workflows for reproducibility.
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Data Services presentation for PsychologyLynda Kellam
This document provides an overview of data services and resources available through UNCG Library and ICPSR. It describes how the library supports data discovery, management, and instruction. Key resources highlighted include ICPSR, which collects and shares social science data for research and teaching, and the many longitudinal datasets it provides, such as Add Health. Services for acquiring, analyzing, and curating data are discussed.
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
The document analyzes 1,260 Data Management Plans (DMPs) from NSF grant proposals submitted to the University of Illinois between 2011-2013. It finds that most proposals planned to store data on PI servers, websites, or campus resources like IDEALS. While there were no significant differences between funded and unfunded proposals, more recent plans were more likely to use IDEALS and disciplinary repositories for data storage and sharing. This suggests an increasing role for libraries, universities, and disciplines in research data management.
This document discusses creating a data management plan. It explains that a data management plan is a comprehensive plan for managing research data throughout a project's lifecycle and briefly describing how data will be shared per a funder's policy. It provides an overview of key elements to include in a plan such as file formats, organization, sharing, and preservation. The document also reviews funder requirements and available tools to create plans, noting they can be tailored to different funders' guidelines.
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
This document provides an introduction to data management. It discusses why data management is important, covering key aspects like developing data management plans, file organization, documentation and metadata, storage and backup, legal and ethical considerations, sharing and reuse, and preservation. Effective data management is critical for research success as it supports reproducibility, sharing, and preventing data loss. The document outlines best practices and resources like the library that can help with developing strong data management strategies.
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
This document discusses the importance of research data management. It covers the data lifecycle and components of a data management plan. The data lifecycle includes collecting, processing, analyzing, storing, preserving, and sharing data. A data management plan outlines how data will be managed and preserved during and after a research project. It includes information about the data, metadata, data sharing policies, long-term storage, and budget. Developing a data management plan helps keep data organized, track processes, control versions, prepare data for sharing and reuse, and ensure long-term access.
This document discusses research lifecycles and data management. It begins by outlining typical stages in a research lifecycle from planning to publication. It then discusses how data is created and managed at various stages, and raises questions researchers should consider around formatting, documenting, storing, sharing and preserving data. The document provides examples of research lifecycle models and gives advice on best practices for managing data at each stage of the research process to support reuse and ensure data is well documented and preserved.
Using a Case Study to Teach Data Management to LibrariansSherry Lake
This document outlines the agenda and learning objectives for a workshop on research data management for libraries. The workshop uses a case study approach and hands-on activities to teach librarians best practices for data collection, organization, documentation, backup/storage, and sharing/preservation. The goal is to prepare librarians to teach researchers about data management and illustrate opportunities for library involvement in the area. Based on a survey after the workshop, most attendees felt their expectations were met or exceeded, and they found the hands-on case study activities and practical tips to be most useful.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Data Services/ICPSR presentation for School of EducationLynda Kellam
UNCG Data Services & ICPSR provides data services and instruction to support research and teaching. This includes a data portal, data consultations, and assistance acquiring openly available data. ICPSR is a large social science data archive that collects, preserves, and disseminates research data for further analysis. ICPSR's most popular datasets cover topics like health, politics, and demographics. Downloads from ICPSR include documentation, codebooks, and data files in various formats. ICPSR also offers training programs, a bibliography of data-related literature, and tools to search and compare variables across datasets.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
Next generation data services at the Marriott LibraryRebekah Cummings
This document discusses next generation data services at the Marriott Library. It begins by asking how data needs in the social sciences and humanities may change over the next five years, and how libraries can partner with faculty on data needs. The document then discusses the library's role in data curation, challenges, and examples of data services like research data consultation, metadata assistance, and repository services. It provides examples of collaborations like embedded librarianship and a project with the UCLA Civil Rights Project to archive publications and datasets. The discussion emphasizes the changing landscape and growing importance of data sharing and management.
This presentation was provided by Joe Zucca of the University of Pennsylvania, during Session Five of the NISO event "Assessment Practices and Metrics for the 21st Century," held on November 22, 2019.
Documentation and Metdata - VA DM BootcampSherry Lake
This document discusses documentation and metadata for research data. It begins with an overview of why documentation is important at different stages of the research data lifecycle from collection through archiving. Key elements to document include how the data was created, its content and structure, who created and maintains it, and how it can be accessed and cited. The document then discusses common documentation formats like readmes, data dictionaries, and codebooks. It also introduces metadata as structured information that describes resources and explains common metadata standards and tools for creating structured metadata files. Exercises guide creating documentation in these formats for a weather dataset example.
The document summarizes the evolution of data citation practices over time. It discusses how data citation was initially part of literature but became more complex with digital data. Early efforts in the 1990s-2000s had little traction. Starting in the mid-2000s, multiple disciplines began developing their own data citation approaches and guidelines, with DOIs becoming a major driver. There is now a consensus phase with joint principles being developed, though implementation is just beginning and will require local cultural changes. The document provides examples of how data is currently cited and discusses best practices around identifiers, versions, and microcitations.
1. The document discusses best practices for managing research data over the data life cycle, from collection through sharing and archiving. It provides tips for organizing, documenting, and storing data in sustainable file formats and naming conventions. Following best practices helps ensure usability, reproducibility, and long-term access to research data.
2. Specific best practices covered include using consistent organization, standardized naming and formats, descriptive filenames, quality assurance, scripting for processing, documenting file contents, and choosing open file formats. The document also addresses data security, backup, and storage considerations.
3. Managing data properly is important for reuse and sharing data with others now or in the future. Scripting helps capture data workflows for reproducibility.
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Data Services presentation for PsychologyLynda Kellam
This document provides an overview of data services and resources available through UNCG Library and ICPSR. It describes how the library supports data discovery, management, and instruction. Key resources highlighted include ICPSR, which collects and shares social science data for research and teaching, and the many longitudinal datasets it provides, such as Add Health. Services for acquiring, analyzing, and curating data are discussed.
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
The document analyzes 1,260 Data Management Plans (DMPs) from NSF grant proposals submitted to the University of Illinois between 2011-2013. It finds that most proposals planned to store data on PI servers, websites, or campus resources like IDEALS. While there were no significant differences between funded and unfunded proposals, more recent plans were more likely to use IDEALS and disciplinary repositories for data storage and sharing. This suggests an increasing role for libraries, universities, and disciplines in research data management.
This document discusses creating a data management plan. It explains that a data management plan is a comprehensive plan for managing research data throughout a project's lifecycle and briefly describing how data will be shared per a funder's policy. It provides an overview of key elements to include in a plan such as file formats, organization, sharing, and preservation. The document also reviews funder requirements and available tools to create plans, noting they can be tailored to different funders' guidelines.
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
This document provides an introduction to data management. It discusses why data management is important, covering key aspects like developing data management plans, file organization, documentation and metadata, storage and backup, legal and ethical considerations, sharing and reuse, and preservation. Effective data management is critical for research success as it supports reproducibility, sharing, and preventing data loss. The document outlines best practices and resources like the library that can help with developing strong data management strategies.
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
This document discusses the importance of research data management. It covers the data lifecycle and components of a data management plan. The data lifecycle includes collecting, processing, analyzing, storing, preserving, and sharing data. A data management plan outlines how data will be managed and preserved during and after a research project. It includes information about the data, metadata, data sharing policies, long-term storage, and budget. Developing a data management plan helps keep data organized, track processes, control versions, prepare data for sharing and reuse, and ensure long-term access.
This document discusses research lifecycles and data management. It begins by outlining typical stages in a research lifecycle from planning to publication. It then discusses how data is created and managed at various stages, and raises questions researchers should consider around formatting, documenting, storing, sharing and preserving data. The document provides examples of research lifecycle models and gives advice on best practices for managing data at each stage of the research process to support reuse and ensure data is well documented and preserved.
Using a Case Study to Teach Data Management to LibrariansSherry Lake
This document outlines the agenda and learning objectives for a workshop on research data management for libraries. The workshop uses a case study approach and hands-on activities to teach librarians best practices for data collection, organization, documentation, backup/storage, and sharing/preservation. The goal is to prepare librarians to teach researchers about data management and illustrate opportunities for library involvement in the area. Based on a survey after the workshop, most attendees felt their expectations were met or exceeded, and they found the hands-on case study activities and practical tips to be most useful.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Data Services/ICPSR presentation for School of EducationLynda Kellam
UNCG Data Services & ICPSR provides data services and instruction to support research and teaching. This includes a data portal, data consultations, and assistance acquiring openly available data. ICPSR is a large social science data archive that collects, preserves, and disseminates research data for further analysis. ICPSR's most popular datasets cover topics like health, politics, and demographics. Downloads from ICPSR include documentation, codebooks, and data files in various formats. ICPSR also offers training programs, a bibliography of data-related literature, and tools to search and compare variables across datasets.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
Next generation data services at the Marriott LibraryRebekah Cummings
This document discusses next generation data services at the Marriott Library. It begins by asking how data needs in the social sciences and humanities may change over the next five years, and how libraries can partner with faculty on data needs. The document then discusses the library's role in data curation, challenges, and examples of data services like research data consultation, metadata assistance, and repository services. It provides examples of collaborations like embedded librarianship and a project with the UCLA Civil Rights Project to archive publications and datasets. The discussion emphasizes the changing landscape and growing importance of data sharing and management.
This presentation was provided by Joe Zucca of the University of Pennsylvania, during Session Five of the NISO event "Assessment Practices and Metrics for the 21st Century," held on November 22, 2019.
Documentation and Metdata - VA DM BootcampSherry Lake
This document discusses documentation and metadata for research data. It begins with an overview of why documentation is important at different stages of the research data lifecycle from collection through archiving. Key elements to document include how the data was created, its content and structure, who created and maintains it, and how it can be accessed and cited. The document then discusses common documentation formats like readmes, data dictionaries, and codebooks. It also introduces metadata as structured information that describes resources and explains common metadata standards and tools for creating structured metadata files. Exercises guide creating documentation in these formats for a weather dataset example.
The document summarizes the evolution of data citation practices over time. It discusses how data citation was initially part of literature but became more complex with digital data. Early efforts in the 1990s-2000s had little traction. Starting in the mid-2000s, multiple disciplines began developing their own data citation approaches and guidelines, with DOIs becoming a major driver. There is now a consensus phase with joint principles being developed, though implementation is just beginning and will require local cultural changes. The document provides examples of how data is currently cited and discusses best practices around identifiers, versions, and microcitations.
http://kulibrarians.g.hatena.ne.jp/kulibrarians/20170222
Presentation by Cuna Ekmekcioglu (The University of Edinburgh)
- Creating and Managing Digital Research Data in Creative Arts: An overview (2016)
CC BY-NC-SA 4.0
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Kristin Briney
This talk, aimed at librarians, describes the data management issues surrounding paper and electronic lab notebooks. It offers several ways for librarians to support good practices and the transition from paper to electronic.
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...Susanna-Assunta Sansone
Part of the SciDataCon14 workshop on "Data Papers and their applications" run by myself and Brian Hole to help attendees understand current data-publishing journals and trends and help them understand the editorial processes on NPG's Scientific Data and Ubiquity's Open Health Data.
The term "life cycle" refers to the series of stages or phases that an organism, system, or product goes through from its beginning to its end. It is a concept that can be applied to various contexts, such as biology, ecology, business, technology, and project management. Here are a few examples of life cycles:
Biological Life Cycle: In biology, the life cycle refers to the sequence of stages that an organism undergoes from birth to reproduction and eventually death. This can include processes like birth or germination, growth and development, reproduction, and death.
Product Life Cycle: The product life cycle describes the stages a product goes through from its introduction to the market until its eventual decline. These stages typically include introduction, growth, maturity, and decline. Companies monitor the product life cycle to make strategic decisions regarding marketing, production, and product development.
Project Life Cycle: The project life cycle outlines the stages involved in the management and execution of a project. These stages typically include initiation, planning, execution, monitoring and control, and closure. Each phase has specific activities and deliverables, ensuring that the project progresses in a structured and organized manner.
Ecological Life Cycle: Ecological life cycles refer to the stages that ecosystems or species go through over time. This can involve the growth and decline of populations, adaptation to environmental changes, and interactions within the ecosystem.
Human Life Cycle: The human life cycle encompasses the different stages of development and growth that individuals go through from birth to death. This includes infancy, childhood, adolescence, adulthood, and eventually old age.
Understanding life cycles is important as it provides insight into the processes and changes that occur within various systems. It allows for better planning, decision-making, and adaptation to ensure sustainable growth, effective management, and optimal utilization of resources throughout the life cycle.
The document provides guidance for completing a dissertation journey, including:
- Deciding between a qualitative or quantitative study approach, each with their own advantages and disadvantages.
- Creating a proposal presentation for the dissertation committee on chapters 1-3 that includes frameworks, definitions, variables, and methodology.
- Working with the research ethics committee and following submission guidelines.
- Undergoing multiple revisions of all chapters based on committee feedback.
This document provides information and recommendations for preventing data loss through proper storage, organization, and backup of research files. It discusses developing a consistent file naming convention and folder structure for projects. The document also recommends storing multiple copies of important files in different locations and using version control software to track changes over time. Activities are included to help attendees evaluate their current practices and develop improved plans for organizing, backing up, and locking important versions of their data and files.
This document discusses sharing research data. It describes the Data Services Center, which provides data services including finding and providing access to datasets. It notes that funders and publishers require data sharing, and that shared data receives more citations. It recommends sharing the minimum data needed to reproduce results, and considering timing, usability and granularity of data sharing. For sharing methods, it recommends using disciplinary or general repositories like UR Research, Dryad and REACTUR, which provide long-term preservation and access. Workshops and help are available for data management and sharing.
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
This document discusses incentivizing the uptake of reusable metadata in survey production. It notes that there is no universal language used to document survey questions and variables, leading to wasted resources. The Data Documentation Initiative (DDI) is proposed as a standard. Barriers to adopting metadata best practices include legacy systems, manual processes, and reluctance to change. The document outlines ideas to incentivize metadata use such as specifying documentation requirements in funding calls and improving documentation tools and workflows. Showing tangible benefits through applications like question banks and data exploration systems is also suggested.
This document discusses the importance of properly documenting research data. It notes that documentation allows data to be understood by those outside the original project and prevents inaccurate assumptions from being made if the data manipulations or variable meanings are unclear. Insufficient documentation can make data unusable or misinterpreted. The document outlines key elements to document like data elements, study details, and decisions made. It provides examples of documentation tools like codebooks, annotated instruments, and data narratives. Thorough documentation ensures research data remains useful and understandable.
Online citation tools allow researchers to store and manage citations online rather than using paper or local files. Previously, finding information involved slow searches across multiple databases and printing or writing citations by hand [1]. New tools like CiteULike, Zotero, and Mendeley allow storing citations in the cloud, sharing with others, and importing references into documents [2]. By scraping citation data from websites, these tools make literature searching more efficient and collaborative [3].
aOS Moscow - Microsoft Teams: From 'Send to' to 'Share with'Sasja Beerendonk
Sasja Beerendonk gave a presentation on moving from email-based collaboration to using Microsoft Teams. Beerendonk discussed the problems with using email to share files, such as creating multiple duplicate versions. Beerendonk proposed "stopping" practices like using file shares and renaming files, and "starting" practices like storing files in OneDrive and Teams, enabling versioning and conversations around files, and using Teams features for real-time collaboration. Beerendonk demonstrated these capabilities in Teams and emphasized transforming work practices to be more collaborative.
aOS Moscow - E5 - Teams From 'Send to' to 'Share with' - Sasja BeerendonkaOS Community
Sasja Beerendonk gave a presentation on moving from email-based collaboration to using Microsoft Teams. Beerendonk discussed the problems with using email to share files, such as creating multiple duplicate versions. Beerendonk proposed "stopping" practices like using file shares and renaming files, and "starting" practices like storing files in OneDrive and Teams, enabling versioning and conversations around files, and using Teams features for real-time collaboration. Beerendonk demonstrated these capabilities in Teams and emphasized transforming work practices to be more collaborative.
Writing a successful data management plan with the DMPToolkfear
This document provides an overview of how to write an effective Data Management Plan (DMP) using the DMPTool. It discusses the key components of a DMP including data products, standards, access and sharing, preservation, and documentation. The goals are to help researchers generate a DMP, understand the basic elements, and recognize how good data management leads to a strong plan. Writing a thorough DMP is now required by many funders and helps ensure data is organized, accessible, and preserved for future use.
Data Publishing Models by Sünje Dallmeier-Tiessendatascienceiqss
Data Publishing is becoming an integral part of scholarly communication today. Thus, it is indispensable to understand how data publishing works across disciplines. Are there best practices others can learn from or even data publishing standards? How do they impact interoperability in the Open Science landscape? The presentation will look at a range of examples, and the main building blocks of data publishing today. The work has been conducted as part of the RDA Data Publishing Workflows group.
Similar to Landing Pages - Joe Hourcle - RDAP12 (20)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 2, Sustainability
Presenter:
Margaret Henderson, Virginia Commonwealth University
Panel Leads:
Kristin Briney, University of Wisconsin-Milwaukee & Erica Johns, Cornell University
RDAP 16: Sustainability of data infrastructure: The history of science scienc...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 2, Sustainability
Presenter:
Kristin Eschenfelder, University of Wisconsin-Madison
Panel Leads:
Kristin Briney, University of Wisconsin-Milwaukee & Erica Johns, Cornell University
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Outline for Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Presenter:
Jonathan Petters, Johns Hopkins University
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Presenter:
Lisa Federer, National Institutes of Health
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...ASIS&T
This document summarizes Andi Ogier's experience providing data management plan (DMP) consulting at Virginia Tech over the past 2 years. It outlines the goals and timeline of DMP consulting services, things that have been done to educate researchers on DMPs, statistics on DMP consulting, logistics of the consulting process, feedback received from the National Science Foundation, and plans for the future of DMP consulting at Virginia Tech including more embedded consulting and leveraging new data repository services.
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Presenter:
Laura J. Biven, US Department of Energy
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Amy Koshoffer, University of Cincinnati
Eric J. Tepe, University of Cincinnati
RDAP 16 Poster: Interpreting Local Data Policies in PracticeASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Line Pouchard, Purdue University
Donna Ferullo, Purdue University
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...ASIS&T
This document discusses challenges in measuring adoption and impact of electronic lab notebooks (ELNs) for research data management. It provides background on ELN implementation at Cornell and Wisconsin universities and describes prior efforts to survey ELN users about data management practices. Specifically, it examines difficulties in defining and assessing concepts like data management and adoption, and getting user perspectives on the value of ELNs for record keeping, metadata capture, and archiving data over time. Input is sought on how to improve questions that evaluate the degree to which ELNs help with various data management needs and goals.
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenter:
Caitlin Bakker, University of Minnesota
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Tina Griffin, University of Illinois at Chicago
RDAP 16 Lightning: RDM Discussion Group: How'd that go?ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Margaret Janz, Temple University
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Christie Wiley, University of Illinois Urbana-Champaign
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Sara Mannheimer, Montana State University
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Ana Van Gulick, Carnegie Mellon University
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Megan N. O'Donnell, Iowa State University
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Betty Rozum, Utah State University
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 4, "Measuring Up: How Are We Defining Success for Research Data Services?"
Presenter:
Yasmeen Shorish, James Madison University
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
1. Recommendation
“Landing Pages”
RDAP 2012
Presented 2012-March-22nd, by Joe Hourclé
2. this is last-minute filler, as I only found out the day
before that one of panel members couldn’t make it, so
did this on the plane ... then found out that I had lost a
second panel member
For those playing along at home, you have to
completely forget the order the slides are in while
presenting, and already be flustered from having tried
to present someone else’s slides but the projector
display it really tiny, so you quit Keynote & came back
in, which ended up making the slides bigger, but
changed the font making them mostly illegible.
You may also want to turn on the presenter notes
3. Current Practice
• Acknowledge the mission or instrument
team w/ standardized text
• Cite the PI’s paper on the instrument
• Cite a paper describing the data
• Cite “the data”
4. Acknowledgement
• Isn’t tracked by most bibliographic tools
• Doesn’t tell :
• which processed / packaged form of the data
• which mirror was used
• which version / edition of the data
• where to get the data
5. Citing the inst. paper
• Can’t distinguish between citing the data,
and citing something else in the paper.
• doesn’t say that the data was useful
• Doesn’t tell :
• which processed / packaged form of the data
• which mirror was used
• which version / edition of the data
• where to get the data
6. Citing the data paper
• Are fixed in time
• Has the calibration changed since the data was
initially released?
• Has the data moved between archives?
• Has the data been removed?
7. Citing the data
• How?
• Not all data is formally ‘published’
• May not have a formal title / author / etc.
• Different needs from different disciplines
• May use data from 100 studies
• May use a small portion of a large study
• Different subsetting needs
8. BRDI Meeting, Aug.
2011
• Two days discussing data attribution &
citation
• Breakouts made lists of the challenges &
issues concerning attribution & citation
• Technical, Scientific, Institutional/Financial/Legal/
Socio-Cultural, Main Actors & Roles, Add’l info
needed to proceed.
9. Technical Breakout
• Most of the issues came back to identity
• How do I decide what to call the data I’m citing?
• What are the essential properties when defining
a data set?
• How do I differentiate between similar data?
• (there were more, but I don’t have that
notebook with me)
10. We need:
• Some official ‘record’ of the data
• We need an official ‘title’ for the data
• We need the data provider to name who the
‘author’ is. (the instrument? the PI team? the
software pipeline?)
• Something to persist even if the data doesn’t
11. Our Proposal
• “Landing Pages”
• Serve as a publishing record of the data set
• Act as an endpoint for citation
• Describe the data so that it can be cited
• Provide links to the data
• Provide context so the data can be used
• Persist, yet can be updated
12. Our proposal
• “Citation Pages”
• Created by the article author
• Stored with the paper
• Link to the “Landing Page” of the data used
• Describe additional processing that might have
been applied (‘extended methods’)
13. The Challenge
• Creating records for all of the data out
there
• Minting DOIs for them
• Making sure they’re maintained &
preserved
• Getting authors to use them for citation
• Can work with journal editors
15. Poster, Handout & Links:
http://docs.virtualsolar.org/wiki/C i t a t i o n
Editor's Notes
Yes, I know, I didn ’ t put my name on the slides ... it was a rush job. I probably should have said ‘ recommendation FOR landing pages ’ (see next slide). This was presented on 2012-March-22nd, by Joe Hourclé It ’ s discussing the poster (see link at the end) which has co-authors (Wo Chang, Franciel Linares, Bruce Wilson, Giri Palanisamy), but they didn ’ t have a chance to review any of this before I presented (see next slide). I also have some stuff in there that came more from the original discussion that didn ’ t make it into the poster, so I should also acknowledge the other people involved in the initial discussion : Paul Groth, Allen Renear, Herbert van de Sompel, and Martie Van Deventer. And part of the notes in here are expanding on questions that people had at the RDAP meeting … so, for the notes portion, I should acknowledge them, but I can ’ t remember who they all were. I remember Karen Wickett, but we ’ ll have to wait for the video to check who else raised issues.
This slide was *not* in there when I presented at RDAP. All of the presenter notes were typed in *after* the presentation, so I ’ ve taken the opportunity to try to make things a little clearer than when I presented. I ’ ve left those slides untouched, and inserted this slide, and two more at the end.
And it likely would ’ ve helped if I had set the context up correctly ... however, I jumped straight to the material in what ’ s now slide #8, talking about the meeting where these recommendations came from. Basically, we had a group to try to make a list of the issues of the technical challenges to good data citation. So, first, the quick overview of the current practices in different fields.
An example: “ SOHO is a project of international cooperation between ESA and NASA ” ... of course, there ’ s multiple instruments on the spacecraft, and *lots* of different data products. And it ’ s been running for 15 years and the data won ’ t be made ‘ final ’ until 1-2 years *after* the mission ends (which could be years if they keep getting funding to continue LASCO operations). And the SOHO ack. text has changed over time ... it used to be that you cited SOHO and the specific instrument (with their PI institutions). But that could mean that you ’ re mentioning 10+ instruments, so that faded out over time. And ‘ which mirror was used ’ is referring to when there ’ s copies of the data in more than one place … there ’ s *also* the issue of instruments with multiple optical pathways through them (eg, Hinode ’ s SOT telescope), but I lump all of that into ‘ observing modes ’
Inst. == Instrument. Ie, ‘ sensor ’ , ‘ detector ’ … whatever your field calls it. For instance, you might cite the paper because you ’ re interested in how the mirrors in a telescope were manufactured or the shielding of the electronics ... we have no way of differentiating between you ’ re using the data vs. you ’ re interested in the design of the instrument. And, as was explained in the Dryad Data talk, the ‘ authors ’ (creators) for the data can be a different group for the research … There are going to be people involved with Phase E (the actual observing portion portion of the mission) that weren ’ t there for the construction of the instrument with in the case of launch delays might ’ ve been years earlier.
The thing about initial calibrations is ... they ’ re always wrong. They might be pretty close, but there ’ s always going to be *something* that we learn over time. For instance, did you know that if you point a telescope at the sun for years, it degrades? If you search for ‘ EIT calibration ’ , you get : http://umbra.nascom.nasa.gov/eit/eit_guide/calibration.htm But it wasn ’ t the PI team who came up with the best model of how the instrument degrades, it was another scientist : http://umbra.nascom.nasa.gov/eit/eit_guide/offpoint.htm … Should he be the one getting all of the citations whenever someone uses the re-calibrated data, or the original PI team? Or both?
Much of the data that is ‘ served ’ in the solar physics community is a bunch of files dumped on an FTP server or website. They ’ re FITS files, so they ’ re self-documenting, but because the instrument might be changing observing modes, it ’ s hard to define specific ‘ collections ’ that have titles. Just naming the instrument (or spacecraft, as shown in the SOHO case), is rarely useful to actually determine what data was used in the research. Each discipline has different needs for describing how they reduced the data before analysis -- we only used 30 minutes of a multi-year mission; we only used one measurement per day (vs. averaging each day ’ s observations); We only used the strip right in the middle of each image to generate an alternate projection of the sun before analysis; we rebinned the image as our analysis software was written for 1024x1024 images); we only looked at a smaller region, not the whole sun. Or you might ’ ve done more than one of those. Trying to cram all of that into a citation gets really, really complex.. And that ’ s just for solar physics. Other disciplines use other techniques (eg, the polar folks may only use observations in January, to remove seasonal variations.
BRDI meeting program w/ links to presentations is at: http://sites.nationalacademies.org/PGA/brdi/PGA_064019
I should ’ ve mentioned that we were trying to look at the technical barriers to citation. Part of the issue was that we weren ’ t sure if it was the researchers or the data provider ’ s responsibility on generating the citations. We agreed that there were lots of ways to cut things up, but that the data provider should provide a description of what they consider the ‘ collections ’ to be, so that people have *something* to reference.
Basically, if you want people to cite your data in a specific way, you need to tell them how to cite it. Don ’ t let them pick the ‘ author ’ , or the ‘ title ’ , as there could be infinite variation. If you want to glob lots of stuff under one title (a ‘ collected works of Shakespeare ’ type situation) so you don ’ t have to do as much work, you can do it … but it ’ s not nearly as useful as being able to cite one of the individual parts, and there ’ s an economy of scale … so making descriptions of 10 individual parts isn ’ t going to be 10x the work of describing them as 1 collection.
Leaving the citation generation up to the individual researcher won ’ t work ... but also, giving a specific ‘ you must use this format ’ isn ’ t as useful, because the author may not be using APA or whatever style guide your discipline ’ s journal uses. (this is where DataCite is useful) There ’ s no reason why we can ’ t describe the data *once* using DataCite, DublinCore-SAM (once that profile is done) or similar, and present it in BibTex or other formats for citation manager software to use. The search engines could even spit out a little report when you download data, as a sort of ‘ receipt ’ , telling you what you ’ ve just downloaded, and how to cite it. (which could provide the basis for the ‘ citation page ’ (next slide) We also don ’ t specifically define the granularity of the collection described -- it could be an individual file; all data for the life of the investigation; broken out by year; divided up by observing mode ... or you could do multiple, as there ’ s nothing to say that a file can ’ t be a member of more than one collection; the provider can define whatever groupings make sense for their data. And they can add new ‘ virtual ’ collections later, if they find lots of people are repeatedly citing one specific grouping of data. The landing pages should also be machine harvestable -- XML w/ XSLT transforms to make it XHTML, RDFa, HTML+microformats, HTTP content negotiation, links to OAI-ORE links, etc. If you have lots (but not TBs) of files to download, it can link to OAI-ORE, MetaLink, sparse bags (BagIt) or similar to automate downloading to those files. We talked about lots of different technical ways of doing it, but we ’ re likely going to need to test them and see which work best for which fields. ... I could keep typing, but it might make more sense to just go read the handout. (see the last slide for a link)
If the journal publisher won ’ t take it, then in an Institutional Repository, or even on the author ’ s website is better than nothing. As the data provenance models mature (eg, the work at globalchange.gov), we may be able to make the processing description machine readable. In those cases where you can store the data with the publication (eg, the data ’ s small enough), you can do it, but you still want to describe how it came to be for reproducability. The citation page can also be used for ‘ enhanced publications ’ or ‘ interactive data pages ’ , to allow readers of the paper to quickly visualize the data, change the scaling or field of view of a plot; reorder or filter a tabular data, etc. The citation page may just link to a single published dataset, or it may aggregate hundreds of datasets. If you only have a few (<5) sources of data, consider citing them individually rather than aggregated, even if you have limits as to how many citations you ’ re allowed, as without the data, you wouldn ’ t have been able to do the research.
If we ’ re going to do this efficiently, we likely need to do it by discipline -- each data archive describing the data would be nice, but it ’ s a lot of time reading documentation, interpreting what each field/attribute means for their data, etc. We get an efficiency of scale working by discipline. We ’ d likely need at least two people -- (1) a person experienced with metadata / cataloging, and (2) a discipline scientist who can verify the work, and possibly write missing documentation. New post docs often write the best documentation, as they come at it fresh, having to research things, rather than just assuming ‘ everyone ’ knows about that little quirk that ’ s not worth mentioning to someone in the field, but would likely not be known by people from other disciplines.
This slide was *not* in there when I presented at RDAP. I usually acknowledge the image I ’ m using in the background ... even had a master slide so I could insert it easily ... but I forgot to put it in. And this time, I ’ ve linked it to the mocked up ‘ landing page ’ for SDO/AIA Lev1 171 Angstrom images. And wow, converting from Keynote to PowerPoint really made this slide ugly due to the link in it. (which I should probably go and modify slightly, to explain that it ’ s a mockup, now that I ’ m giving out the URL to it)
This slide was *not* in there when I presented at RDAP. The URL goes to the Virtual Solar Observatory ’ s wiki. It has links to all of the references mentioned in the poster & handouts ... and I should probably add ‘ MetaLink ’ and ‘ BagIt ’ and other stuff … but if you follow that link, also follow the ESIP wiki link … they have lots of other references, as do most of the other ones (but those have the same ‘ fixed in time ’ issue as citing a paper.