DataUp: An overview for the DataONE Users GroupCarly Strasser
This document summarizes a presentation on DataUp, an open source tool to help researchers better manage, organize, share, and archive their tabular data. DataUp includes features like a best practices check, metadata generation, citation creation, and ability to upload data to repositories. It is being developed as an Excel add-in and web application to make it easy for researchers to find and integrate into their existing workflows. Feedback was sought on features and engaging the community to help ensure sustainability of the project.
Webinar presented on December 5, 2012, by Joan Starr and Perry Willett of CDL/UC3, and Lisa Federer and Claudia Horning from UCLA. Part of the ACRL Digital Curation Interest (DCIG) Group Webinar Series.
Publishing your research: Research Data Management (Introduction) Jamie Bisset
Publishing your research: Research Data Management (Introduction) (November 2013) slides. Delivered as part of the Durham University Researcher Development Programme. Further Training available at https://www.dur.ac.uk/library/research/training/
Designing the Garden: Getting Grounded in Linked DataJenn Riley
Riley, Jenn. “Designing the Garden: Getting Grounded in Linked Data.” Beyond the Looking Glass: Real World Linked Data. What Does it Take to Make it Work? ALCTS Preconference, San Francisco, CA, June 26, 2015.
Riley, Jenn. “Getting Comfortable with Metadata Reuse.” O Rare! Performance in Special Collections: The 54th Annual RBMS Preconference, Minneapolis, June 23 – 26, 2013
DataUp: An overview for the DataONE Users GroupCarly Strasser
This document summarizes a presentation on DataUp, an open source tool to help researchers better manage, organize, share, and archive their tabular data. DataUp includes features like a best practices check, metadata generation, citation creation, and ability to upload data to repositories. It is being developed as an Excel add-in and web application to make it easy for researchers to find and integrate into their existing workflows. Feedback was sought on features and engaging the community to help ensure sustainability of the project.
Webinar presented on December 5, 2012, by Joan Starr and Perry Willett of CDL/UC3, and Lisa Federer and Claudia Horning from UCLA. Part of the ACRL Digital Curation Interest (DCIG) Group Webinar Series.
Publishing your research: Research Data Management (Introduction) Jamie Bisset
Publishing your research: Research Data Management (Introduction) (November 2013) slides. Delivered as part of the Durham University Researcher Development Programme. Further Training available at https://www.dur.ac.uk/library/research/training/
Designing the Garden: Getting Grounded in Linked DataJenn Riley
Riley, Jenn. “Designing the Garden: Getting Grounded in Linked Data.” Beyond the Looking Glass: Real World Linked Data. What Does it Take to Make it Work? ALCTS Preconference, San Francisco, CA, June 26, 2015.
Riley, Jenn. “Getting Comfortable with Metadata Reuse.” O Rare! Performance in Special Collections: The 54th Annual RBMS Preconference, Minneapolis, June 23 – 26, 2013
Transforming Networking within ESIP using ResearchBitErin Robinson
Geoscientists increasingly need interdisciplinary teams to solve their research problems. Currently, geoscientists use Research Networking (RN) systems to connect with each other and find people of similar and dissimilar interests. As we shift to digitally mediated scholarship, we need innovative methods for scholarly communication. Formal methods for scholarly communication are undergoing vast transformation owing to the open-access movement and reproducible research. However, informal scholarly communication that takes place at professional society meetings and conferences, like AGU, has received limited research attention relying primarily on serendipitous interaction.
The ResearchBit project aims to fundamentally improve informal methods of scholarly communication by leveraging the serendipitous interactions of researchers and making them more aware of co-located potential collaborators with mutual interests. This presentation will describe our preliminary hardware testing done at the Federation for Earth Science Information Partners (ESIP) Summer meeting this past July and the initial recommendation system design. The presentation will also cover the cultural shifts and hurdles to introducing new technology, the privacy concerns of tracking technology and how we are addressing those new issues.
Presented at 2015 AGU Fall Meeting
https://agu.confex.com/agu/fm15/webprogram/Paper60869.html
This document discusses issues related to science research data. It notes that practices in science research drive institutional approaches to supporting research. The data lifecycle is discussed, including data management planning, storage, publishing, and more. Challenges with science data are also addressed, such as reproducibility and sharing practices. New tools and initiatives are emerging to help address these challenges, including crowd-funding of science, reproducibility initiatives, unique researcher identifiers, sharing code and data, and altmetrics.
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"hypertext2007
Carole Goble, Professor in the School of Computer Science in the University of Manchester. This is the slides of the keynote presentation opening the Hypertext 2007 Conference in Manchester, UK on the 10th September 2007.
Visit http://www.ht07.org for more details
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...Forum One
This document discusses how Drupal can be used to manage, distribute, and visualize data on websites. It notes that data comes from multiple sources and formats and managing it presents many challenges. However, Drupal allows users to treat basic data as content that can be imported, stored, and presented using modules like Views. More advanced visualization and analysis of data is also possible using third-party libraries that integrate with Drupal. The document encourages readers to start small by creating a "/data" page on their own websites that catalogs and shares available data sets and developer resources to help distribute their data.
MPhil Lecture of Data Vis for PresentationShawn Day
This document provides an introduction to structured data presentation tools for digital humanities scholars. It discusses Exhibit, a lightweight framework for presenting, searching, and faceted browsing of digital collections. The document gives an overview of Exhibit's capabilities and includes code examples for basic implementation. It also discusses other tools like Omeka, Prezi, and visualizations in TimeFlow, Google Fusion Tables, Dipity and Many Eyes. The document concludes with a hands-on exercise to install and configure Exhibit.
A talk at the RPI-NSF Workshop on Multiscale Modeling of Complex Data, September 12, 2011, Troy NY, USA.
We have made much progress over the past decade toward effectively
harnessing the collective power of IT resources distributed across the
globe. In fields such as high-energy physics, astronomy, and climate,
thousands benefit daily from tools that manage and analyze large
quantities of data produced and consumed by large collaborative teams.
But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimately
most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled
with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?
Consumers and businesses face similar challenges, and industry has
responded by moving IT out of homes and offices to so-called cloud providers (e.g., GMail, Google Docs, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.
I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date and suggest a path towards
large-scale delivery of these capabilities.
EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
This document discusses the development of the NIH Data Commons, which aims to create a shared framework and infrastructure for biomedical data. It notes the increasing amounts of data being generated and the need for data sharing and interoperability. The Data Commons framework treats data, tools, and publications as digital objects that are findable, accessible, interoperable and reusable. Current pilots include deploying reference datasets in the cloud, indexing data and tools, and a credits system for cloud resources. Challenges discussed include metrics, costs, standards, incentives and sustainability. The framework's relevance for supporting open data in Australia is also addressed.
The document describes a summer institute on discovering big data held in San Diego from August 5-9, 2013. It discusses several topics related to big data in neuroscience including available resources, how to find and connect relevant information, challenges around data integration from disparate sources, and using ontologies and machine learning for tasks like data tagging.
deck from talk at YOW Data in Sydney, covers VariantSpark, custom Apache Spark Machine Learning library and also GT-Scan2 using AWS Lambda architecture for bioinformatics
The document provides an overview of the development of the NIH Data Commons. It discusses factors driving the need for a data commons, including large amounts of data being generated and increased support for data sharing. It outlines the goals of making data findable, accessible, interoperable and reusable. Several pilots are exploring the feasibility of the commons framework, including placing large datasets in the cloud and developing indexing methods. Considerations in fully realizing the commons are also discussed, such as standards, discoverability, policies and incentives.
This document contains stable isotope data from algal and reference samples collected from Wash Cresc Lake. It includes a table with sample identifiers, weight, carbon and nitrogen content, carbon and nitrogen isotope ratios, and other metadata. Additionally, it notes that the data is from Stephanie Hampton's 2010 ESA Workshop presentation and is from an Excel file titled "Wash Cres Lake Dec 15 Dont_Use.xls". Regression statistics are provided for the isotope data.
Keeping Up to Date on Data Management - UC3 Data Curation WorkshopCarly Strasser
This document provides resources for staying up to date with data management. It lists toolboxes, blogs, and websites that provide information on data management plans and practices. It also recommends attending conferences, webinars, and following listservs and people on Twitter to learn about current issues and innovations in data management. Key organizations mentioned include UC3, IASSIST, CNI, and domain-specific conferences. Non-US resources include the Digital Curation Centre, Australian National Data Center, and Canadian Association of Research Libraries.
This document lists locations visited by someone, including staircases, hallways, and rooms inside a building numbered 1 through 4 as well as the top lawn and front drive of a property. The person is recorded as visiting rooms 1, 2, 4, and the top lawn multiple times over the course of their movements.
This document discusses dataset citation and identifiers. It begins with an introduction to data citation and the benefits it provides. It then covers identifiers in more detail, explaining what they are, how they work, and examples. The document also discusses the EZID service for assigning persistent identifiers to datasets and metadata. It provides information on Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs), comparing the two. Finally, it encourages the use of identifiers and data citation and provides contacts for more information.
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
The document discusses foundation support for data science tools and skills training. It notes that while career tracks and barriers to interdisciplinary work remain unchanged, computational and data analysis skills are increasingly important for researchers. The Data-Driven Discovery Initiative aims to catalyze shifts that encourage and reward data-intensive research. This includes making data science resources more accessible and ensuring students understand data analysis by 2020. The initiative promotes tools for data-driven research and funds environments welcoming data scientists to biology.
Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
DCXL Lightning Talk: Archiving Small DatasetsCarly Strasser
The document discusses the development of an Excel add-in to help scientists better manage, share, and archive their small science data sets. A survey found that many scientists are not trained in data management and do not share or archive their data. The add-in aims to address this by allowing scientists to check and fix CSV files, add metadata templates, generate citations, and deposit data and metadata in repositories directly from Excel for improved data preservation, documentation, and sharing.
Transforming Networking within ESIP using ResearchBitErin Robinson
Geoscientists increasingly need interdisciplinary teams to solve their research problems. Currently, geoscientists use Research Networking (RN) systems to connect with each other and find people of similar and dissimilar interests. As we shift to digitally mediated scholarship, we need innovative methods for scholarly communication. Formal methods for scholarly communication are undergoing vast transformation owing to the open-access movement and reproducible research. However, informal scholarly communication that takes place at professional society meetings and conferences, like AGU, has received limited research attention relying primarily on serendipitous interaction.
The ResearchBit project aims to fundamentally improve informal methods of scholarly communication by leveraging the serendipitous interactions of researchers and making them more aware of co-located potential collaborators with mutual interests. This presentation will describe our preliminary hardware testing done at the Federation for Earth Science Information Partners (ESIP) Summer meeting this past July and the initial recommendation system design. The presentation will also cover the cultural shifts and hurdles to introducing new technology, the privacy concerns of tracking technology and how we are addressing those new issues.
Presented at 2015 AGU Fall Meeting
https://agu.confex.com/agu/fm15/webprogram/Paper60869.html
This document discusses issues related to science research data. It notes that practices in science research drive institutional approaches to supporting research. The data lifecycle is discussed, including data management planning, storage, publishing, and more. Challenges with science data are also addressed, such as reproducibility and sharing practices. New tools and initiatives are emerging to help address these challenges, including crowd-funding of science, reproducibility initiatives, unique researcher identifiers, sharing code and data, and altmetrics.
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"hypertext2007
Carole Goble, Professor in the School of Computer Science in the University of Manchester. This is the slides of the keynote presentation opening the Hypertext 2007 Conference in Manchester, UK on the 10th September 2007.
Visit http://www.ht07.org for more details
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...Forum One
This document discusses how Drupal can be used to manage, distribute, and visualize data on websites. It notes that data comes from multiple sources and formats and managing it presents many challenges. However, Drupal allows users to treat basic data as content that can be imported, stored, and presented using modules like Views. More advanced visualization and analysis of data is also possible using third-party libraries that integrate with Drupal. The document encourages readers to start small by creating a "/data" page on their own websites that catalogs and shares available data sets and developer resources to help distribute their data.
MPhil Lecture of Data Vis for PresentationShawn Day
This document provides an introduction to structured data presentation tools for digital humanities scholars. It discusses Exhibit, a lightweight framework for presenting, searching, and faceted browsing of digital collections. The document gives an overview of Exhibit's capabilities and includes code examples for basic implementation. It also discusses other tools like Omeka, Prezi, and visualizations in TimeFlow, Google Fusion Tables, Dipity and Many Eyes. The document concludes with a hands-on exercise to install and configure Exhibit.
A talk at the RPI-NSF Workshop on Multiscale Modeling of Complex Data, September 12, 2011, Troy NY, USA.
We have made much progress over the past decade toward effectively
harnessing the collective power of IT resources distributed across the
globe. In fields such as high-energy physics, astronomy, and climate,
thousands benefit daily from tools that manage and analyze large
quantities of data produced and consumed by large collaborative teams.
But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimately
most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled
with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?
Consumers and businesses face similar challenges, and industry has
responded by moving IT out of homes and offices to so-called cloud providers (e.g., GMail, Google Docs, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.
I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date and suggest a path towards
large-scale delivery of these capabilities.
EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
This document discusses the development of the NIH Data Commons, which aims to create a shared framework and infrastructure for biomedical data. It notes the increasing amounts of data being generated and the need for data sharing and interoperability. The Data Commons framework treats data, tools, and publications as digital objects that are findable, accessible, interoperable and reusable. Current pilots include deploying reference datasets in the cloud, indexing data and tools, and a credits system for cloud resources. Challenges discussed include metrics, costs, standards, incentives and sustainability. The framework's relevance for supporting open data in Australia is also addressed.
The document describes a summer institute on discovering big data held in San Diego from August 5-9, 2013. It discusses several topics related to big data in neuroscience including available resources, how to find and connect relevant information, challenges around data integration from disparate sources, and using ontologies and machine learning for tasks like data tagging.
deck from talk at YOW Data in Sydney, covers VariantSpark, custom Apache Spark Machine Learning library and also GT-Scan2 using AWS Lambda architecture for bioinformatics
The document provides an overview of the development of the NIH Data Commons. It discusses factors driving the need for a data commons, including large amounts of data being generated and increased support for data sharing. It outlines the goals of making data findable, accessible, interoperable and reusable. Several pilots are exploring the feasibility of the commons framework, including placing large datasets in the cloud and developing indexing methods. Considerations in fully realizing the commons are also discussed, such as standards, discoverability, policies and incentives.
This document contains stable isotope data from algal and reference samples collected from Wash Cresc Lake. It includes a table with sample identifiers, weight, carbon and nitrogen content, carbon and nitrogen isotope ratios, and other metadata. Additionally, it notes that the data is from Stephanie Hampton's 2010 ESA Workshop presentation and is from an Excel file titled "Wash Cres Lake Dec 15 Dont_Use.xls". Regression statistics are provided for the isotope data.
Keeping Up to Date on Data Management - UC3 Data Curation WorkshopCarly Strasser
This document provides resources for staying up to date with data management. It lists toolboxes, blogs, and websites that provide information on data management plans and practices. It also recommends attending conferences, webinars, and following listservs and people on Twitter to learn about current issues and innovations in data management. Key organizations mentioned include UC3, IASSIST, CNI, and domain-specific conferences. Non-US resources include the Digital Curation Centre, Australian National Data Center, and Canadian Association of Research Libraries.
This document lists locations visited by someone, including staircases, hallways, and rooms inside a building numbered 1 through 4 as well as the top lawn and front drive of a property. The person is recorded as visiting rooms 1, 2, 4, and the top lawn multiple times over the course of their movements.
This document discusses dataset citation and identifiers. It begins with an introduction to data citation and the benefits it provides. It then covers identifiers in more detail, explaining what they are, how they work, and examples. The document also discusses the EZID service for assigning persistent identifiers to datasets and metadata. It provides information on Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs), comparing the two. Finally, it encourages the use of identifiers and data citation and provides contacts for more information.
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
The document discusses foundation support for data science tools and skills training. It notes that while career tracks and barriers to interdisciplinary work remain unchanged, computational and data analysis skills are increasingly important for researchers. The Data-Driven Discovery Initiative aims to catalyze shifts that encourage and reward data-intensive research. This includes making data science resources more accessible and ensuring students understand data analysis by 2020. The initiative promotes tools for data-driven research and funds environments welcoming data scientists to biology.
Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
DCXL Lightning Talk: Archiving Small DatasetsCarly Strasser
The document discusses the development of an Excel add-in to help scientists better manage, share, and archive their small science data sets. A survey found that many scientists are not trained in data management and do not share or archive their data. The add-in aims to address this by allowing scientists to check and fix CSV files, add metadata templates, generate citations, and deposit data and metadata in repositories directly from Excel for improved data preservation, documentation, and sharing.
1. The document discusses tips and tools for data stewardship, including planning for data management, best practices for data collection and organization, documenting workflows, creating metadata, and sharing data.
2. It emphasizes writing a data management plan, keeping raw data separate and secure, using version control and backups, and revisiting plans periodically.
3. The document encourages learning skills for data management, using resources like libraries and repositories, and embracing changes that support more open and reproducible science.
CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.
Amundsen is a metadata-driven application developed by Lyft to solve data discovery challenges. It provides a search-based UI and uses a distributed architecture with various microservices to index and serve metadata from multiple sources. Key components include a metadata service using Neo4j, a search service using Elasticsearch, and a frontend. The tool has been hugely successful at Lyft and is now open source. Future work includes expanding metadata coverage and integrating with other tools.
How Cyverse.org enables scalable data discoverability and re-useMatthew Vaughn
Cyverse.org designs, builds, and operates an innovative, integrated life sciences cyberinfrastructure. It provides data management and analysis capabilities with point and click, cloud, API, and command-line interfaces that engage users of any computing proficiency and is based on an extensible platform that integrates local and national-scale HPC, storage, and cloud resources. Cyverse directly supports thousands of users who store and access over 2PB of research data, use millions of compute hours annually, and participate in the platform's improvement, plus a secondary user community from partner projects that have built atop it. Cyverse is organized around "Data Store" and "App Catalog" services, each of which enables users to upload digital research assets that can be kept private, shared, or made public. Recently, Cyverse has been transitioning from passively enabling digital sharing towards active facilitation. It is partnering with repositories like NCBI SRA to enable direct submission from Cyverse applications, adopting commonly-used ontologies, enabling import/export of virtual machine images, developing metadata-driven persistent landing pages for data sets, and providing DOI (and other identifier) services. These new features are expected to further catalyze growth of an interoperable, interconnected network of shared research infrastructure across the biological sciences.
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
The document discusses data management practices in earth and environmental sciences. It notes that currently, many scientists in these fields are not taught proper data management skills. It then outlines best practices for data management, including planning, organizing data collection, quality control, adding metadata, workflows, and data sharing. The document also summarizes requirements for data management plans from funders like the National Science Foundation and provides examples of resources like data repositories and tools that can help scientists improve their data management.
Airbnb aims to democratize data within the company by building a graph database of all internal data resources connected by relationships. This graph is queried through a search interface to help employees explore, discover, and build trust in company data. Challenges include modeling complex data dependencies and proxy nodes, merging graph updates from different sources, and designing a data-dense interface simply. Future goals are to gamify content production, deliver recommendations, certify trusted content, and analyze the information network.
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
Here are some random notes on data from Peter's old lab:
- Table with stable isotope data from algal samples collected at Wash Cresc Lake on Dec 16. Includes sample IDs, weights, %C, delta C-13 and N-15 values, and spectrometer numbers.
- Reference standards analyzed to calculate sample delta values. SDs reported for delta C-13 and N-15 of reference standards.
- Samples include algae from shore, lake outlet, and various collection points around lake labeled ALG01, 03, 05, 07.
- Delta values range from -30.17 to -21.11 for C-13 and -1.65 to 0.87 for N-15
UC Santa Cruz: Data Management for ScientistsCarly Strasser
This document contains data from an isotopic analysis of algal samples from Wash Cresc Lake. It includes the sample identifiers, weights, carbon and nitrogen composition percentages and delta values, and spectrometer numbers. There are notes that some of the samples are from the lake outlet or shore. The data is organized in a table with sample positions and reference standards included for quality control.
Palestra, em inglês, "Publishing Data on the Web" sobre o documento Data on the Web Best Practices, apresentada na Semana de Metodologia NIC.br, em São Paulo, dia 12 de abril de 2016.
DataEd Slides: Growing Practical Data Governance ProgramsDATAVERSITY
At its core, Data Governance (DG) is managing data with guidance. This immediately provokes the question: Would you tolerate any of your assets to be managed without guidance? (In all likelihood, your organization has been managing data without adequate guidance, and this accounts for its current, less-than-optimal state.) This program provides a practical guide to implementing DG or recharging your existing program. It provides an understanding of what Data Governance functions are required and how they fit with other Data Management disciplines. Understanding these aspects is a necessary prerequisite to eliminate the ambiguity that often surrounds initial discussions and implement effective Data Governance/stewardship programs that manage data in support of the organizational strategy. Program learning objectives include:
• Understanding why Data Governance can be tricky for organizations due to data’s confounding characteristics
• Strategy #1: Keeping DG practically focused
• Strategy #2: DG must exist at the same level as HR
• Strategy #3: Gradually add ingredients
• Data Governance in action: storytelling
The document discusses data, data science, and finding data sources. It defines data as raw facts about the world and notes that data comes from various sources like government, scientific research, citizens, and private companies. It then discusses the growth of digital data and issues around open data. The document defines data science as using analysis methods to describe facts, detect patterns, and test hypotheses. Finally, it provides tips on finding needed data, such as searching open data sources, APIs, scraping, and joining datasets.
Synopsis: Biological research increasingly depends on computational analysis of large and complex data sets. These slides were used in a one-hour webinar that provided a comprehensive look at platforms, tools, and services for large-scale data analysis provided by CyVerse, a cyberinfrastructure (CI) project of the US National Science Foundation. The webinar is available at https://youtu.be/QErkkoDFdyU.
The webinar was aimed at viewers interested in the compute and platform architecture of Cyverse. It introduced the basic components of CyVerse CI including the Discovery Environment (a simple web portal for managing data, analyses and workflows); the Data Store (scalable, secure, and reliable storage for terabyte-scale data management); Atmosphere (one-click, on-demand cloud computing); and the Visual and Interactive Computing Environment (flexible implementations of Jupyter Labs, Rstudio, and R Shiny).
CyVerse provides a full stack of CI services with entry points for computational novices and software developers. All resources are freely available to the community and free accounts can be obtained at user.cyverse.org. New users can check out the Cyverse Youtube channel (https://www.youtube.com/channel/UC-gvdjTz9rq6RovZ57LoDDA/featured) for webinars on how to get started with Cyverse and how to use specific tools and workflows.
Speaker: Jason Williams. Assistant Director, External Collaborations Cold Spring Harbor Laboratory, DNA Learning Center and Education, Outreach and Training lead for CyVerse.
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
Ignite talk (20 slides / 15 seconds per slide) for ESA 2014 meeting in Sacramento, CA 12 August 2014. On the Dash platform for helping researchers manage and share their data via institutional repositories
Data Management for Mountain Observatories WorkshopCarly Strasser
This document provides tips and recommendations for data stewardship. It encourages enabling data sharing, exploring new tools, and working with libraries and researchers to help change systems. It emphasizes that data is more important than ever due to digital data and complex workflows. Proper data management helps ensure reproducibility, credibility, collaboration and faster progress. Researchers must have data management plans and make their data open and useful to others. They should include data in their credentials and publications to get proper recognition. The document recommends tools and resources for planning, documenting, and getting credit for data work.
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
Keynote presentation for the Colorado Alliance of Research Libraries 2014 Research Data Management Conference, 11 July 2014. Focuses on why data management and sharing is important, and the role of libraries.
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
*Please excuse the typos :)
Presentation on open science and open data for the Australian Institute of Marine Science (AIMS) workshop on "Raising your research profile using research data". 18 June 2014.
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
Presentation to introduce current landscape of data management and UC3 tools and services that support data sharing. For IASSIST in Toronto, 5 June 2014.
The document discusses repository choices for research data, including institutional and discipline-specific repositories. It notes that institutional repositories tell the story of a researcher's work, but may only include some data from a given paper, while discipline-specific repositories could include all data but are less discoverable. The document then outlines UCSF's DataShare repository, including its goals of lowering barriers to data sharing and building an engaged user community. It proposes expanding DataShare to be UC-wide under the name "UC Dash" and customizing it for each campus using the Merritt repository platform. Features and future enhancements are also listed.
This document discusses data publication and sharing. It defines key aspects of data publication as making data available, citable, and trustworthy. It provides examples of how data can be published, including as supplemental materials, in data papers, or as standalone datasets with rich metadata. The document also summarizes a survey of researchers' views on data publication, sharing, and citation. It promotes solving simple problems first, like enabling easy sharing and citable datasets, to advance data publishing and open science.
Data Publication for UC Davis Publish or PerishCarly Strasser
Intro presentation for panel on going beyond publishing journal articles. UC Davis "Publish or Perish?" Event, 13 Feb 2014. Sorry about missing gradient on some of slides!
October 18, 2013 @ Kennedy Library, Data Studio, Cal Poly. We hear about all things “open” these days: open access, open source, open data, open science, et cetera. But what does it really mean for how we do science? How are things changing, and what are the implications for individual researchers?
Cal Poly - Data Management: Who knew it was a hot topic?Carly Strasser
October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
New mandates, announcements, memos, and requirements are emerging that encourage better data management, data sharing, and data preservation. In this presentation, data curation specialist Carly Strasser, PhD, offers a lay of the data management land by discussing recent events, resources, and new directions for data stewardship.
Cal Poly - Data Management and the DMPToolCarly Strasser
October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Many funders now require researchers to submit a Data Management Plan alongside their project proposals. The DMPTool is a free, online wizard that helps you create a data management plan specific to your project, and provides you with links and resources for ensuring your plan is successful.
Cal Poly - Data Management for ResearchersCarly Strasser
The document describes stable isotope data from algal samples collected from Wash Cresc Lake. It includes a table with sample identifiers, weights, carbon and nitrogen composition percentages, delta C-13 and delta N-15 values, and calibration corrections for delta values. The table contains stable isotope measurements for 23 algal samples. Additional context is provided by notes that it is stable isotope data from algal samples collected on December 16th and references are given for the standard deviations of delta values.
1. DataUp:
Helping
manage &
archive data
Carly Strasser, PhD
From Flickr by kaniths
California Digital Library
@carlystrasser
AGU 2012
2. Digital data
+
Complex
workflows
From Calisphere via San Jose Public Library
3. The Fallout
Data
Reuse
Data
Sharing
Data
Management
4. Hurdles to
From Flickr by iowa_spirit_walker
Data
Stewardship
Cost
Confusion about standards
Disparate datasets
Lack of training
Fear of lost rights or benefits
No incentives
5. ?
The Fallout
Data
Reuse
Data
Sharing
Data
Management
8. Facilitate
Archiving
Data
management & Data Reuse &
Sharing Reproducibility
organization
Publishing
9. Open Source
Tool Add-in & Web
Application
Earth,
environmental,
?
ecological
researchers
10. Add-in
• Software you download & install
• Appears as “ribbon” in Excel
• Works for Windows Excel 2007+
Web-based application
• Website that does something
with user’s files
• Any platform
• But… new user interface
12. ~200 scientists
• No data preservation
– Unaware of archives
– Resistant to sharing
• Poor data documentation
• 90% use Excel + other programs
13. Requirements
DataUp Features
1. Best practices check
2. Generate metadata
3. Generate identifier +
citation
4. Post data to repository
From Flickr by Thewmatt
30. Establish
Partnerships
From animationresources.org
Engage Developers
Build Community
31. Website dataup.cdlib.org
Twitter feed @DataUpCDL
Facebook facebook.com/DataUpCDL
Code site bitbucket.org/dataup/main
My website carlystrasser.net
Email me carlystrasser@gmail.com
Tweet me @carlystrasser
My slides slideshare.net/carlystrasser
CDL Blog datapub.cdlib.org
Editor's Notes
Data reuse = meta-analysis!
Data reuse = meta-analysis!
Fields of research: (including computer science, informatics, and social science)Technologies: (semantic mediation, scientific workflows, visualization)