Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
NISO Webinar on data curation services at the CDLCarly Strasser
"Building communities and Services in Support of Data-Intensive Research". Webinar on 18 Sept 2013 for the NISO Webinar Series. This was part 2 of 2 for Data Curation
Data Management for Mountain Observatories WorkshopCarly Strasser
Keynote presentation for 2014 Mountain Observatories Workshop, 16 July 2014.
Abstract:
While methods for collecting data are well taught, there is less emphasis on managing the resulting data effectively. New mandates, announcements, memos, and requirements from agencies and publishers are emerging that encourage better data management, data sharing, and data preservation. Scientists with good management skills will be able to maximize the productivity of their own research, effectively and efficiently share their data with the community, and benefit from the re-use of their data by others. I will offer an overview of data management landscape - discussing recent events, resources, and new directions for data stewardship. I will also cover best practices for data management, which will facilitate data sharing and reuse, and introduce tools researchers can use to help in their data stewardship endeavours.
Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
NISO Webinar on data curation services at the CDLCarly Strasser
"Building communities and Services in Support of Data-Intensive Research". Webinar on 18 Sept 2013 for the NISO Webinar Series. This was part 2 of 2 for Data Curation
Data Management for Mountain Observatories WorkshopCarly Strasser
Keynote presentation for 2014 Mountain Observatories Workshop, 16 July 2014.
Abstract:
While methods for collecting data are well taught, there is less emphasis on managing the resulting data effectively. New mandates, announcements, memos, and requirements from agencies and publishers are emerging that encourage better data management, data sharing, and data preservation. Scientists with good management skills will be able to maximize the productivity of their own research, effectively and efficiently share their data with the community, and benefit from the re-use of their data by others. I will offer an overview of data management landscape - discussing recent events, resources, and new directions for data stewardship. I will also cover best practices for data management, which will facilitate data sharing and reuse, and introduce tools researchers can use to help in their data stewardship endeavours.
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 – March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
Ignite talk (20 slides / 15 seconds per slide) for ESA 2014 meeting in Sacramento, CA 12 August 2014. On the Dash platform for helping researchers manage and share their data via institutional repositories
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
Kathleen Fear, ICPSR, University of Michigan
“The impact of data reuse: a pilot study of 5 measures”
Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
NSF Data Management Plan - Implications for LibrariansAndrew Sallans
A. Sallans. "NSF Data Management Plan - Implications for Librarians." Presented at the Science and Technology Section (STS) Hot Topics Discussion Group Meeting of the American Library Association's 2011 Midwinter Meeting. 8 January 2011
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
Lesson 1: Introduction to research data management. From a series of lectures from a 10-week, 2-credit graduate-level course in research data management (GRAD521, offered at Oregon State University).
The course description is: "Careful examination of all aspects of research data management best practices. Designed to prepare students to exceed funder mandates for performance in data planning, documentation, preservation and sharing in an increasingly complex digital research environment. Open to students of all disciplines."
Major course content includes: Overview of research data management, definitions and best practices; Types, formats and stages of research data; Metadata (data documentation); Data storage, backup and security; Legal and ethical considerations of research data; Data sharing and reuse; Archiving and preservation.
See also, "Whitmire, Amanda (2014): GRAD 521 Research Data Management Lectures. figshare. http://dx.doi.org/10.6084/m9.figshare.1003835. Retrieved 23:25, Jan 07, 2015 (GMT)"
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
Written and presented by Wolfgang Müller (HITS) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 – March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
Ignite talk (20 slides / 15 seconds per slide) for ESA 2014 meeting in Sacramento, CA 12 August 2014. On the Dash platform for helping researchers manage and share their data via institutional repositories
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
Kathleen Fear, ICPSR, University of Michigan
“The impact of data reuse: a pilot study of 5 measures”
Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
NSF Data Management Plan - Implications for LibrariansAndrew Sallans
A. Sallans. "NSF Data Management Plan - Implications for Librarians." Presented at the Science and Technology Section (STS) Hot Topics Discussion Group Meeting of the American Library Association's 2011 Midwinter Meeting. 8 January 2011
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
Lesson 1: Introduction to research data management. From a series of lectures from a 10-week, 2-credit graduate-level course in research data management (GRAD521, offered at Oregon State University).
The course description is: "Careful examination of all aspects of research data management best practices. Designed to prepare students to exceed funder mandates for performance in data planning, documentation, preservation and sharing in an increasingly complex digital research environment. Open to students of all disciplines."
Major course content includes: Overview of research data management, definitions and best practices; Types, formats and stages of research data; Metadata (data documentation); Data storage, backup and security; Legal and ethical considerations of research data; Data sharing and reuse; Archiving and preservation.
See also, "Whitmire, Amanda (2014): GRAD 521 Research Data Management Lectures. figshare. http://dx.doi.org/10.6084/m9.figshare.1003835. Retrieved 23:25, Jan 07, 2015 (GMT)"
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
Written and presented by Wolfgang Müller (HITS) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
This slideshow was used in a Preparing Your Research Material for the Future course for the Humanities Division, University of Oxford, on 2017-02-22. It provides an overview of some key issues, focusing on the long-term management of data and other research material, including sharing and curation.
This slideshow was used in a Preparing Your Research Material for the Future course for the Humanities Division, University of Oxford, on 2018-06-08. It provides an overview of some key issues, focusing on the long-term management of data and other research material, including sharing and curation.
Data Visibility and Protection at the Scale of Life SciencesAdam Marko
Data generation in the life sciences continues at a rapid pace. There are always risks of data loss, including hardware failures, inability of staff to access data centers, and user error. During challenging times like these, understanding and protecting your data can save lives. Join us to see how you can protect and visualize your files at the scale of Life Sciences, with integrated search, restore, and visibility.
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
Similar to Data Matters for AGU Early Career Conference (20)
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
Keynote presentation for the Colorado Alliance of Research Libraries 2014 Research Data Management Conference, 11 July 2014. Focuses on why data management and sharing is important, and the role of libraries.
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
*Please excuse the typos :)
Presentation on open science and open data for the Australian Institute of Marine Science (AIMS) workshop on "Raising your research profile using research data". 18 June 2014.
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
Presentation to introduce current landscape of data management and UC3 tools and services that support data sharing. For IASSIST in Toronto, 5 June 2014.
Data Publication for UC Davis Publish or PerishCarly Strasser
Intro presentation for panel on going beyond publishing journal articles. UC Davis "Publish or Perish?" Event, 13 Feb 2014. Sorry about missing gradient on some of slides!
October 18, 2013 @ Kennedy Library, Data Studio, Cal Poly. We hear about all things “open” these days: open access, open source, open data, open science, et cetera. But what does it really mean for how we do science? How are things changing, and what are the implications for individual researchers?
Cal Poly - Data Management: Who knew it was a hot topic?Carly Strasser
October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
New mandates, announcements, memos, and requirements are emerging that encourage better data management, data sharing, and data preservation. In this presentation, data curation specialist Carly Strasser, PhD, offers a lay of the data management land by discussing recent events, resources, and new directions for data stewardship.
Cal Poly - Data Management and the DMPToolCarly Strasser
October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Many funders now require researchers to submit a Data Management Plan alongside their project proposals. The DMPTool is a free, online wizard that helps you create a data management plan specific to your project, and provides you with links and resources for ensuring your plan is successful.
Cal Poly - Data Management for ResearchersCarly Strasser
October 17, 2013 @ 1 Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Researchers rarely learn about good data management practices. Instead we develop our own systems that are often unintelligible to others. In this talk, Strasser, PhD, will focus on the common mistakes that scientists make and how to avoid them. She will provide best practices for data management, which will facilitate data sharing and reuse, and introduce tools you can use.
"Undergrad ecologists aren't learning data management" - ESA 2013Carly Strasser
Presentation for Ecological Society of America 2013 Meeting in Minneapolis, MN on 6 August 2013. Results published in Ecosphere doi: 10.1890/ES12-00139.1
Overview of data management policies and data management plans, including the DMPTool. For Ecological Society of America 2013 Meeting in Minneapolis, MN 5 August 2013.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3
Data Matters for AGU Early Career Conference
1. Data Matters
Tips & Tools for Better Research
Carly Strasser, California Digital Library
carlystrasser@gmail.com
AGU Student & Early Career Scientist Conference
14 Dec 2014
From Flickr by Lachlan Donald
2. Why are
you here?
Science: you’re (probably)
doing it wrong
8. Digital data
From Flickr by Flickmor
From Flickr by DW0825
From Flickr by US Army Environmental Command
C. Strasser
Courtesey of WHOI
From Flickr by deltaMike
20. Because reproducibility is one of
the fundamental tenets of science.
Because we need to be credible.
Because Fox News, creationism,
and the war on science.
21. “Help us identify grants that are wasteful
or that you don’t think are a good use of
taxpayer dollars.”
Rep. Adrian Smith (R-Nebraska), a member of the House Committee on Science
and Technology
22. Because reproducibility is one of
the fundamental tenets of science.
Because we need to be credible.
Because Fox News, creationism,
and the war on science
Because it means faster progress.
30. Feb
2013
… “Federal agencies investing in research and
development (more than $100 million in annual
expenditures) must have clear and coordinated
policies for increasing public access to
research products.”
32. From Flickr by Big Swede Guy
data management
Best
Practices
33. From Flickr by Mark Sardella
Plan before data collection
34. Design sample naming schemePlanning
• Create a key (data dictionary)
• Make sure names are unique
• Define codes
From Flickr by zebbie
35. Design file naming schemePlanning
Use descriptive file names
• Unique
• Reflect contents
From
R
Cook,
ESA
Best
Practices
Workshop
2010
Bad:
Mydata.xls
2001_data.csv
best version.txt
Better:
Eaffinis_nanaimo_2010_counts.xls
Site
name
Year
What was
measured
Study
organism
*Not for everyone
*
36. Design file organizationPlanning
Biodiversity
Lake
Experiments
Field work
Grassland
Biodiv_H20_heatExp_2005to2008.csv
Biodiv_H20_predatorExp_2001to2003.csv
…
Biodiv_H20_PlanktonCount_2001toActive.csv
Biodiv_H20_ChlAprofiles_2003.csv
…
Consider…
• Dependencies?
• File formats?
• Time of collection?
• Order of analysis?
From S. Hampton
37. Planning
Design your spreadsheet
Constrain entries
Atomize
Break down spreadsheets
From Flickr by Ulleskelf
38. Consider a databasePlanning
A relational database is
A set of tables
Relationships among the tables
A language to specify & query the tables
A RDB provides
Scalability: millions+ records
Features for sub-setting, querying, sorting
Reduced redundancy & entry errors
From Mark Schildhauer
39. Pick a data repository
Store your data in a repository
Institutional archive
Discipline/specialty archive
From Flickr by torkildr
Planning
40. Pick a data repository
Store your data in a repository
Institutional archive
Discipline/specialty archive
From Flickr by torkildr
Planning
Ask a librarian
41. Pick a data repository
Store your data in a repository
Institutional archive
Discipline/specialty archive
From Flickr by torkildr
Planning
Ask a librarian
Repos of repos:
databib.org
re3data.org
42. Decide on preservation/backup
From Flickr by sepa synod
From Flickr by taberandrew
From Flickr by withassociates
Planning
43. Decide on preservation/backup
From Flickr by sepa synod
From Flickr by taberandrew
From Flickr by withassociates
What software?
What hardware?
What personnel?
How often?
Set up reminders!
Test system
Planning
44. …document that
describes what you will
do with your data
throughout
the research project
From Flickr by Barbies Land
Write a data
management plan!
Planning
45. Planning
DMP components
• What will be collected
• Methods
• Standards
• Metadata
• Sharing/But they access
all have
• Long-term storage
different requirements
and express them in
different ways
From Flickr by Barbies Land
48. Realistically:
• Archive .csv version of raw data
• Make a “raw” tab in working data file
• Do all work on other tabs
During
Keep raw data rawcollection
49. Keep raw data raw
Raw data as .csv
During
collection
R script for processing & analysis
Ideally:
• Use scripts to process data
• Save them with data
50. During
Document your workflowcollection
Workflow: how you get from the raw data to the final
products of your research
Temperature
data
Salinity
data
Data import into Excel
Quality control &
“Clean” T data cleaning
& S data
Analysis: mean, SD
Graph production
Data in
spread-sheet
Summary
statistics
Simple workflow: flow chart
51. During
collection
Workflow: how you get from the raw data to the final
products of your research
Commented script
• R, SAS, MATLAB…
• Well-documented code is
Easier to review
Easier to share
Easier to use for repeat analysis
#
%$
&
Document your workflow
52. Constrain data entries
• Excel lists
• Data validation
• Google docs forms
Modified from K. Vanderbilt
During
collection
54. During
Break down spreadsheetscollection
Fake a relational database
Create parameter table
From doi:10.3334/ORNLDAAC/777
From doi:10.3334/ORNLDAAC/777
From R Cook, ESA Best Practices Workshop 2010
Create a site table
55. Metadata: data reporting
WHO created the data?
WHAT is the content
of the data set?
WHEN was it created?
WHERE was it collected?
HOW was it developed?
WHY was it developed?
From Flickr by //ichael Patric|{
During
Create metadatacollection
56. Create metadatacollection
Digital context
• Name of the data set
• The name(s) of the data file(s) in the
data set
• Date the data set was last modified
• Example data file records for each data
type file
• Pertinent companion files
• List of related or ancillary data sets
• Software (including version number)
used to prepare/read the data set
• Data processing that was performed
Personnel & stakeholders
• Who collected
• Who to contact with questions
• Funders
During
Scientific context
• Scientific reason why the data were
collected
• What data were collected
• What instruments (including model & serial
number) were used
• Environmental conditions during collection
• Temporal & spatial resolution
• Standards or calibrations used
Information about parameters
• How each was measured or produced
• Units of measure
• Format used in the data set
• Precision & accuracy if known
Information about data
• Definitions of codes used
• Quality assurance & control measures
• Known problems that limit data use (e.g.
uncertainty, sampling problems)
57. < Create metadata
St a n da rd
Metadata standards…
• Provide structure to describe data
During
collection
What is
metadata?
Common terms | definitions | language | structure
• Come in many flavors
EML , FGDC, ISO19115, DarwinCore,…
• Can be met using software tools
Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM)
58. Back up daily
During
collection
From Flickr by lippo
From Flickr by see phar
Original
Near
Far
59. During
collection
From Flickr by Barbies Land
Remember that data
management plan?
Revisit
Review
Revise
60. During
collection
Schedule a time each
week or month
Revisit
Review
Revise
From Flickr by purplemattfish