This document contains data from a study on stable isotopes in algal samples from Wash Cresc Lake. It includes a table with sample identifiers, weights, carbon and nitrogen percentages and delta values, and spectrometer numbers. There are also notes that this is old pilot data from Peter's lab that should not be used, and comments identifying average concentrations and locations for some samples. Additional files with these data are stored on Hampton's computer.
Data Management: Scientist Perspective - UC3 Data Curation WorkshopCarly Strasser
Presentation on data management: the current landscape, barriers to management, and data types. For UC3-CDL data curation for practitioners workshop, 8 Nov 2012 in Oakland CA.
Data Management: Scientist Perspective - UC3 Data Curation WorkshopCarly Strasser
Presentation on data management: the current landscape, barriers to management, and data types. For UC3-CDL data curation for practitioners workshop, 8 Nov 2012 in Oakland CA.
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
Presentation for the UC Davis for Open Access Week. Covers the current status of data management in the sciences, best practices for data management, data management planning, and tools for researchers.
Data Management: Scientist Perspective - DLF 2012Carly Strasser
Presentation at the 2012 Digital Libraries Federation Fall Forum in Denver, CO. Workshop on Data Management Services, held 5 Nov 2012. http://www.diglib.org/forums/2012forum/data-management-services-at-the-library-the-3-hour-tour/
Cal Poly - Data Management for ResearchersCarly Strasser
October 17, 2013 @ 1 Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Researchers rarely learn about good data management practices. Instead we develop our own systems that are often unintelligible to others. In this talk, Strasser, PhD, will focus on the common mistakes that scientists make and how to avoid them. She will provide best practices for data management, which will facilitate data sharing and reuse, and introduce tools you can use.
CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23
Cynthia Hudson-Vitale, Digital Data Outreach Librarian, Washington University
Brianna Marshall, Digital Curation Coordinator, University of Wisconsin-Madison
Amy Nurnberger, Research Data Manager, Columbia University
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
Presentation for the UC Davis for Open Access Week. Covers the current status of data management in the sciences, best practices for data management, data management planning, and tools for researchers.
Data Management: Scientist Perspective - DLF 2012Carly Strasser
Presentation at the 2012 Digital Libraries Federation Fall Forum in Denver, CO. Workshop on Data Management Services, held 5 Nov 2012. http://www.diglib.org/forums/2012forum/data-management-services-at-the-library-the-3-hour-tour/
Cal Poly - Data Management for ResearchersCarly Strasser
October 17, 2013 @ 1 Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Researchers rarely learn about good data management practices. Instead we develop our own systems that are often unintelligible to others. In this talk, Strasser, PhD, will focus on the common mistakes that scientists make and how to avoid them. She will provide best practices for data management, which will facilitate data sharing and reuse, and introduce tools you can use.
CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23
Cynthia Hudson-Vitale, Digital Data Outreach Librarian, Washington University
Brianna Marshall, Digital Curation Coordinator, University of Wisconsin-Madison
Amy Nurnberger, Research Data Manager, Columbia University
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
Ignite talk (20 slides / 15 seconds per slide) for ESA 2014 meeting in Sacramento, CA 12 August 2014. On the Dash platform for helping researchers manage and share their data via institutional repositories
Data Management for Mountain Observatories WorkshopCarly Strasser
Keynote presentation for 2014 Mountain Observatories Workshop, 16 July 2014.
Abstract:
While methods for collecting data are well taught, there is less emphasis on managing the resulting data effectively. New mandates, announcements, memos, and requirements from agencies and publishers are emerging that encourage better data management, data sharing, and data preservation. Scientists with good management skills will be able to maximize the productivity of their own research, effectively and efficiently share their data with the community, and benefit from the re-use of their data by others. I will offer an overview of data management landscape - discussing recent events, resources, and new directions for data stewardship. I will also cover best practices for data management, which will facilitate data sharing and reuse, and introduce tools researchers can use to help in their data stewardship endeavours.
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
Keynote presentation for the Colorado Alliance of Research Libraries 2014 Research Data Management Conference, 11 July 2014. Focuses on why data management and sharing is important, and the role of libraries.
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
*Please excuse the typos :)
Presentation on open science and open data for the Australian Institute of Marine Science (AIMS) workshop on "Raising your research profile using research data". 18 June 2014.
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
Presentation to introduce current landscape of data management and UC3 tools and services that support data sharing. For IASSIST in Toronto, 5 June 2014.
Data Publication for UC Davis Publish or PerishCarly Strasser
Intro presentation for panel on going beyond publishing journal articles. UC Davis "Publish or Perish?" Event, 13 Feb 2014. Sorry about missing gradient on some of slides!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
UC Riverside: Data Management for Scientists
1. Data
Management
for
Scientists
Reduce
your
workload
Reuse
your
ideas
Recycle
your
data
www.oddee.com
Carly
Strasser,
PhD
UC
Riverside
California
Digital
Library,
UC
Office
of
the
President
February
2012
carly.strasser@ucop.edu
www.carlystrasser.net
2. Roadmap
4. Toolbox
3. How
to
improve
2. Mistakes
we
make
1. Background
3. What
role
can
libraries
play
in
data
education?
What
barriers
to
sharing
can
we
eliminate?
Why
don’t
people
share
data?
Is
data
management
Do
attitudes
about
being
taught?
sharing
differ
among
disciplines?
How
can
we
promote
storing
data
in
repositories?
5. Roadmap
4. Toolbox
3. How
to
improve
2. Mistakes
we
make
1. Background
6. From
Flickr
by
DW0825
From
Flickr
by
Flickmor
From
Flickr
by
deltaMike
Digital
data
www.woodrow.org
C.
Strasser
Courtesey
of
WHOI
From
Flickr
by
US
Army
Environmental
Command
8. Data
Models
Maximum
Likelihood
estimation
Matrix
Models
Images
Tables
Paper
9. UGLY TRUTH
Many
Earth
|
Environmental
|
Ecological
scientists…
5shortessays.blogspot.com
are
not
taught
data
management
don’t
know
what
metadata
are
can’t
name
data
centers
or
repositories
don’t
share
data
publicly
or
store
it
in
an
archive
aren’t
convinced
they
should
share
data
14. Data
Hangover
What
happened?
From
Flickr
by
SteveMcN
15. Where
data
end
up
From
Flickr
by
diylibrarian
www
blog.order2disorder.com
From
Flickr
by
csessums
Data
Metadata
From
Flickr
by
csessums
Recreated
from
Klump
et
al.
2006
16. Who
cares?
From
Flickr
by
Redden-‐McAllister
From
Flickr
by
AJC1
www.rba.gov.au
17. Where
data
end
up
From
Flickr
by
diylibrarian
www
Data
www
Metadata
From
Flickr
by
torkildr
Recreated
from
Klump
et
al.
2006
19. Trends
in
Data
Archiving
Journal
publishers
Joint
Data
Archiving
Agreement
Data
Papers
etc.
Ecological
Archives,
Beyond
the
PDF
20. Trends
in
Data
Archiving
Journal
publishers
Joint
Data
Archiving
Agreement
Data
Papers
etc.
Ecological
Archives,
Beyond
the
PDF
Funders
Data
management
requirements
21. Roadmap
4. Toolbox
3. How
to
improve
2. Mistakes
we
make
1. Background
22. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
23. 2.
Data
collection
&
organization
Create
unique
identifiers
• Decide
on
naming
scheme
early
• Create
a
key
• Different
for
each
sample
From
Flickr
by
zebbie
From
Flickr
by
sjbresnahan
24. 2.
Data
collection
&
organization
Standardize
• Consistent
within
columns
– only
numbers,
dates,
or
text
• Consistent
names,
codes,
formats
Modified
from
K.
Vanderbilt
From
Pink
Floyd,
The
Wall
themurkyfringe.com
25. 2.
Data
collection
&
organization
Standardize
• Reduce
possibility
of
manual
error
by
constraining
entry
choices
Excel
lists
Data Google
Docs
Forms
validataion
Modified
from
K.
Vanderbilt
26. 2.
Data
collection
&
organization
Create
parameter
table
Create
a
site
table
From
doi:10.3334/ORNLDAAC/777
From
doi:10.3334/ORNLDAAC/777
From
R
Cook,
ESA
Best
Practices
Workshop
2010
27. 2.
Data
collection
&
organization
Use
descriptive
file
names
PhDcomics.com
28. 2.
Data
collection
&
organization
Use
descriptive
file
names
*
• Unique
• Reflect
contents
Bad:
Mydata.xls
Better:
Eaffinis_nanaimo_2010_counts.xls
2001_data.csv
best
version.txt
Study
Year
organism
Site
name
What
was
measured
*Not
for
everyone
From
R
Cook,
ESA
Best
Practices
Workshop
2010
29. 2.
Data
collection
&
organization
Organize
files
logically
Biodiversity
Lake
Experiments
Biodiv_H20_heatExp_2005to2008.csv
Biodiv_H20_predatorExp_2001to2003.csv
…
Field
work
Biodiv_H20_PlanktonCount_2001toActive.csv
Biodiv_H20_ChlAprofiles_2003.csv
…
Grassland
From
S.
Hampton
30. 2.
Data
collection
&
organization
Preserve
information
R
script
for
processing
&
analysis
• Keep
raw
data
raw
• Use
scripts
to
process
data
&
save
them
with
data
Raw
data
as
.csv
31. 2.
Data
collection
&
oAll
of
the
things
that
rganization
make
Excel
great
for
data
organization
are
bad
for
archiving!
What
to
do?
1. Create
archive-‐ready
raw
data
2. Put
it
somewhere
special
3. Have
your
fun
with
fancy
Excel
techniques
4. Keep
archiving
in
mind
32. 3.
Quality
control
and
quality
assurance
Define
&
enforce
standards
Double
data
entry
Document
changes
Minimize
manual
data
entry
No
missing,
impossible,
or
anomalous
values
60
50
40
30
20
10
0
0
5
10
15
20
25
30
35
34. 4.
Metadata
basics
Metadata
=
Data
reporting
WHO
created
the
data?
WHAT
is
the
content
of
the
data
set?
WHEN
was
it
created?
WHERE
was
it
collected?
HOW
was
it
developed?
WHY
was
it
developed?
35. • Scientific
context
4.
Metadata
basics
• Scientific
reason
why
the
data
were
collected
• What
data
were
collected
• Digital
context
• What
instruments
(including
model
&
• Name
of
the
data
set
serial
number)
were
used
• The
name(s)
of
the
data
file(s)
in
the
data
• Environmental
conditions
during
collection
set
• Where
collected
&
spatial
resolution
When
• Date
the
data
set
was
last
modified
collected
&
temporal
resolution
• Example
data
file
records
for
each
data
• Standards
or
calibrations
used
type
file
• Information
about
parameters
• Pertinent
companion
files
• How
each
was
measured
or
produced
• List
of
related
or
ancillary
data
sets
• Units
of
measure
• Software
(including
version
number)
• Format
used
in
the
data
set
used
to
prepare/read
the
data
set
• Precision
&
accuracy
if
known
• Data
processing
that
was
performed
• Information
about
data
• Personnel
&
stakeholders
• Definitions
of
codes
used
• Who
collected
• Quality
assurance
&
control
measures
• Who
to
contact
with
questions
• Known
problems
that
limit
data
use
(e.g.
• Funders
uncertainty,
sampling
problems)
• How
to
cite
the
data
set
36. 4.
Metadata
basics
What
is
a
What
is
metadata
metadata?
standard?
Select
the
appropriate
metadata
standard
• Provides
structure
to
describe
data
Common
terms
|
definitions
|
language
|
structure
• Lots
of
different
standards
EML
,
FGDC,
ISO19115,
DarwinCore,…
• Tools
for
creating
metadata
files
Morpho
(EML),
Metavist
(FGDC),
NOAA
MERMaid
(CSGDM)
38. 5.
Workflows
Simplest
workflows:
commented
scripts,
flow
charts
Temperature
data
Data
import
into
R
Data
in
R
Salinity
format
data
Quality
control
&
“Clean”
T
data
cleaning
&
S
data
Analysis:
mean,
SD
Summary
statistics
Graph
production
40. 5.
Workflows
Workflows
enable
From
Flickr
by
merlinprincesse
Reproducibility
can
someone
independently
validate
findings?
Transparency
others
can
understand
how
you
arrived
at
your
results
Executability
others
can
re-‐run
or
re-‐use
your
analysis
41. 6.
Data
stewardship
&
reuse
From
Flickr
by
greensambaman
The 20-Year Rule
The
metadata
accompanying
a
data
set
should
be
written
for
a
user
20
years
into
the
future
RULE
(National
Research
Council
1991)
42. 6.
Data
stewardship
&
reuse
Use
stable
formats
csv,
txt,
tiff
Create
back-‐up
copies
original,
near,
far
Periodically
test
ability
to
restore
information
Modified from R. Cook
43. 6.
Data
stewardship
&
reuse
Where
do
I
put
my
data?
Insitutional
archive
Discipline/specialty
archive
DataCite
list
of
repostiories:
www.datacite.org/repolist
From
Flickr
by
torkildr
44. 6.
Data
stewardship
&
reuse
Data
Citation:
Why
everyone
should
do
it
Allow
readers
to
find
data
products
Get
credit
for
data
and
publications
Promote
reproducibility
Better
measure
of
research
impact
Example:
Sidlauskas,
B.
2007.
Data
from:
Testing
for
unequal
rates
of
morphological
diversification
in
the
absence
of
a
detailed
phylogeny:
a
case
study
from
characiform
fishes.
Dryad
Digital
Repository.
doi:10.5061/dryad.20
Learn
more
at
www.datacite.org
Modified from R. Cook
45. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
7. Planning
46. 1.
Planning
What
is
a
data
management
plan?
A
document
that
describes
what
you
will
do
with
your
data
during
your
research
and
after
you
complete
your
research
Data
Hangover
47. 1.
Planning
Why
should
I
prepare
a
DMP?
Saves
time
Increases
efficiency
Easier
to
use
data
Others
can
understand
&
use
data
Credit
for
data
products
Funders
require
it
48. NSF
DMP
Requirements
From
Grant
Proposal
Guidelines:
DMP
supplement
may
include:
1. the
types
of
data,
samples,
physical
collections,
software,
curriculum
materials,
and
other
materials
to
be
produced
in
the
course
of
the
project
2.
the
standards
to
be
used
for
data
and
metadata
format
and
content
(where
existing
standards
are
absent
or
deemed
inadequate,
this
should
be
documented
along
with
any
proposed
solutions
or
remedies)
3.
policies
for
access
and
sharing
including
provisions
for
appropriate
protection
of
privacy,
confidentiality,
security,
intellectual
property,
or
other
rights
or
requirements
4.
policies
and
provisions
for
re-‐use,
re-‐distribution,
and
the
production
of
derivatives
5.
plans
for
archiving
data,
samples,
and
other
research
products,
and
for
preservation
of
access
to
them
49. 1. Types
of
data
&
other
information
• Types
of
data
produced
• Relationship
to
existing
data
• How/when/where
will
the
data
be
captured
or
created?
C.
Strasser
• How
will
the
data
be
processed?
• Quality
assurance
&
quality
control
measures
• Security:
version
control,
backing
up
biology.kenyon.edu
• Who
will
be
responsible
for
data
management
during/after
project?
From
Flickr
by
Lazurite
50. 2. Data
&
metadata
standards
• What
metadata
are
needed
to
make
the
data
meaningful?
• How
will
you
create
or
capture
these
metadata?
Wired.com
• Why
have
you
chosen
particular
standards
and
approaches
for
metadata?
51. 3. Policies
for
access
&
sharing
4. Policies
for
re-‐use
&
re-‐distribution
• Are
you
under
any
obligation
to
share
data?
• How,
when,
&
where
will
you
make
the
data
available?
• What
is
the
process
for
gaining
access
to
the
data?
• Who
owns
the
copyright
and/or
intellectual
property?
• Will
you
retain
rights
before
opening
data
to
wider
use?
How
long?
• Are
permission
restrictions
necessary?
• Embargo
periods
for
political/commercial/patent
reasons?
• Ethical
and
privacy
issues?
• Who
are
the
foreseeable
data
users?
• How
should
your
data
be
cited?
52. 5. Plans
for
archiving
&
preservation
• What
data
will
be
preserved
for
the
long
term?
For
how
long?
• Where
will
data
be
preserved?
• What
data
transformations
need
to
occur
before
preservation?
• What
metadata
will
be
submitted
alongside
the
datasets?
• Who
will
be
responsible
for
preparing
data
for
preservation?
Who
will
be
the
main
contact
person
for
the
archived
data?
From
Flickr
by
theManWhoSurfedTooMuch
53. Don’t
forget:
Budget
• Costs
of
data
preparation
&
documentation
Hardware,
software
Personnel
Archive
fees
• How
costs
will
be
paid
Request
funding!
dorrvs.com
54. NSF’s
Vision*
DMPs
and
their
evaluation
will
grow
&
change
over
time
(similar
to
broader
impacts)
Peer
review
will
determine
next
steps
Community-‐driven
guidelines
– Different
disciplines
have
different
definitions
of
acceptable
data
sharing
– Flexibility
at
the
directorate
and
division
levels
– Tailor
implementation
of
DMP
requirement
Evaluation
will
vary
with
directorate,
division,
&
program
officer
*Unofficially
Help
from
Jennifer
Schopf,
NSF
55. Roadmap
4. Toolbox
3. How
to
improve
2. Mistakes
we
make
1. Background
56. DMPTool:
dmp.cdlib.org
Step-‐by-‐step
wizard
for
generating
DMP
Create
|
edit
|
re-‐use
|
share
|
save
|
generate
Open
to
community
Links
to
institutional
resources
Directorate
information
&
updates
58. CDL
Services
for
UC
Community
Where
should
I
put
Data
Repository
my
data?
Deposit
|
Manage
|
Share
|
Preserve
www.cdlib.org/services/uc3
59. CDL
Services
for
UC
Community
Create
&
manage
persistent
identifiers
• Precise
identification
of
a
dataset
• Credit
to
data
producers
and
data
publishers
• A
link
from
the
traditional
literature
to
the
data
• Research
metrics
for
datasets
Example:
Sidlauskas,
B.
2007.
Data
from:
Testing
for
unequal
rates
of
morphological
diversification
in
the
absence
of
a
detailed
phylogeny:
a
case
study
from
characiform
fishes.
Dryad
Digital
Repository.
doi:10.5061/dryad.20
www.cdlib.org/services/uc3
60. Why
are
you
promoting
Excel?
• Open
source
add-‐in
• Facilitate
data
management,
sharing,
archiving
for
scientists
• Focus
on
atmospheric,
ecological,
hydrological,
and
oceanographic
data
• Collecting
requirements
for
add-‐in
from
scientists,
data
centers,
libraries
Funders:
Gordon
and
Betty
Moore
Foundation,
Microsoft
Research
61. Why
are
you
promoting
Excel?
Everyone
uses
it
Stopgap
measure
63. www.dataone.org
• Data
Education
Tutorials
• Database
of
best
practices
&
software
tools
• Links
to
DMPTool
• Primer
on
data
management
From
Flickr
by
Robert
Hruzek
66. Process
1. Assess
needs
2. Gather
requirements
3. Build
requirements
document
4. Build
community
67. Requirements
1. Must
work
for
Excel
users
without
the
add-‐in
2. No
additional
software
(other
than
add-‐in
and
Excel)
necessary
3. Can
be
used
offline
4. Perform
CSV
compatibility
checks,
reporting,
and
automated
fixes
5. Add
Metadata
to
data
file
a. Can
use
existing
metadata
as
a
template
b. Add-‐in
can
automatically
generate
some
of
the
metadata
where
the
info
is
available
from
the
file
6. Generate
a
citation
for
the
data
file
7. Deposit
data
and
metadata
in
a
repository
68. The
Great
Debate
Add-‐in
• Little
pieces
of
software
• Download
to
extend
the
capabilities
of
Excel
• Appear
as
“ribbon”
Web-‐based
application
• Require
the
web:
www
+
wba
• Do
not
require
that
you
download
a
program
• Websites
that
do
something
with
info/files
provided
by
user
• Examples:
Facebook,
YouTube
69. Add-‐in
New
&
Download
improved
add-‐in
DCXL
spreadsheet
add-‐in
Check
Create
Connect
Compatibility
Metadata
to
repository
1. Parse
for
compatibility
1. Make
template
1. Version
control
2. Report
potential
errors
2. Auto-‐fill
2. Backing
up
3. Allow
user-‐directed
3. Parameter
list
selection
3. Retrieve
info:
error
correction
4. Citation
generation
Authentication
5. DOI
connection
Keyword
list
Metadata
standard
Citation
format
Acceptable
file
formats
70. Summary:
Add-‐in
The Good The Bad
• Integrated
in
workflow
• Windows
only
• Familiar
UI,
functionality
• Install
&
updates
required
• Smaller
shift
• Not
as
generalizable/
• Available
offline
extensible
• Not
as
easy
for
community
to
get
involved
71. Web
application
New
&
Upload
Web-‐based
improved
spreadsheet
application
spreadsheet
Check
Create
Connect
Compatibility
Metadata
to
repository
1. Parse
for
compatibility
1. Make
template
1. Version
control
2. Report
potential
errors
2. Auto-‐fill
2. Backing
up
3. Allow
user-‐directed
3. Parameter
list
selection
3. Retrieve
info:
error
correction
4. Citation
generation
Authentication
5. DOI
connection
Keyword
list
Metadata
standard
Citation
format
Acceptable
file
formats
72. Summary:
Web
based
The Good The Bad
• Easier
to
maintain,
update
• Not
familiar
• Can
use
with
Mac
• Requires
new
UI
• Generalizable/extensible
• Not
integrated
in
Excel
• Community
involvement
• Offline
use
not
guaranteed
possible
73. Moving
forward…
• Simple,
clean
user
interface
• Connect
to
web
application
from
within
Excel
• Offline
use
of
web
application,
especially
ability
to
create
metadata
offline
74. Send
me
feedback!
From
Flickr
by
hashmil
Comment
on
the
blog
dcxl.cdlib.org
Email
me
carlystrasser@gmail.com
Tweet
me
@carlystrasser
FB
message
me
DCXLatCDL
75. Diane
Bisom
Ann
Frenkel
Dr.
Ruth
Jackson
dcxl.cdlib.org
@dcxlCDL
www.facebook.com/DCXLatCDL
www.carlystrasser.net
carlystrasser@gmail.com
@carlystrasser