Here are some random notes on data from Peter's old lab:
- Table with stable isotope data from algal samples collected at Wash Cresc Lake on Dec 16. Includes sample IDs, weights, %C, delta C-13 and N-15 values, and spectrometer numbers.
- Reference standards analyzed to calculate sample delta values. SDs reported for delta C-13 and N-15 of reference standards.
- Samples include algae from shore, lake outlet, and various collection points around lake labeled ALG01, 03, 05, 07.
- Delta values range from -30.17 to -21.11 for C-13 and -1.65 to 0.87 for N-15
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
Presentation for the UC Davis for Open Access Week. Covers the current status of data management in the sciences, best practices for data management, data management planning, and tools for researchers.
CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23
Cynthia Hudson-Vitale, Digital Data Outreach Librarian, Washington University
Brianna Marshall, Digital Curation Coordinator, University of Wisconsin-Madison
Amy Nurnberger, Research Data Manager, Columbia University
The Internet, Science, and Transformations of KnowledgeEric Meyer
Talk on June 7, 2012 in the Harvard SAP Speaker Series (Office of the Senior Associate Provost for the Harvard Library).
http://www.provost.harvard.edu/harvard_library/sap_speakers_series.php
Philosophy of Big Data: Big Data, the Individual, and SocietyMelanie Swan
Philosophical concepts elucidate the impact the Big Data Era (exabytes/year of scientific, governmental, corporate, personal data being created) is having on our sense of ourselves as individuals in society as information generators in constant dialogue with the pervasive information climate.
Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
DataONE Education Module 10: Legal and Policy IssuesDataONE
Lesson 10 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Supporting Terminology Standards and Interoperability in Nursing PracticeErin D. Foster
Poster presentation at the Medical Library Association annual conference in May 2016.
Link to webpage: https://www.nlm.nih.gov/research/umls/Snomed/nursing_terminology_resources.html
Measuring competence: building an assessment tool for public health graduate ...Erin D. Foster
Presentation for the Pacific Northwest Region of the National Network of Libraries of Medicine "PNR Partners" webinar series in March 2016.
Link to webinar recording: https://www.youtube.com/watch?v=uaKc8Aa4Gko
Linking Data with sameAs: Challenges and Solutions - WorkshopAdrian Stevenson
Feedback from 'Linking Data with sameAs: Challenges and Solutions' 3 hour workshop given at ELAG 2014 in Bath, UK.
http://elag2014.org/programme/elag-2014-workshops/stevenson/
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
Presentation for the UC Davis for Open Access Week. Covers the current status of data management in the sciences, best practices for data management, data management planning, and tools for researchers.
CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23
Cynthia Hudson-Vitale, Digital Data Outreach Librarian, Washington University
Brianna Marshall, Digital Curation Coordinator, University of Wisconsin-Madison
Amy Nurnberger, Research Data Manager, Columbia University
The Internet, Science, and Transformations of KnowledgeEric Meyer
Talk on June 7, 2012 in the Harvard SAP Speaker Series (Office of the Senior Associate Provost for the Harvard Library).
http://www.provost.harvard.edu/harvard_library/sap_speakers_series.php
Philosophy of Big Data: Big Data, the Individual, and SocietyMelanie Swan
Philosophical concepts elucidate the impact the Big Data Era (exabytes/year of scientific, governmental, corporate, personal data being created) is having on our sense of ourselves as individuals in society as information generators in constant dialogue with the pervasive information climate.
Funders and publishers have something in common: for better or worse, we have the ability to influence the behavior of researchers. This talk will focus on what both groups can do to improve research now and in the future.
DataONE Education Module 10: Legal and Policy IssuesDataONE
Lesson 10 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Supporting Terminology Standards and Interoperability in Nursing PracticeErin D. Foster
Poster presentation at the Medical Library Association annual conference in May 2016.
Link to webpage: https://www.nlm.nih.gov/research/umls/Snomed/nursing_terminology_resources.html
Measuring competence: building an assessment tool for public health graduate ...Erin D. Foster
Presentation for the Pacific Northwest Region of the National Network of Libraries of Medicine "PNR Partners" webinar series in March 2016.
Link to webinar recording: https://www.youtube.com/watch?v=uaKc8Aa4Gko
Linking Data with sameAs: Challenges and Solutions - WorkshopAdrian Stevenson
Feedback from 'Linking Data with sameAs: Challenges and Solutions' 3 hour workshop given at ELAG 2014 in Bath, UK.
http://elag2014.org/programme/elag-2014-workshops/stevenson/
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
Data Management: Scientist Perspective - DLF 2012Carly Strasser
Presentation at the 2012 Digital Libraries Federation Fall Forum in Denver, CO. Workshop on Data Management Services, held 5 Nov 2012. http://www.diglib.org/forums/2012forum/data-management-services-at-the-library-the-3-hour-tour/
Cal Poly - Data Management: Who knew it was a hot topic?Carly Strasser
October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
New mandates, announcements, memos, and requirements are emerging that encourage better data management, data sharing, and data preservation. In this presentation, data curation specialist Carly Strasser, PhD, offers a lay of the data management land by discussing recent events, resources, and new directions for data stewardship.
Data Management: Scientist Perspective - UC3 Data Curation WorkshopCarly Strasser
Presentation on data management: the current landscape, barriers to management, and data types. For UC3-CDL data curation for practitioners workshop, 8 Nov 2012 in Oakland CA.
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
Presentation to introduce current landscape of data management and UC3 tools and services that support data sharing. For IASSIST in Toronto, 5 June 2014.
Overview of data management policies and data management plans, including the DMPTool. For Ecological Society of America 2013 Meeting in Minneapolis, MN 5 August 2013.
Workshop session given at the Institutional Web Management Workshop 2012 (IWMW 2012) event held at the University of Edinburgh on 18th - 20th June 2012.
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
Ignite talk (20 slides / 15 seconds per slide) for ESA 2014 meeting in Sacramento, CA 12 August 2014. On the Dash platform for helping researchers manage and share their data via institutional repositories
Data Management for Mountain Observatories WorkshopCarly Strasser
Keynote presentation for 2014 Mountain Observatories Workshop, 16 July 2014.
Abstract:
While methods for collecting data are well taught, there is less emphasis on managing the resulting data effectively. New mandates, announcements, memos, and requirements from agencies and publishers are emerging that encourage better data management, data sharing, and data preservation. Scientists with good management skills will be able to maximize the productivity of their own research, effectively and efficiently share their data with the community, and benefit from the re-use of their data by others. I will offer an overview of data management landscape - discussing recent events, resources, and new directions for data stewardship. I will also cover best practices for data management, which will facilitate data sharing and reuse, and introduce tools researchers can use to help in their data stewardship endeavours.
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
Keynote presentation for the Colorado Alliance of Research Libraries 2014 Research Data Management Conference, 11 July 2014. Focuses on why data management and sharing is important, and the role of libraries.
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
*Please excuse the typos :)
Presentation on open science and open data for the Australian Institute of Marine Science (AIMS) workshop on "Raising your research profile using research data". 18 June 2014.
Data Publication for UC Davis Publish or PerishCarly Strasser
Intro presentation for panel on going beyond publishing journal articles. UC Davis "Publish or Perish?" Event, 13 Feb 2014. Sorry about missing gradient on some of slides!
October 18, 2013 @ Kennedy Library, Data Studio, Cal Poly. We hear about all things “open” these days: open access, open source, open data, open science, et cetera. But what does it really mean for how we do science? How are things changing, and what are the implications for individual researchers?
Cal Poly - Data Management and the DMPToolCarly Strasser
October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Many funders now require researchers to submit a Data Management Plan alongside their project proposals. The DMPTool is a free, online wizard that helps you create a data management plan specific to your project, and provides you with links and resources for ensuring your plan is successful.
Cal Poly - Data Management for ResearchersCarly Strasser
October 17, 2013 @ 1 Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Researchers rarely learn about good data management practices. Instead we develop our own systems that are often unintelligible to others. In this talk, Strasser, PhD, will focus on the common mistakes that scientists make and how to avoid them. She will provide best practices for data management, which will facilitate data sharing and reuse, and introduce tools you can use.
NISO Webinar on data curation services at the CDLCarly Strasser
"Building communities and Services in Support of Data-Intensive Research". Webinar on 18 Sept 2013 for the NISO Webinar Series. This was part 2 of 2 for Data Curation
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Data Management for Scientists: Workshop at Ocean Sciences 2012
1. Data
Management
for
Scientists
Reduce
your
workload
Reuse
your
ideas
Recycle
your
data
www.oddee.com
Carly
Strasser,
PhD
Ocean
Sciences
Meeting
California
Digital
Library,
UC
Office
of
the
President
February
2012
carly.strasser@ucop.edu
www.carlystrasser.net
2. Roadmap
4. Toolbox
3. How
to
improve
2. Data
management
landscape
1. Background
3. NSF
funded
DataNet
Project
Office
of
Cyberinfrastructure
Community
Cyberinfrastructure
Engagement
&
Outreach
From
Flickr
by
wetwebwork
Courtesy
of
DataONE
4. What
role
can
libraries
play
in
data
education?
What
barriers
to
sharing
can
we
eliminate?
Why
don’t
people
share
data?
Is
data
management
Do
attitudes
about
being
taught?
sharing
differ
among
disciplines?
How
can
we
promote
storing
data
in
repositories?
5.
6. Roadmap
4. Toolbox
3. How
to
improve
2. Data
management
landscape
1. Background
7. From
Flickr
by
DW0825
From
Flickr
by
Flickmor
From
Flickr
by
deltaMike
Digital
data
www.woodrow.org
C.
Strasser
Courtesey
of
WHOI
From
Flickr
by
US
Army
Environmental
Command
9. Data
Models
Maximum
Likelihood
estimation
Matrix
Models
Images
Tables
Paper
10. Data
Models
Maximum
Likelihood
estimation
Matrix
Models
Images
Tables
Paper
11. UGLY TRUTH
Many
Earth
|
Environmental
|
Ecological
scientists…
5shortessays.blogspot.com
are
not
taught
data
management
don’t
know
what
metadata
are
can’t
name
data
centers
or
repositories
don’t
share
data
publicly
or
store
it
in
an
archive
aren’t
convinced
they
should
share
data
12. Data
Hangover
What
happened?
From
Flickr
by
SteveMcN
13. Where
data
end
up
From
Flickr
by
diylibrarian
www
blog.order2disorder.com
From
Flickr
by
csessums
Data
Metadata
From
Flickr
by
csessums
Recreated
from
Klump
et
al.
2006
14. Who
cares?
From
Flickr
by
Redden-‐McAllister
From
Flickr
by
AJC1
www.rba.gov.au
15. Where
data
end
up
From
Flickr
by
diylibrarian
www
Data
www
Metadata
From
Flickr
by
torkildr
Recreated
from
Klump
et
al.
2006
17. Trends
in
Data
Archiving
Journal
publishers
Joint
Data
Archiving
Agreement
Data
Papers
etc.
Ecological
Archives,
Beyond
the
PDF
Funders
Data
management
requirements
18. Roadmap
4. Toolbox
3. Best
practices
2. Data
management
landscape
1. Background
19. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
7. Planning
24. 2.
Data
collection
&
organization
Create
unique
identifiers
• Decide
on
naming
scheme
early
• Create
a
key
• Different
for
each
sample
From
Flickr
by
zebbie
From
Flickr
by
sjbresnahan
25. 2.
Data
collection
&
organization
Standardize
• Consistent
within
columns
– only
numbers,
dates,
or
text
• Consistent
names,
codes,
formats
Modified
from
K.
Vanderbilt
From
Pink
Floyd,
The
Wall
themurkyfringe.com
26. 2.
Data
collection
&
organization
Standardize
• Reduce
possibility
of
manual
error
by
constraining
entry
choices
Excel
lists
Data Google
Docs
Forms
validataion
Modified
from
K.
Vanderbilt
27. 2.
Data
collection
&
organization
Identify
missing
data
• Numeric
fields:
distinct
value
(e.g.
9999)
• Text
fields:
NULL
or
NA
• Use
data
flags
in
a
separate
column
to
qualify
empty
cells
M1
=
missing;
no
sample
collected
E1
=
estimated
from
grab
sample
28. 2.
Data
collection
&
organization
Create
parameter
table
Create
a
site
table
From
doi:10.3334/ORNLDAAC/777
From
doi:10.3334/ORNLDAAC/777
From
R
Cook,
ESA
Best
Practices
Workshop
2010
29. 2.
Data
collection
&
organization
SPREADSHEETS:
THE GOOD
Quick
on
the
draw
Clickety-‐click
and
you’re
ready
to
fire
Always
there
in
time
Everyone
has
Excel
Smarter
than
he
lets
on
Stats,
Pivot
tables,
VB
scripts
Cleans
up
real
pretty
Graphics,
fonts,
colors,
borders
From
Mark
Schildhauer
30. 2.
Data
collection
&
organization
SPREADSHEETS:
THE BAD
Shoot
first
ask
later
Click&fire
Click&fire
Click&fire
No
scruples
Delete
row,
click&fire,
ctrl-‐x/ctrl-‐c,
click&fire,
re-‐sort,
save
Talks
a
good
story
but
not
much
education
Stats
From
Mark
Schildhauer
31. 2.
Data
collection
&
organization
SPREADSHEETS:
THE UGLY
Ill-‐mannered
Takes
data
prisoner;
conflates
raw
and
summary
data
Gaudy
Use
of
visual
cues
as
metadata:
color,
font,
border
Shifty
Cross-‐linking
worksheets
sets
up
“invisible”
dependencies
Shiftless
No
provenance
The
more
complicated
your
spreadsheet,
the
uglier
it
gets
for
use
with
other
software
From
Mark
Schildhauer
32. 2.
Data
collection
&
organization
All
of
the
things
that
make
Excel
great
for
data
are
bad
for
archiving!
1. Create
archive-‐ready
raw
data
2. Put
it
somewhere
special
3. Have
your
fun
with
fancy
Excel
techniques
4. Keep
archiving
in
mind
33. 2.
Data
collection
&
organization
What
about
databases?
A
relational
database
is
A
set
of
tables
Relationships
among
the
tables
A
language
to
specify
&
query
the
tables
From
Mark
Schildhauer
34. 2.
Data
collection
&
organization
Sample
sites
samples
Samples
Species
*siteID
*sampleID
*sampleID
*speciesID
site_name
siteID
siteID
sample_date
species_name
latitude
sample_date
common_name
speciesID
longitude
speciesID
height
family
description
height
flowering
order
flowering
flag
comments
flag
comments
*
Denotes
the
primary
key
From
Mark
Schildhauer
35. 2.
Data
collection
&
organization
Databases
often
enforce
good
practice
Must
define
A
B
C
D
E
Tables
1
2
3
10
11
Attributes
4
5
6
12
13
14
15
Relationships
(constraints)
7
8
9
16
17
Databases
provide:
Scalability:
millions+
records
Features
for
sub-‐setting,
querying,
sorting
Scripted
language:
SQL
Reduced
redundancy
&
potential
data
entry
errors
From
Mark
Schildhauer
36. 2.
Data
collection
&
organization
Spreadsheets
Databases
• Good
for
simple,
self-‐contained
• Works
well
with
lots
of
data
charts,
graphs,
calculations
• Easy
to
query
and
subset
data
• Handy
for
collecting
raw
data
• Data
fields
are
constrainted
• Flexible
cell
content
type
• Columns
cannot
be
sorted
But…
independently
of
each
other
• Hard
to
subset
or
sort
• Normalization
reduces
data
entry
• Lack
“record”
integrity:
can
sort
a
and
potential
for
error
column
independently
of
all
others
But…
• Harder
to
maintain
as
complexity
• More
to
learn
and
size
of
data
grows
• Harder
to
use
From
Mark
Schildhauer
37. 2.
Data
collection
&
organization
Invest
time
in
learning
databases
if
your
data
sets
are
large
or
complex
Consider
investing
time
in
learning
databases
if…
your
data
are
small
and
humble
you
ever
intend
to
share
your
data
you
are
<
30
years
old
www.top20training.com
From
Mark
Schildhauer
38. 2.
Data
collection
&
organization
Use
descriptive
file
names
PhDcomics.com
39. 2.
Data
collection
&
organization
Use
descriptive
file
names
*
• Unique
• Reflect
contents
Bad:
Mydata.xls
Better:
Eaffinis_nanaimo_2010_counts.xls
2001_data.csv
best
version.txt
Study
Year
organism
Site
name
What
was
measured
*Not
for
everyone
From
R
Cook,
ESA
Best
Practices
Workshop
2010
40. 2.
Data
collection
&
organization
Organize
files
logically
Biodiversity
Lake
Experiments
Biodiv_H20_heatExp_2005to2008.csv
Biodiv_H20_predatorExp_2001to2003.csv
…
Field
work
Biodiv_H20_PlanktonCount_2001toActive.csv
Biodiv_H20_ChlAprofiles_2003.csv
…
Grassland
From
S.
Hampton
41. 2.
Data
collection
&
organization
Preserve
information
R
script
for
processing
&
analysis
• Keep
raw
data
raw
• Use
scripts
to
process
data
&
save
them
with
data
Raw
data
as
.csv
42. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
7. Planning
43. 3.
Quality
control
and
quality
assurance
Before
data
collection
• Define
&
enforce
standards
• Assign
responsibility
for
data
quality
From
Flickr
by
StacieBee
44. 3.
Quality
control
and
quality
assurance
During
data
collection/entry
• Minimize
manual
entry
• Use
double
entry
• Use
text-‐to-‐speech
program
to
read
data
back
• Use
a
database
• Document
changes
From
Flickr
by
schock
45. 3.
Quality
control
and
quality
assurance
After
data
entry
• Check
for
missing,
impossible,
anomalous
values
• Perform
statistical
summaries
• Look
for
outliers
• Normal
probability
plots
• Regression
• Scatter
plots
60
50
40
• Maps
30
20
10
0
0
10
20
30
40
46. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
7. Planning
48. 4.
Metadata
basics
Metadata
=
Data
reporting
WHO
created
the
data?
WHAT
is
the
content
of
the
data
set?
WHEN
was
it
created?
WHERE
was
it
collected?
HOW
was
it
developed?
WHY
was
it
developed?
49. • Scientific
context
4.
Metadata
basics
• Scientific
reason
why
the
data
were
collected
• What
data
were
collected
• Digital
context
• What
instruments
(including
model
&
• Name
of
the
data
set
serial
number)
were
used
• The
name(s)
of
the
data
file(s)
in
the
data
• Environmental
conditions
during
collection
set
• Where
collected
&
spatial
resolution
When
• Date
the
data
set
was
last
modified
collected
&
temporal
resolution
• Example
data
file
records
for
each
data
• Standards
or
calibrations
used
type
file
• Information
about
parameters
• Pertinent
companion
files
• How
each
was
measured
or
produced
• List
of
related
or
ancillary
data
sets
• Units
of
measure
• Software
(including
version
number)
• Format
used
in
the
data
set
used
to
prepare/read
the
data
set
• Precision
&
accuracy
if
known
• Data
processing
that
was
performed
• Information
about
data
• Personnel
&
stakeholders
• Definitions
of
codes
used
• Who
collected
• Quality
assurance
&
control
measures
• Who
to
contact
with
questions
• Known
problems
that
limit
data
use
(e.g.
• Funders
uncertainty,
sampling
problems)
• How
to
cite
the
data
set
50. 4.
Metadata
basics
What
is
metadata?
Select
the
appropriate
metadata
standard
• Provides
structure
to
describe
data
Common
terms
|
definitions
|
language
|
structure
• Lots
of
different
standards
EML
,
FGDC,
ISO19115,
DarwinCore,…
• Tools
for
creating
metadata
files
Morpho
(EML),
Metavist
(FGDC),
NOAA
MERMaid
(CSGDM)
52. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
7. Planning
53. 5.
Workflows
Workflow:
how
you
get
from
the
raw
data
to
the
final
products
of
your
research
Simple
workflows:
flow
charts
Temperature
data
Data
import
into
R
Data
in
R
Salinity
format
data
Quality
control
&
“Clean”
T
data
cleaning
&
S
data
Analysis:
mean,
SD
Summary
statistics
Graph
production
54. 5.
Workflows
Workflow:
how
you
get
from
the
raw
data
to
the
final
products
of
your
research
Simple
workflows:
commented
scripts
• R,
SAS,
MATLAB
• Well-‐documented
code
is…
Easier
to
review
Easier
to
share
%
#
$
Easier
to
repeat
analysis
&
56. 5.
Workflows
Workflows
enable
From
Flickr
by
merlinprincesse
Reproducibility
can
someone
independently
validate
findings?
Transparency
others
can
understand
how
you
arrived
at
your
results
Executability
others
can
re-‐run
or
re-‐use
your
analysis
57. 5.
Workflows
Minimally:
document
your
analysis
commented
code;
simple
flow-‐chart
www.littlebytesoflife.com
Emerging
workflow
applications
will…
− Link
software
for
executable
end-‐to-‐end
analysis
− Provide
detailed
info
about
data
&
analysis
− Facilitate
re-‐use
&
refinement
of
complex,
multi-‐step
analyses
− Enable
efficient
swapping
of
alternative
models
&
algorithms
− Help
automate
tedious
tasks
58. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
7. Planning
59. 6.
Data
stewardship
&
reuse
From
Flickr
by
greensambaman
The
20-‐Year
Rule
The
metadata
accompanying
a
data
set
should
be
written
for
a
user
20
years
into
the
future
RULE
(National
Research
Council
1991)
60. 6.
Data
stewardship
&
reuse
Use
stable
formats
csv,
txt,
tiff
Create
back-‐up
copies
original,
near,
far
Periodically
test
ability
to
restore
information
Modified from R. Cook
61. 6.
Data
stewardship
&
reuse
Store
your
data
in
a
repository
Institutional
archive
Discipline/specialty
archive
DataCite
list
of
repostiories:
www.datacite.org/repolist
From
Flickr
by
torkildr
62. 6.
Data
stewardship
&
reuse
Data
Citation
Allows
readers
to
find
data
products
Get
credit
for
data
and
publications
Promotes
reproducibility
Better
measure
of
research
impact
Example:
Sidlauskas,
B.
2007.
Data
from:
Testing
for
unequal
rates
of
morphological
diversification
in
the
absence
of
a
detailed
phylogeny:
a
case
study
from
characiform
fishes.
Dryad
Digital
Repository.
doi:10.5061/dryad.20
Learn
more
at
www.datacite.org
Modified from R. Cook
63. Best
Practices
for
Data
Management
1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
7. Planning
&
data
management
plans
in
particular
64. 1.
Planning
What
is
a
data
management
plan?
A
document
that
describes
what
you
will
do
with
your
data
during
your
research
and
after
you
complete
your
research
Data
Hangover
65. 1.
Planning
Why
should
I
prepare
a
DMP?
Saves
time
Increases
efficiency
Easier
to
use
data
Others
can
understand
&
use
data
Credit
for
data
products
Funders
require
it
66. NSF
DMP
Requirements
From
Grant
Proposal
Guidelines:
DMP
supplement
may
include:
1. the
types
of
data,
samples,
physical
collections,
software,
curriculum
materials,
and
other
materials
to
be
produced
in
the
course
of
the
project
2.
the
standards
to
be
used
for
data
and
metadata
format
and
content
(where
existing
standards
are
absent
or
deemed
inadequate,
this
should
be
documented
along
with
any
proposed
solutions
or
remedies)
3.
policies
for
access
and
sharing
including
provisions
for
appropriate
protection
of
privacy,
confidentiality,
security,
intellectual
property,
or
other
rights
or
requirements
4.
policies
and
provisions
for
re-‐use,
re-‐distribution,
and
the
production
of
derivatives
5.
plans
for
archiving
data,
samples,
and
other
research
products,
and
for
preservation
of
access
to
them
67. 1. Types
of
data
&
other
information
• Types
of
data
produced
• Relationship
to
existing
data
• How/when/where
will
the
data
be
captured
or
created?
C.
Strasser
• How
will
the
data
be
processed?
• Quality
assurance
&
quality
control
measures
• Security:
version
control,
backing
up
biology.kenyon.edu
• Who
will
be
responsible
for
data
management
during/after
project?
From
Flickr
by
Lazurite
68. 2. Data
&
metadata
standards
• What
metadata
are
needed
to
make
the
data
meaningful?
• How
will
you
create
or
capture
these
metadata?
Wired.com
• Why
have
you
chosen
particular
standards
and
approaches
for
metadata?
69. 3. Policies
for
access
&
sharing
4. Policies
for
re-‐use
&
re-‐distribution
• Are
you
under
any
obligation
to
share
data?
• How,
when,
&
where
will
you
make
the
data
available?
• What
is
the
process
for
gaining
access
to
the
data?
• Who
owns
the
copyright
and/or
intellectual
property?
• Will
you
retain
rights
before
opening
data
to
wider
use?
How
long?
• Are
permission
restrictions
necessary?
• Embargo
periods
for
political/commercial/patent
reasons?
• Ethical
and
privacy
issues?
• Who
are
the
foreseeable
data
users?
• How
should
your
data
be
cited?
70. 5. Plans
for
archiving
&
preservation
• What
data
will
be
preserved
for
the
long
term?
For
how
long?
• Where
will
data
be
preserved?
• What
data
transformations
need
to
occur
before
preservation?
• What
metadata
will
be
submitted
alongside
the
datasets?
• Who
will
be
responsible
for
preparing
data
for
preservation?
Who
will
be
the
main
contact
person
for
the
archived
data?
From
Flickr
by
theManWhoSurfedTooMuch
71. Don’t
forget:
Budget
• Costs
of
data
preparation
&
documentation
Hardware,
software
Personnel
Archive
fees
• How
costs
will
be
paid
Request
funding!
dorrvs.com
72. NSF’s
Vision*
DMPs
and
their
evaluation
will
grow
&
change
over
time
(similar
to
broader
impacts)
Peer
review
will
determine
next
steps
Community-‐driven
guidelines
– Different
disciplines
have
different
definitions
of
acceptable
data
sharing
– Flexibility
at
the
directorate
and
division
levels
– Tailor
implementation
of
DMP
requirement
Evaluation
will
vary
with
directorate,
division,
&
program
officer
*Unofficially
Help
from
Jennifer
Schopf,
NSF
73. Roadmap
4. Toolbox
3. Best
practices
2. Data
management
landscape
1. Background
75. DMPTool:
dmp.cdlib.org
Step-‐by-‐step
wizard
for
generating
DMP
Create
|
edit
|
re-‐use
|
share
|
save
|
generate
Open
to
community
Links
to
institutional
resources
Directorate
information
&
updates
76. CDL
Services:
www.cdlib.org/services/uc3
Data
Repository
Deposit
|
Manage
|
Share
|
Preserve
• Precise
identification
of
a
dataset
• Credit
to
data
producers
and
data
publishers
• A
link
from
the
traditional
literature
to
the
data
• Research
metrics
for
datasets
Example:
Sidlauskas,
B.
2007.
Data
from:
Testing
for
unequal
rates
of
morphological
diversification
in
the
absence
of
a
detailed
phylogeny:
a
case
study
from
characiform
fishes.
Dryad
Digital
Repository.
doi:10.5061/dryad.20
77. Why
are
you
promoting
Excel?
• Open
source
add-‐in
• Facilitate
data
management,
sharing,
archiving
for
scientists
• Focus
on
atmospheric,
ecological,
hydrological,
and
oceanographic
data
• Collecting
requirements
for
add-‐in
from
scientists,
data
centers,
libraries
Funders:
Gordon
and
Betty
Moore
Foundation,
Microsoft
Research
78. Why
are
you
promoting
Excel?
Everyone
uses
it
Stopgap
measure
Funders:
Gordon
and
Betty
Moore
Foundation,
Microsoft
Research
79. www.dataone.org
• Data
Education
Tutorials
• Database
of
best
practices
&
software
tools
• Links
to
DMPTool
• Primer
on
data
management
From
Flickr
by
Robert
Hruzek
82. Handy
References
Best
Practices
for
Preparing
Environmental
Data
Sets
to
Share
and
Archive.
September
2010.
Hook,
Santhana
Vannan,
Beaty,
Cook,
&
Wilson
http://daac.ornl.gov/PI/BestPractices-‐2010.pdf
Some
Simple
Guidelines
for
Effective
Data
Management.
Borer,
Seabloom,
Jones,
&
Schildhauer.
Bull
Ecol
Soc
Amer,
April
2009:
205-‐214.