From Calisphere via California State University Libraries, Data Management A Scientist’s ark:/13030/c818356g Perspective Carly Strasser California Digital Library University of Florida Libraries University of California Curation Center August 2012
C. Strasser C. Strasser Courtesy of WHOI C. Strasser C. Strasser
C. Strasser C. Strasser North Atlantic right whale mother and calf, C. Strasser by Gill Braulik under Permit No. 655-‐1652
Roadmap 5. Landscape 4. Barriers 3. The Fallout 2. The world of data 1. A brief history of data collection C. Strasser
A Brief From Calisphere via Santa Clara University, History of Data ark:/13030/kt696nc7j2 Collection Or… how scientists came to be so bad at data management
The lab/ﬁeld notebook Curie Newton Darwin Da Vinci classicalschool.blogspot.com
The lab/ﬁeld notebook From Calisphere via Fullerton College, ark:/13030/kt5c60273t
From Flickr by DW0825 From Flickr by Flickmor From Flickr by deltaMike The lab/ﬁeld notebook…? www.woodrow.org C. Strasser Courtesey of WHOI From Flickr by US Army Environmental Command
From Flickr by DW0825 From Flickr by Flickmor From Flickr by deltaMike Digital data www.woodrow.org C. Strasser Courtesey of WHOI From Flickr by US Army Environmental Command
The Long Tail Size of dataset grant ($) # datasets # researchers # grants
The Long Tail 300 NSF DEB 2005-‐2010 250 n = 1234 Number of Awards 200 150 100 50 0 0.1 0.5 1 1.5 2 >2.5 Award Amount (millions of dollars) Hampton et al., In press, Frontiers in Ecology and Evolution
UGLY TRUTH Many (most?) researchers… 5shortessays.blogspot.com are not taught data management don’t know what metadata are can’t name data centers or repositories don’t share data publicly or store it in an archive aren’t convinced they should share data
Information Entropy Fig. 1 of Michener et al. 1997
From Calisphere via San Jose Public Library How bad can it be?
The Fallout: Where data end up From Flickr by diylibrarian www blog.order2disorder.com From Flickr by csessums Data Metadata From Flickr by csessums Recreated from Klump et al. 2006
The Fallout Data Reuse Data Sharing Data Management
Is data produced Is the data produced 100 NSF Dare data Where EB awards or reused? shared? 2005-‐2009 shared? Is data produced or reused? Is the data produced shared? One paper from each Where areor GenBank data shared? Shared TreeBase Produced all award Else- Reused Shared where none GenBank or Shared Shared Produced TreeBase Is data produced Both Is the data some produced Where are data all Else- or reused? shared? shared? Reused Shared where none Shared Produced: 57% (37) Shared all: 28% (17) some GenBank or Both GenBank or Reused: 8% (5) Shared some: 15% (9) TreeBase: Produced Shared TreeBase (21) 81% Both: 35% (23) Shared none: 57% (34) all Elsewhere: 19% (5) Else- Reused Shared where Produced: 57% (37) Shared all: 28% (17) GenBank or none Reused: 8% (5) Shared Shared some: 15% (9) TreeBase: 81% (21) Both: Both (23) 35% some Shared none: 57% (34) Elsewhere: 19% (5) Produced: 57% (37) Shared all: 28% (17) GenBank or Reused: 8% (5) Shared some: 15% (9) TreeBase: 81% (21) Both: 35% (23) Shared none: 57% (34) Elsewhere: 19% (5)Hampton et al., In press, Frontiers in Ecology and Evolution
Why? Barriers to Data Stewardship From Flickr by iowa_spirit_walker
From Flickr by indigoprime Barriers: Cost From Flickr by kobiz7 C. Strasser
Barriers: Sociocultural From Flickr by freefotouk Not the norm
Barriers: Sociocultural Not the norm Lack of / too many standards
Barriers: Sociocultural Not the norm Lack of / too From Flickr by toucanradio many standards Disparate data From Flickr by Chris Campbell
Barriers: Sociocultural From Flickr by uniinnsbruck Not the norm Lack of / too many standards Disparate data Lack of training
From Flickr by Christina Ann VanMeter Missed opportunities Loss of rights or beneﬁts From Flickr by pnh Barriers: Sociocultural Conﬂict From Flickr by tymesynk Misuse
Barriers: Sociocultural Lack of incentives Time consuming & expensive No requirements From Flickr by bthomso Reward structure
From Flickr by Marquette University generation? But what about the next
Are Undergrads Learning About Data Management? • Metadata generation 40 • Software choice 35 • File naming • QAQC 30 Important • Backing up 25 • Workﬂows 20 • Data sharing • Data re-‐use 15 • Meta-‐analysis 10 • Reproducibility • Notebook protocols 5 • Databases 0 If it’s important, why 0 10 Assessed 20 30 40 isn’t it taught?
Are Undergrads Learning About Data Management? Barriers: Too Not a Not advanced priority appropriate level Students Time don’t know No software Lab No training Covered Too in Lab big
NSF DMP Requirements From Grant Proposal Guidelines: DMP supplement may include: 1. the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project 2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies) 3. policies for access and sharing including provisions for appropriate protection of privacy, conﬁdentiality, security, intellectual property, or other rights or requirements 4. policies and provisions for re-‐use, re-‐distribution, and the production of derivatives 5. plans for archiving data, samples, and other research products, and for preservation of access to them
NSF’s Vision* DMPs and their evaluation will grow & change over time (similar to broader impacts) Peer review will determine next steps Community-‐driven guidelines – Diﬀerent disciplines have diﬀerent deﬁnitions of acceptable data sharing – Flexibility at the directorate and division levels – Tailor implementation of DMP requirement Evaluation will vary with directorate, division, & program oﬃcer *Unoﬃcially Help from Jennifer Schopf, NSF
Individual Challenges What is a data Will I get credit for my work? Collect management plan? Analyze Assure What is What tools do I metadata? use? Are there standards? Integrate Describe How much will it cost? Who can help me? Discover Deposit Where do I How do I preserve my Preserve preserve my data? data?
NSF funded DataNet Project Oﬃce of Cyberinfrastructure Community Cyberinfrastructure Engagement & Outreach Courtesy of DataONE
What role can libraries play in data education? What barriers to sharing can we eliminate? Why don’t people share data? Is data management Do attitudes about being taught? sharing diﬀer among disciplines? How can we promote storing data in repositories?