Your SlideShare is downloading. ×
0
The Metadata [R]evolution: Transformative Opportunities
September 18, 2013
Some Ideas on Making Research Data
Discoverable...
Everybody’s talking about research data:
 Share research outputs
 Demonstrate impact to public
 Data availability drive...
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local ha...
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local ha...
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local ha...
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local ha...
Research data management in action:
Using antibodies
Research data management in action:
Using antibodies
and squishy bits
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into thei...
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into thei...
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into thei...
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into thei...
de Waard, A., Burton, S. et al., 2013
An attempt to get researchers to curate
(but only partially share!) their data:
de Waard, A., Burton, S. et al., 2013
An attempt to get researchers to curate
(but only partially share!) their data:
What to do in the meantime:
49 publications193 publications 76 publications 214 publications 210 publicat
• In 220 publica...
How can research databases become
sustainable in the long term?
1. With IEDA:
– Building a database for lunar
geochemistry...
Making lunar sample data usable:
Making lunar sample data usable:
Making lunar sample data usable:
Making lunar sample data usable:
Private
store
Data
producer
or sponsor
Acces
s
Closed
Flow of
funds
Data
publication
Publi
c
Service
Collaboration
Conclav...
Comparing data repository types:
Repository Advantages Disadvantages
Local data
repository
Easy! No one steals
your data.
...
Funding Agency: University:
Collaborators:Domain of study:Domain-Specific
Data Repository
Local
Data Repository
Institutio...
Where do IRs/libraries fit in?
• Planning series of interviews at key institutions:
– What role do libraries/institutions ...
Principles of Elsevier RDS:
• Main goal: make research data optimally available,
discoverable and reusable.
• Collaboratio...
In summary:
If researchers start to curate and share their data…
And research databases become long-term
sustainable…
… we...
Thank you!
Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, E...
Your questions?
Anita de Waard
VP Research Data Collaborations,
Elsevier Research Data Services (VT)
a.dewaard@elsevier.co...
Upcoming SlideShare
Loading in...5
×

Some Ideas on Making Research Data: "It's the Metadata, stupid!"

723

Published on

Talk at OCLC Collective Insight symposium, Johns Hopkins, Baltimore, MD, September 18, 2013

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
723
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Some Ideas on Making Research Data: "It's the Metadata, stupid!""

  1. 1. The Metadata [R]evolution: Transformative Opportunities September 18, 2013 Some Ideas on Making Research Data Discoverable and Usable: “It’s the Metadata, Stupid!” Anita de Waard, VP Research Data Collaborations, Elsevier Research Data Services (VT)
  2. 2. Everybody’s talking about research data:  Share research outputs  Demonstrate impact to public  Data availability drives growth  Demonstrate impact  Guarantee permanence, discoverability  Avoid fraud  Generate, track outputs  Comply with mandates  Ensure availability  Archive, track, curate  Support researcher/institution  Archive  Add curation  Allow reuse Todd Vision, DataDryad, OAI8, 6/23/13: “We need to find a way to keep Dryad funded, and would love to hear your ideas about doing that.” Phil Bourne, Associate Vice Chancellor, UCSD, 4/13: “We are thinking about the university as a digital enterprise.” Mike Huerta, Ass. Director NLM O of Health Info at NIH, 6/13: “Today, the major public product of science are concepts, written down in papers. But tomorrow, data will be the main product of science…. We will require scientists to track and share their data as least as well, if not better, than they are sharing their ideas today.” Mara Saule, Dean University Libraries/CIO, UVM, 5/13: “We need to do something about data.”  Derive credit  Comply with mandates  Discover and use  Cite/acknowledge Gov Funding bodies University management Researchers Librarians Data Repositories Nathan Urban, PI Urban Lab, CMU, 3/13: “If we can share our data, we can write a paper that will knock everybody’s socks off!” Roles and needs wrt Research Data: Barbara Ransom, NSF Program Director Earth Sciences, 2/13: “We’re not going to spend any more money for you to go out and get more data! We want you first to show us how you’re going to use all the data we paid y’all to collect in the past!”
  3. 3. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories
  4. 4. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories How do we get researchers to curate, store and share their data?
  5. 5. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories How do we get researchers to curate, store and share their data? How do we ensure long-term sustainability for high-end repositories?
  6. 6. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories How do we get researchers to curate, store and share their data? How do we ensure long-term sustainability for high- end repositories? What role do libraries/institution s play?
  7. 7. Research data management in action: Using antibodies
  8. 8. Research data management in action: Using antibodies and squishy bits
  9. 9. Research data management in action: Using antibodies and squishy bits Grad Students experiment
  10. 10. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook.
  11. 11. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides,
  12. 12. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper.
  13. 13. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story.
  14. 14. de Waard, A., Burton, S. et al., 2013 An attempt to get researchers to curate (but only partially share!) their data:
  15. 15. de Waard, A., Burton, S. et al., 2013 An attempt to get researchers to curate (but only partially share!) their data:
  16. 16. What to do in the meantime: 49 publications193 publications 76 publications 214 publications 210 publicat • In 220 publications only 40% of antibodies, 40% of cell lines and 25% of constructs can be manually identified (Vasilevsky et al, submitted) • Proposal (with NIH/NIF and Force11 Group): – Adding minimal data standards – Tool extracts likely reagents / resources – User interface asks author to confirm or select
  17. 17. How can research databases become sustainable in the long term? 1. With IEDA: – Building a database for lunar geochemistry – Write joint report on building repository, curation costs and challenges 2. With WDS/RDA WG: – Planning survey of cost recovery models – Input/inspiration: ICPSR Sloane-funded project ‘Sustaining Domain Repositories for Digital Data’ – Developing overarching funding model with Todd Vision/DataDryad
  18. 18. Making lunar sample data usable:
  19. 19. Making lunar sample data usable:
  20. 20. Making lunar sample data usable:
  21. 21. Making lunar sample data usable:
  22. 22. Private store Data producer or sponsor Acces s Closed Flow of funds Data publication Publi c Service Collaboration Conclave  Limited Subscriptio n content   Commercial overlay  Limited Academic Use/Limited Data user Flow of funds Examples ICSP R, CERN -LHC KEGG GeoFacets Reaxys DRAFT - CC-BY-NC 2013, Todd Vision & Anita de Waard Many small operations, e.g. try-db.org, plhdb.org Dryad, arXiv, PDB Commercial and institutional storage  & or A research database funding model:
  23. 23. Comparing data repository types: Repository Advantages Disadvantages Local data repository Easy! No one steals your data. No one sees it. Not compliant with requirements Generic data repository Not very hard to do. Have complied! Data can’t be easily reused. Credit? Institutional Repository Can use existing IR? Tracking and compliance checks. Data can’t easily be reused. Credit? Domain-specific data repository Data can be reused. Credit! Lot of work for curators. Long-term sustainable? Effort,Reuse,Credit,Compliance Habit,Ease,Privacy,Control Higherqualitymetadata
  24. 24. Funding Agency: University: Collaborators:Domain of study:Domain-Specific Data Repository Local Data Repository Institutional Data Repository Generic Data Repository AND THEYALL WANT DIFFERENT METADATA!!!! Metadata madness…
  25. 25. Where do IRs/libraries fit in? • Planning series of interviews at key institutions: – What role do libraries/institutions play wrt research data management? – What tools/metadata standards are used? – What aspects of data deposition is the Research Office/IR/Institution interested in? – How does this compare with what scientists want and do in their labs? • Goal: share knowledge; establish plan of action
  26. 26. Principles of Elsevier RDS: • Main goal: make research data optimally available, discoverable and reusable. • Collaboration is tailored to partner’s unique needs: – Working with a few domain-specific and institutional repositories and institutions – Aspects where collaboration is needed are discussed – Collaboration plan is drawn up using SLA: agree on time, conditions, etc. • 2013: series of pilots, studies and reports to enable feasibility study: – What are key needs? – Can Elsevier play a role: skillsets, partnerships? – Is there a (transparent) business model for this?
  27. 27. In summary: If researchers start to curate and share their data… And research databases become long-term sustainable… … we enable enrichment with high-quality metadata that makes research data truly discoverable and reusable. Many questions remain: ? What role would the institution/library play? ? How do we ensure interoperable metadata? ? What are sustainable models, moving forward? ? Is there a place for publishers, in all this?
  28. 28. Thank you! Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy • UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky • NIF: Maryann Martone, Anita Bandrowski • MSU: Brian Bothner • OHSU: Melissa Haendel, Nicole Vasilevsky • California Digital Library: Carly Strasser, John Kunze, Stephen Abrams • Columbia/IEDA: Kerstin Lehnert, Leslie Hsu • CNI: Clifford Lynch • Harvard: Michael Kurtz, Chris Erdmann • MIT: Micah Altman • UVM: Mara Saurle
  29. 29. Your questions? Anita de Waard VP Research Data Collaborations, Elsevier Research Data Services (VT) a.dewaard@elsevier.com http://researchdata.elsevier.com/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×