data curation issues

•Download as PPTX, PDF•

2 likes•1,072 views

A very short, very minimal presentation I prepared for the Yale Libraries' SCOPA event to introduce librarians in diverse disciplines to the concepts and challenges of data curation.

Technology Business

Data curation issues Michelle Hudson SCOPA Forum 9.21.11

What is data? Definition varies by discipline and can include experimental, observational, and computational data.

What is data? Definition varies by discipline and can include experimental, observational, and computational data. In general “research data” refers to raw or processed products of a research project.

What is data curation? “Data curation is the active and ongoing management of research data through its lifecycle of interest and usefulness to scholarship, science, and education.” – Carole Palmer, UIUC GSLIS

Why is data curation important for us? According to Paul F. Uhlir, Director of the Board on Research Data and Information, researchers are “contributing to a networked information enterprise where data are a fundamental infrastructural component of the modern research system.”

Some Problems that make curation difficult.

Some Problems that make curation difficult. No standards.

Some Problems that make curation difficult. No standards. Lack of interoperability.

Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing.

Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited.

Ideas for solutions! Experiment tracking software and electronic notebooks.

Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata.

Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow.

Other stuff! Data citation Data sharing Reward models Identity control (ORCID, EZID) Semantic web and linked data Cyberinfrastructure

Questions? michelle.hudson@yale.edu 203.432.4587 @michellehudson in person for coffee @ kbt cafe

What's hot

Kno.e.sis Review: late 2012 to mid 2013Artificial Intelligence Institute at UofSC

Research Data Management from a Software Engineering PerspectiveSarah Anna Stewart

Knoesis Student AchievementArtificial Intelligence Institute at UofSC

Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire

RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T

Virtual Research EnvironmentsJoss Winn

Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Research Support Team, IT Services, University of Oxford

SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford

Sgci data west 12-15-16Nancy Wilkins-Diehr

Summary of 3DPASDaniel S. Katz

Why science needs open data – Jisc and CNI conference 10 July 2014Jisc

Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin

Presentation to the UM Library Emergent Research SeriesSEAD

Data Science and What It Means to Library and Information ScienceJian Qin

Disciplinary RDMSarah Jones

IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire

Sgci data west 12-15-16Nancy Wilkins-Diehr

Managing data responsibly to enable research interityIUPUI

Data management (1)SM Lalon

Data publishing at the UQ LibraryARDC

What's hot (20)

Kno.e.sis Review: late 2012 to mid 2013

Research Data Management from a Software Engineering Perspective

Knoesis Student Achievement

Introduction to research data management; Lecture 01 for GRAD521

RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...

Virtual Research Environments

Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...

SEEKing our way to better presentation of data and models from scientific inv...

Sgci data west 12-15-16

Summary of 3DPAS

Why science needs open data – Jisc and CNI conference 10 July 2014

Functional and Architectural Requirements for Metadata: Supporting Discovery...

Presentation to the UM Library Emergent Research Series

Data Science and What It Means to Library and Information Science

Disciplinary RDM

IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...

Sgci data west 12-15-16

Managing data responsibly to enable research interity

Data management (1)

Data publishing at the UQ Library

Viewers also liked

Data curation issues for repositoriesChris Rusbridge

Current and emerging scientific data curation practicesMichael Day

Curation Service Models - Michael Witt - RDAP12ASIS&T

Data curation and preservation: the Digital Curation CentreMichael Day

Data Curation Models JHU Barbara Pralle RDAP12ASIS&T

Data Curation: Retooling the Existing WorkforceSteven Miller

Viewers also liked (6)

Data curation issues for repositories

Current and emerging scientific data curation practices

Curation Service Models - Michael Witt - RDAP12

Data curation and preservation: the Digital Curation Centre

Data Curation Models JHU Barbara Pralle RDAP12

Data Curation: Retooling the Existing Workforce

Similar to data curation issues

Research process and research data managementKen Chad Consulting Ltd

UKSG 2014 Breakout Session - Westminster Research Process and Research DataUKSG: connecting the knowledge community

Research data life cycleUniversity of Arizona

Fsci 2018 monday30_july_am6ARDC

Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC

Coming to an Understanding: a Cross-institutional Examination of Assessments ...Stephanie Wright

Publishing your research: Research Data Management (Introduction) Jamie Bisset

Martone gretheMaryann Martone

METRO RDM WebinarVictoria Steeves

Research Data Management and LibrariansJohann van Wyk

DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...University of California Curation Center

Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012Arhiv družboslovnih podatkov

Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...Globus

Scientific Information Management at the U.S. Geological SurveyDave Govoni

Research Data ManagementJamie Bisset

Neville Prendergast "E-Science - What is it?"The TMC Library

Curation of Research DataMichael Day

Mind the Gap: Reflections on Data Policies and PracticeLizLyon

Research data management for masters and ph d studentsDebs Martindale

Acting as Advocate? Seven steps for libraries in the data decadeLizLyon

Similar to data curation issues (20)

Research process and research data management

UKSG 2014 Breakout Session - Westminster Research Process and Research Data

Research data life cycle

Fsci 2018 monday30_july_am6

Research Data Management in practice, RIA Data Management Workshop Brisbane 2017

Coming to an Understanding: a Cross-institutional Examination of Assessments ...

Publishing your research: Research Data Management (Introduction)

Martone grethe

METRO RDM Webinar

Research Data Management and Librarians

DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...

Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012

Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...

Scientific Information Management at the U.S. Geological Survey

Research Data Management

Neville Prendergast "E-Science - What is it?"

Curation of Research Data

Mind the Gap: Reflections on Data Policies and Practice

Research data management for masters and ph d students

Acting as Advocate? Seven steps for libraries in the data decade

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Gen AI in Business - Global Trends Report 2024.pdfAddepto

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

How to write a Business Continuity PlanDatabarracks

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Developer Data Modeling Mistakes: From Postgres to NoSQL

DevoxxFR 2024 Reproducible Builds with Apache Maven

Connect Wave/ connectwave Pitch Deck Presentation

DSPy a system for AI to Write Prompts and Do Fine Tuning

Dev Dives: Streamline document processing with UiPath Studio Web

How AI, OpenAI, and ChatGPT impact business and software.

Advanced Test Driven-Development @ php[tek] 2024

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

Artificial intelligence in cctv survelliance.pptx

Nell’iperspazio con Rocket: il Framework Web di Rust!

Take control of your SAP testing with UiPath Test Suite

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Gen AI in Business - Global Trends Report 2024.pdf

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

How to write a Business Continuity Plan

data curation issues

1. Data curation issues Michelle Hudson SCOPA Forum 9.21.11

2. What is data?

3. What is data? Definition varies by discipline and can include experimental, observational, and computational data.

4. What is data? Definition varies by discipline and can include experimental, observational, and computational data. In general “research data” refers to raw or processed products of a research project.

5. What is data? Definition varies by discipline and can include experimental, observational, and computational data. In general “research data” refers to raw or processed products of a research project. These products can be video, images, or numeric files in the form of geographic information, spreadsheets, and other formats.

6. What is data curation?

7. What is data curation? “Data curation is the active and ongoing management of research data through its lifecycle of interest and usefulness to scholarship, science, and education.” – Carole Palmer, UIUC GSLIS

8. What is data curation? “Data curation is the active and ongoing management of research data through its lifecycle of interest and usefulness to scholarship, science, and education.” – Carole Palmer, UIUC GSLIS “Curation” includes selection, appraisal, maintenance, preservation.

9. Why is data curation important for us?

10. Why is data curation important for us? According to Paul F. Uhlir, Director of the Board on Research Data and Information, researchers are “contributing to a networked information enterprise where data are a fundamental infrastructural component of the modern research system.”

11. Why is data curation important for us? According to Paul F. Uhlir, Director of the Board on Research Data and Information, researchers are “contributing to a networked information enterprise where data are a fundamental infrastructural component of the modern research system.” Increasingly, data itself is a product and record of scholarship.

12. Some Problems that make curation difficult.

13. Some Problems that make curation difficult. No standards.

14. Some Problems that make curation difficult. No standards. Lack of interoperability.

15. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing.

16. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited.

17. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited. Domain of stewardship/responsibility is unclear.

18. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited. Domain of stewardship/responsibility is unclear. Individual repositories make silos of content.

19. Ideas for solutions!

20. Ideas for solutions! Experiment tracking software and electronic notebooks.

21. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata.

22. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow.

23. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow. Educating graduate students on proper data management.

24. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow. Educating graduate students on proper data management. DataONE and the Data Conservancy.

25. Other stuff! Data citation Data sharing Reward models Identity control (ORCID, EZID) Semantic web and linked data Cyberinfrastructure

26. Questions? michelle.hudson@yale.edu 203.432.4587 @michellehudson in person for coffee @ kbt cafe

Editor's Notes

Hi! I’m Michelle Hudson. I’m the science and social science librarian working in the social science library for now, but I’ll be in the new Center for Science and Social Science Information when we open in January up in the Kline Biology Tower.I started at Yale in April and I’ve been going to conferences and workshops all summer – today I’m going to talk about things I learned at the Summer Institute on Data Curation put on by UIUC’s GSLIS and the NSF-funded Princeton Research Data Lifecycle Workshop.
Director of CIRSS -- Center for Informatics Research in Science & Scholarship
As well as many other things – too many to list.
No standards for storage or metadata either within or across disciplines in many cases.
Poor interoperability makes it difficult to maintain or convert data to appropriate formats – open source software and standards are lacking.
Even simple ones to standardize the outcomes of experiments – people don’t talk about things the same way.
Some data are cheaper to generate than to store. That’s not to say they’re cheap to generate – it’s just that adequate backed-up storage is still unreasonably expensive for most research enterprises.
Domain of stewardship/responsibility - whose job is it to keep university assets for thirty, forty years or more? liaison models and services to researchers. Who can give this authority and who is willing to fill the gap? As a discipline, we don’t have liaison models in place. Is it the library? Is it IT?
So Cornell has a repository with a lot of great data they’re willing to share. This isn’t helpful unless we know data sets are stored there. It’s even less helpful if we can’t access them. This is true for all kinds of universities and centers all over the world.
Utilizing software that tracks and records the progress of experiments and keeps data as it happens is something that’s gaining more popularity. Every scientist uses a notebook, and, until recently, these notebooks were paper and shelved somewhere inaccessible or forgotten once the project was done. With electronic notebooks, it’ll be easier to keep track of what’s done to the data, when, etc.
Automatic metadata application came up as a main point during the Princeton workshop. Scientists want to advocate for instrument makers to give more robust options for creating metadata at the point of data creation/observation.
If we start early in the workflow, it’s easier to get usable data we can decide to select and keep with less work down the line. If data is taken care of early with the aim of keeping it safe for a long time, it would require less work on the part of the curators.
Some institutions are beginning to educate grad students on good data management techniques. Simple data or file management isn’t often taught as part of a research methods scope, so students have been welcoming this information, and see librarians as experts on this kind of thing.
DataONE and the Data Conservancy are two NSF-funded projects with important goals for nation-wide cyberinfrastructure and research data storage. DataONE intends to connect “nodes” of research institutions to make the data they share widely available. They also have a mission to educate students, librarians, and researchers on data management best practices. The Data Conservancy is a model for a large-scale repository for research data. Instances will start rolling out as early as next year and it’ll be interesting to see how it develops and if it’s viable. You can read a lot more about it on their website, which is on your handout.
Feel free to talk to me about anything else related to data as well. Obviously not much time to cover it all today, but let’s meet and chat! Also refer to the handout for good resources on learning more.

data curation issues

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to data curation issues

Similar to data curation issues (20)

Recently uploaded

Recently uploaded (20)

data curation issues

Editor's Notes