SlideShare a Scribd company logo
1 of 26
Data curation issues Michelle Hudson SCOPA Forum 9.21.11
What is data?
What is data? Definition varies by discipline and can include experimental, observational, and computational data.
What is data? Definition varies by discipline and can include experimental, observational, and computational data. In general “research data” refers to raw or processed products of a research project.
What is data? Definition varies by discipline and can include experimental, observational, and computational data. In general “research data” refers to raw or processed products of a research project. These products can be video, images, or numeric files in the form of geographic information, spreadsheets, and other formats.
What is data curation?
What is data curation? “Data curation is the active and ongoing management of research data through its lifecycle of interest and usefulness to scholarship, science, and education.” – Carole Palmer, UIUC GSLIS
What is data curation? “Data curation is the active and ongoing management of research data through its lifecycle of interest and usefulness to scholarship, science, and education.” – Carole Palmer, UIUC GSLIS “Curation” includes selection, appraisal, maintenance, preservation.
Why is data curation important for us?
Why is data curation important for us? According to Paul F. Uhlir, Director of the Board on Research Data and Information, researchers are “contributing to a networked information enterprise where data are a fundamental infrastructural component of the modern research system.”
Why is data curation important for us? According to Paul F. Uhlir, Director of the Board on Research Data and Information, researchers are “contributing to a networked information enterprise where data are a fundamental infrastructural component of the modern research system.” Increasingly, data itself is a product and record of scholarship.
Some Problems that make curation difficult.
Some Problems that make curation difficult. No standards.
Some Problems that make curation difficult. No standards. Lack of interoperability.
Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing.
Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited.
Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited. Domain of stewardship/responsibility is unclear.
Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited. Domain of stewardship/responsibility is unclear. Individual repositories make silos of content.
Ideas for solutions!
Ideas for solutions! Experiment tracking software and electronic notebooks.
Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata.
Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow.
Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow. Educating graduate students on proper data management.
Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow. Educating graduate students on proper data management. DataONE and the Data Conservancy.
Other stuff! Data citation Data sharing Reward models Identity control (ORCID, EZID) Semantic web and linked data Cyberinfrastructure
Questions? michelle.hudson@yale.edu 203.432.4587 @michellehudson in person for coffee @ kbt cafe

More Related Content

What's hot

Research Data Management from a Software Engineering Perspective
Research Data Management from a Software Engineering PerspectiveResearch Data Management from a Software Engineering Perspective
Research Data Management from a Software Engineering PerspectiveSarah Anna Stewart
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
 
Virtual Research Environments
Virtual Research EnvironmentsVirtual Research Environments
Virtual Research EnvironmentsJoss Winn
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesSEAD
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 
Disciplinary RDM
Disciplinary RDMDisciplinary RDM
Disciplinary RDMSarah Jones
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interityIUPUI
 
Data management (1)
Data management (1)Data management (1)
Data management (1)SM Lalon
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 

What's hot (20)

Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013
 
Research Data Management from a Software Engineering Perspective
Research Data Management from a Software Engineering PerspectiveResearch Data Management from a Software Engineering Perspective
Research Data Management from a Software Engineering Perspective
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 
Virtual Research Environments
Virtual Research EnvironmentsVirtual Research Environments
Virtual Research Environments
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
Sgci data west 12-15-16
Sgci data west 12-15-16Sgci data west 12-15-16
Sgci data west 12-15-16
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
Disciplinary RDM
Disciplinary RDMDisciplinary RDM
Disciplinary RDM
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
Sgci data west 12-15-16
Sgci data west 12-15-16Sgci data west 12-15-16
Sgci data west 12-15-16
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 

Viewers also liked

Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Current and emerging scientific data curation practices
Current and emerging scientific data curation practicesCurrent and emerging scientific data curation practices
Current and emerging scientific data curation practicesMichael Day
 
Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12ASIS&T
 
Data curation and preservation: the Digital Curation Centre
Data curation and preservation: the Digital Curation CentreData curation and preservation: the Digital Curation Centre
Data curation and preservation: the Digital Curation CentreMichael Day
 
Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12ASIS&T
 
Data Curation: Retooling the Existing Workforce
Data Curation: Retooling the Existing WorkforceData Curation: Retooling the Existing Workforce
Data Curation: Retooling the Existing WorkforceSteven Miller
 

Viewers also liked (6)

Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Current and emerging scientific data curation practices
Current and emerging scientific data curation practicesCurrent and emerging scientific data curation practices
Current and emerging scientific data curation practices
 
Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12
 
Data curation and preservation: the Digital Curation Centre
Data curation and preservation: the Digital Curation CentreData curation and preservation: the Digital Curation Centre
Data curation and preservation: the Digital Curation Centre
 
Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12
 
Data Curation: Retooling the Existing Workforce
Data Curation: Retooling the Existing WorkforceData Curation: Retooling the Existing Workforce
Data Curation: Retooling the Existing Workforce
 

Similar to data curation issues

Research process and research data management
Research  process and research data managementResearch  process and research data management
Research process and research data managementKen Chad Consulting Ltd
 
UKSG 2014 Breakout Session - Westminster Research Process and Research Data
UKSG 2014 Breakout Session - Westminster Research Process and Research DataUKSG 2014 Breakout Session - Westminster Research Process and Research Data
UKSG 2014 Breakout Session - Westminster Research Process and Research DataUKSG: connecting the knowledge community
 
Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6ARDC
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
 
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Stephanie Wright
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Jamie Bisset
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and LibrariansJohann van Wyk
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...University of California Curation Center
 
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012Arhiv družboslovnih podatkov
 
Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...
Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...
Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...Globus
 
Scientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological SurveyScientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological SurveyDave Govoni
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"The TMC Library
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research DataMichael Day
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeLizLyon
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d studentsDebs Martindale
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 

Similar to data curation issues (20)

Research process and research data management
Research  process and research data managementResearch  process and research data management
Research process and research data management
 
UKSG 2014 Breakout Session - Westminster Research Process and Research Data
UKSG 2014 Breakout Session - Westminster Research Process and Research DataUKSG 2014 Breakout Session - Westminster Research Process and Research Data
UKSG 2014 Breakout Session - Westminster Research Process and Research Data
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
 
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
 
Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...
Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...
Virtual Organizations 2.0: Social Constructs for Data-centered Collaborative ...
 
Scientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological SurveyScientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological Survey
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and Practice
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d students
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

data curation issues

  • 1. Data curation issues Michelle Hudson SCOPA Forum 9.21.11
  • 3. What is data? Definition varies by discipline and can include experimental, observational, and computational data.
  • 4. What is data? Definition varies by discipline and can include experimental, observational, and computational data. In general “research data” refers to raw or processed products of a research project.
  • 5. What is data? Definition varies by discipline and can include experimental, observational, and computational data. In general “research data” refers to raw or processed products of a research project. These products can be video, images, or numeric files in the form of geographic information, spreadsheets, and other formats.
  • 6. What is data curation?
  • 7. What is data curation? “Data curation is the active and ongoing management of research data through its lifecycle of interest and usefulness to scholarship, science, and education.” – Carole Palmer, UIUC GSLIS
  • 8. What is data curation? “Data curation is the active and ongoing management of research data through its lifecycle of interest and usefulness to scholarship, science, and education.” – Carole Palmer, UIUC GSLIS “Curation” includes selection, appraisal, maintenance, preservation.
  • 9. Why is data curation important for us?
  • 10. Why is data curation important for us? According to Paul F. Uhlir, Director of the Board on Research Data and Information, researchers are “contributing to a networked information enterprise where data are a fundamental infrastructural component of the modern research system.”
  • 11. Why is data curation important for us? According to Paul F. Uhlir, Director of the Board on Research Data and Information, researchers are “contributing to a networked information enterprise where data are a fundamental infrastructural component of the modern research system.” Increasingly, data itself is a product and record of scholarship.
  • 12. Some Problems that make curation difficult.
  • 13. Some Problems that make curation difficult. No standards.
  • 14. Some Problems that make curation difficult. No standards. Lack of interoperability.
  • 15. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing.
  • 16. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited.
  • 17. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited. Domain of stewardship/responsibility is unclear.
  • 18. Some Problems that make curation difficult. No standards. Lack of interoperability. Controlled vocabularies are missing. Storage space is limited. Domain of stewardship/responsibility is unclear. Individual repositories make silos of content.
  • 20. Ideas for solutions! Experiment tracking software and electronic notebooks.
  • 21. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata.
  • 22. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow.
  • 23. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow. Educating graduate students on proper data management.
  • 24. Ideas for solutions! Experiment tracking software and electronic notebooks. Automatic metadata. Integrating curation early into the researcher workflow. Educating graduate students on proper data management. DataONE and the Data Conservancy.
  • 25. Other stuff! Data citation Data sharing Reward models Identity control (ORCID, EZID) Semantic web and linked data Cyberinfrastructure
  • 26. Questions? michelle.hudson@yale.edu 203.432.4587 @michellehudson in person for coffee @ kbt cafe

Editor's Notes

  1. Hi! I’m Michelle Hudson. I’m the science and social science librarian working in the social science library for now, but I’ll be in the new Center for Science and Social Science Information when we open in January up in the Kline Biology Tower.I started at Yale in April and I’ve been going to conferences and workshops all summer – today I’m going to talk about things I learned at the Summer Institute on Data Curation put on by UIUC’s GSLIS and the NSF-funded Princeton Research Data Lifecycle Workshop.
  2. Director of CIRSS -- Center for Informatics Research in Science & Scholarship
  3. As well as many other things – too many to list.
  4. No standards for storage or metadata either within or across disciplines in many cases.
  5. Poor interoperability makes it difficult to maintain or convert data to appropriate formats – open source software and standards are lacking.
  6. Even simple ones to standardize the outcomes of experiments – people don’t talk about things the same way.
  7. Some data are cheaper to generate than to store. That’s not to say they’re cheap to generate – it’s just that adequate backed-up storage is still unreasonably expensive for most research enterprises.
  8. Domain of stewardship/responsibility - whose job is it to keep university assets for thirty, forty years or more? liaison models and services to researchers. Who can give this authority and who is willing to fill the gap? As a discipline, we don’t have liaison models in place. Is it the library? Is it IT?
  9. So Cornell has a repository with a lot of great data they’re willing to share. This isn’t helpful unless we know data sets are stored there. It’s even less helpful if we can’t access them. This is true for all kinds of universities and centers all over the world.
  10. Utilizing software that tracks and records the progress of experiments and keeps data as it happens is something that’s gaining more popularity. Every scientist uses a notebook, and, until recently, these notebooks were paper and shelved somewhere inaccessible or forgotten once the project was done. With electronic notebooks, it’ll be easier to keep track of what’s done to the data, when, etc.
  11. Automatic metadata application came up as a main point during the Princeton workshop. Scientists want to advocate for instrument makers to give more robust options for creating metadata at the point of data creation/observation.
  12. If we start early in the workflow, it’s easier to get usable data we can decide to select and keep with less work down the line. If data is taken care of early with the aim of keeping it safe for a long time, it would require less work on the part of the curators.
  13. Some institutions are beginning to educate grad students on good data management techniques. Simple data or file management isn’t often taught as part of a research methods scope, so students have been welcoming this information, and see librarians as experts on this kind of thing.
  14. DataONE and the Data Conservancy are two NSF-funded projects with important goals for nation-wide cyberinfrastructure and research data storage. DataONE intends to connect “nodes” of research institutions to make the data they share widely available. They also have a mission to educate students, librarians, and researchers on data management best practices. The Data Conservancy is a model for a large-scale repository for research data. Instances will start rolling out as early as next year and it’ll be interesting to see how it develops and if it’s viable. You can read a lot more about it on their website, which is on your handout.
  15. Feel free to talk to me about anything else related to data as well. Obviously not much time to cover it all today, but let’s meet and chat! Also refer to the handout for good resources on learning more.