Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Year One Data Stewardship
Year One Data Stewardship
Loading in …3
1 of 28

Introduction to Scientific Data Stewardship Maturity Matrix



Download to read offline

An introduction with high-level background information on scientific data stewardship maturity matrix.

What's new in this version: updated reference list for maturity assessment models and applications.

As SlideShare has disabled the re-upload feature, the latest version will be maintained at:

Introduction to Scientific Data Stewardship Maturity Matrix

  1. 1. Introduction to Scientific Data Stewardship Maturity Matrix Ge Peng Cooperative Institute for Climate and Satellite – North Carolina (CICS-NC), NC State University and NOAA’s National Centers for Environmental Information – NC (NCEI-NC) (Formerly known as NOAA’s National Climatic Data Center (NCDC)) A Unified Framework for Measuring Stewardship Practices Applied to Digital Environmental Datasets In Collaboration with Jeff Privette, Ed Kearns, Nancy Ritchey, and Steve Ansari NCEI-NC/NOAA Version: 09/15/2016 r2
  2. 2. • What is scientific data stewardship? What does it mean? • Why should we care? • Why do we need a data stewardship maturity matrix (DSMM)? • Where are we now? • What is the NCEI/ICS-NC Scientific Data Stewardship Maturity Matrix? • How did we get to where we are? • Who could use the DSMM? What are the ways to use the DSMM? • Putting maturity assessment into perspective • What to do next? In This Presentation  An overview of the scientific data stewardship maturity assessment model with high-level background information on
  3. 3. What Is Scientific Data Stewardship? Data Quality Screening/ Assurance/ Control/ Evaluation/ Assessment/ Monitoring Activities to ensure or improve the quality and usability of geosciences data and products • Activities to preserve or improve the information content, accessibility, and usability of environmental data and metadata (National Research Council, 2007) To Ensure Data Are • always meaningful • trustworthy • Common data format • Spatial & temporal characteristics • Uncertainty estimates
  4. 4. What Does Scientific Data Stewardship Mean? Ensure your data are  preserved and secure  available, discoverable, and accessible  credible and understandable  usable and useful  sustainable and extendable  citable and traceable Version: 20141017 Rev. 2.2 POC:
  5. 5. Why Should We Care?  Quality of data and what being done with/to data matter!  Knowing stewardship maturity is essential in making informed, actionable, and efficient data management decisions!
  6. 6. Problem: Most of data centers currently cannot readily convey - or even assess – the level of stewardship practices for its stakeholders or customers. No community scorecard exists. Hypothetic questions to a data center: 1. Congress: Are your datasets compliant with the U.S. Data Quality Act? If not, then what? 2. Business: Is your product credible? Readily accessible with common data format? Sustainable? 3. Modelers: Is the quality of a routinely updated product being assessed? Solution: Define a Stewardship Maturity Matrix to assess stewardship practices applied to individual data products Why Do We Need a Data Stewardship Maturity Matrix?  This is a vulnerability – and an opportunity!  The value and quality of a data set depends – in part – on the stewardship practices applied after its production.
  7. 7. Where Are We Now? • A stewardship maturity matrix for individual digital environmental datasets – baselined • A paper – published by a peer-reviewed journal with free online access (Peng et al., 2015: doi:10.2481/dsj.14-049)
  8. 8. What Is the NCEI/CICS-NC Scientific Data Stewardship Maturity Matrix (DSMM)? A Unified Framework for Measuring Stewardship Practices Applied to Individual Digital Earth Sciences Data Products That Are Publicly Available Online Leveraging Institutional Knowledge and Community Best Practices and Standards
  9. 9. DSMM Defines Measureable, Five-Level Progressive Practices in Nine Quasi-Independent Key Components (Data system integrity is also very important but not included in the matrix due to potential security risks to the system.)
  10. 10. The Scope of Stewardship Practices • Those applied to individual datasets – measureable and progressive • Those associated with the functional entities of the Open Archival Information System (OAIS) (within the shaded box in the diagram below) CCSDS (2012) Version: 650x0m2-2012
  11. 11. How Did We Get here? Policies Processes Tasks Procedures /Standards •U.S. laws •Agencies’ guidelines •Experts’ recommendation •Research to operations •Data/metadata management •Data application •Data preservation •Data governance •Data provenance •Data quality assessment •Evaluate product •Verify file checksum •Create metadata •Monitor data quality Non-Functional Requirements Functional Core Areas Community Practices Key Matrix Components • Relevant • Measurable • Progressive • Quasi-Independent Pathway to Identify Key Components and Define Levels of Stewardship Maturity Matrix
  12. 12. DSMM Follows CMMI level Structure Level 1 Ad Hoc Not Managed Level 2 Minimal Limit Managed Level 3 Intermediate/Managed Community Good Practices Level 4 Advanced/Well Managed Community Best Practices Level 5 Optimal/Well Managed Measured, Controlled, Audit Reference Maturity Level Structure • Capability Maturity Model Integration (CMMI) • Levels of Maturity of Digital repository Recommended level for online operational products stewarded by National Data Centers
  13. 13. Overarching Goals • General • Simple • Concise Assess & Convey & Path Forward Not to Reinvent Wheels Leveraging • NCEI Subject Matter Experts (SMEs) (institutional knowledge) • Community accepted good and best practices and standards • SMEs from national and international communities
  14. 14. Who Could Use The Matrix? • Data providers and scientific stewards  to evaluate and improve the quality and usability of their products against community best practices • Modelers, decision-support system users, and scientists  to improve their products and uncertainty estimates  to make investment and use decision • Data managers/stewards of data centers and repositories  to validate their compliance or lack of to community accepted stewardship practice or standards  to assess the current state  to create a roadmap forward to improve or enhance its stewardship maturity of practices applied to a certain product or all its holdings • General data users  to make an educated choice on selecting or utilizing a dataset
  15. 15. Ways to Utilize DSMM & Assessment Results • To know the current state of your dataset(s) – maturity assessment (stewardship maturity scoreboard) • To know where you want or need to be – stewardship requirements • To know how to get there – roadmap forward (informed, actionable steps) • A reference model for stewardship planning and resource allocation – informed decision-making support • A consolidate source and transparency for information about stewardship practices – assessment with detailed justifications Current Need to Be Stewardship Maturity Scoreboard and Roadmap Forward • Content-rich quality metadata – enhanced discoverability and usability
  16. 16. Putting Maturity Assessment into Perspective
  17. 17. Tiers of Maturity Assessment within Context of Scientific Data Stewardship Organizations (Capability) • Repository Procedures Maturity (e.g., ISO 16363:2012–trustworthiness) Portfolios (Asset Management) Individual Datasets (Practices) • Stewardship Practices Maturity (e.g., NCEI/CICS-NC Data Stewardship Maturity Matrix (Peng et al., 2015)) • Repository Processes Maturity (e.g., CMMI Data Management Maturity) • Asset Management Maturity (e.g., National Geospatial Dataset Asset Lifecycle Maturity Model (FGDC, 2016))
  18. 18. Create/Evaluate/Obtain Product Maintain/Preserve/Access Stewardship Use/User Service Service Define/Develop/Validate Science Product Maturity Matrix Stewardship Maturity Matrix Service Maturity Matrix Science Maturity Matrix EUMETSAT (2013; 2015) Zhao et al. (2016) Bates and Privette (2012) Peng et al. (2015) NCEI MM-Serv WG (2017) Individual Datasets Maturity Assessment within Context of Dataset Lifecycle Stages An End-2-End, Consistent, Integrated Maturity Matrix Suite A Consistent Measure of Product, Stewardship, and Service Maturity (See Peng et al. (2016a) for an overview of the current state of dataset-centric maturity assessment models.)
  19. 19. Communities Are Interested In This Subject! Introduction to Stewardship Maturity Matrix on ( • 1598 views globally since 1st upload in July 2014 Data Stewardship Maturity Matrix on ( • 976 views globally since 1st upload in July 2014 (Based on view metrics provided by as of 9/15/2016) DSMM Self-Assessment Template on ( • 465 downloads since 1st upload in February 2015 (Based on download metrics provided by as of 9/15/2016) (Based on view metrics provided by as of 9/15/2016)
  20. 20. What To Do Next? • ESIP (The Federation of Earth Science Information Partners) Data Stewardship Committee – ensure consistent application and implementation of DSMM across agencies and potentially get the committee endorsement (e.g., Downs et al., 2015) • EUMETSAT – provide a common stewardship assessment framework between NOAA and EUMETSAT satellite Climate Data Records (CDRs) • OMB A-16 NGDA Portfolio lifecycle maturity assessment model working group – potentially integrate DSMM into their portfolio assessment model • Use case studies (NCEI, ESIP, NSIDC, NCAR, DataOne, CSIRO, etc.) – application and refinement of DSMM & defining roles and responsibilities for assessment (e.g., Ritchey and Peng, 2015; Hou et al., 2015, Peng et al., 2016b,c); • Decision-support tools (NOAA OSD & TRIO, CICS-NC, NCEI) – assess, display, and integrate content-rich quality information in a more systematic way (e.g., Austin and Peng, 2015; Ritchey et al., 2016; Zinn et al., 2017).
  21. 21. What Is Good Scientific Data Stewardship? Make it easier for users  to trust your data  to find your dataset(s)  to get your data files  To understand your data  to learn the quality of your data  to use your data  to integrate your data Version: 20141017 Rev. 2.1 POC:
  22. 22. Acknowledgement Benefit greatly from input and feedback from many people at or affiliated with NCEI-NC and other data centers and agencies Appreciate support and guidance from NCEI-NC (formerly known as NCDC), CICS-NC, CDR Program, RSAD, and Product Branch management
  23. 23. *** NCEI-NC Informal Focus Groups *** • Data Preservability  Nancy Ritchey  Ed Kearns  Drew Saunders  Jason Cooper  Ge Peng • Data Accessibility/Usability  Steve Ansari  Drew Saunders  John Keck  John Stachniewicz  Philip Jones  Jay Morris  Louis Vasquez  Christina Lief  Jeff Privette  Ge Peng • Data Integrity/Security  Scott Koger  Jason Symonds  David Bowman  Ryan Nelson  Steve Ansari  Ed Kearns  Ken Schmidt  Ge Peng • Production Sustainability  Jeff Privette  Walter Jesse Glance  Ken Knapp  Tom Zhao  Ge Peng • Data Quality  Jeff Privette  Richard Kauffold  Otis Brown  Ken Knapp  Bryant Cramer  Ed Kearns  Ge Peng • Transparency/Traceability  Ana Privette  Drew Saunders  Ge Peng • User Requirement  Sam McCown  Jeff Robel  Derek Arndt  Jenny Dissen  Ge Peng
  24. 24. We Would Like to Thank Them All! Special THANKS to Jeff Privette, Ed Kearns, Nancy Ritchey, Steve Ansari, Ken Knapp, Drew Saunders, John Keck, Scott Koger, John Bates, Otis Brown, Bryant Cramer, Richard Kauffold, Linda Copley, Phil Jones, Daniel Wunder, Terry McPherson, Dan Kowal, Ken Casey, Grace Peng, Ruth Duerr, Donna Scott, Matthew Austin, Ana Privette, NCEI – NC Metadata Working Group
  25. 25. Like to learn more? Could contribute?  contact us at or  register at or
  26. 26. Reference Austin, M. and G. Peng, 2015: A Prototype for content-rich decision-making support in NOAA using data as an asset. Poster: IN21A-1676. 2015 AGU Fall meeting, 14 – 18 December 2015, San Francisco, CA, USA. Bates, J. J. and J.L. Privette, 2012: A maturity model for assessing the completeness of climate data records. EOS, Transactions of the AGU, 44, 441. CCSDS (The Consultative Committee for Space Data Systems), 2012: Reference Model for an Open Archival Information System (OAIS), Recommended Practices, Issue 2. Version: CCSDS 650.0-M-2. 135 pp. DAMA International, 2010: Guide to the Data Management Body of Knowledge (DAMA-DMBOK). Eds. Mosley, M., Brackett, M., & Earley, S., Technics Publications, LLC, New Jersey, USA. 2nd Print Edition. 406 pp. Downs, R.R., R. Duerr, D.J. Hills, and H.K. Ramapriyan, 2015: Data Stewardship in the Earth Sciences. D-Lib Magazine, 21, doi: 10.1045/july2015-downs EUMETSAT, 2013: CORE-CLIMAX Climate Data Record Assessment Instruction Manual. Version 2, 25 November 2013. EUMETSAT, 2015: GAIA-CLIM Measurement Maturity Matrix Guidance: Gap Analysis for Integrated Atmospheric ECV Climate Monitoring: Report on system of systems approach adopted and rationale. Version: 27 Nov 2015. FGDC, 2016: National Geospatial Data Asset (NGDA) Lifecycle Maturity Assessment (LMA) 2015 Report - Analysis and Recommendations. Version: 8 December 2016. Hou, C.-Y., M. Mayermik, G. Peng, R. Duerr, and A. Rosati, 2015: Assessing formation quality: Use case studies for the data stewardship maturity matrix. Poster: IN21A-1675. 2015 AGU Fall meeting, 14 – 18 December 2015, San Francisco, CA, USA. National Research Council, 2007: Environmental data management at NOAA: Archiving, stewardship, and access. 116 pp. The National Academies Press, Washington, D.C. NCEI MM-Serv WG (Use/Service Maturity Matrix Working Group), 2017: A reference framework for assessing service maturity of digital environmental datasets. Under development.
  27. 27. Reference – Cont. Peng, G., J.L. Privette, E.J. Kearns, N.A. Ritchey, and S. Ansari, 2015: A unified framework for measuring stewardship practices applied to digital environmental datasets. Data Science Journal, 13, 231 - 253. doi: Peng, G., H. Ramapriyan, and D. F. Moroni, 2016a: The State of Building a Consistent Framework for Curation and Presentation of Earth Science Data Quality. Poster: IN41C.1666, AGU 2016 Fall Meeting, 12 – 16 December 2016, San Francisco, CA, USA. Peng, G., N. A. Ritchey, K. S. Casey, E. J. Kearns, J. L. Privette, D. Saunders, P. Jones, T. Maycock, and S. Ansari, 2016b: Scientific stewardship in the Open Data and Big Data era - Roles and responsibilities of stewards and other major product stakeholders. D.-Lib Magazine. 22, doi:10.1045/may2016-peng. Peng, G., J. Lawrimore, V. Toner, C. Lief, R. Baldwin, N. Ritchey, and D. Bringar, 2016c: Assessment of Stewardship Maturity of the Global Historical Climatology Network-Monthly (GHCN-M) Dataset and Lessons Learned. D.-Lib Magazine,22, doi:10.1045/nov2016-peng. Ritchey, N. and G. Peng, 2015: Assessing stewardship maturity: use case study results and lessons learned. IN14A-05, 2015 AGU Fall meeting, 14 – 18 December 2015, San Francisco, CA, USA. Ritchey, N.A., G. Peng, A. Milan, P. Lemieux, R. Partee, R. Lonin, and K.S. Casey, 2016: Practical Application of the Data Stewardship Maturity Model for NOAA’s OneStop Project. IN42D-08. AGU 2016 Fall Meeting, 12 – 16 December 2016, San Francisco, CA, USA. Zhou, L. H., M. Divakarla, and X. P. Liu, 2016: An Overview of the Joint Polar Satellite System (JPSS) Science Data Product Calibration and Validation. Remote Sensing, 8(2). doi:10.3390/rs8020139 Zinn, S., J. Relph, G. Peng, A. Milan, and A. Rosenberg, 2017: Design and implementation of automation tools for DSMM diagrams and reports. Invited Talk. ESIP 2017 Winter Meeting, 11 – 13 January 2017, Bethesda, MD, USA.
  28. 28. A self-assessment template using the latest DSMM is available at: