A first step towards defining roles and formalizing responsibilities of key players and stakeholders for ensuring and improving data quality and usability of digital Earth Science datasets
Introducing the concept of multi-domain stewards serving as a knowledge and communication hub for effective long-term scientific stewardship of digital environmental data products.
Applying a User-Centered Design Approach to Improve Data Use in Decision MakingMEASURE Evaluation
This document summarizes the application of a user-centered design approach to improve data use in decision making. Key activities included conducting immersion interviews with data users, holding design workshops to understand barriers and generate ideas, and prototyping solutions. Some prototypes developed included a digital portal for accessing data and policies, a social media platform for communication, and data use scorecards for facilities. The process identified technical, behavioral, and organizational barriers to data use and provided lessons on engaging stakeholders and testing prototypes.
This presentation introduced participants to the DC 101 course and was given at the Digital Curation and Preservation Outreach and Capacity Building Workshop in Belfast on September 14-15 2009.
http://www.dcc.ac.uk/events/workshops/digital-curation-and-preservation-outreach-and-capacity-building-workshop
This document discusses the fundamentals of data quality management. It begins by introducing the speaker, Laura Sebastian-Coleman, and providing an abstract and agenda for the presentation. The abstract states that while organizations rely on data, traditional data management requires many skills and a strategic perspective. Technology changes have increased data volume, velocity and variety, but veracity is still a challenge. Both traditional and big data must be managed together. The presentation will revisit data quality management fundamentals and how to apply them to traditional and big data environments. Attendees will learn how to assess their data environment and provide reliable data to stakeholders.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
This webinar focuses on obtaining business value from data quality initiatives. The presenter will illustrate how chronic business challenges can often be traced to poor data quality. Data quality should be engineered by providing a framework to more quickly identify business and data problems, as well as prevent recurring issues caused by structural or process defects. The webinar will cover data quality definitions, the data quality engineering cycle and complications, causes of data quality issues, quality across the data lifecycle, tools for data quality engineering, and takeaways.
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar focuses on obtaining business value from data quality initiatives. I will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...DATAVERSITY
Are you looking to measure Data Quality in a more organized way? Look no further, use the Conformed Dimensions of Data Quality to organize your efforts, improve communication with stakeholders and track improvement over time. In this webinar, Information Quality practitioner Dan Myers will present the Conformed Dimensions of Data Quality framework along with the complete results of the 3rd Annual Dimensions of Data Quality survey. This presentation will provide the first view of the 2017 results, and all attendees will receive the associated whitepaper free.
In this webinar you will learn:
Why organizations use the Dimensions of Data Quality
Why there are so many options, and what he recommends you use
3rd Annual Survey data about how frequently organizations use the dimensions and specifically which dimensions are most used
Industry trends in adoption and more resources on the topic
Introducing the concept of multi-domain stewards serving as a knowledge and communication hub for effective long-term scientific stewardship of digital environmental data products.
Applying a User-Centered Design Approach to Improve Data Use in Decision MakingMEASURE Evaluation
This document summarizes the application of a user-centered design approach to improve data use in decision making. Key activities included conducting immersion interviews with data users, holding design workshops to understand barriers and generate ideas, and prototyping solutions. Some prototypes developed included a digital portal for accessing data and policies, a social media platform for communication, and data use scorecards for facilities. The process identified technical, behavioral, and organizational barriers to data use and provided lessons on engaging stakeholders and testing prototypes.
This presentation introduced participants to the DC 101 course and was given at the Digital Curation and Preservation Outreach and Capacity Building Workshop in Belfast on September 14-15 2009.
http://www.dcc.ac.uk/events/workshops/digital-curation-and-preservation-outreach-and-capacity-building-workshop
This document discusses the fundamentals of data quality management. It begins by introducing the speaker, Laura Sebastian-Coleman, and providing an abstract and agenda for the presentation. The abstract states that while organizations rely on data, traditional data management requires many skills and a strategic perspective. Technology changes have increased data volume, velocity and variety, but veracity is still a challenge. Both traditional and big data must be managed together. The presentation will revisit data quality management fundamentals and how to apply them to traditional and big data environments. Attendees will learn how to assess their data environment and provide reliable data to stakeholders.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
This webinar focuses on obtaining business value from data quality initiatives. The presenter will illustrate how chronic business challenges can often be traced to poor data quality. Data quality should be engineered by providing a framework to more quickly identify business and data problems, as well as prevent recurring issues caused by structural or process defects. The webinar will cover data quality definitions, the data quality engineering cycle and complications, causes of data quality issues, quality across the data lifecycle, tools for data quality engineering, and takeaways.
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar focuses on obtaining business value from data quality initiatives. I will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...DATAVERSITY
Are you looking to measure Data Quality in a more organized way? Look no further, use the Conformed Dimensions of Data Quality to organize your efforts, improve communication with stakeholders and track improvement over time. In this webinar, Information Quality practitioner Dan Myers will present the Conformed Dimensions of Data Quality framework along with the complete results of the 3rd Annual Dimensions of Data Quality survey. This presentation will provide the first view of the 2017 results, and all attendees will receive the associated whitepaper free.
In this webinar you will learn:
Why organizations use the Dimensions of Data Quality
Why there are so many options, and what he recommends you use
3rd Annual Survey data about how frequently organizations use the dimensions and specifically which dimensions are most used
Industry trends in adoption and more resources on the topic
It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementNick Lynch
Life Science externalisation and collaboration overview and the challenges that Life Science companies face in delivering successful data sharing with their partners in either Open Innovation or pre-competitive workflows
An information system should have several key elements and characteristics for success. It should be functional, usable, operational, scalable and allow for revisions. Specifically, it needs to be user-friendly, reliable, efficient, secure and easily maintained. Maximizing the system requires features like interoperability, portability, reusability and integration. High-quality data is also essential for decision making, and must be timely, complete and accurate. Common challenges to implementation include lengthy timelines and lack of post-go-live support. Critical success factors include stakeholder buy-in, local ownership and an enabling environment.
On this slides, we tried to give an overview of advanced Data quality management (ADQM). To understand about DQ why important, and all those steps of DQ management.
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Takeaways:
•Understanding foundational data quality concepts based on the DAMA DMBOK
•Utilizing data quality engineering in support of business strategy
•Case Studies illustrating data quality success
•Data Quality guiding principles & best practices
•Steps for improving data quality at your organization
Custodian Interviews - How to Leverage a Valuable Opportunity Logikcull.com
Custodian interviews are a valuable opportunity to satisfy preservation obligations, gather discovery intelligence, and disseminate litigation information. Key custodians, representative custodians, and departmental custodians should be interviewed. Questions should focus on who they are, what they do, how they communicate and store information, and whether they understand preservation obligations. Answers can be documented through conversational interviews, scripted interviews, or automated questionnaires. The results of custodian interviews can be leveraged to validate preservation efforts, prioritize collection and processing, and customize document review.
Sharon Dawes (CTG Albany) Open data quality: a practical viewOpen City Foundation
This document discusses open data quality and focuses on ensuring data is fit for its intended use. It notes that while open data aims to provide easy access, the value depends on the quality and how users apply the data. Quality issues can arise from how data is originally collected and maintained by different government systems. The document recommends open data providers adopt stewardship practices to maintain metadata and ensure quality, while users should approach data cautiously and look for ways to engage in data communities. Overall it promotes openness but also a realistic view of potential quality problems and the need for tools and strategies to maximize data value for various users.
This document discusses ensuring data quality and mapping outcomes for quality assurance and control. It covers defining data quality standards, quality assurance and quality control activities, and mapping research outcomes to data. Key points include recognizing the need for quality standards, identifying quality assurance and control processes before and during data collection, and documenting all study details to ensure data integrity and reproducibility.
This document provides an introduction to data management. It discusses why managing data well is important, including enabling reproducibility, data sharing and citation. It covers topics such as data entry and manipulation, quality control and assurance. The goal of good data management is to produce high quality, accessible data that can be easily shared and reused.
- Thanuja T has over 9 years of experience in clinical data management, primarily working in oncology.
- She has worked at Quintiles as an Assistant Manager for the past 6 years managing clinical data and a team of 7 reports.
- Prior to Quintiles, she worked at Accenture and Jubilant Biosys in data validation and research roles respectively.
John Koch of Merck presented on improving scientific information management at Merck Research Labs. He discussed the challenges of managing vast amounts of scientific data and information from multiple sources. Merck's Scientific Information Architecture and Search group developed an approach to engage business areas, identify pain points, pilot solutions, and embed improved practices. Their solution called QUICK created a centralized knowledgebase of pre-clinical compound data to address issues like dispersed data, duplicative data capture, and inaccessible definitive data. It is expected to improve data reporting efficiency, analytical productivity, and collaboration while enabling better study selection decisions.
This document provides an introduction to data management. It discusses why data should be managed, including benefits like enabling verification, new research, and cost savings. It also covers topics like data entry and manipulation, quality control, and sharing data. Effective data management results in high quality, accessible data that can be cited, reused, and helps researchers gain recognition.
Digital Preservation - Manage and Provide AccessMichaelPaulmeno
This document discusses managing and providing access to digital content over the long term. It covers several key points:
- Digital preservation involves managing content through its entire lifecycle, from initial creation through long-term storage and access.
- Effective management requires addressing organizational needs, technological opportunities and changes, and available resources. It involves designating responsible people, policies, and technology.
- When providing access, it is important to use proven, sustainable technologies and deliver content completely and accurately according to access policies.
- Legal and rights issues must be considered to ensure appropriate access to content over time based on factors like donor agreements or confidential information.
- Understanding current and future users is essential for developing access strategies
Ashley Ohmann--Data Governance Final 011315Ashley Ohmann
This presentation discusses enterprise data governance with Tableau. It defines data governance as processes that formally manage important data assets. The goals of data governance include establishing standards, processes, compliance, security, and metrics. Good data governance benefits an organization by improving accuracy, enabling better decisions with less waste. The presentation provides examples of how one organization improved data governance through stakeholder involvement, establishing metrics, building a data warehouse, and implementing Tableau for analytics. Key goals discussed are building trust, communicating validity, enabling access, managing metadata, provisioning rights, and maintaining compliance.
How do you assess the quality and reliability of data sources in data analysi...Soumodeep Nanee Kundu
**Assessing the Quality and Reliability of Data Sources in Data Analysis**
Data is often referred to as the lifeblood of data analysis. It forms the foundation upon which decisions are made, insights are drawn, and actions are taken. However, not all data is created equal. The quality and reliability of data sources are paramount to the success of data analysis efforts. In this essay, we will explore the intricate process of assessing data quality and reliability, touching on the methods, considerations, and best practices to ensure the data used in the analysis is trustworthy and fit for purpose.
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012TEST Huddle
Ray Scott discusses test data management in agile environments. He notes that while development may be agile, supporting test data often cannot keep up with frequent changes. Traditional test data generation methods take weeks but agile needs data in hours. He advocates treating test data management as a development project and service. Testers should own the data by determining usage, mapping test conditions to data conditions, and ensuring versioning. With solid data provisioning focusing on business rules and repeatability, testing can add value in agile projects.
Overview of the Research on Open Educational Resources for Development (ROER4D) Open Data initiative, highlighting data management principles, the five pillars of the ROER4D data publication approach and the project de-identification approach.
Improving Stewardship of Scientific Data Through Use of a Maturity MatrixGe Peng
This presentation uses the highly utilized monthly land surface temperature data product derived from the Global Historical Climatology Network (GHCN-M) to demonstrate how a data stewardship maturity matrix (DSMM) can help identify potential areas of improvement in both stewardship practices and system integration. This success story shows how people from multiple disciplines utilized the DSMM to address topics needing improvement by integrating interoperable, high quality metadata with product-specific descriptive information, resulting in enhanced product accessibility and usability.
Service Tools and Social Media Data Sharing Use CaseGe Peng
What to improve sharing and expand user base of your data? This presentation provides a use case study with some of the available service tools and social media in addition to peer-reviewed data publishing.
More Related Content
Similar to New Paradigm for Ensuring and Improving Data Quality and Usability
It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementNick Lynch
Life Science externalisation and collaboration overview and the challenges that Life Science companies face in delivering successful data sharing with their partners in either Open Innovation or pre-competitive workflows
An information system should have several key elements and characteristics for success. It should be functional, usable, operational, scalable and allow for revisions. Specifically, it needs to be user-friendly, reliable, efficient, secure and easily maintained. Maximizing the system requires features like interoperability, portability, reusability and integration. High-quality data is also essential for decision making, and must be timely, complete and accurate. Common challenges to implementation include lengthy timelines and lack of post-go-live support. Critical success factors include stakeholder buy-in, local ownership and an enabling environment.
On this slides, we tried to give an overview of advanced Data quality management (ADQM). To understand about DQ why important, and all those steps of DQ management.
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Takeaways:
•Understanding foundational data quality concepts based on the DAMA DMBOK
•Utilizing data quality engineering in support of business strategy
•Case Studies illustrating data quality success
•Data Quality guiding principles & best practices
•Steps for improving data quality at your organization
Custodian Interviews - How to Leverage a Valuable Opportunity Logikcull.com
Custodian interviews are a valuable opportunity to satisfy preservation obligations, gather discovery intelligence, and disseminate litigation information. Key custodians, representative custodians, and departmental custodians should be interviewed. Questions should focus on who they are, what they do, how they communicate and store information, and whether they understand preservation obligations. Answers can be documented through conversational interviews, scripted interviews, or automated questionnaires. The results of custodian interviews can be leveraged to validate preservation efforts, prioritize collection and processing, and customize document review.
Sharon Dawes (CTG Albany) Open data quality: a practical viewOpen City Foundation
This document discusses open data quality and focuses on ensuring data is fit for its intended use. It notes that while open data aims to provide easy access, the value depends on the quality and how users apply the data. Quality issues can arise from how data is originally collected and maintained by different government systems. The document recommends open data providers adopt stewardship practices to maintain metadata and ensure quality, while users should approach data cautiously and look for ways to engage in data communities. Overall it promotes openness but also a realistic view of potential quality problems and the need for tools and strategies to maximize data value for various users.
This document discusses ensuring data quality and mapping outcomes for quality assurance and control. It covers defining data quality standards, quality assurance and quality control activities, and mapping research outcomes to data. Key points include recognizing the need for quality standards, identifying quality assurance and control processes before and during data collection, and documenting all study details to ensure data integrity and reproducibility.
This document provides an introduction to data management. It discusses why managing data well is important, including enabling reproducibility, data sharing and citation. It covers topics such as data entry and manipulation, quality control and assurance. The goal of good data management is to produce high quality, accessible data that can be easily shared and reused.
- Thanuja T has over 9 years of experience in clinical data management, primarily working in oncology.
- She has worked at Quintiles as an Assistant Manager for the past 6 years managing clinical data and a team of 7 reports.
- Prior to Quintiles, she worked at Accenture and Jubilant Biosys in data validation and research roles respectively.
John Koch of Merck presented on improving scientific information management at Merck Research Labs. He discussed the challenges of managing vast amounts of scientific data and information from multiple sources. Merck's Scientific Information Architecture and Search group developed an approach to engage business areas, identify pain points, pilot solutions, and embed improved practices. Their solution called QUICK created a centralized knowledgebase of pre-clinical compound data to address issues like dispersed data, duplicative data capture, and inaccessible definitive data. It is expected to improve data reporting efficiency, analytical productivity, and collaboration while enabling better study selection decisions.
This document provides an introduction to data management. It discusses why data should be managed, including benefits like enabling verification, new research, and cost savings. It also covers topics like data entry and manipulation, quality control, and sharing data. Effective data management results in high quality, accessible data that can be cited, reused, and helps researchers gain recognition.
Digital Preservation - Manage and Provide AccessMichaelPaulmeno
This document discusses managing and providing access to digital content over the long term. It covers several key points:
- Digital preservation involves managing content through its entire lifecycle, from initial creation through long-term storage and access.
- Effective management requires addressing organizational needs, technological opportunities and changes, and available resources. It involves designating responsible people, policies, and technology.
- When providing access, it is important to use proven, sustainable technologies and deliver content completely and accurately according to access policies.
- Legal and rights issues must be considered to ensure appropriate access to content over time based on factors like donor agreements or confidential information.
- Understanding current and future users is essential for developing access strategies
Ashley Ohmann--Data Governance Final 011315Ashley Ohmann
This presentation discusses enterprise data governance with Tableau. It defines data governance as processes that formally manage important data assets. The goals of data governance include establishing standards, processes, compliance, security, and metrics. Good data governance benefits an organization by improving accuracy, enabling better decisions with less waste. The presentation provides examples of how one organization improved data governance through stakeholder involvement, establishing metrics, building a data warehouse, and implementing Tableau for analytics. Key goals discussed are building trust, communicating validity, enabling access, managing metadata, provisioning rights, and maintaining compliance.
How do you assess the quality and reliability of data sources in data analysi...Soumodeep Nanee Kundu
**Assessing the Quality and Reliability of Data Sources in Data Analysis**
Data is often referred to as the lifeblood of data analysis. It forms the foundation upon which decisions are made, insights are drawn, and actions are taken. However, not all data is created equal. The quality and reliability of data sources are paramount to the success of data analysis efforts. In this essay, we will explore the intricate process of assessing data quality and reliability, touching on the methods, considerations, and best practices to ensure the data used in the analysis is trustworthy and fit for purpose.
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012TEST Huddle
Ray Scott discusses test data management in agile environments. He notes that while development may be agile, supporting test data often cannot keep up with frequent changes. Traditional test data generation methods take weeks but agile needs data in hours. He advocates treating test data management as a development project and service. Testers should own the data by determining usage, mapping test conditions to data conditions, and ensuring versioning. With solid data provisioning focusing on business rules and repeatability, testing can add value in agile projects.
Overview of the Research on Open Educational Resources for Development (ROER4D) Open Data initiative, highlighting data management principles, the five pillars of the ROER4D data publication approach and the project de-identification approach.
Similar to New Paradigm for Ensuring and Improving Data Quality and Usability (20)
Improving Stewardship of Scientific Data Through Use of a Maturity MatrixGe Peng
This presentation uses the highly utilized monthly land surface temperature data product derived from the Global Historical Climatology Network (GHCN-M) to demonstrate how a data stewardship maturity matrix (DSMM) can help identify potential areas of improvement in both stewardship practices and system integration. This success story shows how people from multiple disciplines utilized the DSMM to address topics needing improvement by integrating interoperable, high quality metadata with product-specific descriptive information, resulting in enhanced product accessibility and usability.
Service Tools and Social Media Data Sharing Use CaseGe Peng
What to improve sharing and expand user base of your data? This presentation provides a use case study with some of the available service tools and social media in addition to peer-reviewed data publishing.
Non Functional Requirements for Climate Data RecordsGe Peng
This document discusses the importance of data stewardship for climate data records. It outlines several principles and guidelines for ensuring accessible, credible, and useful environmental data, including: (1) data and metadata require expert stewardship to preserve and improve information; (2) long-term stewardship of scientific data and oversight from scientists is important; (3) guidelines from the Office of Management and Budget aim to ensure data quality, objectivity, and integrity.
This document presents a draft maturity matrix for long-term scientific data stewardship. The matrix defines 5 levels of maturity for 10 key components of data stewardship, including preservation, accessibility, usability, production sustainability, and data quality. Each increasing level represents more advanced and formalized approaches to managing the data according to established standards and community best practices. The authors thank various subject matter experts who helped define the maturity levels based on their expertise in areas such as data archiving, access, and product development.
Scientific Data Stewardship Maturity MatrixGe Peng
The document presents a stewardship maturity matrix for digital environmental data products. It outlines six levels of maturity for various aspects of data preservation, accessibility, usability, production sustainability, and data quality assurance/control. Each increasing level incorporates greater definition, implementation, and conformance to community standards for things like archiving, metadata, documentation, data quality procedures, and integrity/authenticity verification. The highest level involves national/international commitments, external reviews, and fully monitored and reported performance of all quality assurance processes.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Open Source Contributions to Postgres: The Basics POSETTE 2024
New Paradigm for Ensuring and Improving Data Quality and Usability
1. A
New
Paradigm
for
Ensuring
and
Improving
Dataset
Quality
and
Usability
–
Roles
and
Responsibili?es
of
Stewards
and
Other
Major
Product
Stakeholders
Ge
Peng
NOAA’s
Coopera?ve
Ins?tute
for
Climate
and
Satellite
–
North
Carolina
(CICS-‐NC)
NC
State
University
and
NOAA’s
Na?onal
Centers
for
Environmental
Informa?on
(NCEI)
In
Collabora?on
with
Nancy
Ritchey,
Kenneth
Casey,
Edward
Kearns,
Jeffrey
PriveQe,
Drew
Saunders,
Philip
Jones,
Tom
Maycock,
and
Steve
Ansari
Version
20160515
CC-‐BY-‐SA
4.0
POC:
gpeng@cicsnc.org
2. What
Is
Data
Quality?
Who
Should
Care?
Ø How
good
or
bad
a
data
product
is.
Ø All
Key
Players
-‐
everyone
who
develops,
creates,
produces,
stewards,
manages,
publishes,
or
serves
the
product
Ø Other
major
product
stakeholders
(including
sponsors,
power
users,
and
management)
Ø General
users
What
Is
Data
Usability?
Ø How
easy
or
hard
a
data
product
is
understood
and
used.
3. Quality
-‐
How
good
or
bad
something
is
• Product
quality
–
degree
to
which
the
data
product
is
produced
and
described
correctly.
• Stewardship
quality
–
degree
to
which
the
data
product
was
being
preserved
and
cared
for
properly.
Steward
-‐
A
person
managing
or
caring
for
other’s
assets
• A
role
in
incorporaSng
processes,
policies,
guidelines
and
responsibiliSes
to
administering
organizaSon’s
data
in
compliance
with
policy
and/or
regulatory
obligaSons.
• Requires
expert
domain
knowledge
and
general
knowledge
for
relevant
domains
and
intenSon
to
ensure
and
improve
the
stewardship
of
other
people’s
datasets.
§ Data
steward:
Ø A
role
responsible
for
managing
both
dataset
and
metadata
§ Scien?fic
steward:
Ø A
role
responsible
for
managing
data
quality
and
usability
§ Technology
steward:
Ø A
role
responsible
for
managing
tools
and
systems
(
Source:
Chisholm
2014;
Peng
et
al.
2016)
4. • Stewards
are
stewardship
roles
assigned
to
domain
subject
maYer
experts
(SMEs)
who
have
general
knowledge
of
other
relevant
domains.
§ SMEs
are
people
with
extensive
knowledge
and
experiences
in
their
local
domains.
§ The
role
of
SME
is
gained
and
not
assigned.
• Stewards
need
to
have
a
mindset
of
caring
for
other
people’s
asset
(e.g.,
data
products)
and
are
capable
of
communicaSng
within
and
across-‐domains.
• One
person
could
be
assigned
more
than
one
stewardship
role.
(Source:
Chisholm
2014;
Peng
et
al.
2016)
Something
about
Stewards
5. Ensuring
and
improving
data
quality
and
usability
throughout
the
life
cycle
of
a
dataset
• Old
days
–
one
person
Ø Primarily
done
by
data
producers
Ø Usability,
i.e.,
easy
to
use,
is
usually
not
taking
into
consideraSon
Ø InformaSon
about
procedures
or
pracSces
on
data
quality
are
hard
to
come
by
Ø Data
choice
is
limited
for
users
and
users
have
no
choice
but
to
wait
for
the
release
of
the
dataset
• Nowadays
–
an
integrated
team
Ø Need
to
be
more
scalable
Ø Need
to
be
more
integrated
Ø Need
to
be
more
Smely
Ø InformaSon
about
methods
and
results
need
to
be
§ readily
available;
in
an
easy
to
understand
and
interoperable
format
Ø Users
have
many
choices
and
they
do
not
have
to
wait
for
or
use
your
data
7. Product
Quality
Stewardship
Quality
Use/Service
Quality
Data
Producers
• Define/Create/Obtain
Stewards
• Maintain/Preserve/Document/Access
Data
Providers/Users
• Use/Service
Food
Quality
• Requirements
• Produc?on/distribu?on
• Info
on
product
specs
• Storage,
transport,
re-‐distribu?on
• Product
packing/labels
• Cooking
instruc?on
• Stores/restaurants/homes
• Derived
products
-‐-‐-‐>
• Timeliness/Presenta?on
Data
Quality
Producers
Middlemen
Providers
A
shared
responsibility
in
ensuring
quality!
8. So
We
All
Have
To
Talk
To
Each
Other
–
That
Is
The
Problem!
(another
example:
adap?ng
ISO
OAIS
RM
for
long-‐term
preserva?on)
Func?onal
En??es
Data
Produc?on
Roles
Ingest
Metadata
Documenta?on
Archive
Dissemina?on
Access
Service
Data
Use
Data
Producer
Metadata
Specialist
Access
POC
Science
POC
User
Service
POC
Access
Specialist
User
Service
POC
Archive
POC
Science
POC
Data
Consumer
Stakeholders
including
Sponsors
and
Management
• We
do
not
talk
in
the
same
language
• We
do
not
communicate
in
the
same
channel
Potential interfaces in knowledge domains
9. Why
Do
We
Need
to
Define
Roles
of
Stewards?
Data
Producer
Metadata
POC
Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standard
10. Why
Do
We
Need
to
Define
Roles
of
Stewards?
Stewards
help
capture
and
convey
DQ
info
into
the
context
of
DQ
metadata!
Data
Producer
Metadata
POC
Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standard
11. Why
Do
We
Need
to
Define
Responsibili?es
of
Key
Players
and
Stakeholders?
Data
Producer
Program
Managers
Metadata
POC
Stewardship
Management
Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standards
Ø Crea?ng
and
improving
DQ
metadata
and
documenta?ons
is
beyond
the
current
job
scope
and
exper?se
of
data
providers
and
metadata
curators.
Ø Defining
responsibili?es
will
help
facilitate
the
process!
Ø It
will
help
raise
the
awareness
and
improve
requirements
of
data
quality
and
usability.
You
are
responsible
for
data
quality
of
your
data.
So
you
should
provide
us
with
the
DQ
metadata!
You
are
responsible
for
metadata.
You
should
create
the
DQ
metadata
yourself!
12. First
Step
in
Formalizing
Roles
and
High-‐Level
Responsibili?es
13. Data
Producer
• Ensure
and
improve
Scien,fic
Quality
of
the
data
product
-‐
defining
and
documen?ng
data
product
accuracy,
precision,
uncertainty
sources
and
es?mates
• Ensure
Data
Quality
during
produc?on–
screening/assurance
• Assess
and
improve
Data
Quality
–
verifica?on/valida?on
• Ensure
Data
Integrity
–
crea?on/staging
• Help
ensure
Preservability
-‐
providing
informa?on
about
data
product
(?me,
space,
size,
variables,
etc.)
• Ensure
Produc,on
Sustainability
• Help
Ensure
Transparency
-‐
providing
informa?on
on
data
source,
algorithm
and
processing
steps,
and
error
es?mates/
sources
• Ensure
and
improve
Data
Usability
-‐
providing
informa?on
about
the
product
(update
frequency,
latency,
variable
aQributes,
etc.)
and
guidance
on
data
use
Roles
Responsibili?es
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
14. • Ensure
Data
Integrity
–
ingest
and
archive
• Ensure
and
improve
Data
Provenance
and
Traceability
• Improve
Data
Quality
metadata
• Ensure
and
improve
archiving
requirements
• Assess/improve
Data
Quality
–
Evalua?on/verifica?on
• Promote
and
improve
Data
Usability
–
Characteriza?on
• Help
ensure
and
improve
Data
Quality
metadata
• Ensure
and
improve
data
quality
and
usability
requirements
• Ensure
Data
Integrity
–
ingest,
archive
retrieval,
data
access,
and
file
system
and
technology
upgrade
• Ensure
and
Improve
Data
Accessibility
and
Discoverability
• Promote
and
improve
Data
Interoperability
• Ensure
and
improve
sobware
and
system
requirements
Data
Steward
Scien?fic
Steward
Technology
Steward
Roles
Responsibili?es
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
15. End-‐User
• Request
Transparency
in
data
quality
procedures
and
prac?ces
• Request
Provenance
of
the
data
product
• Request
evalua?on
results
of
product,
stewardship,
and
service
maturity
of
the
data
product
• Provide
feedback
on
Quality
and
Usability
of
the
data
product
Manager
• Help
increase
awareness
of
Data
Quality
and
Usability
• Help
improve
data
quality
and
usability
requirements
• Help
ensure
Data
Interoperability
Sponsor
• Define
Data
Quality
and
Usability
requirements
• Require
data
quality
oversight
and
monitoring
• Encourage
Transparency
in
data
quality
procedures
and
prac?ces
Data
Distributor
• Ensure
and
improve
Representa,on
of
data
quality
informa?on
• Ensure
and
improve
Traceability
of
data
quality
informa?on
• Ensure
user
feedback
• Help
improve
data
quality
and
usability
requirements
Roles
Responsibili?es
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
16. Data
Originator
• Ensure and improve Scientific Quality of the data product - defining and
documenting data product accuracy, precision, uncertainty sources and estimates
• Ensure Data Quality during production– screening/assurance
• Assess and improve Data Quality – verification/validation
• Ensure Data Integrity – creation/staging
• Help ensure Preservability - providing information about data product (time, space,
size, variables, etc.)
• Ensure Production Sustainability
• Help Ensure Transparency - providing information on data source, algorithm and
processing steps, and error estimates/sources
• Ensure and improve Data Usability - providing information about the product (update
frequency, latency, variable attributes, etc.) and guidance on data use
Data
Steward
• Ensure Data Integrity – ingest and archive
• Ensure and improve Data Provenance and Traceability
• Improve Data Quality metadata
• Ensure and improve archiving requirements
Technology
Steward
• Ensure Data Integrity – ingest, archive retrieval, data access, and file system and
technology upgrade
• Ensure and Improve Data Accessibility and Discoverability
• Promote and improve Data Interoperability
• Ensure and improve software and system requirements
Scien?fic
Steward
• Assess/improve Data Quality – Evaluation/verification
• Promote and improve Data Usability – Characterization
• Help ensure and improve Data Quality metadata
• Ensure and improve data quality and usability requirements
Documenta?on
• Capture
• Convey
• Be
traceable
• Be
transparent
• Be
machine
–
readable
• Be
human-‐
understandable
Quality
Ra?ng
• Assess
• Improve
• Be
transparent
• Be
quanSfiable
• Be
machine-‐
readable
• Be
human-‐
understandable
• Understandable
info
for
users
• Ac?onable
info
for
management
• Integrable
tags
for
machines
Roles
Responsibili?es
One
person
may
wear
several
hats!
End-‐User
• Request Transparency in data quality procedures and practices
• Request Provenance of the data product
• Request evaluation results of product, stewardship, and service maturity of the data
product
• Provide feedback on Quality and Usability of the data product
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
Data
Distributor
• Ensure and improve Representation of data quality information
• Ensure and improve Traceability of data quality information
• Ensure user feedback
• Help improve data quality and usability requirements
Sponsor
• Define Data Quality and Usability requirements
• Require data quality oversight and monitoring
• Encourage Transparency in data quality procedures and practices
Manager
• Help increase awareness of Data Quality and Usability
• Help improve data quality and usability requirements
• Help ensure Data Interoperability
Version:
20160515
CC-‐BY-‐SA
4.0
POC:
gpeng@cicsnc.org
17. Take
Away
Messages
• Ensuring
data
quality
is
an
end-‐to-‐end
process
and
a
shared
responsibility
of
all
key
players
(data
producers,
managers/stewards,
providers/publishers)
and
other
major
stakeholders
(sponsors,
power
users,
and
management).
• Effec?ve
stewardship
of
scien?fic
data
requires:
§ Expert
domain
knowledge
in
data
management,
technology,
and
science
§ ConSnuous
oversight
from
all
stewards,
and
§ Open
and
conSnuous
communicaSon
among
key
players
and
stakeholders
• Defining
roles
and
responsibili?es
of
key
players
and
stakeholders
will
help
facilitate
the
process
of
§ Ensuring
and
improving
dataset
quality
and
usability
§ Capturing
and
conveying
informaSon
about
data
quality
18. Acknowledgement
The
idea
of
using
food
quality
for
an
analog
of
data
quality
originated
from
one
of
the
family
dinner
table
discussions.
I
thank
my
family
for
beneficial
discussions
that
followed,
for
allowing
me
to
use
them
as
“Guinea
Pigs”,
and
for
their
helpful
comments!
To
cite
this
presenta?on
Peng,
G.,
2015:
A
New
Paradigm
for
Ensuring
and
Improving
Dataset
Quality
and
Usability
–
Roles
and
ResponsibiliSes
of
Stewards
and
Other
Major
Product
Stakeholders.
Updated:
May
15,
2016.
Slideshare.
Access
date:
mm/dd/yyyy.
View
Latest
Version
of
This
Presenta?on
hYp://Snyurl.com/RolesRs-‐DQU
Related
Presenta?on:
Stewards
–
Knowledge
and
CommunicaSon
Hub
hYp://Snyurl.com/Stewards-‐Hub
19. Image
source
hYp://www.busyinbrooklyn.com/wp-‐content/uploads/2013/09/USDA_GRADES.jpg;
hYp://www.kaleelbrothers.com/images/Fresh-‐Produce.png;
hYp://www.pgabeef.com/images/storage_chart.gif;
hYps://www.colorado.gov/pacific/sites/default/files/u/6556/Egg-‐Grading.JPG;
hYp://www.hickmanseggs.com/w3/wp-‐content/uploads/2014/04/egg_size.jpg;
hYps://c2.staScflickr.com/8/7159/6801729225_82e823a5d6_z.jpg;
hYp://www.thepoultrysite.com/arScles/contents/09-‐12CobbChicks1.jpg;
hYp://www.topratedsteakhouses.com/wp-‐content/uploads/2013/12/Grilled-‐Beef-‐with-‐Tomato.jpg;
hYp://cdn2.hubspot.net/hub/66214/file-‐15223310-‐jpg/images/wearingmanyhats.jpg;
References
Chisholm,
M.,
2014:
Data
Stewards
versus
Subject
MaYer
Experts
and
Data
Managers.
Informa/on
Management.
Version:
May
28,
2014.
[Available
online
at:
hYp://
www.informaSon-‐management.com/news/news/data-‐stewards-‐versus-‐subject-‐
maYer-‐experts-‐and-‐data-‐managers-‐10025704-‐1.html.]
Peng,
G.,
N.
A.
Ritchey,
K.
S.
Casey,
E.
J.
Kearns,
J.
L.
PriveYe,
D.
Saunders,
P.
Jones,
T.
Maycock,
and
S.
Ansari,
2016:
ScienSfic
Stewardship
in
the
Open
Data
and
Big
Data
Era
-‐
Roles
and
ResponsibiliSes
of
Stewards
and
Other
Major
Product
Stakeholders.
D.-‐Lib
Magazine,
22.
doi:
10.1045/may2016-‐peng.
[Available
online
at:
hYp://dlib.org/dlib/may16/peng/05peng.html.]