This document discusses data quality and data profiling. It begins by describing problems with data like duplication, inconsistency, and incompleteness. Good data is a valuable asset while bad data can harm a business. Data quality is assessed based on dimensions like accuracy, consistency, completeness, and timeliness. Data profiling statistically examines data to understand issues before development begins. It helps assess data quality and catch problems early. Common analyses include analyzing null values, keys, formats, and more. Data profiling is conducted using SQL or profiling tools during requirements, modeling, and ETL design.
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of a high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy, which in turns allows for speedy identification of business problems, delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues.
Over the course of this webinar, we will:
Help you understand foundational Data Quality concepts based on “The DAMA Guide to the Data Management Body of Knowledge” (DAMA DMBOK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
Data Profiling: The First Step to Big Data QualityPrecisely
Big data offers the promise of a data-driven business model generating new revenue and competitive advantage fueled by new business insights, AI, and machine learning. Yet without high quality data that provides trust, confidence, and understanding, business leaders continue to rely on gut instinct to drive business decisions.
The critical foundation and first step to deliver high quality data in support of a data-driven view that truly leverages the value of big data is data profiling - a proven capability to analyze the actual data content and help you understand what's really there.
View this webinar on-demand to learn five core concepts to effectively apply data profiling to your big data, assess and communicate the quality issues, and take the first step to big data quality and a data-driven business.
Data Verification and Validation - Melissa Data helps you in analyzing, cleansing & match data quality, data standardization and data quality management services for your organization.
Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process and technology. Join Donna Burbank and Nigel Turner as they provide practical ways to control data quality issues in your organization.
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of a high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy, which in turns allows for speedy identification of business problems, delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues.
Over the course of this webinar, we will:
Help you understand foundational Data Quality concepts based on “The DAMA Guide to the Data Management Body of Knowledge” (DAMA DMBOK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
Data Profiling: The First Step to Big Data QualityPrecisely
Big data offers the promise of a data-driven business model generating new revenue and competitive advantage fueled by new business insights, AI, and machine learning. Yet without high quality data that provides trust, confidence, and understanding, business leaders continue to rely on gut instinct to drive business decisions.
The critical foundation and first step to deliver high quality data in support of a data-driven view that truly leverages the value of big data is data profiling - a proven capability to analyze the actual data content and help you understand what's really there.
View this webinar on-demand to learn five core concepts to effectively apply data profiling to your big data, assess and communicate the quality issues, and take the first step to big data quality and a data-driven business.
Data Verification and Validation - Melissa Data helps you in analyzing, cleansing & match data quality, data standardization and data quality management services for your organization.
Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process and technology. Join Donna Burbank and Nigel Turner as they provide practical ways to control data quality issues in your organization.
Introduction to Data Governance
Seminar hosted by Embarcadero technologies, where Christopher Bradley presented a session on Data Governance.
Drivers for Data Governance & Benefits
Data Governance Framework
Organization & Structures
Roles & responsibilities
Policies & Processes
Programme & Implementation
Reporting & Assurance
DAS Slides: Data Quality Best PracticesDATAVERSITY
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
Data Governance is both a technical and an organizational discipline, and getting Data Governance right requires a combination of Data Management fundamentals aligned with organizational change and stakeholder buy-in. Join Nigel Turner and Donna Burbank as they provide an architecture-based approach to aligning business motivation, organizational change, Metadata Management, Data Architecture and more in a concrete, practical way to achieve success in your organization.
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
In this lecture we discuss data quality and data quality in Linked Data. This 50 minute lecture was given to masters student at Trinity College Dublin (Ireland), and had the following contents:
1) Defining Quality
2) Defining Data Quality - What, Why, Costs
3) Identifying problems early - using a simple semantic publishing process as an example
4) Assessing Linked (big) Data quality
5) Quality of LOD cloud datasets
References can be found at the end of the slides
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA-40) International License.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
• History of Data Management
• Business Drivers for implementation of data governance • Building Data Strategy & Governance Framework
• Data Management Maturity Models
• Data Quality Management
• Metadata and Governance
• Metadata Management
• Data Governance Stakeholder Communication Strategy
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldn’t it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
Data Marketplace and the Role of Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3IS9sQS
A data marketplace is like an online shopping interface specializing in data. Ideally, it should work just like an online store, with minimal latency and maximum responsiveness. However, this does not mean that all of the data in the data marketplace needs to be stored in the same central repository.
In this session, Shadab Hussain, Americas Sales Head, Data Analytics at Wipro, a partner company with Denodo and a co-sponsor of DataFest 2021, talks about the role of data virtualization in enabling full-featured data marketplaces. Such data marketplaces provide real-time, curated access to data, even when the data is stored across many different sources throughout the organization.
You will learn:
- The main features of a data marketplace
- Why organizations need data marketplaces
- Why data marketplaces sometimes fail
- How data virtualization enables the most effective data marketplaces
- How one of Europe’s premiere public healthcare system organizations leveraged a data marketplace to improve data consumption and ease of access
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDATAVERSITY
Roles and responsibilities are a critical component of every Data Governance program. Building a set of roles that are practical and that will not interfere with people’s “day jobs” is an important consideration that will influence how well your program is adopted. This tutorial focuses on sharing a proven model guaranteed to represent your organization.
Join Bob Seiner for this lively webinar where he will dissect a complete Operating Model of Roles and Responsibilities that encompasses all levels of the organization. Seiner will detail the roles and describe the most effective way to associate people with the roles. You will walk out of this webinar with a model to apply to your organization.
In this session Bob will share:
- The five levels of Data Governance roles
- A proven Operating Model of Roles and Responsibilities
- How to customize the model to meet your requirements
- Setting appropriate role expectations
- How to operationalize the roles and demonstrate value
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
When starting or evaluating the present state of your Data Governance program, it is important to focus on best practices such that you don’t take a ready, fire, aim approach. Best practices need to be practical and doable to be selected for your organization, and the program must be at risk if the best practice is not achieved.
Join Bob Seiner for an important webinar focused on industry best practice around standing up formal Data Governance. Learn how to assess your organization against the practices and deliver an effective roadmap based on the results of conducting the assessment.
In this webinar, Bob will focus on:
- Criteria to select the appropriate best practices for your organization
- How to define the best practices for ultimate impact
- Assessing against selected best practices
- Focusing the recommendations on program success
- Delivering a roadmap for your Data Governance program
How do you assess the quality and reliability of data sources in data analysi...Soumodeep Nanee Kundu
**Assessing the Quality and Reliability of Data Sources in Data Analysis**
Data is often referred to as the lifeblood of data analysis. It forms the foundation upon which decisions are made, insights are drawn, and actions are taken. However, not all data is created equal. The quality and reliability of data sources are paramount to the success of data analysis efforts. In this essay, we will explore the intricate process of assessing data quality and reliability, touching on the methods, considerations, and best practices to ensure the data used in the analysis is trustworthy and fit for purpose.
Introduction to Data Governance
Seminar hosted by Embarcadero technologies, where Christopher Bradley presented a session on Data Governance.
Drivers for Data Governance & Benefits
Data Governance Framework
Organization & Structures
Roles & responsibilities
Policies & Processes
Programme & Implementation
Reporting & Assurance
DAS Slides: Data Quality Best PracticesDATAVERSITY
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
Data Governance is both a technical and an organizational discipline, and getting Data Governance right requires a combination of Data Management fundamentals aligned with organizational change and stakeholder buy-in. Join Nigel Turner and Donna Burbank as they provide an architecture-based approach to aligning business motivation, organizational change, Metadata Management, Data Architecture and more in a concrete, practical way to achieve success in your organization.
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
In this lecture we discuss data quality and data quality in Linked Data. This 50 minute lecture was given to masters student at Trinity College Dublin (Ireland), and had the following contents:
1) Defining Quality
2) Defining Data Quality - What, Why, Costs
3) Identifying problems early - using a simple semantic publishing process as an example
4) Assessing Linked (big) Data quality
5) Quality of LOD cloud datasets
References can be found at the end of the slides
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA-40) International License.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
• History of Data Management
• Business Drivers for implementation of data governance • Building Data Strategy & Governance Framework
• Data Management Maturity Models
• Data Quality Management
• Metadata and Governance
• Metadata Management
• Data Governance Stakeholder Communication Strategy
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldn’t it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
Data Marketplace and the Role of Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3IS9sQS
A data marketplace is like an online shopping interface specializing in data. Ideally, it should work just like an online store, with minimal latency and maximum responsiveness. However, this does not mean that all of the data in the data marketplace needs to be stored in the same central repository.
In this session, Shadab Hussain, Americas Sales Head, Data Analytics at Wipro, a partner company with Denodo and a co-sponsor of DataFest 2021, talks about the role of data virtualization in enabling full-featured data marketplaces. Such data marketplaces provide real-time, curated access to data, even when the data is stored across many different sources throughout the organization.
You will learn:
- The main features of a data marketplace
- Why organizations need data marketplaces
- Why data marketplaces sometimes fail
- How data virtualization enables the most effective data marketplaces
- How one of Europe’s premiere public healthcare system organizations leveraged a data marketplace to improve data consumption and ease of access
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDATAVERSITY
Roles and responsibilities are a critical component of every Data Governance program. Building a set of roles that are practical and that will not interfere with people’s “day jobs” is an important consideration that will influence how well your program is adopted. This tutorial focuses on sharing a proven model guaranteed to represent your organization.
Join Bob Seiner for this lively webinar where he will dissect a complete Operating Model of Roles and Responsibilities that encompasses all levels of the organization. Seiner will detail the roles and describe the most effective way to associate people with the roles. You will walk out of this webinar with a model to apply to your organization.
In this session Bob will share:
- The five levels of Data Governance roles
- A proven Operating Model of Roles and Responsibilities
- How to customize the model to meet your requirements
- Setting appropriate role expectations
- How to operationalize the roles and demonstrate value
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
When starting or evaluating the present state of your Data Governance program, it is important to focus on best practices such that you don’t take a ready, fire, aim approach. Best practices need to be practical and doable to be selected for your organization, and the program must be at risk if the best practice is not achieved.
Join Bob Seiner for an important webinar focused on industry best practice around standing up formal Data Governance. Learn how to assess your organization against the practices and deliver an effective roadmap based on the results of conducting the assessment.
In this webinar, Bob will focus on:
- Criteria to select the appropriate best practices for your organization
- How to define the best practices for ultimate impact
- Assessing against selected best practices
- Focusing the recommendations on program success
- Delivering a roadmap for your Data Governance program
How do you assess the quality and reliability of data sources in data analysi...Soumodeep Nanee Kundu
**Assessing the Quality and Reliability of Data Sources in Data Analysis**
Data is often referred to as the lifeblood of data analysis. It forms the foundation upon which decisions are made, insights are drawn, and actions are taken. However, not all data is created equal. The quality and reliability of data sources are paramount to the success of data analysis efforts. In this essay, we will explore the intricate process of assessing data quality and reliability, touching on the methods, considerations, and best practices to ensure the data used in the analysis is trustworthy and fit for purpose.
From Asset to Impact - Presentation to ICS Data Protection Conference 2011Castlebridge Associates
This is a presentation I delivered to the Irish Computer Society Data Protection Conference in February 2011 and again on a webinar for dataqualitypro.com in March 2011.
It looks (for what I believe was the first time) at the relationship between Information Quality and Data Governance principles and practices and the objectives of Data Protection/Privacy compliance. it includes my first version of the mapping of the 8 Data Protection principles to the POSMAD Information Life Cycle referred to by McGilvray and others in the IQ/DQ fields.
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
Data Analytics has emerged has one of the central aspects of business operations. Consequently, the quest to grab professional positions within the Data Analytics domain has assumed unimaginable proportions. So if you too happen to be someone who is desirous of making through a Data Analyst .
Data-Ed Online: Trends in Data ModelingDATAVERSITY
Businesses cannot compete without data. Every organization produces and consumes it. Data trends are hitting the mainstream and businesses are adopting buzzwords such as Big Data, data vault, data scientist, etc., to seek solutions for their fundamental data issues. Few realize that the importance of any solution, regardless of platform or technology, relies on the data model supporting it. Data modeling is not an optional task for an organization’s data remediation effort. Instead, it is a vital activity that supports the solution driving your business.
This webinar will address emerging trends around data model application methodology, as well as trends around the practice of data modeling itself. We will discuss abstract models and entity frameworks, as well as the general shift from data modeling being segmented to becoming more integrated with business practices.
Takeaways:
How are anchor modeling, data vault, etc. different and when should I apply them?
Integrating data models to business models and the value this creates
Application development (Data first, code first, object first)
Data quality testing – a quick checklist to measure and improve data qualityJaveriaGauhar
Don't wait for a data migration event to test your data quality. Perform data quality tests now before it gets too late. Here's everything you need to know!
https://dataladder.com/data-quality-test-checklist/
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
2. Introduction
Today is world of heterogeneity.
We have different technologies.
We operate on different platforms.
We have large amount of data being generated
everyday in all sorts of organizations and
Enterprises.
And we do have problems with data.
4. Why data quality matters?
Good data is your most valuable asset, and
bad data can seriously harm your
business and credibility…
1.What have you missed?
2.When things go wrong.
3.Making confident decisions.
5. What is data quality?
Data quality is a perception or an assessment
of data’s fitness to serve its purpose in a given
context.
It is described by several dimensions like
•Correctness / Accuracy : Accuracy of data is the
degree to which the captured data correctly describes
the real world entity.
•Consistency: This is about the single version of truth.
Consistency means data throughout the enterprise
should be sync with each other.
6. Contd…
•Completeness: It is the extent to which
the expected attributes of data are
provided.
•Timeliness: Right data to the right person
at the right time is important for business.
•Metadata: Data about data.
7. Maintenance of data quality
Data quality results from the process of going through
the data and scrubbing it, standardizing it, and de
duplicating records, as well as doing some of the data
enrichment.
1. Maintain complete data.
2. Clean up your data by standardizing it using rules.
3. Use fancy algorithms to detect duplicates. Eg: ICS
and Informatics Computer System.
4. Avoid entry of duplicate leads and contacts.
5. Merge existing duplicate records.
6. Use roles for security.
8. Bill no CustomerName SocialSecurityNumber
101 Mr. Aleck Stevenson ADWPS10017
Bill no CustomerName SocialSecurityNumber
205 Mr. S Aleck ADWPS10017
Bill no CustomerName SocialSecurityNumber
314 Mr. Stevenson Aleck ADWPS10017
Bill no CustomerName SocialSecurityNumber
316 Mr. Alec Stevenson ADWPS10017
Invoice 3
Invoice 2
Invoice 4
Invoice 1
Inconsistent data before cleaning up
9. Bill no CustomerName SocialSecurityNumber
205 Mr. Aleck Stevenson ADWPS10017
Bill no CustomerName SocialSecurityNumber
101 Mr. Aleck Stevenson ADWPS10017
Bill no CustomerName SocialSecurityNumber
314 Mr. Aleck Stevenson ADWPS10017
Bill no CustomerName SocialSecurityNumber
316 Mr. Aleck Stevenson ADWPS10017
Invoice 1
Invoice 4
Invoice 3
Invoice 2
Consistent data after cleaning up
11. Context
In process of data warehouse design, many database
professionals face situations like:
1. Several data inconsistencies in source, like missing
records or NULL values.
2. Or, column they chose to be the primary key column is
not unique throughout the table.
3. Or, schema design is not coherent to the end user
requirement.
4. Or, any other concern with the data, that must have been
fixed right at the beginning.
12. To fix such data quality issues would mean
making changes in ETL data flow
packages., cleaning the identified
inconsistencies etc.
This in turn will lead to a lot of re-work to be
done.
Re-work will mean added costs to the
company, both in terms of time and effort.
So, what one would do in such a case?
13. Solution
Instead of a solution to the problem, it would be
better to catch it right at the start before it
becomes a problem.
After all “PREVENTION IS BETTER THAN CURE”.
Hence data profiling software came to the
rescue.
14. What is data profiling ?
It is the process of statistically examining and analyzing
the content in a data source, and hence collecting
information about the data. It consists of techniques
used to analyze the data we have for accuracy and
completeness.
1. Data profiling helps us make a thorough assessment
of data quality.
2. It assists the discovery of anomalies in data.
3. It helps us understand
content, structure, relationships, etc. about the data
in the data source we are analyzing.
15. Contd…
4. It helps us know whether the existing data can be
applied to other areas or purposes.
5. It helps us understand the various
issues/challenges we may face in a database
project much before the actual work begins. This
enables us to make early decisions and act
accordingly.
6. It is also used to assess and validate metadata.
16. When and how to conduct data
profiling?
Generally, data profiling is conducted in two
ways:
1.Writing SQL queries on sample data extracts
put into a database.
2.Using data profiling tools.
17. When to conduct Data
Profiling?
-> At the discovery/requirements
gathering phase
-> Just before the dimensional modeling
process
-> During ETL package design.
18. How to conduct Data Profiling?
Data profiling involves statistical analysis of the data at
source and the data being loaded, as well as analysis of
metadata. These statistics may be used for various
analysis purposes. Common examples of analyses to be
done are:
Data quality: Analyze the quality of data at the data
source.
NULL values: Look out for the number of NULL values in
an attribute.
19. Candidate keys: Analysis of the extent to which certain
columns are distinct will give developer useful
information w. r. t. selection of candidate keys.
Primary key selection: To check whether the candidate key
column does not violate the basic requirements of not
having NULL values or duplicate values.
Empty string values: A string column may contain NULL or
even empty sting values that may create problems later.
String length: An analysis of largest and shortest possible
length as well as the average string length of a sting-type
column can help us decide what data type would be
most suitable for the said column.
20. Identification of cardinality: The cardinality relationships
are important for inner and outer join considerations
with regard to several BI tools.
Data format: Sometimes, the format in which certain
data is written in some columns may or may not be
user-friendly.
21. Common Data Profiling Software
Most of the data-integration/analysis soft-wares have data
profiling built into them. Alternatively, various independent
data profiling tools are also available. Some popular ones are:
• Trillium Enterprise Data quality
• Datiris Profiler
• Talend Data Profiler
• IBM Infosphere Information Analyzer
• SSIS Data Profiling Task
• Oracle Warehouse Builder