SlideShare a Scribd company logo
Understand Your Data
Transform your downstream cloud analytics with
data quality
Marco de Jong | Sales Engineering Director
What you will learn today
• Why Data needs Data Quality
• How Data Profiling helps
understanding your data
• The top 5 steps needed for
effective data profiling
• How another company saw
success through data profiling
• What you can do in the next 90
days to take action on DQ
Data needs
data quality
“Societal trust in business is
arguably at an all-time low
and, in a world increasingly
driven by data and technology,
reputations and brands are
ever harder to protect.”
EY “Trust in Data and Why it Matters”, 2017.
80%
of AI/ML projects are stalling due to
poor data quality
Dimensional Research, 2019
90%
of executives are concerned about
how misused data can impact
corporate reputation
PWC, 22nd Annual Global CEO Survey, 2019
64%
of IT executives have
trouble finding and cleaning the right
data for strategic data projects
Sierra Venture, 2020
The importance of data
quality in the enterprise:
• Decision making
• Customer centricity
• Compliance
• Machine learning & AI
Understanding your data
Data Profiling
• The set of analytical techniques
that evaluate actual data
content (vs. metadata) to
provide a complete view of
each data element in a data
source.
• Provides summarized
inferences, and details of value
and pattern frequencies to
quickly gain data insights.
• Business Rules
• The data quality or validation
rules that help ensure that data
is “fit for use” in its intended
operational and decision-
making contexts.
• Covers the accuracy,
completeness, consistency,
relevance, timeliness and
validity of data.
Five Key Steps to effective
Data Profiling
These are not new, but good to reiterate
1. How do you want to analyze the data?
2. What should you review? (there's a lot of stuff)
3. What should you look for? (based on data “type”)
4. When should you build rules? (laser-focus; CDE’s)
5. What needs to be communicated?
1. How do you want to analyze the data?
“Never lead with a data set;
lead with a question.”
Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet
Forbes Insights, May 31, 2017, “The Data Differentiator”
Universal DQ best practices:
Understand the End Goal
• How does the business intend
to use the data (i.e. what’s the
use case)?
• Empower users (“Who”) to
gain new clarity into the core
problem (“Why”)
• What will the data be used
for?
• What defines the Fitness for
your Purpose?
Establish Scope
• Ask the “right questions”
about the use case and the
data (not just “what” and
“how”)
• What data is relevant to the
effort?
• Big Data or other, you need to
set boundaries for the work
Understand Context
• How does the business define
the data?
• What are the important
characteristics and context of
the data?
• What are the Critical Data
Elements?
• What qualities will you need
to address, or leave alone?
• “High-quality data” definition
will vary by business problem
“If you don’t know what you
want to get out of the data, how
can you know what data you
need – and what insight you’re
looking for?”
Wolf Ruzicka, Chairman of the Board at EastBanc
Technologies, Blog post: June 1, 2017, “Grow A Data Tree
Out Of The “Big Data” Swamp”
2. What do you want to review?
Common data quality measurements
What measures can we take advantage of?
1. Completeness – Are the relevant fields populated?
2. Integrity – Does the data maintain an internal
structural integrity or a relational integrity across
sources
3. Uniqueness – Are keys or records unique?
4. Validity – Does the data have the correct values?
• Code and reference values
• Valid ranges
• Valid value combinations
5. Consistency – Is the data at consistent levels of
aggregation or does it have consistent valid
values over time?
6. Timeliness – Did the data arrive in a time period
that makes it useful or usable?
New data quality problems
New data, new data quality challenges
• 3rd Party and external data with unknown provenance or
relevance
• Bias in the data – whether in collection, extraction, or other
processing
• Data without standardized structure or formatting
• Continuously streaming data
• Disjointed data (e.g. gaps in receipt)
• Consistency and verification of data sources
• Changes and transformation applied to data (i.e. does it really
represent the original input)
Let data profiling guide you
• Contextual visualizations
• Value and pattern distributions
• Attribute summaries and metadata
• Sort and filter to quickly find data of interest
• Detail drilldowns to any content
3. What should you look for?
Common data types
What do you need to be aware of?
1. Identifiers – data that uniquely identifies
something
2. Indicators – data that flags a specific
condition
3. Dates – data that identifies a point in time
4. Quantities – data that identifies an amount
or value of something
5. Codes – data that segments other data
6. Text – data that describes or names
something
4. When do you build rules?
Build rules for defined conditions
Focus on:
• Critical Data Elements (data quality dimensions)
• Policy-based conditions (e.g. regulatory compliance)
• Correlated data conditions (e.g. If x, then y)
• Filtering and segmenting data (refining evaluations;
investigating root cause)
Benefits of business rules
• Validate critical requirements within or across
data sources
• Build common rules that can be readily tested
and shared
• Evaluate and remediate issues
• Take action on incorrect data and defaults
• Create flags for subsequent use in marking or
remediating data
• Filter result sets and export for additional use
5. What should you communicate?
Communicate!
Culture of Data Literacy
• “Democratization of Data” requires
cultural support
Program of Data Governance
• Provide the processes and practices
necessary for success
Center of Excellence/Knowledge Base
• Where do you go to find answers?
• Who can help show you how?
Annotate results with findings
Large European Telco
Leveraging data as a critical asset
• Business Rules
Goal
• Ensure accurate data to support
customer service, marketing,
retention and loyalty
• Implement enterprise-wide data
governance
Challenge
• Data from multiple
sources/systems, stored in many
different formats
• No enterprise standard for data
quality
• Moving to the cloud
Benefits Achieved
• Trusted data for faster, better
strategic and operational decision
making
• More effective marketing and
better customer service
Solution
• Precisely Trillium Discovery
• Precisely Trillium Quality
Looking at the Next 90 Days…
• Make profiling actionable (you don’t know what you don’t know until you
profile)
• Keep the 5 key questions top of mind!
Visit us to learn more about Cloud Transformation:
https://www.precisely.com/campaigns/cloud-transformation
QA
Transform Your Downstream Cloud Analytics with Data Quality 

More Related Content

What's hot

TargetStateFutureArchitect - DV
TargetStateFutureArchitect - DVTargetStateFutureArchitect - DV
TargetStateFutureArchitect - DV
Bhavendra Chavan
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introduction
datatovalue
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
Dr. C.V. Suresh Babu
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
Shivam Singh
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
ssuser23e4f31
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
Dendej Sawarnkatat
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Kevin Pledge
 
Data quality
Data qualityData quality
Data quality
sethnainaa
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
DATAVERSITY
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries cia
Kevin Pledge
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
DATAVERSITY
 
Data Analytics Domain
Data Analytics DomainData Analytics Domain
Data Analytics Domain
Multisoft Virtual Academy
 
Graduation Thesis Sample
Graduation Thesis SampleGraduation Thesis Sample
Graduation Thesis Sample
Graduate Thesis
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
Bhavendra Chavan
 
These Are The Data You Are Looking For
These Are The Data You Are Looking ForThese Are The Data You Are Looking For
These Are The Data You Are Looking For
Embarcadero Technologies
 
Data science and data analytics major similarities and distinctions (1)
Data science and data analytics  major similarities and distinctions (1)Data science and data analytics  major similarities and distinctions (1)
Data science and data analytics major similarities and distinctions (1)
Robert Smith
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
Barry Leventhal
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
ANAND PRAKASH
 
Introduction to Business Data Analytics
Introduction to Business Data AnalyticsIntroduction to Business Data Analytics
Introduction to Business Data Analytics
VadivelM9
 

What's hot (20)

TargetStateFutureArchitect - DV
TargetStateFutureArchitect - DVTargetStateFutureArchitect - DV
TargetStateFutureArchitect - DV
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introduction
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
 
Data quality
Data qualityData quality
Data quality
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries cia
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
 
Data Analytics Domain
Data Analytics DomainData Analytics Domain
Data Analytics Domain
 
Graduation Thesis Sample
Graduation Thesis SampleGraduation Thesis Sample
Graduation Thesis Sample
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
These Are The Data You Are Looking For
These Are The Data You Are Looking ForThese Are The Data You Are Looking For
These Are The Data You Are Looking For
 
Data science and data analytics major similarities and distinctions (1)
Data science and data analytics  major similarities and distinctions (1)Data science and data analytics  major similarities and distinctions (1)
Data science and data analytics major similarities and distinctions (1)
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Introduction to Business Data Analytics
Introduction to Business Data AnalyticsIntroduction to Business Data Analytics
Introduction to Business Data Analytics
 

Similar to Transform Your Downstream Cloud Analytics with Data Quality 

Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
Precisely
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Precisely
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Precisely
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Precisely
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Precisely
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
Caserta
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptx
SaminaNawaz14
 
When the business needs intelligence (15Oct2014)
When the business needs intelligence   (15Oct2014)When the business needs intelligence   (15Oct2014)
When the business needs intelligence (15Oct2014)
Dipti Patil
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
Beth Fitzpatrick
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
Data driven decision making
Data driven decision makingData driven decision making
Data driven decision making
SHAHZAD M. SALEEM
 
Accenture Big Data Expo
Accenture Big Data ExpoAccenture Big Data Expo
Accenture Big Data Expo
BigDataExpo
 
Data Detectives - Presentation
Data Detectives - PresentationData Detectives - Presentation
Data Detectives - Presentation
Clint Campbell
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
AbdulrahimShaibuIssa
 
You Need a Data Catalog. Do You Know Why?
 You Need a Data Catalog. Do You Know Why? You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Precisely
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
Caserta
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Concept Searching, Inc
 
lec1.pdf
lec1.pdflec1.pdf
lec1.pdf
nimmakiran1
 

Similar to Transform Your Downstream Cloud Analytics with Data Quality  (20)

Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptx
 
When the business needs intelligence (15Oct2014)
When the business needs intelligence   (15Oct2014)When the business needs intelligence   (15Oct2014)
When the business needs intelligence (15Oct2014)
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Data driven decision making
Data driven decision makingData driven decision making
Data driven decision making
 
Accenture Big Data Expo
Accenture Big Data ExpoAccenture Big Data Expo
Accenture Big Data Expo
 
Data Detectives - Presentation
Data Detectives - PresentationData Detectives - Presentation
Data Detectives - Presentation
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
You Need a Data Catalog. Do You Know Why?
 You Need a Data Catalog. Do You Know Why? You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
 
lec1.pdf
lec1.pdflec1.pdf
lec1.pdf
 

More from Precisely

Making Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdfMaking Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdf
Precisely
 
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNowGetting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Precisely
 
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Precisely
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Precisely
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Precisely
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
Precisely
 
AI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptxAI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptx
Precisely
 
Building a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i SecurityBuilding a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i Security
Precisely
 
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdfOptimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Precisely
 
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdfChaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Precisely
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Precisely
 
Navigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful MigrationNavigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful Migration
Precisely
 
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google ChronicleUnlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Precisely
 
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
Precisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Precisely
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
Precisely
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
Precisely
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Precisely
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
Precisely
 

More from Precisely (20)

Making Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdfMaking Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdf
 
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNowGetting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
 
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
 
AI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptxAI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptx
 
Building a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i SecurityBuilding a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i Security
 
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdfOptimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
 
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdfChaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdf
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
Navigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful MigrationNavigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful Migration
 
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google ChronicleUnlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
 
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 

Recently uploaded

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 

Recently uploaded (20)

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 

Transform Your Downstream Cloud Analytics with Data Quality 

  • 1. Understand Your Data Transform your downstream cloud analytics with data quality Marco de Jong | Sales Engineering Director
  • 2. What you will learn today • Why Data needs Data Quality • How Data Profiling helps understanding your data • The top 5 steps needed for effective data profiling • How another company saw success through data profiling • What you can do in the next 90 days to take action on DQ
  • 3. Data needs data quality “Societal trust in business is arguably at an all-time low and, in a world increasingly driven by data and technology, reputations and brands are ever harder to protect.” EY “Trust in Data and Why it Matters”, 2017. 80% of AI/ML projects are stalling due to poor data quality Dimensional Research, 2019 90% of executives are concerned about how misused data can impact corporate reputation PWC, 22nd Annual Global CEO Survey, 2019 64% of IT executives have trouble finding and cleaning the right data for strategic data projects Sierra Venture, 2020 The importance of data quality in the enterprise: • Decision making • Customer centricity • Compliance • Machine learning & AI
  • 4. Understanding your data Data Profiling • The set of analytical techniques that evaluate actual data content (vs. metadata) to provide a complete view of each data element in a data source. • Provides summarized inferences, and details of value and pattern frequencies to quickly gain data insights. • Business Rules • The data quality or validation rules that help ensure that data is “fit for use” in its intended operational and decision- making contexts. • Covers the accuracy, completeness, consistency, relevance, timeliness and validity of data.
  • 5. Five Key Steps to effective Data Profiling These are not new, but good to reiterate 1. How do you want to analyze the data? 2. What should you review? (there's a lot of stuff) 3. What should you look for? (based on data “type”) 4. When should you build rules? (laser-focus; CDE’s) 5. What needs to be communicated?
  • 6. 1. How do you want to analyze the data?
  • 7. “Never lead with a data set; lead with a question.” Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet Forbes Insights, May 31, 2017, “The Data Differentiator”
  • 8. Universal DQ best practices: Understand the End Goal • How does the business intend to use the data (i.e. what’s the use case)? • Empower users (“Who”) to gain new clarity into the core problem (“Why”) • What will the data be used for? • What defines the Fitness for your Purpose? Establish Scope • Ask the “right questions” about the use case and the data (not just “what” and “how”) • What data is relevant to the effort? • Big Data or other, you need to set boundaries for the work Understand Context • How does the business define the data? • What are the important characteristics and context of the data? • What are the Critical Data Elements? • What qualities will you need to address, or leave alone? • “High-quality data” definition will vary by business problem “If you don’t know what you want to get out of the data, how can you know what data you need – and what insight you’re looking for?” Wolf Ruzicka, Chairman of the Board at EastBanc Technologies, Blog post: June 1, 2017, “Grow A Data Tree Out Of The “Big Data” Swamp”
  • 9. 2. What do you want to review?
  • 10. Common data quality measurements What measures can we take advantage of? 1. Completeness – Are the relevant fields populated? 2. Integrity – Does the data maintain an internal structural integrity or a relational integrity across sources 3. Uniqueness – Are keys or records unique? 4. Validity – Does the data have the correct values? • Code and reference values • Valid ranges • Valid value combinations 5. Consistency – Is the data at consistent levels of aggregation or does it have consistent valid values over time? 6. Timeliness – Did the data arrive in a time period that makes it useful or usable?
  • 11. New data quality problems New data, new data quality challenges • 3rd Party and external data with unknown provenance or relevance • Bias in the data – whether in collection, extraction, or other processing • Data without standardized structure or formatting • Continuously streaming data • Disjointed data (e.g. gaps in receipt) • Consistency and verification of data sources • Changes and transformation applied to data (i.e. does it really represent the original input)
  • 12. Let data profiling guide you • Contextual visualizations • Value and pattern distributions • Attribute summaries and metadata • Sort and filter to quickly find data of interest • Detail drilldowns to any content
  • 13. 3. What should you look for?
  • 14. Common data types What do you need to be aware of? 1. Identifiers – data that uniquely identifies something 2. Indicators – data that flags a specific condition 3. Dates – data that identifies a point in time 4. Quantities – data that identifies an amount or value of something 5. Codes – data that segments other data 6. Text – data that describes or names something
  • 15. 4. When do you build rules?
  • 16. Build rules for defined conditions Focus on: • Critical Data Elements (data quality dimensions) • Policy-based conditions (e.g. regulatory compliance) • Correlated data conditions (e.g. If x, then y) • Filtering and segmenting data (refining evaluations; investigating root cause)
  • 17. Benefits of business rules • Validate critical requirements within or across data sources • Build common rules that can be readily tested and shared • Evaluate and remediate issues • Take action on incorrect data and defaults • Create flags for subsequent use in marking or remediating data • Filter result sets and export for additional use
  • 18. 5. What should you communicate?
  • 19. Communicate! Culture of Data Literacy • “Democratization of Data” requires cultural support Program of Data Governance • Provide the processes and practices necessary for success Center of Excellence/Knowledge Base • Where do you go to find answers? • Who can help show you how?
  • 21. Large European Telco Leveraging data as a critical asset • Business Rules Goal • Ensure accurate data to support customer service, marketing, retention and loyalty • Implement enterprise-wide data governance Challenge • Data from multiple sources/systems, stored in many different formats • No enterprise standard for data quality • Moving to the cloud Benefits Achieved • Trusted data for faster, better strategic and operational decision making • More effective marketing and better customer service Solution • Precisely Trillium Discovery • Precisely Trillium Quality
  • 22. Looking at the Next 90 Days… • Make profiling actionable (you don’t know what you don’t know until you profile) • Keep the 5 key questions top of mind! Visit us to learn more about Cloud Transformation: https://www.precisely.com/campaigns/cloud-transformation
  • 23. QA