SlideShare a Scribd company logo
1 of 18
Download to read offline
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA
QUALITY EVALUATION: CONTEXTUAL DATA QUALITY ANALYSIS
21st International Conference on Enterprise Information Systems (ICEIS),
Heraklion, Crete – Greece, 2019
Anastasija Nikiforova, Janis Bicevskis
Faculty of Computing, University of Latvia
Anastasija.Nikiforova@lu.lv
 “Quality” is a desirable goal to be achieved through management of the
production process.
 «Data quality» is a relative concept, largely dependent on specific
requirements resulting from the data use.
QUALITY AND DATA QUALITY
Source: Bičevska (2018)
Source: ISO 9001:2015: Quality
management principles.
2016
Decisions resulting from bad
data cost the US economy
$3.1 trillion dollars per year
-IBM
2017
Organizations believe poor data quality
to be responsible for an average of $15
million per year in losses
-Gartner
Data quality weaknesses
can lead to huge losses
!!! The same data may be
sufficiently qualitative in one case
BUT
completely useless under other
circumstances.
«Dimensions are not defined in a measurable and formal way»
-Batini et al., 2016, DAMA, 2019, Huang et al., 1999, Eppler, 2006
«…Even amongst data quality professionals the key data quality dimensions
are not universally agreed. This state of affairs has led to much confusion
within the data quality community and is even more bewildering for those
who are new to the discipline and more importantly to business
stakeholders…»
-DAMA, 2019
RELATED RESEARCHES
 General studies on data and information quality - define different
dimensions of quality and their groupings as well as data assessment
methodologies.
 Assessments of specific industry data and information quality - sector-
specific methods.
• Cancer registry, Healthcare, Manufacturing, Chemical Hazard Risk Assessments, etc.
BUT!!!
 There is no consensus on data quality dimensions
and their usability.
 How to relate particular dimension (and which
one?) to a particular use-case???
 Dimensions of the same name can have different
semantics in different researches.
Problem: necessity to involve data quality experts at every stage of data
quality analysis process
Solution: data object-driven approach to data quality evaluation
(Bicevskis, Bicevska, Nikiforova, Oditis, 2018)
TDQM data quality lifecycle
Data quality
definition
Data quality
measuring
Data quality
analysis
Data quality
improvemen
t
MAIN PRINCIPLES OF THE PROPOSED
SOLUTION
 Each specific application can have its own specific DQ checks;
 DQ requirements can be formulated on several levels
• from informal text in natural language
• to an automatically executable model, SQL statements or program
code;
 DQ can be checked in various stages of the data processing;
 DQ definition language is graphical DSL:
• the diagrams are easy to read, create, understand and edit even by
non-IT and non-Data Quality professionals;
• syntax and semantics can be easily applied to any new IS.
!!! All three components are
defined by using a graphical
domain specific language
(DSL)**
**Three DSL families were developed as graphic languages
based on the possibilities of the modelling platform DIMOD
1. DATA OBJECT (DO) - the set of values of the parameters that characterize a real-life object
 primary data object - the initial DO which quality is analysed;
 secondary data object – DO that determines the context for analysis of the primary DO.
* Many objects of the same structure form class of data objects
2. DATA QUALITY REQUIREMENTS - conditions that must be met in order a data object is
considered of high quality.
** May contain: informal or formalized implementation-independent descriptions of conditions
3. DATA QUALITY MEASURING PROCESS - procedures should be performed to evaluate the
data object’s quality.
DATA QUALITY MODEL
instead of dimensions
DATA QUALITY ANALYSIS. STEP-BY-STEP GUIDE
0-1. Definition of the use case
0-2. Analysis of source data
1-1. Definition of the primary data object
1-2. Definition of the secondary data object(-s)
1-3. Primary and secondary data objects linking
2-1. Primary data object quality specification
2-2. Primary and secondary data objects linking conditions
3. Data quality measuring process
defined using
graphical DSL
4-1. Analysis of the results
4-2. Data quality improvement (MS DQS)
Use-cases:
1. company search/ identification
(by its name, registration
number, incorporation date);
2. contacting by post
(by address and postal code)
Company registers of:
 United Kingdom (UK)
 Latvia (LV)
 Estonia (EE)
 Norway (NOR)
Global Open Data Index
UK: 1st place
LV: 18st place
EE: -
NOR: 1st place
APPROBATION. DATA SETS
Country # of columns # of columns with quality problems
(number, %)
United Kingdom 55 15 (27.3%)
Latvia 22 11 (50%)
Estonia 14 7 (50%)
Norway 42 8 (19%)
1) company identification
(by its name, registration number and incorporation date)
2) contacting by post
(by its address and postal code)
Country Identificat
ion
Name Reg.
Nr.
Incorporation
date
UK
-
1
0.0001%
0
3 invalid
0.0004%
Latvia - 10
0.0025%
0 94 NULL
0.02%
Estonia + 0 0 -
Norway - 0 0 9 doubtful
0.001%
Contactin
g by post
Address Postal
code
- 7 514 NULL –
1%
4 invalid –
0.0005%
12 151
1.6%
- 366
0.09%
20 498
5.16%
- 29 918
11.24%
22 621
8.5%
- 68 128
6.2%
14 683
1.3%
APPROBATION. RESULTS
Mainly syntactic analysis was done -
analysis in scope of one data object
!!!
More in-depth and comprehensive
analysis should be done -
analysis in scope of multiple data
objects
TOTAL: 128 different values,
that possibly contain data quality problems
Various names indicating the
same country
USA
United States
United States of America
Northern Ireland
Republic of Ireland
Ireland
Virgin Island
British Virgin Island
Virgin Islands, British
Scotland
Scotland UK
…
???
Which of them
is valid?
APPROBATION. ADDITIONAL CHECKS
OF «COMPANIES HOUSE» (UK)
# Type of issue Example
1
various names
indicating the same
country
USA, United
States and United
States of America
etc.
2
names of dissolved
countries
Czechoslovakia
Yugoslavia
USSR
3
values indicating
administrative
division or region
Wales
Scotland
England & Wales
England
…
4 not countries at all
“SW7”
“EAST SUSSEX”
“BWI”
“DE 19901”
The single data object analysis indicates the mere
existence of the data quality problem without
detecting all the defective records.
The secondary data object is
needed!!!
• Data object is platform-independent.
• The checking of parameter values is local and
formal process.
• The quality checking for one of the DO
parameters values is an examination of properties
of the individual values, e.g. whether:
• (1) a text string may serve as a value of the field Name,
• (2) value of the field Address is a correct address.
• Can be formulated at different levels of abstraction:
• from the formal language grammar
• to definitions of variables in programming languages.
DATA OBJECT
Secondary DO
Primary DO
• Quality conditions are defined only for the
primary data object.
• DQ requirements are defined by using logical
expressions.
• The names of DO attributes/ fields serve as
operands in the logical expressions.
• Both syntactical and semantical data quality can
be analysed according to unified principles.
DATA QUALITY SPECIFICATION
SendMessage
Assess Field "CountryOfOrigin"
checkvalueExists(CountryOfOrigin)
Assess Field "URI"
checkValueExists(URI)
checkValueURI(URI,
'http://business.data.gov.uk.id/company/$CompanyName')
Assess Field "CompanyNumber"
checkValueExists(CompanyNumber)
checkValueDigits(8)
Assess Field "RegAddress AddressLine1"
checkValueExists(RegAddress AddressLine1)
Assess Field "IncorporationDate"
checkValueExists(IncorporationDate)
checkValueDate(IncorporationDate, "DD/MM/YYYY")
Assess Field "RegAddress AddressPostCode"
checkvalueExists(RegAddress AddressPostCode)
Assess Field "CompanyName"
checkValueExists(CompanyName)
SendMessage
SendMessage
SendMessage
SendMessage
SendMessage
SendMessage
SendMessage
Assess Field "RegAddressCountry"
checkvalueExists(RegAddressCountry)
ShortName
OfficialName
ISO2
ISO3
UNDP
checkCountryOfOriginName(Country,
CountryOfOrigin)
checkRegAddressCountryName(Country,
RegAddressCountry)
NO
NO
OK
NO
NO
NO
NO
NO
NO
OK
OK
OK
OK
OK
OK
OK
Secondary DO
Link between
primary and
secondary DOs
(informal rule)
DATA QUALITY MEASURING
PROCESS
The activities to be taken to select data object values from data sources.
One or more steps to evaluate the quality of the data, each of which describes one
test for the compliance of the data object with a specific quality specification.
+
Gather values of the secondary DOs from the data sources if the parameter indicating
the secondary DO’s value in scope of defined quality condition is true:
1. read/ write operations from data source into database,
2. connection of primary and secondary data objects via appropriate
parameters
The steps to improve data quality automatically or manually triggering changes in
the data source.
For contextual
checks
The language describing the quality evaluation
process involves verification activities for a
particular DO that can be defined:
 informally as a natural language text,
 using UML activity diagrams,
 in the own DSL.
Additionally, processing of DO classes instances
may require looping constructions, similar to
iterator used in C#.
• A concrete DO or a class of DO is used as an
input for a quality verification process.
• The quality verification process creates a test
protocol.
In case of SQL:
 SELECT statement specifies the target DO
 WHERE clause specifies quality
requirements
+
 JOIN clause link primary and secondary
DOs
DATA QUALITY MEASURING
PROCESS
BERMUDA
BWI
…
CZECHOSLOVAKIA
DE 19901
EAST SUSSEX
ENGLAND
ENGLAND & WALES
GIBRALTAR
Great Britain
HOLLAND
…
JERSEY
…
ST VINCENT
NORTHERN
IRELAND
REPUBLIC OF
IRELAND
Country Of Origin Short Name Official Name ISO3 ISO2
… … … … …
DE 19901 NULL NULL NULL NULL
GREECE Greece the Hellenic Republic GRC GR
… … … … …
LATVIA Latvia the Republic of Latvia LVA LV
… … … … …
United States of
America
United States
of America
the United States of
America
USA US
… … … … …
Invalid names
TOTAL: 128 different values,
that possibly can contain data quality
problems
TOTAL: 48 different values,
that definetely have data quality problems
Various names indicating
the same country
USA
United States
United States of America
Northern Ireland
Republic of Ireland
Ireland
Virgin Island
British Virgin Island
Virgin Islands, British
Scotland
Scotland UK
…
REPUBLIC OF NIGERIA
…
SCOTLAND UK
SOUTH KOREA
SW7
TADJIKISTAN
TAIWAN
TURKS & CAICOS
ISLANDS
UNITED STATES
UK
USSR
VENEZUELA
VIETNAM
VIRGIN ISLANDS
WEST GERMANY
YEMEN ARAB
REPUBLIC
YUGOSLAVIA
???
Which of
them is valid?
Results in scope of single data object Results in scope of multiple data objects
SINGLE vs MULTIPLE DATA
OBJECT ANALYSIS • Analysis of 2 parameters containing names of
countries against 4 representations of countries’
names and their subdivisions.
• Although this problem was observed in 27.6%
records, it could be solved by making just 48
corrections.
• All values of “CountryOfOrigin” and 73 of 74 values
of “RegAddress Country” conform to one standard,
i.e., the short name.
ONLY 13 instead of 48 invalid
values were detected!!!
 Data quality analysis in context of multiple data objects was applied to 23 «external» open datasets,
+ 22 different secondary DOs were used;
 21 of 23 datasets (91.3%) have at least few data quality issues that weren’t detected previously;
• initial version: indicated records potentially containing data quality problems - very resources-consuming
process.
• proposed extension of the approach: detects only the records with the certain data quality.
 The initial analysis detected 128 values:
• only 13 values with data quality problem instead of 48.
• 115 values didn’t have data quality problems (false negative).
In this particular case, results of analysis were
improved by 72.9%.
FEW REMARKS
!!! The proposed structure eliminated the necessity of additional in-depth quality
analysis, as well as writing complex queries and individual analysis of the results.
 An data object-driven approach to data quality evaluation:
• 3 components: data object, quality specification, quality measuring process defined using graphical DSLs;
• provide ability to analyse «foreign»/ «external» data without the involvement of data holders (higher level of
abstraction);
• very intuitive – suitable even for non-IT and non-DQ experts.
 The contextual quality analysis significantly improves data quality analysis results:
• possibility to analyse real data object’s quality within the context of multiple data objects;
• detects the records with the certain data quality problem.
• the number of possible controls, where the proposed extended approach can yield valuable results, is very high.
 Both syntactical and contextual data quality are analysed according to unified description principles 
the diagram’s structure remained easy to read, create, understand and edit.
User’s participation in [open] data quality analysis using the presented approach brings benefits not
only the users themselves, but also data holders, when users share their feedbacks, as data holders are not
even aware of data quality problems.
RESULTS
 application and evaluation of the extended approach in the cases of complex data object’s
structure, including supplementing data objects when direct connection between the primary and
the secondary data objects is not possible,
 detection of possible limitations of the proposed extended approach,
 ensuring possibility to evaluate data sets’ evolution,
 assessment of possibility to provide users with suggestions for data improvement,
 developing data quality theory.
FUTURE WORK
THANK YOU!
For more information, see ResearchGate
See also anastasijanikiforova.com
For questions or any other queries, contact me via email - Anastasija.Nikiforova@lu.lv
Article: Nikiforova, A., & Bicevskis, J. (2019). An Extended Data Object-driven Approach to Data Quality
Evaluation: Contextual Data Quality Analysis. In ICEIS (1) (pp. 274-281).

More Related Content

What's hot

Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyAmrapali Zaveri, PhD
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality AssurancePéter Király
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentAmrapali Zaveri, PhD
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsPéter Király
 
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...Anastasija Nikiforova
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONIJDKP
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectioncsandit
 
PETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMS
PETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMSPETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMS
PETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMSIJDKP
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
Data Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training SimulationData Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training SimulationMelissa Moody
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.Giuseppe Ricci
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type detemayank272369
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchMicah Altman
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Alex Rayón Jerez
 
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...Nandana Mihindukulasooriya
 

What's hot (20)

Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality Assurance
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation begins
 
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detection
 
PETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMS
PETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMSPETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMS
PETROCHEMICAL PRODUCTION BIG DATA AND ITS FOUR TYPICAL APPLICATION PARADIGMS
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
Amrapali Zaveri Defense
Amrapali Zaveri DefenseAmrapali Zaveri Defense
Amrapali Zaveri Defense
 
Data Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training SimulationData Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training Simulation
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type dete
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science Research
 
PhD defense
PhD defense PhD defense
PhD defense
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...
 
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
 

Similar to AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUAL DATA QUALITY ANALYSIS

A step towards a data quality theory
 A step towards a data quality theory A step towards a data quality theory
A step towards a data quality theoryAnastasija Nikiforova
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGIJCSEIT Journal
 
Data Quality
Data QualityData Quality
Data QualityVijaya K
 
Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)Péter Király
 
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep Shahapur
 
TejGaurThesis
TejGaurThesisTejGaurThesis
TejGaurThesisTej Gaur
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesChristopher Eaker
 
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Kathmandu Living Labs
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10AnwarrChaudary
 
How Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill SetsHow Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill SetsChad Petrovay
 
Making data typing efforts or automatically detecting data types for automat...
Making data typing efforts or automatically detecting data types  for automat...Making data typing efforts or automatically detecting data types  for automat...
Making data typing efforts or automatically detecting data types for automat...National Institute of Informatics
 
Dqs mds-matching 15042015
Dqs mds-matching 15042015Dqs mds-matching 15042015
Dqs mds-matching 15042015Neil Hambly
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsWayne Yaddow
 
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITYDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITYIJDKP
 
Lecture 6 Data Pre-processing in data mining.pdf
Lecture 6  Data Pre-processing in data mining.pdfLecture 6  Data Pre-processing in data mining.pdf
Lecture 6 Data Pre-processing in data mining.pdfpoly146408
 
SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...
SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...
SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...South Tyrol Free Software Conference
 

Similar to AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUAL DATA QUALITY ANALYSIS (20)

A step towards a data quality theory
 A step towards a data quality theory A step towards a data quality theory
A step towards a data quality theory
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
 
Data Quality
Data QualityData Quality
Data Quality
 
Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)
 
Quality key users
Quality key usersQuality key users
Quality key users
 
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of Exerience
 
TejGaurThesis
TejGaurThesisTejGaurThesis
TejGaurThesis
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
 
Intro to Data Management
Intro to Data ManagementIntro to Data Management
Intro to Data Management
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10
 
How Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill SetsHow Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill Sets
 
Making data typing efforts or automatically detecting data types for automat...
Making data typing efforts or automatically detecting data types  for automat...Making data typing efforts or automatically detecting data types  for automat...
Making data typing efforts or automatically detecting data types for automat...
 
Dqs mds-matching 15042015
Dqs mds-matching 15042015Dqs mds-matching 15042015
Dqs mds-matching 15042015
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
 
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITYDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
 
Lecture 6 Data Pre-processing in data mining.pdf
Lecture 6  Data Pre-processing in data mining.pdfLecture 6  Data Pre-processing in data mining.pdf
Lecture 6 Data Pre-processing in data mining.pdf
 
SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...
SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...
SFScon22 - Chiara Masci - Davide Montesin - How can we make no-code data qual...
 

More from Anastasija Nikiforova

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Anastasija Nikiforova
 
Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Anastasija Nikiforova
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Anastasija Nikiforova
 
Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Anastasija Nikiforova
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Anastasija Nikiforova
 
Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Anastasija Nikiforova
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Anastasija Nikiforova
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...Anastasija Nikiforova
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Anastasija Nikiforova
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISAnastasija Nikiforova
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...Anastasija Nikiforova
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Anastasija Nikiforova
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
 
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...Anastasija Nikiforova
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesAnastasija Nikiforova
 
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...Anastasija Nikiforova
 

More from Anastasija Nikiforova (20)

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
 
Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...
 
Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...
 
Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
 
Atvērto datu potenciāls
Atvērto datu potenciālsAtvērto datu potenciāls
Atvērto datu potenciāls
 
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business Processes
 
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUAL DATA QUALITY ANALYSIS

  • 1. AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUAL DATA QUALITY ANALYSIS 21st International Conference on Enterprise Information Systems (ICEIS), Heraklion, Crete – Greece, 2019 Anastasija Nikiforova, Janis Bicevskis Faculty of Computing, University of Latvia Anastasija.Nikiforova@lu.lv
  • 2.  “Quality” is a desirable goal to be achieved through management of the production process.  «Data quality» is a relative concept, largely dependent on specific requirements resulting from the data use. QUALITY AND DATA QUALITY Source: Bičevska (2018) Source: ISO 9001:2015: Quality management principles. 2016 Decisions resulting from bad data cost the US economy $3.1 trillion dollars per year -IBM 2017 Organizations believe poor data quality to be responsible for an average of $15 million per year in losses -Gartner Data quality weaknesses can lead to huge losses !!! The same data may be sufficiently qualitative in one case BUT completely useless under other circumstances.
  • 3. «Dimensions are not defined in a measurable and formal way» -Batini et al., 2016, DAMA, 2019, Huang et al., 1999, Eppler, 2006 «…Even amongst data quality professionals the key data quality dimensions are not universally agreed. This state of affairs has led to much confusion within the data quality community and is even more bewildering for those who are new to the discipline and more importantly to business stakeholders…» -DAMA, 2019 RELATED RESEARCHES  General studies on data and information quality - define different dimensions of quality and their groupings as well as data assessment methodologies.  Assessments of specific industry data and information quality - sector- specific methods. • Cancer registry, Healthcare, Manufacturing, Chemical Hazard Risk Assessments, etc. BUT!!!  There is no consensus on data quality dimensions and their usability.  How to relate particular dimension (and which one?) to a particular use-case???  Dimensions of the same name can have different semantics in different researches. Problem: necessity to involve data quality experts at every stage of data quality analysis process Solution: data object-driven approach to data quality evaluation (Bicevskis, Bicevska, Nikiforova, Oditis, 2018)
  • 4. TDQM data quality lifecycle Data quality definition Data quality measuring Data quality analysis Data quality improvemen t MAIN PRINCIPLES OF THE PROPOSED SOLUTION  Each specific application can have its own specific DQ checks;  DQ requirements can be formulated on several levels • from informal text in natural language • to an automatically executable model, SQL statements or program code;  DQ can be checked in various stages of the data processing;  DQ definition language is graphical DSL: • the diagrams are easy to read, create, understand and edit even by non-IT and non-Data Quality professionals; • syntax and semantics can be easily applied to any new IS.
  • 5. !!! All three components are defined by using a graphical domain specific language (DSL)** **Three DSL families were developed as graphic languages based on the possibilities of the modelling platform DIMOD 1. DATA OBJECT (DO) - the set of values of the parameters that characterize a real-life object  primary data object - the initial DO which quality is analysed;  secondary data object – DO that determines the context for analysis of the primary DO. * Many objects of the same structure form class of data objects 2. DATA QUALITY REQUIREMENTS - conditions that must be met in order a data object is considered of high quality. ** May contain: informal or formalized implementation-independent descriptions of conditions 3. DATA QUALITY MEASURING PROCESS - procedures should be performed to evaluate the data object’s quality. DATA QUALITY MODEL instead of dimensions
  • 6. DATA QUALITY ANALYSIS. STEP-BY-STEP GUIDE 0-1. Definition of the use case 0-2. Analysis of source data 1-1. Definition of the primary data object 1-2. Definition of the secondary data object(-s) 1-3. Primary and secondary data objects linking 2-1. Primary data object quality specification 2-2. Primary and secondary data objects linking conditions 3. Data quality measuring process defined using graphical DSL 4-1. Analysis of the results 4-2. Data quality improvement (MS DQS)
  • 7. Use-cases: 1. company search/ identification (by its name, registration number, incorporation date); 2. contacting by post (by address and postal code) Company registers of:  United Kingdom (UK)  Latvia (LV)  Estonia (EE)  Norway (NOR) Global Open Data Index UK: 1st place LV: 18st place EE: - NOR: 1st place APPROBATION. DATA SETS Country # of columns # of columns with quality problems (number, %) United Kingdom 55 15 (27.3%) Latvia 22 11 (50%) Estonia 14 7 (50%) Norway 42 8 (19%)
  • 8. 1) company identification (by its name, registration number and incorporation date) 2) contacting by post (by its address and postal code) Country Identificat ion Name Reg. Nr. Incorporation date UK - 1 0.0001% 0 3 invalid 0.0004% Latvia - 10 0.0025% 0 94 NULL 0.02% Estonia + 0 0 - Norway - 0 0 9 doubtful 0.001% Contactin g by post Address Postal code - 7 514 NULL – 1% 4 invalid – 0.0005% 12 151 1.6% - 366 0.09% 20 498 5.16% - 29 918 11.24% 22 621 8.5% - 68 128 6.2% 14 683 1.3% APPROBATION. RESULTS Mainly syntactic analysis was done - analysis in scope of one data object !!! More in-depth and comprehensive analysis should be done - analysis in scope of multiple data objects
  • 9. TOTAL: 128 different values, that possibly contain data quality problems Various names indicating the same country USA United States United States of America Northern Ireland Republic of Ireland Ireland Virgin Island British Virgin Island Virgin Islands, British Scotland Scotland UK … ??? Which of them is valid? APPROBATION. ADDITIONAL CHECKS OF «COMPANIES HOUSE» (UK) # Type of issue Example 1 various names indicating the same country USA, United States and United States of America etc. 2 names of dissolved countries Czechoslovakia Yugoslavia USSR 3 values indicating administrative division or region Wales Scotland England & Wales England … 4 not countries at all “SW7” “EAST SUSSEX” “BWI” “DE 19901” The single data object analysis indicates the mere existence of the data quality problem without detecting all the defective records. The secondary data object is needed!!!
  • 10. • Data object is platform-independent. • The checking of parameter values is local and formal process. • The quality checking for one of the DO parameters values is an examination of properties of the individual values, e.g. whether: • (1) a text string may serve as a value of the field Name, • (2) value of the field Address is a correct address. • Can be formulated at different levels of abstraction: • from the formal language grammar • to definitions of variables in programming languages. DATA OBJECT Secondary DO Primary DO
  • 11. • Quality conditions are defined only for the primary data object. • DQ requirements are defined by using logical expressions. • The names of DO attributes/ fields serve as operands in the logical expressions. • Both syntactical and semantical data quality can be analysed according to unified principles. DATA QUALITY SPECIFICATION SendMessage Assess Field "CountryOfOrigin" checkvalueExists(CountryOfOrigin) Assess Field "URI" checkValueExists(URI) checkValueURI(URI, 'http://business.data.gov.uk.id/company/$CompanyName') Assess Field "CompanyNumber" checkValueExists(CompanyNumber) checkValueDigits(8) Assess Field "RegAddress AddressLine1" checkValueExists(RegAddress AddressLine1) Assess Field "IncorporationDate" checkValueExists(IncorporationDate) checkValueDate(IncorporationDate, "DD/MM/YYYY") Assess Field "RegAddress AddressPostCode" checkvalueExists(RegAddress AddressPostCode) Assess Field "CompanyName" checkValueExists(CompanyName) SendMessage SendMessage SendMessage SendMessage SendMessage SendMessage SendMessage Assess Field "RegAddressCountry" checkvalueExists(RegAddressCountry) ShortName OfficialName ISO2 ISO3 UNDP checkCountryOfOriginName(Country, CountryOfOrigin) checkRegAddressCountryName(Country, RegAddressCountry) NO NO OK NO NO NO NO NO NO OK OK OK OK OK OK OK Secondary DO Link between primary and secondary DOs (informal rule)
  • 12. DATA QUALITY MEASURING PROCESS The activities to be taken to select data object values from data sources. One or more steps to evaluate the quality of the data, each of which describes one test for the compliance of the data object with a specific quality specification. + Gather values of the secondary DOs from the data sources if the parameter indicating the secondary DO’s value in scope of defined quality condition is true: 1. read/ write operations from data source into database, 2. connection of primary and secondary data objects via appropriate parameters The steps to improve data quality automatically or manually triggering changes in the data source. For contextual checks The language describing the quality evaluation process involves verification activities for a particular DO that can be defined:  informally as a natural language text,  using UML activity diagrams,  in the own DSL. Additionally, processing of DO classes instances may require looping constructions, similar to iterator used in C#.
  • 13. • A concrete DO or a class of DO is used as an input for a quality verification process. • The quality verification process creates a test protocol. In case of SQL:  SELECT statement specifies the target DO  WHERE clause specifies quality requirements +  JOIN clause link primary and secondary DOs DATA QUALITY MEASURING PROCESS
  • 14. BERMUDA BWI … CZECHOSLOVAKIA DE 19901 EAST SUSSEX ENGLAND ENGLAND & WALES GIBRALTAR Great Britain HOLLAND … JERSEY … ST VINCENT NORTHERN IRELAND REPUBLIC OF IRELAND Country Of Origin Short Name Official Name ISO3 ISO2 … … … … … DE 19901 NULL NULL NULL NULL GREECE Greece the Hellenic Republic GRC GR … … … … … LATVIA Latvia the Republic of Latvia LVA LV … … … … … United States of America United States of America the United States of America USA US … … … … … Invalid names TOTAL: 128 different values, that possibly can contain data quality problems TOTAL: 48 different values, that definetely have data quality problems Various names indicating the same country USA United States United States of America Northern Ireland Republic of Ireland Ireland Virgin Island British Virgin Island Virgin Islands, British Scotland Scotland UK … REPUBLIC OF NIGERIA … SCOTLAND UK SOUTH KOREA SW7 TADJIKISTAN TAIWAN TURKS & CAICOS ISLANDS UNITED STATES UK USSR VENEZUELA VIETNAM VIRGIN ISLANDS WEST GERMANY YEMEN ARAB REPUBLIC YUGOSLAVIA ??? Which of them is valid? Results in scope of single data object Results in scope of multiple data objects SINGLE vs MULTIPLE DATA OBJECT ANALYSIS • Analysis of 2 parameters containing names of countries against 4 representations of countries’ names and their subdivisions. • Although this problem was observed in 27.6% records, it could be solved by making just 48 corrections. • All values of “CountryOfOrigin” and 73 of 74 values of “RegAddress Country” conform to one standard, i.e., the short name. ONLY 13 instead of 48 invalid values were detected!!!
  • 15.  Data quality analysis in context of multiple data objects was applied to 23 «external» open datasets, + 22 different secondary DOs were used;  21 of 23 datasets (91.3%) have at least few data quality issues that weren’t detected previously; • initial version: indicated records potentially containing data quality problems - very resources-consuming process. • proposed extension of the approach: detects only the records with the certain data quality.  The initial analysis detected 128 values: • only 13 values with data quality problem instead of 48. • 115 values didn’t have data quality problems (false negative). In this particular case, results of analysis were improved by 72.9%. FEW REMARKS !!! The proposed structure eliminated the necessity of additional in-depth quality analysis, as well as writing complex queries and individual analysis of the results.
  • 16.  An data object-driven approach to data quality evaluation: • 3 components: data object, quality specification, quality measuring process defined using graphical DSLs; • provide ability to analyse «foreign»/ «external» data without the involvement of data holders (higher level of abstraction); • very intuitive – suitable even for non-IT and non-DQ experts.  The contextual quality analysis significantly improves data quality analysis results: • possibility to analyse real data object’s quality within the context of multiple data objects; • detects the records with the certain data quality problem. • the number of possible controls, where the proposed extended approach can yield valuable results, is very high.  Both syntactical and contextual data quality are analysed according to unified description principles  the diagram’s structure remained easy to read, create, understand and edit. User’s participation in [open] data quality analysis using the presented approach brings benefits not only the users themselves, but also data holders, when users share their feedbacks, as data holders are not even aware of data quality problems. RESULTS
  • 17.  application and evaluation of the extended approach in the cases of complex data object’s structure, including supplementing data objects when direct connection between the primary and the secondary data objects is not possible,  detection of possible limitations of the proposed extended approach,  ensuring possibility to evaluate data sets’ evolution,  assessment of possibility to provide users with suggestions for data improvement,  developing data quality theory. FUTURE WORK
  • 18. THANK YOU! For more information, see ResearchGate See also anastasijanikiforova.com For questions or any other queries, contact me via email - Anastasija.Nikiforova@lu.lv Article: Nikiforova, A., & Bicevskis, J. (2019). An Extended Data Object-driven Approach to Data Quality Evaluation: Contextual Data Quality Analysis. In ICEIS (1) (pp. 274-281).