Hema dm

•Download as PPTX, PDF•

0 likes•57 views

GowriLatha1

integraion

Education

DATA INTEGRATION
S.AZHAGARAMMAL
M.SC(Info Tech)
NADAR SARASWATHI
COLLEGE OF ARTS AND
SCIENCE,THENI.

DATA INTEGRATION:
data integration is straight forward:
New information is merged with
information that already exists.
Any business that regularly collects
information is concerned with data
integration.
Businesses want their information to be
accurate and up-to-date.

ACCURACY:
The degree of conformity of a measure to
a standard or a true value - see also
Accuracy and precision.
Accuracy is very hard to achieve through
data-cleansing in the general case,
because it requires accessing an external
source of data that contains the true value:
such "gold standard" data is often
unavailable.

COMPLETENESS:
The degree to which all required
measures are known.
 Incompleteness is almost impossible to
fix with data cleansing methodology: one
cannot infer facts that were not captured
when the data in question was initially
recorded.
In the case of systems that insist certain
columns should not be empty, one may
work around the problem by designating a
value that indicates "unknown" or
"missing".

CONSISTENCY:
The degree to which a set of measures are
equivalent in across systems (see also
Consistency). Inconsistency occurs when two
data items in the data set contradict each other.
Fixing inconsistency is not always possible: it
requires a variety of strategies - e.g., deciding
which data were recorded more recently.
which data source is likely to be most reliable
(the latter knowledge may be specific to a given
organization).

UNIFORMITY:
The degree to which a set data
measures are specified using the
same units of measure in all
systems ( see also Unit of
measure).
In datasets pooled from different
locales, weight may be recorded
either in pounds or kilos, and must
be converted to a single measure
using an arithmetic transformation.

ERROR CORRECTION AND LOSS OF
INFORMATION:
The most challenging problem within data
cleansing remains the correction of values to
remove duplicates and invalid entries.
 In many cases, the available information on
such anomalies is limited and insufficient to
determine the necessary transformations or
corrections, leaving the deletion of such entries
as a primary solution.
The deletion of data, though, leads to loss of
information; this loss can be particularly costly if
there is a large amount of deleted data.

MAINTENANCE OF CLEANSED
DATA:
So after having performed data
cleansing and achieving a data
collection free of errors, one would want
to avoid the re-cleansing of data in its
entirety after some values in data
collection change.
The process should only be repeated
on values that have changed; this means
that a cleansing lineage would need to
be kept, which would require efficient
data collection and management
techniques.

DATA CLEANSING IN VIRTUALLY
INTEGRATED ENVIRONMENTS:
In virtually integrated sources like
IBM’s DiscoveryLink, the cleansing
of data has to be performed every
time the data is accessed, which
considerably increases the
response time and lowers
efficiency.

DATA-CLEANSING FRAMEWORK:
In many cases, it will not be possible to
derive a complete data-cleansing graph to
guide the process in advance.
This makes data cleansing an iterative
process involving significant exploration and
interaction, which may require a framework in
the form of a collection of methods for error
detection and elimination in addition to data
auditing.
This can be integrated with other data-
processing stages like integration and
maintenance.

What's hot

How to Draw Data Flow DiagramsJim Johnson

XL-MINER: AssociationsDataminingTools Inc

Data PreProcessingtdharmaputhiran

Data Mining: Data processingDataminingTools Inc

Data preprocessingdineshbabuspr

Data Preprocessing || Data MiningIffat Firozy

Is 581 milestone 11 and 12 case study coastline systems consultingprintwork4849

Data preprocessingankur bhalla

Approximated Bayes Estimators for the Parameters and Reliability Function of ...Conferenceproceedings

[Infographic] 4 Problems with Using Excel for Analytics3C Software

Dmblogveeralakshmi pandi

Preparing data for spss analysis Ritika Beria

Project Experience3ajith k

Data Preprocessingdineshbabuspr

Data preprocessingksamyMCA

Review comments nttuanhsm

What's hot (16)

How to Draw Data Flow Diagrams

XL-MINER: Associations

Data PreProcessing

Data Mining: Data processing

Data preprocessing

Data Preprocessing || Data Mining

Is 581 milestone 11 and 12 case study coastline systems consulting

Data preprocessing

Approximated Bayes Estimators for the Parameters and Reliability Function of ...

[Infographic] 4 Problems with Using Excel for Analytics

Dmblog

Preparing data for spss analysis

Project Experience3

Data Preprocessing

Data preprocessing

Review comments nt

Similar to Hema dm

Data masking techniques for InsuranceNIIT Technologies

Unit2AishwaryaLakshmiA

Data preparation and processing chapter 2Mahmoud Alfarra

Overview of Data Cleaning.pdfSheetalDandge

The Growing Importance of Data CleaningCarolineSmith912130

thegrowingimportanceofdatacleaning-211202141902.pptxYashaswiniSrinivasan1

Data miningSilicon

What is Data Observability.pdf4dalert

IRJET- A Review of Data Cleaning and its Current ApproachesIRJET Journal

Chapter 3.pdfDrGnaneswariG

GROPSIKS.pptxavanceregine312

BDA TAE 2 (BMEB 83).pptxAkash527744

Data Preparation and Preprocessing , Data CleaningShivarkarSandip

Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli

DATA WRANGLING presentation.pptxAbdullahAbbasi55

DATA PREPROCESSING AND DATA CLEANSINGAhtesham Ullah khan

DataPreprocessing.pptxDr-Dipali Meher

A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa

Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...TELKOMNIKA JOURNAL

Data mining and data warehouse lab manual updatedYugal Kumar

Similar to Hema dm (20)

Data masking techniques for Insurance

Unit2

Data preparation and processing chapter 2

Overview of Data Cleaning.pdf

The Growing Importance of Data Cleaning

thegrowingimportanceofdatacleaning-211202141902.pptx

Data mining

What is Data Observability.pdf

IRJET- A Review of Data Cleaning and its Current Approaches

Chapter 3.pdf

GROPSIKS.pptx

BDA TAE 2 (BMEB 83).pptx

Data Preparation and Preprocessing , Data Cleaning

Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline

DATA WRANGLING presentation.pptx

DATA PREPROCESSING AND DATA CLEANSING

DataPreprocessing.pptx

A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE

Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...

Data mining and data warehouse lab manual updated

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management

Measures of Central Tendency: Mean, Median and ModeThiyagu K

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre

9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt

The byproduct of sericulture in different industries.pptxShobhayan Kirtania

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

microwave assisted reaction. General introductionMaksud Ahmed

Mastering the Unannounced Regulatory InspectionSafetyChain Software

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

Student login on Anyboli platform.helpinRaunakKeshri1

Advance Mobile Application Development class 07Dr. Mazin Mohamed alkathiri

The basics of sentences session 2pptx copy.pptxheathfieldcps1

The Most Excellent Way | 1 Corinthians 13Steve Thomason

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

Arihant handbook biology for class 11 .pdfchloefrazer622

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors

Measures of Central Tendency: Mean, Median and Mode

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Disha NEET Physics Guide for classes 11 and 12.pdf

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Q4-W6-Restating Informational Text Grade 3

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

9548086042 for call girls in Indira Nagar with room service

The byproduct of sericulture in different industries.pptx

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

Sanyam Choudhary Chemistry practical.pdf

microwave assisted reaction. General introduction

Mastering the Unannounced Regulatory Inspection

Separation of Lanthanides/ Lanthanides and Actinides

Student login on Anyboli platform.helpin

Advance Mobile Application Development class 07

The basics of sentences session 2pptx copy.pptx

The Most Excellent Way | 1 Corinthians 13

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Arihant handbook biology for class 11 .pdf

Hema dm

1. DATA INTEGRATION S.AZHAGARAMMAL M.SC(Info Tech) NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE,THENI.

2. DATA INTEGRATION: data integration is straight forward: New information is merged with information that already exists. Any business that regularly collects information is concerned with data integration. Businesses want their information to be accurate and up-to-date.

3. ACCURACY: The degree of conformity of a measure to a standard or a true value - see also Accuracy and precision. Accuracy is very hard to achieve through data-cleansing in the general case, because it requires accessing an external source of data that contains the true value: such "gold standard" data is often unavailable.

4. COMPLETENESS: The degree to which all required measures are known.  Incompleteness is almost impossible to fix with data cleansing methodology: one cannot infer facts that were not captured when the data in question was initially recorded. In the case of systems that insist certain columns should not be empty, one may work around the problem by designating a value that indicates "unknown" or "missing".

5. CONSISTENCY: The degree to which a set of measures are equivalent in across systems (see also Consistency). Inconsistency occurs when two data items in the data set contradict each other. Fixing inconsistency is not always possible: it requires a variety of strategies - e.g., deciding which data were recorded more recently. which data source is likely to be most reliable (the latter knowledge may be specific to a given organization).

6. UNIFORMITY: The degree to which a set data measures are specified using the same units of measure in all systems ( see also Unit of measure). In datasets pooled from different locales, weight may be recorded either in pounds or kilos, and must be converted to a single measure using an arithmetic transformation.

7. ERROR CORRECTION AND LOSS OF INFORMATION: The most challenging problem within data cleansing remains the correction of values to remove duplicates and invalid entries.  In many cases, the available information on such anomalies is limited and insufficient to determine the necessary transformations or corrections, leaving the deletion of such entries as a primary solution. The deletion of data, though, leads to loss of information; this loss can be particularly costly if there is a large amount of deleted data.

8. MAINTENANCE OF CLEANSED DATA: So after having performed data cleansing and achieving a data collection free of errors, one would want to avoid the re-cleansing of data in its entirety after some values in data collection change. The process should only be repeated on values that have changed; this means that a cleansing lineage would need to be kept, which would require efficient data collection and management techniques.

9. DATA CLEANSING IN VIRTUALLY INTEGRATED ENVIRONMENTS: In virtually integrated sources like IBM’s DiscoveryLink, the cleansing of data has to be performed every time the data is accessed, which considerably increases the response time and lowers efficiency.

10. DATA-CLEANSING FRAMEWORK: In many cases, it will not be possible to derive a complete data-cleansing graph to guide the process in advance. This makes data cleansing an iterative process involving significant exploration and interaction, which may require a framework in the form of a collection of methods for error detection and elimination in addition to data auditing. This can be integrated with other data- processing stages like integration and maintenance.

Hema dm

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Hema dm

Similar to Hema dm (20)

More from GowriLatha1

More from GowriLatha1 (20)

Recently uploaded

Recently uploaded (20)

Hema dm