SlideShare a Scribd company logo
1 of 10
DATA INTEGRATION
S.AZHAGARAMMAL
M.SC(Info Tech)
NADAR SARASWATHI
COLLEGE OF ARTS AND
SCIENCE,THENI.
DATA INTEGRATION:
data integration is straight forward:
New information is merged with
information that already exists.
Any business that regularly collects
information is concerned with data
integration.
Businesses want their information to be
accurate and up-to-date.
ACCURACY:
The degree of conformity of a measure to
a standard or a true value - see also
Accuracy and precision.
Accuracy is very hard to achieve through
data-cleansing in the general case,
because it requires accessing an external
source of data that contains the true value:
such "gold standard" data is often
unavailable.
COMPLETENESS:
The degree to which all required
measures are known.
 Incompleteness is almost impossible to
fix with data cleansing methodology: one
cannot infer facts that were not captured
when the data in question was initially
recorded.
In the case of systems that insist certain
columns should not be empty, one may
work around the problem by designating a
value that indicates "unknown" or
"missing".
CONSISTENCY:
The degree to which a set of measures are
equivalent in across systems (see also
Consistency). Inconsistency occurs when two
data items in the data set contradict each other.
Fixing inconsistency is not always possible: it
requires a variety of strategies - e.g., deciding
which data were recorded more recently.
which data source is likely to be most reliable
(the latter knowledge may be specific to a given
organization).
UNIFORMITY:
The degree to which a set data
measures are specified using the
same units of measure in all
systems ( see also Unit of
measure).
In datasets pooled from different
locales, weight may be recorded
either in pounds or kilos, and must
be converted to a single measure
using an arithmetic transformation.
ERROR CORRECTION AND LOSS OF
INFORMATION:
The most challenging problem within data
cleansing remains the correction of values to
remove duplicates and invalid entries.
 In many cases, the available information on
such anomalies is limited and insufficient to
determine the necessary transformations or
corrections, leaving the deletion of such entries
as a primary solution.
The deletion of data, though, leads to loss of
information; this loss can be particularly costly if
there is a large amount of deleted data.
MAINTENANCE OF CLEANSED
DATA:
So after having performed data
cleansing and achieving a data
collection free of errors, one would want
to avoid the re-cleansing of data in its
entirety after some values in data
collection change.
The process should only be repeated
on values that have changed; this means
that a cleansing lineage would need to
be kept, which would require efficient
data collection and management
techniques.
DATA CLEANSING IN VIRTUALLY
INTEGRATED ENVIRONMENTS:
In virtually integrated sources like
IBM’s DiscoveryLink, the cleansing
of data has to be performed every
time the data is accessed, which
considerably increases the
response time and lowers
efficiency.
DATA-CLEANSING FRAMEWORK:
In many cases, it will not be possible to
derive a complete data-cleansing graph to
guide the process in advance.
This makes data cleansing an iterative
process involving significant exploration and
interaction, which may require a framework in
the form of a collection of methods for error
detection and elimination in addition to data
auditing.
This can be integrated with other data-
processing stages like integration and
maintenance.

More Related Content

What's hot

How to Draw Data Flow Diagrams
How to Draw Data Flow DiagramsHow to Draw Data Flow Diagrams
How to Draw Data Flow DiagramsJim Johnson
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data MiningIffat Firozy
 
Is 581 milestone 11 and 12 case study coastline systems consulting
Is 581 milestone 11 and 12 case study coastline systems consultingIs 581 milestone 11 and 12 case study coastline systems consulting
Is 581 milestone 11 and 12 case study coastline systems consultingprintwork4849
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...Conferenceproceedings
 
[Infographic] 4 Problems with Using Excel for Analytics
[Infographic] 4 Problems with Using Excel for Analytics[Infographic] 4 Problems with Using Excel for Analytics
[Infographic] 4 Problems with Using Excel for Analytics3C Software
 
Preparing data for spss analysis
Preparing data for spss analysis Preparing data for spss analysis
Preparing data for spss analysis Ritika Beria
 
Project Experience3
Project Experience3Project Experience3
Project Experience3ajith k
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingksamyMCA
 
Review comments nt
Review comments ntReview comments nt
Review comments nttuanhsm
 

What's hot (16)

How to Draw Data Flow Diagrams
How to Draw Data Flow DiagramsHow to Draw Data Flow Diagrams
How to Draw Data Flow Diagrams
 
XL-MINER: Associations
XL-MINER: AssociationsXL-MINER: Associations
XL-MINER: Associations
 
Data PreProcessing
Data PreProcessingData PreProcessing
Data PreProcessing
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 
Is 581 milestone 11 and 12 case study coastline systems consulting
Is 581 milestone 11 and 12 case study coastline systems consultingIs 581 milestone 11 and 12 case study coastline systems consulting
Is 581 milestone 11 and 12 case study coastline systems consulting
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...
 
[Infographic] 4 Problems with Using Excel for Analytics
[Infographic] 4 Problems with Using Excel for Analytics[Infographic] 4 Problems with Using Excel for Analytics
[Infographic] 4 Problems with Using Excel for Analytics
 
Dmblog
DmblogDmblog
Dmblog
 
Preparing data for spss analysis
Preparing data for spss analysis Preparing data for spss analysis
Preparing data for spss analysis
 
Project Experience3
Project Experience3Project Experience3
Project Experience3
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Review comments nt
Review comments ntReview comments nt
Review comments nt
 

Similar to Hema dm

Data masking techniques for Insurance
Data masking techniques for InsuranceData masking techniques for Insurance
Data masking techniques for InsuranceNIIT Technologies
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2Mahmoud Alfarra
 
Overview of Data Cleaning.pdf
Overview of Data Cleaning.pdfOverview of Data Cleaning.pdf
Overview of Data Cleaning.pdfSheetalDandge
 
The Growing Importance of Data Cleaning
The Growing Importance of Data CleaningThe Growing Importance of Data Cleaning
The Growing Importance of Data CleaningCarolineSmith912130
 
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxthegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxYashaswiniSrinivasan1
 
Data mining
Data miningData mining
Data miningSilicon
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf4dalert
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET Journal
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxAkash527744
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningShivarkarSandip
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxAbdullahAbbasi55
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGAhtesham Ullah khan
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
 
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...TELKOMNIKA JOURNAL
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedYugal Kumar
 

Similar to Hema dm (20)

Data masking techniques for Insurance
Data masking techniques for InsuranceData masking techniques for Insurance
Data masking techniques for Insurance
 
Unit2
Unit2Unit2
Unit2
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Overview of Data Cleaning.pdf
Overview of Data Cleaning.pdfOverview of Data Cleaning.pdf
Overview of Data Cleaning.pdf
 
The Growing Importance of Data Cleaning
The Growing Importance of Data CleaningThe Growing Importance of Data Cleaning
The Growing Importance of Data Cleaning
 
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxthegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptx
 
Data mining
Data miningData mining
Data mining
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
GROPSIKS.pptx
GROPSIKS.pptxGROPSIKS.pptx
GROPSIKS.pptx
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptx
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data Cleaning
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptx
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSING
 
DataPreprocessing.pptx
DataPreprocessing.pptxDataPreprocessing.pptx
DataPreprocessing.pptx
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
 
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 

More from GowriLatha1

Filtering in frequency domain
Filtering in frequency domainFiltering in frequency domain
Filtering in frequency domainGowriLatha1
 
Demand assigned and packet reservation multiple access
Demand assigned and packet reservation multiple accessDemand assigned and packet reservation multiple access
Demand assigned and packet reservation multiple accessGowriLatha1
 
Software engineering
Software engineeringSoftware engineering
Software engineeringGowriLatha1
 
Web services & com+ components
Web services & com+ componentsWeb services & com+ components
Web services & com+ componentsGowriLatha1
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databasesGowriLatha1
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databasesGowriLatha1
 
Inter process communication
Inter process communicationInter process communication
Inter process communicationGowriLatha1
 
computer network
computer networkcomputer network
computer networkGowriLatha1
 
Operating System
Operating SystemOperating System
Operating SystemGowriLatha1
 
Data mining query language
Data mining query languageData mining query language
Data mining query languageGowriLatha1
 
Path & application(ds)2
Path & application(ds)2Path & application(ds)2
Path & application(ds)2GowriLatha1
 

More from GowriLatha1 (20)

Filtering in frequency domain
Filtering in frequency domainFiltering in frequency domain
Filtering in frequency domain
 
Demand assigned and packet reservation multiple access
Demand assigned and packet reservation multiple accessDemand assigned and packet reservation multiple access
Demand assigned and packet reservation multiple access
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Shadow paging
Shadow pagingShadow paging
Shadow paging
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Hive
HiveHive
Hive
 
Web services & com+ components
Web services & com+ componentsWeb services & com+ components
Web services & com+ components
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databases
 
Recovery system
Recovery systemRecovery system
Recovery system
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databases
 
Static analysis
Static analysisStatic analysis
Static analysis
 
Data reduction
Data reductionData reduction
Data reduction
 
Inter process communication
Inter process communicationInter process communication
Inter process communication
 
computer network
computer networkcomputer network
computer network
 
Operating System
Operating SystemOperating System
Operating System
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
 
Enterprice java
Enterprice javaEnterprice java
Enterprice java
 
Ethernet
EthernetEthernet
Ethernet
 
Java script
Java scriptJava script
Java script
 
Path & application(ds)2
Path & application(ds)2Path & application(ds)2
Path & application(ds)2
 

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Hema dm

  • 1. DATA INTEGRATION S.AZHAGARAMMAL M.SC(Info Tech) NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE,THENI.
  • 2. DATA INTEGRATION: data integration is straight forward: New information is merged with information that already exists. Any business that regularly collects information is concerned with data integration. Businesses want their information to be accurate and up-to-date.
  • 3. ACCURACY: The degree of conformity of a measure to a standard or a true value - see also Accuracy and precision. Accuracy is very hard to achieve through data-cleansing in the general case, because it requires accessing an external source of data that contains the true value: such "gold standard" data is often unavailable.
  • 4. COMPLETENESS: The degree to which all required measures are known.  Incompleteness is almost impossible to fix with data cleansing methodology: one cannot infer facts that were not captured when the data in question was initially recorded. In the case of systems that insist certain columns should not be empty, one may work around the problem by designating a value that indicates "unknown" or "missing".
  • 5. CONSISTENCY: The degree to which a set of measures are equivalent in across systems (see also Consistency). Inconsistency occurs when two data items in the data set contradict each other. Fixing inconsistency is not always possible: it requires a variety of strategies - e.g., deciding which data were recorded more recently. which data source is likely to be most reliable (the latter knowledge may be specific to a given organization).
  • 6. UNIFORMITY: The degree to which a set data measures are specified using the same units of measure in all systems ( see also Unit of measure). In datasets pooled from different locales, weight may be recorded either in pounds or kilos, and must be converted to a single measure using an arithmetic transformation.
  • 7. ERROR CORRECTION AND LOSS OF INFORMATION: The most challenging problem within data cleansing remains the correction of values to remove duplicates and invalid entries.  In many cases, the available information on such anomalies is limited and insufficient to determine the necessary transformations or corrections, leaving the deletion of such entries as a primary solution. The deletion of data, though, leads to loss of information; this loss can be particularly costly if there is a large amount of deleted data.
  • 8. MAINTENANCE OF CLEANSED DATA: So after having performed data cleansing and achieving a data collection free of errors, one would want to avoid the re-cleansing of data in its entirety after some values in data collection change. The process should only be repeated on values that have changed; this means that a cleansing lineage would need to be kept, which would require efficient data collection and management techniques.
  • 9. DATA CLEANSING IN VIRTUALLY INTEGRATED ENVIRONMENTS: In virtually integrated sources like IBM’s DiscoveryLink, the cleansing of data has to be performed every time the data is accessed, which considerably increases the response time and lowers efficiency.
  • 10. DATA-CLEANSING FRAMEWORK: In many cases, it will not be possible to derive a complete data-cleansing graph to guide the process in advance. This makes data cleansing an iterative process involving significant exploration and interaction, which may require a framework in the form of a collection of methods for error detection and elimination in addition to data auditing. This can be integrated with other data- processing stages like integration and maintenance.