SlideShare a Scribd company logo
Submitted by,
M. Kavitha M.Sc.,
Nadar Saraswathi College of
Art & Science, Theni.
Data Mining
Data Integration and
Transformation
Data Integration
* Data Integration involves combining data from
several disparate source, which are stored using various
technologies and provide a unified view of the data.
* The later initiative is often called a data warehouse.
* It merges the data from multiple data stores (data
source).
* It includes multiple databases, data cubes or flat
files.
* Metadata, correlation analysis, data conflict detection
and resolution of semantic heterogeneity contribute towards
smooth data integration.
Advantages :
1. Independence.
2. Faster query processing.
3. Complex query processing.
4. Advanced data summarization & storage possible.
5. High volume data processing.
Disadvantages :
1. Latency (since data needs to be loaded using ETL).
2. Costlier (data localization, infrastructure, security).
There are a number of issues to consider during data integration.
1. Schema Integration.
2. Redundancy.
3. Detection and resolution of data value conflicts.
Schema integration :
The real-world entities from multiple source be matched
is referred to as the entity identification problem.
For example,
Data analyst or the computer be sure that customer_id in
one database and cust_number in another refer to the same
entity. Databases and data warehouses that is a data about the
data it’s a meta data.
Redundancy :
* It is another important issue.
* An attribute may be redundant if it can be “derived”
from another table, such as annual revenue.
* Some redundancies can be detected by correlation
analysis.
For example,
Two attributes, such analysis can measure how
strongly one attribute implies the other based on the
available data.
The correlation between attributes attribute A and B by
Detection and resolution of data value conflicts :
* A third important issue in data integration is the
detection and resolution of data value conflicts.
* The same real-world entity, attribute values from
different sources. This may be due to differences in
representation, scaling, or encoding.
* An attribute in one system may be recorded at a
lower level of abstraction than the “same” attribute in another.
* For example, the total sales in one database may
refer to one branch of All Electronics, an attribute of the same
name in another database may refer to the total sales for All
Electronics stores in a given region.
Data Transformation
* Data transformation the data are transformed or
consolidated into forms in appropriate for mining.
* Data transformation can involve
1. Smoothing.
2. Aggregation.
3. Generalization.
4. Normalization.
5. Attribute construction.
Smoothing :
Which works to remove the noise from data. Such
techniques include binning, clustering and regression.
Aggregation :
* Where summary or aggregation operations are applied
to the data.
* For example, the daily sales data may be aggregated so
as to compute monthly and annual total amounts.
Generalization :
* The data where low-level or “primitive” data are placed
by higher-level concepts through the use of concept through
the use of concept hierarchies.
* For example, the attributes like street can be
generalized to higher-level concept city or country when the
numeric attributes to higher-level concept young, middle-
aged and street.
Normalization :
Where the attribute data are scaled so as to fall within
a specified range, such as -1.0 to 1.0 or 0.0 to 1.0
Attribute construction :
Where new attribute are a constructed and added
from the given set of attributes to help the mining
process.
There are many method for data normalization.
* Min-Max normalization.
* Z-Score normalization.
* Normalization by decimal scaling.
Min – Max Normalization :
It performs a linear transformation on the original data.
Suppose that min A and max A are the minimum and
maximum values of attributes A. A Min – Max
normalization maps a value v of A to v’ in the range.
Z – Score Normalization :
The Z – Score normalization a value of an attribute A
are normalized based on the mean and standard deviation of
A. A value v of A is normalized to v’
Normalization by Decimal Scaling :
Normalization by decimal scaling normalizes by moving
the decimal point of values of attribute A.
The number of decimal points moved depends on the
maximum absolute value of A. A value v of A is normalized
to v’ by computing
where j is the smallest integer such that Max(|V’|) < 1.
Thank You

More Related Content

What's hot

Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysisHirra Sultan
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)snegacmr
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMSkoolkampus
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data MiningIffat Firozy
 
Types Of Keys in DBMS
Types Of Keys in DBMSTypes Of Keys in DBMS
Types Of Keys in DBMSPadamNepal1
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
Dbms Notes Lecture 9 : Specialization, Generalization and Aggregation
Dbms Notes Lecture 9 : Specialization, Generalization and AggregationDbms Notes Lecture 9 : Specialization, Generalization and Aggregation
Dbms Notes Lecture 9 : Specialization, Generalization and AggregationBIT Durg
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraintsmadhav bansal
 

What's hot (20)

Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysis
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Data preprocess
Data preprocessData preprocess
Data preprocess
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
DATABASE CONSTRAINTS
DATABASE CONSTRAINTSDATABASE CONSTRAINTS
DATABASE CONSTRAINTS
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 
Types Of Keys in DBMS
Types Of Keys in DBMSTypes Of Keys in DBMS
Types Of Keys in DBMS
 
2 phase locking protocol DBMS
2 phase locking protocol DBMS2 phase locking protocol DBMS
2 phase locking protocol DBMS
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Dbms Notes Lecture 9 : Specialization, Generalization and Aggregation
Dbms Notes Lecture 9 : Specialization, Generalization and AggregationDbms Notes Lecture 9 : Specialization, Generalization and Aggregation
Dbms Notes Lecture 9 : Specialization, Generalization and Aggregation
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 

Similar to Data Integration and Transformation in Data mining

Similar to Data Integration and Transformation in Data mining (20)

Data integration
Data integrationData integration
Data integration
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Unit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdfUnit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdf
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
1234
12341234
1234
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdf
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&tools
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and Comparison
 
Unit 3-2.ppt
Unit 3-2.pptUnit 3-2.ppt
Unit 3-2.ppt
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

More from kavitha muneeshwaran (13)

Physical Security
Physical SecurityPhysical Security
Physical Security
 
Digital Audio
Digital AudioDigital Audio
Digital Audio
 
Java
JavaJava
Java
 
Data structure
Data structureData structure
Data structure
 
Internet Programming with Java
Internet Programming with JavaInternet Programming with Java
Internet Programming with Java
 
Digital image processing
Digital image processing  Digital image processing
Digital image processing
 
Staffing level estimation
Staffing level estimation Staffing level estimation
Staffing level estimation
 
Transaction Management - Deadlock Handling
Transaction Management - Deadlock HandlingTransaction Management - Deadlock Handling
Transaction Management - Deadlock Handling
 
Process
ProcessProcess
Process
 
Digital Logic circuit
Digital Logic circuitDigital Logic circuit
Digital Logic circuit
 
C and C++ functions
C and C++ functionsC and C++ functions
C and C++ functions
 
I/O system in intel 80386 microcomputer architecture
I/O system in intel 80386 microcomputer architectureI/O system in intel 80386 microcomputer architecture
I/O system in intel 80386 microcomputer architecture
 
narrow Band ISDN
narrow Band ISDNnarrow Band ISDN
narrow Band ISDN
 

Recently uploaded

How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17Celine George
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleCeline George
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxbennyroshan06
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringDenish Jangid
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTechSoup
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXMIRIAMSALINAS13
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345beazzy04
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxakshayaramakrishnan21
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 
Keeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security ServicesKeeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security ServicesTechSoup
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersPedroFerreira53928
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...Nguyen Thanh Tu Collection
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...Denish Jangid
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticspragatimahajan3
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportAvinash Rai
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfVivekanand Anglo Vedic Academy
 
The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...sanghavirahi2
 

Recently uploaded (20)

How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Keeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security ServicesKeeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security Services
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...
 

Data Integration and Transformation in Data mining

  • 1. Submitted by, M. Kavitha M.Sc., Nadar Saraswathi College of Art & Science, Theni. Data Mining Data Integration and Transformation
  • 2. Data Integration * Data Integration involves combining data from several disparate source, which are stored using various technologies and provide a unified view of the data. * The later initiative is often called a data warehouse. * It merges the data from multiple data stores (data source). * It includes multiple databases, data cubes or flat files. * Metadata, correlation analysis, data conflict detection and resolution of semantic heterogeneity contribute towards smooth data integration.
  • 3. Advantages : 1. Independence. 2. Faster query processing. 3. Complex query processing. 4. Advanced data summarization & storage possible. 5. High volume data processing. Disadvantages : 1. Latency (since data needs to be loaded using ETL). 2. Costlier (data localization, infrastructure, security).
  • 4. There are a number of issues to consider during data integration. 1. Schema Integration. 2. Redundancy. 3. Detection and resolution of data value conflicts. Schema integration : The real-world entities from multiple source be matched is referred to as the entity identification problem. For example, Data analyst or the computer be sure that customer_id in one database and cust_number in another refer to the same entity. Databases and data warehouses that is a data about the data it’s a meta data.
  • 5. Redundancy : * It is another important issue. * An attribute may be redundant if it can be “derived” from another table, such as annual revenue. * Some redundancies can be detected by correlation analysis. For example, Two attributes, such analysis can measure how strongly one attribute implies the other based on the available data. The correlation between attributes attribute A and B by
  • 6. Detection and resolution of data value conflicts : * A third important issue in data integration is the detection and resolution of data value conflicts. * The same real-world entity, attribute values from different sources. This may be due to differences in representation, scaling, or encoding. * An attribute in one system may be recorded at a lower level of abstraction than the “same” attribute in another. * For example, the total sales in one database may refer to one branch of All Electronics, an attribute of the same name in another database may refer to the total sales for All Electronics stores in a given region.
  • 7. Data Transformation * Data transformation the data are transformed or consolidated into forms in appropriate for mining. * Data transformation can involve 1. Smoothing. 2. Aggregation. 3. Generalization. 4. Normalization. 5. Attribute construction. Smoothing : Which works to remove the noise from data. Such techniques include binning, clustering and regression.
  • 8. Aggregation : * Where summary or aggregation operations are applied to the data. * For example, the daily sales data may be aggregated so as to compute monthly and annual total amounts. Generalization : * The data where low-level or “primitive” data are placed by higher-level concepts through the use of concept through the use of concept hierarchies. * For example, the attributes like street can be generalized to higher-level concept city or country when the numeric attributes to higher-level concept young, middle- aged and street.
  • 9. Normalization : Where the attribute data are scaled so as to fall within a specified range, such as -1.0 to 1.0 or 0.0 to 1.0 Attribute construction : Where new attribute are a constructed and added from the given set of attributes to help the mining process. There are many method for data normalization. * Min-Max normalization. * Z-Score normalization. * Normalization by decimal scaling.
  • 10. Min – Max Normalization : It performs a linear transformation on the original data. Suppose that min A and max A are the minimum and maximum values of attributes A. A Min – Max normalization maps a value v of A to v’ in the range. Z – Score Normalization : The Z – Score normalization a value of an attribute A are normalized based on the mean and standard deviation of A. A value v of A is normalized to v’
  • 11. Normalization by Decimal Scaling : Normalization by decimal scaling normalizes by moving the decimal point of values of attribute A. The number of decimal points moved depends on the maximum absolute value of A. A value v of A is normalized to v’ by computing where j is the smallest integer such that Max(|V’|) < 1.