SlideShare a Scribd company logo
Data Mining
Data Integration
and
Transformation
Dr.J.Kalavathi. M.Sc., P.hD.,
Assistant Professor,
Department of Information Technology,
V.V.Vanniaperumal College for Women,
Virudhunagar.
Data Integration
* Data Integration involves combining data from several
disparate source, which are stored using various technologies
and provide a unified view of the data.
* The later initiative is often called a data warehouse.
* It merges the data from multiple data stores (data
source).
* It includes multiple databases, data cubes or flat
files.
* Metadata, correlation analysis, data conflict detection
and resolution of semantic heterogeneity contribute towards
smooth data integration.
Data Integration Define :
It combines data from multiple sources into a coherent
data store, as in data warehousing. These sources may include
multiple databases, data cubes, or flat files.
The data integration systems are formally defined as triple<G,S,M>
Where G: The global schema
S:Heterogeneous source of schemas
M: Mapping between the queries of source and global
schema
Advantages :
1. Independence.
2. Faster query processing.
3. Complex query processing.
4. Advanced data summarization & storage possible.
5. High volume data processing.
Disadvantages :
1. Latency (since data needs to be loaded using ETL).
2. Costlier (data localization, infrastructure, security).
Data Warehouse Approach
Data Integration Approach:
There are mainly 2 major approaches for data
integration – one is “tight coupling approach” and another is
“loose coupling approach”.
Tight Coupling:
• Here, a data warehouse is treated as an information retrieval component.
• In this coupling, data is combined from different sources into a single physical
location through the process of ETL – Extraction, Transformation and Loading.
Loose Coupling:
• Here, an interface is provided that takes the query from the user, transforms it in a
way the source database can understand and then sends the query directly to the
source databases to obtain the result.
• And the data only remains in the actual source databases.
There are a number of issues to consider during data integration.
1. Schema Integration.
2. Redundancy.
3. Detection and resolution of data value conflicts.
Schema integration :
The real-world entities from multiple source be matched
is referred to as the entity identification problem.
For example,
Data analyst or the computer be sure that customer_id in
one database and cust_number in another refer to the same
entity. Databases and data warehouses that is a data about the
data it’s a meta data.
Redundancy :
* It is another important issue.
* An attribute may be redundant if it can be “derived”
from another table, such as annual revenue.
* Some redundancies can be detected by correlation
analysis.
For example,
Two attributes, such analysis can measure how
strongly one attribute implies the other based on the
available data.
The correlation between attributes attribute A and Bby
Detection and resolution of data value conflicts :
* A third important issue in data integration is the
detection and resolution of data value conflicts.
* The same real-world entity, attribute values from
different sources. This may be due to differences in
representation, scaling, or encoding.
* An attribute in one system may be recorded at a
lower level of abstraction than the “same” attribute in another.
* For example, the total sales in one database may
refer to one branch of All Electronics, an attribute of the same
name in another database may refer to the total sales for All
Electronics stores in a given region.
Data Transformation
* Data transformation the data are transformed or
consolidated into forms in appropriate for mining.
* Data transformation can involve
1. Smoothing.
2. Aggregation.
3. Generalization.
4. Normalization.
5. Attribute construction.
Such
Smoothing :
Which works to remove the noise from data.
techniques include binning, clustering and regression.
Aggregation :
* Where summary or aggregation operations are applied to
the data.
* For example, the daily sales data may be aggregated so
as to compute monthly and annual total amounts.
Generalization :
* The data where low-level or “primitive” data are placed
by higher-level concepts through the use of concept through
the use of concept hierarchies.
* For example, the attributes like street can be generalized
to higher-level concept city or country when the numeric
attributes to higher-level concept young, middle- aged and
street.
Normalization :
Where the attribute data are scaled so as to fall within
a specified range, such as -1.0 to 1.0 or 0.0 to 1.0
Attribute construction :
Where new attribute are a constructed and added
from the given set of attributes to help the mining
process.
There are many method for data normalization.
* Min-Max normalization.
* Z-Score normalization.
* Normalization by decimal scaling.
Min – Max Normalization :
It performs a linear transformation on the original data.
Suppose that min A and max A are the minimum and
maximum values of attributes A. A Min – Max
normalization maps a value v ofAto v’in the range.
Z – Score Normalization :
The Z – Score normalization a value of an attribute A
are normalized based on the mean and standard deviation of
A.Avalue v ofAis normalized to v’
Normalization by Decimal Scaling :
Normalization by decimal scaling normalizes by moving
the decimal point of values of attribute A.
The number of decimal points moved depends on the
maximum absolute value of A. A value v of A is normalized
to v’ by computing
where j is the smallest integer such that Max(|V’|) < 1.
Thank You

More Related Content

What's hot

Data warehousing
Data warehousingData warehousing
Data warehousing
Shruti Dalela
 
Kdd process
Kdd processKdd process
Kdd process
Rajesh Chandra
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
Shashikant Kumar
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
Dr. C.V. Suresh Babu
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
kunj desai
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
DataminingTools Inc
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
GowriLatha1
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
SSaudia
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
Salah Amean
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
lavanya marichamy
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
MachinePulse
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
Bernard Marr
 
Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition system
REHMAT ULLAH
 
OLAP in Data Warehouse
OLAP in Data WarehouseOLAP in Data Warehouse
OLAP in Data Warehouse
SOMASUNDARAM T
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
J M
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
Rotary Club of North Raleigh
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
Amanda Whitmire
 

What's hot (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Kdd process
Kdd processKdd process
Kdd process
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Data mart
Data martData mart
Data mart
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
 
Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition system
 
OLAP in Data Warehouse
OLAP in Data WarehouseOLAP in Data Warehouse
OLAP in Data Warehouse
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 

Similar to Data integration

Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
kavitha muneeshwaran
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
Lakshmi Sarvani Videla
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
T Kavitha
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
DrGnaneswariG
 
U - 2 Emerging.pptx
U - 2 Emerging.pptxU - 2 Emerging.pptx
U - 2 Emerging.pptx
MulukenTamrat2
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
Nandakumar P
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
IJTET Journal
 
data mining
data miningdata mining
data mining
manasa polu
 
Chapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptxChapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptx
JethroDignadice2
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 
Chapter 2 Cond (1).ppt
Chapter 2 Cond (1).pptChapter 2 Cond (1).ppt
Chapter 2 Cond (1).ppt
kannaradhas
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
DataminingTools Inc
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
Datamining Tools
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
SamPrem3
 
Unit i
Unit iUnit i
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
IRJET Journal
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
PothyeswariPothyes
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
PalaniKumarR2
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Mahir Haque
 

Similar to Data integration (20)

Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
1234
12341234
1234
 
U - 2 Emerging.pptx
U - 2 Emerging.pptxU - 2 Emerging.pptx
U - 2 Emerging.pptx
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
 
data mining
data miningdata mining
data mining
 
Chapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptxChapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptx
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Chapter 2 Cond (1).ppt
Chapter 2 Cond (1).pptChapter 2 Cond (1).ppt
Chapter 2 Cond (1).ppt
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
Unit i
Unit iUnit i
Unit i
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 

More from kalavathisugan

Serial Communication.pptx
Serial Communication.pptxSerial Communication.pptx
Serial Communication.pptx
kalavathisugan
 
Timer and counting.pptx
Timer and counting.pptxTimer and counting.pptx
Timer and counting.pptx
kalavathisugan
 
SS-assemblers 1.pptx
SS-assemblers 1.pptxSS-assemblers 1.pptx
SS-assemblers 1.pptx
kalavathisugan
 
SS-CISC -1.pptx
SS-CISC -1.pptxSS-CISC -1.pptx
SS-CISC -1.pptx
kalavathisugan
 
SS-SIC (1).pptx
SS-SIC (1).pptxSS-SIC (1).pptx
SS-SIC (1).pptx
kalavathisugan
 
Chapter 3.4.pptx
Chapter 3.4.pptxChapter 3.4.pptx
Chapter 3.4.pptx
kalavathisugan
 
Cloud Computing 1.3.pptx
Cloud Computing 1.3.pptxCloud Computing 1.3.pptx
Cloud Computing 1.3.pptx
kalavathisugan
 
Cloud computing 2.pptx
Cloud computing 2.pptxCloud computing 2.pptx
Cloud computing 2.pptx
kalavathisugan
 
Data reduction
Data reductionData reduction
Data reduction
kalavathisugan
 
Data pre processing
Data pre processingData pre processing
Data pre processing
kalavathisugan
 
Games
GamesGames
Functions in c
Functions in cFunctions in c
Functions in c
kalavathisugan
 
Structures in c
Structures in cStructures in c
Structures in c
kalavathisugan
 

More from kalavathisugan (13)

Serial Communication.pptx
Serial Communication.pptxSerial Communication.pptx
Serial Communication.pptx
 
Timer and counting.pptx
Timer and counting.pptxTimer and counting.pptx
Timer and counting.pptx
 
SS-assemblers 1.pptx
SS-assemblers 1.pptxSS-assemblers 1.pptx
SS-assemblers 1.pptx
 
SS-CISC -1.pptx
SS-CISC -1.pptxSS-CISC -1.pptx
SS-CISC -1.pptx
 
SS-SIC (1).pptx
SS-SIC (1).pptxSS-SIC (1).pptx
SS-SIC (1).pptx
 
Chapter 3.4.pptx
Chapter 3.4.pptxChapter 3.4.pptx
Chapter 3.4.pptx
 
Cloud Computing 1.3.pptx
Cloud Computing 1.3.pptxCloud Computing 1.3.pptx
Cloud Computing 1.3.pptx
 
Cloud computing 2.pptx
Cloud computing 2.pptxCloud computing 2.pptx
Cloud computing 2.pptx
 
Data reduction
Data reductionData reduction
Data reduction
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Games
GamesGames
Games
 
Functions in c
Functions in cFunctions in c
Functions in c
 
Structures in c
Structures in cStructures in c
Structures in c
 

Recently uploaded

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 

Recently uploaded (20)

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 

Data integration

  • 1. Data Mining Data Integration and Transformation Dr.J.Kalavathi. M.Sc., P.hD., Assistant Professor, Department of Information Technology, V.V.Vanniaperumal College for Women, Virudhunagar.
  • 2. Data Integration * Data Integration involves combining data from several disparate source, which are stored using various technologies and provide a unified view of the data. * The later initiative is often called a data warehouse. * It merges the data from multiple data stores (data source). * It includes multiple databases, data cubes or flat files. * Metadata, correlation analysis, data conflict detection and resolution of semantic heterogeneity contribute towards smooth data integration.
  • 3. Data Integration Define : It combines data from multiple sources into a coherent data store, as in data warehousing. These sources may include multiple databases, data cubes, or flat files. The data integration systems are formally defined as triple<G,S,M> Where G: The global schema S:Heterogeneous source of schemas M: Mapping between the queries of source and global schema
  • 4. Advantages : 1. Independence. 2. Faster query processing. 3. Complex query processing. 4. Advanced data summarization & storage possible. 5. High volume data processing. Disadvantages : 1. Latency (since data needs to be loaded using ETL). 2. Costlier (data localization, infrastructure, security).
  • 6. Data Integration Approach: There are mainly 2 major approaches for data integration – one is “tight coupling approach” and another is “loose coupling approach”. Tight Coupling: • Here, a data warehouse is treated as an information retrieval component. • In this coupling, data is combined from different sources into a single physical location through the process of ETL – Extraction, Transformation and Loading. Loose Coupling: • Here, an interface is provided that takes the query from the user, transforms it in a way the source database can understand and then sends the query directly to the source databases to obtain the result. • And the data only remains in the actual source databases.
  • 7. There are a number of issues to consider during data integration. 1. Schema Integration. 2. Redundancy. 3. Detection and resolution of data value conflicts. Schema integration : The real-world entities from multiple source be matched is referred to as the entity identification problem. For example, Data analyst or the computer be sure that customer_id in one database and cust_number in another refer to the same entity. Databases and data warehouses that is a data about the data it’s a meta data.
  • 8. Redundancy : * It is another important issue. * An attribute may be redundant if it can be “derived” from another table, such as annual revenue. * Some redundancies can be detected by correlation analysis. For example, Two attributes, such analysis can measure how strongly one attribute implies the other based on the available data. The correlation between attributes attribute A and Bby
  • 9. Detection and resolution of data value conflicts : * A third important issue in data integration is the detection and resolution of data value conflicts. * The same real-world entity, attribute values from different sources. This may be due to differences in representation, scaling, or encoding. * An attribute in one system may be recorded at a lower level of abstraction than the “same” attribute in another. * For example, the total sales in one database may refer to one branch of All Electronics, an attribute of the same name in another database may refer to the total sales for All Electronics stores in a given region.
  • 10. Data Transformation * Data transformation the data are transformed or consolidated into forms in appropriate for mining. * Data transformation can involve 1. Smoothing. 2. Aggregation. 3. Generalization. 4. Normalization. 5. Attribute construction. Such Smoothing : Which works to remove the noise from data. techniques include binning, clustering and regression.
  • 11. Aggregation : * Where summary or aggregation operations are applied to the data. * For example, the daily sales data may be aggregated so as to compute monthly and annual total amounts. Generalization : * The data where low-level or “primitive” data are placed by higher-level concepts through the use of concept through the use of concept hierarchies. * For example, the attributes like street can be generalized to higher-level concept city or country when the numeric attributes to higher-level concept young, middle- aged and street.
  • 12. Normalization : Where the attribute data are scaled so as to fall within a specified range, such as -1.0 to 1.0 or 0.0 to 1.0 Attribute construction : Where new attribute are a constructed and added from the given set of attributes to help the mining process. There are many method for data normalization. * Min-Max normalization. * Z-Score normalization. * Normalization by decimal scaling.
  • 13. Min – Max Normalization : It performs a linear transformation on the original data. Suppose that min A and max A are the minimum and maximum values of attributes A. A Min – Max normalization maps a value v ofAto v’in the range. Z – Score Normalization : The Z – Score normalization a value of an attribute A are normalized based on the mean and standard deviation of A.Avalue v ofAis normalized to v’
  • 14. Normalization by Decimal Scaling : Normalization by decimal scaling normalizes by moving the decimal point of values of attribute A. The number of decimal points moved depends on the maximum absolute value of A. A value v of A is normalized to v’ by computing where j is the smallest integer such that Max(|V’|) < 1.