Data Integration is a data processing technique that collects data from different sources (such as data cubes, multiple databases, and flat files) and offers a unified view of the data to the users. Data integration in data mining connects with issues such as duplicate data, inconsistent data, old systems, etc. Manual data integration can be achieved through middleware and applications.
2. Key Points
Why Is Data Integration In Data Mining Important?
What are two major systems for data integration?
What are the Issues Of Data Integration in Data
Mining?
3. Why Is Data Integration In Data
Mining Important?
Data Integration is a data processing
technique that collects data from
different sources (such as data
cubes, multiple databases, and flat
files) and offers a unified view of the
data to the users.
4. Data integration in data mining
connects with issues such as duplicate
data, inconsistent data, old systems,
etc. Manual data integration can be
achieved through middleware and
applications.
5. What are two major systems for data integration?
Tight Coupling
Loose Coupling
There are primarily 2 major systems for data integration
which are as follows:
6. Tight Coupling
In this method, the data warehouse is
treated as an information recovery
feature. The process is known as ETL
which means Extraction,
Transformation, and Loading.
7. Loose Coupling
In this method, an interface is offered
that listens to a query from the user
and transforms it to the source
database and then sends the query
directly to the reference databases
and obtains a great result.
8. What are the Issues Of Data
Integration in Data Mining?
There are no problems during data
integration in data mining: Schema
Integration, Redundancy, Detection and
explanation of data value disputes.
10. 1. Schema Integration - It integrates metadata from multiple
sources and the real-world entities are matched with the entity
identification problem.
2. Redundancy - An attribute may be duplicative or obtain
redundancy. When the attributes are inconsistent, they may appear
as duplicates in the resulting data set.
11. 3. Detection and explanation of data value
disputes - This is the third critical issue in
data integration. Here the attribute values
collected from different sources may vary
for the exact real-world entity. An attribute
collected in a system may be registered at
a lower level of generalisation as
compared with the “same” characteristic
in another.