Evolution of data mining and
Data collection and database
Database Management systems
Advanced database systems
Data warehousing and Data
Web Based Databases
Big data retrieval
Data Mining refers to
extracting or “mining”
knowledge from large
amounts of data
Knowledge mining from
Knowledge discovery from
Knowledge Discovery Process:
Object Relational Databases
Temporal, Sequence and Time series
Spatial and Spatio Temporal Databases
Text and Multimedia Databases
Heterogeneous and Legacy Databases
Data Streams and WWW
A set of variables
A set of messages
A set of methods
A temporal database typically stores
relational data that include time-related
These attributes may involve several
timestamps, each having different
A sequence database stores sequences
of ordered events, with or without a
concrete notion of time.
Examples include customer shopping
sequences,Web click streams, and
A time-series database stores sequences
of values or events obtained over
repeated measurements of time (e.g.,
hourly, daily, weekly).
Examples include data collected from the
stock xchange, inventory control, and the
observation of natural phenomena (like
temperature and wind).
A data warehouse is a subjectoriented, integrated, time-variant, and
nonvolatile collection of data in support
very large-scale integration (VLSI) or
computed-aided design databases,
medical and satellite image databases.
Spatial data may be represented in
n-dimensional bit maps or pixel maps.
For example, a 2-D satellite
each pixel registers the rainfall in a given
Maps can be represented in vector
format, where roads, bridges, buildings,
lakes are represented as unions or
overlays of basic geometric constructs,
such as points,
lines, polygons, and the partitions and
networks formed by these components.
A spatial database that stores spatial
objects that change with time is called a
e.g., Cricket Ball
Text databases are databases that
contain word descriptions for objects.
Multimedia databases store image,
audio, and video data.
A heterogeneous database consists of a
set of interconnected, autonomous
A legacy database is a group of
heterogeneous databases that
combines different kinds of data systems,
such as relational or object-oriented
network databases, spreadsheets,
multimedia databases, or file systems.
data flow in and out of an observation
platform (or window) dynamically is
generated and analyzed.
Capturing user access patterns in such
distributed information environments is
called Web usage mining (or Weblog
› Time Variant
The Warehouse data represent the flow of data
through time. It can even contain projected
Once data enter the Data Warehouse, they
are never removed.
The Data Warehouse is always growing
BW - Business Information Warehouse
(SAP Netweaver BI)
Microsoft SQL Server
IBM DB2 (Infosphere Warehouse)
1984 — Metaphor Computer Systems,
founded by David Liddle and Don
Massaro, releases Data Interpretation
DIS was a hardware/software package
and GUI for business users to create a
database management and analytic
Survey (S): (2 Minutes)
The students are asked to browse the
following titles and subtitles from the
Han and Kamber, “Data Mining”, Second
Page no : 2-21
1.Data Mining is otherwise called as
a) Knowledge mining
b) Knowledge mining from large data
c) Data extraction
d) None of the above
2.In knowledge Discovery process,data mining is after which process
a) Data transformation
b) Data selection
c) Neither (a) nor (b)
3. In which type of data warehouse, once the data enter the Data
Warehouse, they are never removed.
c) Subject oriented
4. An object relational database consists of
d) All the above
5.Web usage mining is otherwise called as Web
a) Web mining
b) Web log mining
c) None of the above
Specify the seven steps in KDD process?
Explain four categories of data
Define heterogenous and legacy
What are the data mining task
What are the different kinds of data to
A data warehouse maintains a copy of
information from the source transaction
systems. This architectural complexity
provides the opportunity to:
Congregate data from multiple sources
into a single database so a single query
engine can be used to present data.
Mitigate the problem of database
isolation level lock contention in
transaction processing systems caused by
attempts to run large, long running,
analysis queries in transaction processing
Maintain data history, even if the source
transaction systems do not.
Integrate data from multiple source
systems, enabling a central view across
the enterprise. This benefit is always
valuable, but particularly so when the
organization has grown by merger.
Improve data quality, by providing
consistent codes and descriptions,
flagging or even fixing bad data.
Present the organization's information
Provide a single common data model for
all data of interest regardless of the
Restructure the data so that it makes
sense to the business users.
Restructure the data so that it delivers
excellent query performance, even for
complex analytic queries, without
impacting the operational systems.
Add value to operational business
applications, notably customer
relationship management (CRM)