On National Teacher Day, meet the 2024-25 Kenan Fellows
Data mining Introduction
1.
2.
3.
4.
5.
6.
7.
8. year
Evolution of data mining and
warehousing
1960’s
Data collection and database
creation
1970’s
Database Management systems
Mid 1980’s
Advanced database systems
Late 1980’s
Data warehousing and Data
mining
1990’s
Web Based Databases
2006
Information Systems
2013
Big data retrieval
9. Data Mining refers to
extracting or “mining”
knowledge from large
amounts of data
Knowledge mining from
data
Knowledge Extraction
Data/Pattern analysis
Data archaelogy
Data Dredging
Knowledge discovery from
data.
Knowledge Discovery Process:
Data cleaning
Data integration
Data selection
Data transformation
Data mining
Pattern evaluation
Knowledge presentation
10.
11.
12.
13.
Relational databases
Data Warehouses
Transactional Databases
Object Relational Databases
Temporal, Sequence and Time series
Databases
Spatial and Spatio Temporal Databases
Text and Multimedia Databases
Heterogeneous and Legacy Databases
Data Streams and WWW
18.
A set of variables
A set of messages
A set of methods
19.
A temporal database typically stores
relational data that include time-related
attributes.
These attributes may involve several
timestamps, each having different
semantics.
20.
A sequence database stores sequences
of ordered events, with or without a
concrete notion of time.
Examples include customer shopping
sequences,Web click streams, and
biological sequences.
21.
A time-series database stores sequences
of values or events obtained over
repeated measurements of time (e.g.,
hourly, daily, weekly).
Examples include data collected from the
stock xchange, inventory control, and the
observation of natural phenomena (like
temperature and wind).
22.
Data Warehouse
A data warehouse is a subjectoriented, integrated, time-variant, and
nonvolatile collection of data in support
of
management’s
decision-making
process
23.
geographic (map)
databases,
very large-scale integration (VLSI) or
computed-aided design databases,
medical and satellite image databases.
Spatial data may be represented in
raster format:
n-dimensional bit maps or pixel maps.
For example, a 2-D satellite
each pixel registers the rainfall in a given
area.
24. Maps can be represented in vector
format, where roads, bridges, buildings,
and
lakes are represented as unions or
overlays of basic geometric constructs,
such as points,
lines, polygons, and the partitions and
networks formed by these components.
25. A spatial database that stores spatial
objects that change with time is called a
spatiotemporal database,
e.g., Cricket Ball
26.
Text databases are databases that
contain word descriptions for objects.
Multimedia databases store image,
audio, and video data.
27.
A heterogeneous database consists of a
set of interconnected, autonomous
component databases.
A legacy database is a group of
heterogeneous databases that
combines different kinds of data systems,
such as relational or object-oriented
databases,hierarchical databases,
network databases, spreadsheets,
multimedia databases, or file systems.
28.
data flow in and out of an observation
platform (or window) dynamically is
generated and analyzed.
29.
Capturing user access patterns in such
distributed information environments is
called Web usage mining (or Weblog
mining).
30. › Time Variant
The Warehouse data represent the flow of data
through time. It can even contain projected
data.
› Non-Volatile
Once data enter the Data Warehouse, they
are never removed.
The Data Warehouse is always growing
34.
1984 — Metaphor Computer Systems,
founded by David Liddle and Don
Massaro, releases Data Interpretation
System (DIS).
DIS was a hardware/software package
and GUI for business users to create a
database management and analytic
system.
35. Survey (S): (2 Minutes)
The students are asked to browse the
following titles and subtitles from the
book.
Text Book:
Han and Kamber, “Data Mining”, Second
Edition, Elsevier,2008.
Page no:105-109
Page no : 2-21
36.
37.
38. 1.Data Mining is otherwise called as
a) Knowledge mining
b) Knowledge mining from large data
c) Data extraction
d) None of the above
2.In knowledge Discovery process,data mining is after which process
a) Data transformation
b) Data selection
c) Neither (a) nor (b)
d) Both
3. In which type of data warehouse, once the data enter the Data
Warehouse, they are never removed.
a) Integrated
b) Time-variant
c) Subject oriented
d) Non-Volatile
39. 4. An object relational database consists of
entities with
a) Variables
b) Messages
c) Methods
d) All the above
5.Web usage mining is otherwise called as Web
a) Web mining
b) Web log mining
c) None of the above
d) Both
40.
Specify the seven steps in KDD process?
Explain four categories of data
warehousing?
Define heterogenous and legacy
database?
What are the data mining task
primitives?
What are the different kinds of data to
be mined?
41.
A data warehouse maintains a copy of
information from the source transaction
systems. This architectural complexity
provides the opportunity to:
Congregate data from multiple sources
into a single database so a single query
engine can be used to present data.
Mitigate the problem of database
isolation level lock contention in
transaction processing systems caused by
attempts to run large, long running,
analysis queries in transaction processing
databases.
42.
Maintain data history, even if the source
transaction systems do not.
Integrate data from multiple source
systems, enabling a central view across
the enterprise. This benefit is always
valuable, but particularly so when the
organization has grown by merger.
Improve data quality, by providing
consistent codes and descriptions,
flagging or even fixing bad data.
43.
Present the organization's information
consistently.
Provide a single common data model for
all data of interest regardless of the
data's source.
Restructure the data so that it makes
sense to the business users.
Restructure the data so that it delivers
excellent query performance, even for
complex analytic queries, without
impacting the operational systems.
Add value to operational business
applications, notably customer
relationship management (CRM)