year

Evolution of data mining and
warehousing

1960’s

Data collection and database
creation

1970’s

Database Management...
Data Mining refers to
extracting or “mining”
knowledge from large
amounts of data
 Knowledge mining from
data
 Knowledge...










Relational databases
Data Warehouses
Transactional Databases
Object Relational Databases
Temporal, Sequ...
1.Relational database




A set of variables
A set of messages
A set of methods




A temporal database typically stores
relational data that include time-related
attributes.
These attributes may invo...





A sequence database stores sequences
of ordered events, with or without a
concrete notion of time.
Examples includ...




A time-series database stores sequences
of values or events obtained over
repeated measurements of time (e.g.,
hourl...


Data Warehouse
A data warehouse is a subjectoriented, integrated, time-variant, and
nonvolatile collection of data in s...






geographic (map)
databases,
very large-scale integration (VLSI) or
computed-aided design databases,
medical and...
Maps can be represented in vector
format, where roads, bridges, buildings,
and
 lakes are represented as unions or
overla...
A spatial database that stores spatial
objects that change with time is called a
spatiotemporal database,
e.g., Cricket Ba...



Text databases are databases that
contain word descriptions for objects.
Multimedia databases store image,
audio, and...




A heterogeneous database consists of a
set of interconnected, autonomous
component databases.
A legacy database is a...


data flow in and out of an observation
platform (or window) dynamically is
generated and analyzed.


Capturing user access patterns in such
distributed information environments is
called Web usage mining (or Weblog
minin...
› Time Variant
 The Warehouse data represent the flow of data
through time. It can even contain projected
data.

› Non-Vo...








Teradata
Oracle
SAP
BW - Business Information Warehouse
(SAP Netweaver BI)
Microsoft SQL Server
IBM DB2 (In...




1984 — Metaphor Computer Systems,
founded by David Liddle and Don
Massaro, releases Data Interpretation
System (DIS)...
Survey (S): (2 Minutes)
The students are asked to browse the
following titles and subtitles from the
book.
Text Book:
Han ...
1.Data Mining is otherwise called as
a) Knowledge mining
b) Knowledge mining from large data
c) Data extraction
d) None of...
4. An object relational database consists of
entities with
a) Variables
b) Messages
c) Methods
d) All the above
5.Web usag...






Specify the seven steps in KDD process?
Explain four categories of data
warehousing?
Define heterogenous and le...






A data warehouse maintains a copy of
information from the source transaction
systems. This architectural complexi...





Maintain data history, even if the source
transaction systems do not.
Integrate data from multiple source
systems,...








Present the organization's information
consistently.
Provide a single common data model for
all data of inter...
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Data mining Introduction
Upcoming SlideShare
Loading in …5
×

Data mining Introduction

633 views
504 views

Published on

Data mining and data warehousing

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
633
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
74
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data mining Introduction

  1. 1. year Evolution of data mining and warehousing 1960’s Data collection and database creation 1970’s Database Management systems Mid 1980’s Advanced database systems Late 1980’s Data warehousing and Data mining 1990’s Web Based Databases 2006 Information Systems 2013 Big data retrieval
  2. 2. Data Mining refers to extracting or “mining” knowledge from large amounts of data  Knowledge mining from data  Knowledge Extraction  Data/Pattern analysis  Data archaelogy  Data Dredging  Knowledge discovery from data. Knowledge Discovery Process:  Data cleaning  Data integration  Data selection  Data transformation  Data mining  Pattern evaluation  Knowledge presentation
  3. 3.          Relational databases Data Warehouses Transactional Databases Object Relational Databases Temporal, Sequence and Time series Databases Spatial and Spatio Temporal Databases Text and Multimedia Databases Heterogeneous and Legacy Databases Data Streams and WWW
  4. 4. 1.Relational database
  5. 5.    A set of variables A set of messages A set of methods
  6. 6.   A temporal database typically stores relational data that include time-related attributes. These attributes may involve several timestamps, each having different semantics.
  7. 7.    A sequence database stores sequences of ordered events, with or without a concrete notion of time. Examples include customer shopping sequences,Web click streams, and biological sequences.
  8. 8.   A time-series database stores sequences of values or events obtained over repeated measurements of time (e.g., hourly, daily, weekly). Examples include data collected from the stock xchange, inventory control, and the observation of natural phenomena (like temperature and wind).
  9. 9.  Data Warehouse A data warehouse is a subjectoriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process
  10. 10.      geographic (map) databases, very large-scale integration (VLSI) or computed-aided design databases, medical and satellite image databases. Spatial data may be represented in raster format:  n-dimensional bit maps or pixel maps. For example, a 2-D satellite  each pixel registers the rainfall in a given area. 
  11. 11. Maps can be represented in vector format, where roads, bridges, buildings, and  lakes are represented as unions or overlays of basic geometric constructs, such as points,  lines, polygons, and the partitions and networks formed by these components.
  12. 12. A spatial database that stores spatial objects that change with time is called a spatiotemporal database, e.g., Cricket Ball 
  13. 13.   Text databases are databases that contain word descriptions for objects. Multimedia databases store image, audio, and video data.
  14. 14.   A heterogeneous database consists of a set of interconnected, autonomous component databases. A legacy database is a group of heterogeneous databases that combines different kinds of data systems, such as relational or object-oriented databases,hierarchical databases, network databases, spreadsheets, multimedia databases, or file systems.
  15. 15.  data flow in and out of an observation platform (or window) dynamically is generated and analyzed.
  16. 16.  Capturing user access patterns in such distributed information environments is called Web usage mining (or Weblog mining).
  17. 17. › Time Variant  The Warehouse data represent the flow of data through time. It can even contain projected data. › Non-Volatile  Once data enter the Data Warehouse, they are never removed.  The Data Warehouse is always growing
  18. 18.        Teradata Oracle SAP BW - Business Information Warehouse (SAP Netweaver BI) Microsoft SQL Server IBM DB2 (Infosphere Warehouse) SAS
  19. 19.   1984 — Metaphor Computer Systems, founded by David Liddle and Don Massaro, releases Data Interpretation System (DIS). DIS was a hardware/software package and GUI for business users to create a database management and analytic system.
  20. 20. Survey (S): (2 Minutes) The students are asked to browse the following titles and subtitles from the book. Text Book: Han and Kamber, “Data Mining”, Second Edition, Elsevier,2008.  Page no:105-109  Page no : 2-21
  21. 21. 1.Data Mining is otherwise called as a) Knowledge mining b) Knowledge mining from large data c) Data extraction d) None of the above 2.In knowledge Discovery process,data mining is after which process a) Data transformation b) Data selection c) Neither (a) nor (b) d) Both 3. In which type of data warehouse, once the data enter the Data Warehouse, they are never removed. a) Integrated b) Time-variant c) Subject oriented d) Non-Volatile
  22. 22. 4. An object relational database consists of entities with a) Variables b) Messages c) Methods d) All the above 5.Web usage mining is otherwise called as Web a) Web mining b) Web log mining c) None of the above d) Both
  23. 23.      Specify the seven steps in KDD process? Explain four categories of data warehousing? Define heterogenous and legacy database? What are the data mining task primitives? What are the different kinds of data to be mined?
  24. 24.    A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to: Congregate data from multiple sources into a single database so a single query engine can be used to present data. Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases.
  25. 25.    Maintain data history, even if the source transaction systems do not. Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger. Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
  26. 26.      Present the organization's information consistently. Provide a single common data model for all data of interest regardless of the data's source. Restructure the data so that it makes sense to the business users. Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. Add value to operational business applications, notably customer relationship management (CRM)

×