DATAWAREHOUSING AND MINING

BY
G.RAJESH CHANDRA
EVOLUTION OF DATABASE TECHNOLOGY


1960s (Primitive File Processing)




1970s to early 1980s (DBMS)




Data collection, database creation, IMS and network DBMS
Relational data model, relational DBMS implementation ,SQL,
OLTP,User Interfaces.etc

1980s: to Present (Advanced Data Bases)






RDBMS, advanced data models (extended-relational, OO, deductive,
etc.)

Application-oriented DBMS (spatial, scientific, engineering, etc.)

1990s: (Advanced Data Analysis)




Data mining, data warehousing, multimedia databases, and Web
databases

2000s


Stream data management and mining



Data mining and its applications
WHY MINE DATA? COMMERCIAL VIEWPOINT


Lots of data is being collected
and warehoused






Web data, e-commerce
purchases at department/
grocery stores
Bank/Credit Card
transactions

Competitive Pressure is Strong


Provide better, customized services for an edge (e.g. in
Customer Relationship Management)
WHAT IS DATA MINING…..?


•

Data mining (sometimes called data
Discovery or Knowledge Discovery Data)
is the process of analyzing data from
different perspectives and summarizing it
into useful information.
Extraction of interesting (non-trivial,
implicit, previously unknown and
potentially useful) patterns or knowledge
from huge amount of data
WHY MINE DATA? SCIENTIFIC VIEWPOINT


Data collected and stored at
enormous speeds (GB/hour)








remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene
expression data
scientific simulations
generating terabytes of data

Traditional techniques infeasible for raw
data
Data mining may help scientists



in classifying and segmenting data
in Hypothesis Formation
EXAMPLES: WHAT IS (NOT) DATA MINING?
 What is not Data

 What is Data Mining?

Mining?

– Look up phone

– Certain names are more

number in phone
directory

prevalent in certain US locations
(O’Brien, O’Rurke, O’Reilly… in
Boston area)

– Query a Web

– Group together similar
documents returned by search
engine according to their context
(e.g. Amazon rainforest,
Amazon.com,)

search engine for
information about
―Amazon‖
DATA MINING IS ALSO CALLED AS..?
•

•

Knowledge discovery (mining) in
databases (KDD), knowledge extraction,
data/pattern analysis, data archeology,
data dredging, information harvesting,
business intelligence, etc.
Real Time Example Gold Mining
DATA WARE HOUSE = COLLECTION OF DATA BASES
WE HAVE TO USE DIFFERENT METHODS
RAW DATA =DATA BASES + NOISE DATA
DATA SELECTION AND TRANSFORMATION
DATA CLEANING AND INTEGRATION
DATA MINING
PATTERN EVALUATION
KNOWLEDGE REPRASENTATION
KNOWLEDGE REPRASENTATION
December 26, 2013

KNOWLEDGE DISCOVERY (KDD) PROCESS
 Data

mining—core of
knowledge discovery
process

Pattern Evaluation

Data Mining
Task-relevant Data
Data Warehouse
Data Cleaning
Data Integration
Databases

Selection

introduction to data warehousing and mining

  • 1.
  • 2.
    EVOLUTION OF DATABASETECHNOLOGY  1960s (Primitive File Processing)   1970s to early 1980s (DBMS)   Data collection, database creation, IMS and network DBMS Relational data model, relational DBMS implementation ,SQL, OLTP,User Interfaces.etc 1980s: to Present (Advanced Data Bases)    RDBMS, advanced data models (extended-relational, OO, deductive, etc.) Application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s: (Advanced Data Analysis)   Data mining, data warehousing, multimedia databases, and Web databases 2000s  Stream data management and mining  Data mining and its applications
  • 3.
    WHY MINE DATA?COMMERCIAL VIEWPOINT  Lots of data is being collected and warehoused     Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Competitive Pressure is Strong  Provide better, customized services for an edge (e.g. in Customer Relationship Management)
  • 4.
    WHAT IS DATAMINING…..?  • Data mining (sometimes called data Discovery or Knowledge Discovery Data) is the process of analyzing data from different perspectives and summarizing it into useful information. Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data
  • 5.
    WHY MINE DATA?SCIENTIFIC VIEWPOINT  Data collected and stored at enormous speeds (GB/hour)       remote sensors on a satellite telescopes scanning the skies microarrays generating gene expression data scientific simulations generating terabytes of data Traditional techniques infeasible for raw data Data mining may help scientists   in classifying and segmenting data in Hypothesis Formation
  • 6.
    EXAMPLES: WHAT IS(NOT) DATA MINING?  What is not Data  What is Data Mining? Mining? – Look up phone – Certain names are more number in phone directory prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) – Query a Web – Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) search engine for information about ―Amazon‖
  • 7.
    DATA MINING ISALSO CALLED AS..? • • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. Real Time Example Gold Mining
  • 8.
    DATA WARE HOUSE= COLLECTION OF DATA BASES
  • 9.
    WE HAVE TOUSE DIFFERENT METHODS
  • 10.
    RAW DATA =DATABASES + NOISE DATA
  • 11.
    DATA SELECTION ANDTRANSFORMATION
  • 12.
    DATA CLEANING ANDINTEGRATION
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    December 26, 2013 KNOWLEDGEDISCOVERY (KDD) PROCESS  Data mining—core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Data Warehouse Data Cleaning Data Integration Databases Selection