Data Mining Group Members Alisha Korpal Nivia Jain Sharuti Jain
Data Mining ?
Huge amounts of data
Electronic record of our decisions
Choices in the supermarket
Data vs. Information
Data : Collection of raw data ,
facts and figures.
Information : processed form of data
Extracting or “mining” knowledge from large amounts of data
Data – driven discovery and modeling of hidden patterns in large volumes of data
Extraction of interesting (non trivial, implicit, previously and potentially useful) information or patterns from data in large databases .
Data Mining Process
Defining the problem
Exploring and validating Models
Deploying and Updating models
Data Mining Process
Defining the Problem
What are you looking for? What types of relationships are you trying to find?
Do you want to make predictions from the data mining model, or just look for interesting patterns and associations?
Which attribute of the dataset do you want to try to predict?
How are the columns related? If there are multiple tables, how are the tables related?
Does the problem you are trying to solve reflect the policies or processes of the business?
You must understand the data in order to make appropriate decisions when you create the mining models. Exploration techniques include calculating the minimum and maximum values, calculating mean and standard deviations, and looking at the distribution of the data.
Exploring and Validating Models
Deploying and Updating Models
Evolution of Data Mining
Data collection -1960s
Data access - 1980s
Data Warehousing & decision support -1990s
Data Mining -Emerging Today
Prospective, proactive information delivery Advanced algorithms, multiprocessor computers, massive databases "What’s likely to happen to Boston unit sales next month? Why?" Data Mining (Emerging Today) Retrospective, dynamic data delivery at multiple levels On-line analytic processing (OLAP), multidimensional databases, data warehouses "What were unit sales in New England last March? Drill down to Boston." Data Warehousing & Decision Support (1990s) Retrospective, dynamic data delivery at record level Relational databases (RDBMS), Structured Query Language (SQL), ODBC "What were unit sales in New England last March?" Data Access (1980s) Retrospective, static data delivery Computers, tapes, disks "What was my total revenue in the last five years?" Data Collection (1960s) Characteristics Enabling Technologies Business Question Evolutionary Step
Data mining Vs OLAP
On-line Analytical Processing
Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening
Scope of Data Mining
Automated prediction of trends and behaviors
Automated discovery of previously unknown patterns