Knowledge discovery:process of building and implementing a data mining solution
Data Mining Overview
Knowledge Discovery in Databases, KDD
No one data mining approach
each tool viewed logically as application of client
Can reside on separate machine or in separate process and access data warehouse
RDBMS or proprietary OLAP embed data mining capabilities deeply within engines to improve efficiency and add extensions
Requires a good foundation in terms of a data warehouse
Data Mining Overview (con’t)
Common algorithmic approaches
association, affinity grouping
predicting, sequence-based analysis
Steps are:data selection, data transformation,data mining,result interpretation.
Strategic Benefit of Data Mining
Forecasting in Financial Markets
Why Data Mining Now?
Unprecedented affordability of MIPS and MB
Enormous amounts of data can be processed
Popularity of data warehouses, data marts
Relatively clean data available
Data Mining compared to Traditional Analysis
Did sales of product X increase in Nov.?
Do sales of product X decrease when there is a promotion on product Y?
Data mining is result oriented
What are the factors that determine sales of product X?
Data Mining compared to Traditional Analysis (con’t)
Traditional; analysis is incremental
Does billing level affect turnover?
Does location affect turnover?
Analyst builds model step by step
Data Mining is result oriented
Identify the factors and predict turnover
Steps in Data Mining
Data Manipulation - can be 70-80% of data mining effort
Defining a study
Supervised-articulating goal, choosing dependent variable or output and specifying data fields
Unsupervised-group similar types of data or identify exceptions
Steps in Data Mining (con’t)
Reading the data and building the model
model summarizes large amounts of data by accumulating indicators (frequencies,weight,conjunctions,differentiation)
Understanding the model
Know the particular model
Choose the best outcome based on historical data
Artificial intelligence system that mimics the evolutionary, survival-of-the-fittest processes to generate increasingly better solutions to a problem.
Genetic algorithms produce several generations of solutions, choosing the best of the current set for each new generation.
Generating human faces based on a few known features.
Generating solutions to routing problems.
Generating stock portfolios.
EVOLUTION IN GENETIC ALGORITHMS
SELECTION - or survival of the fittest. The key is to give preference to better outcomes.
CROSSOVER - combining portions of good outcomes in the hope of creating an even better outcome.
MUTATION - randomly trying combinations and evaluating the success (or failure) of the outcome.
Mathematical Model of the Way a Brain Functions
Machine learning approach by which historical data can be examined for pattern recognition
A neural network simulates the human ability to classify things based on the experience of seeing many examples .
Pros -Numerical Data
Cons - Opaque, Art or Science
Distinguishing different chemical compounds
Detecting anomalies in human tissue that may signify disease
Detecting fraud in credit card use
Software entities that carry out some set of operations on behalf of user or program with some degree of autonomy and employ some knowledge or representation of users goals and desires.
Some common characteristics
ability to communicate, cooperate and coordinate with other agents
ability to act autonomously to achieve collective goal of system
Intelligent Agents (con’t)
automate repetitive tasks
finding and filtering information
summarizing complex data
Capability to learn and make recommendations
Black box approach hides complexity and allows for design of scalable system
Comparison AI System Expert Systems Neural Networks Genetic Algorithms Intelligent Agents Problem Type Diagnostic or prescriptive Identification, classification, prediction Optimal solution Specific and repetitive tasks Based On Strategies of experts The human brain Biological evolution One or more AI techniques Starting Information Expert’s know-how Acceptable patterns Set of possible solutions Your preferences