Artificial Neural Networks for Data Mining

Dr. Kamal Gulati
Artificial Neural
Networks for data mining

Data Mining: Classification and
Prediction
• 1. Classification with decision trees
• 2. Artificial Neural Networks

1. CLASSIFICATION WITH DECISION
TREES
• Classification is the process of learning a model
that describes different classes of data. The
classes are predetermined.
• Example: In a banking application, customers
who apply for a credit card may be classify as a
“good risk”, a “fair risk” or a “poor risk”. Hence,
this type of activity is also called supervised
learning.
• Once the model is built, then it can be used to
classify new data.

• The first step, of learning the model, is accomplished by using a
training set of data that has already been classified. Each record in the
training data contains an attribute, called the class label, that indicates
which class the record belongs to.
• The model that is produced is usually in the form of a decision tree or a
set of rules.
• Some of the important issues with regard to the model and the
algorithm that produces the model include:
– the model’s ability to predict the correct class of the new data,
– the computational cost associated with the algorithm
– the scalability of the algorithm.
• Let examine the approach where the model is in the form of a decision
tree.
• A decision tree is simply a graphical representation of the description
of each class or in other words, a representation of the classification
rules.

• Example : Suppose that we have a database of
customers on the AllEletronics mailing list. The
database describes attributes of the customers, such as
their name, age, income, occupation, and credit rating.
The customers can be classified as to whether or not
they have purchased a computer at AllElectronics.
• Suppose that new customers are added to the
database and that you would like to notify these
customers of an upcoming computer sale. To send out
promotional literature to every new customers in the
database can be quite costly. A more cost-efficient
method would be to target only those new customers
who are likely to purchase a new computer. A
classification model can be constructed and used for
this purpose.
• The figure 2 shows a decision tree for the concept
buys_computer, indicating whether or not a customer
at AllElectronics is likely to purchase a computer.

Each internal node
represents a test on an
attribute. Each leaf node
represents a class.
A decision tree for the concept buys_computer, indicating whether or not a customer
at AllElectronics is likely to purchase a computer.

Training data tuples from the AllElectronics customer
database
age income student credit_rating
<=30 high no fair
<=30 high no excellent
31…40 high no fair
>40 medium no fair
>40 low yes fair
>40 low yes excellent
31…40 low yes excellent
<=30 medium no fair
<=30 low yes fair
>40 medium yes fair
<=30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
>40 medium no excellent
Class
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No

8
age?
<= 30 >40
31…40
income student credit_rating class
high no fair no
high no excellent no
medium no fair no
low yes fair yes
medium yes excellent yes
high no fair yes
low yes excellent yes
medium no excellent yes
high yes fair yes
medium no fair yes
low yes fair yes
low yes excellent no
medium yes fair yes
medium no excellent no

9
Extracting Classification Rules from Trees
• Represent the knowledge in the form of IF-THEN rules
• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are easier for humans to understand.
Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer =
“no”
IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “yes”

10
1. NEURAL NETWORK REPRESENTATION
• An ANN is composed of processing elements called or perceptrons,
organized in different ways to form the network’s structure.
Processing Elements
• An ANN consists of perceptrons. Each of the perceptrons receives
inputs, processes inputs and delivers a single output.
The input can be raw input
data or the output of
other perceptrons. The
output can be the final
result (e.g. 1 means yes, 0
means no) or it can be
inputs to other
perceptrons.

11
The network
• Each ANN is composed of a collection of perceptrons grouped
in layers. A typical structure is shown in Fig.2.
Note the three layers:
input, intermediate
(called the hidden layer)
and output.
Several hidden layers can
be placed between the
input and output layers.
Figure 2

12
Appropriate Problems for Neural Network
• ANN learning is well-suited to problems in which the training data
corresponds to noisy, complex sensor data. It is also applicable to
problems for which more symbolic representations are used.
• The backpropagation (BP) algorithm is the most commonly used ANN
learning technique. It is appropriate for problems with the
characteristics:
– Input is high-dimensional discrete or real-valued (e.g. raw sensor input)
– Output is discrete or real valued
– Output is a vector of values
– Possibly noisy data
– Long training times accepted
– Fast evaluation of the learned function required.
– Not important for humans to understand the weights
• Examples:
– Speech phoneme recognition
– Image classification
– Financial prediction

13
NEURAL NETWORK APPLICATION
DEVELOPMENT
The development process for an ANN application has eight steps.
• Step 1: (Data collection) The data to be used for the training and
testing of ANN are collected. Important considerations
are that the particular problem is amenable to ANN solution and that
adequate data exist and can be obtained.
• Step 2: (Training and testing data separation) Trainning data must be
identified, and a plan must be made for testing the performance of
ANN. The available data are divided into training and testing data sets.
For a moderately sized data set, 80% of the data are randomly selected
for training, 10% for testing, and 10% secondary testing.
• Step 3: (Network architecture) A network architecture and a learning
method are selected. Important considerations are the exact number
of nodes and the number of layers.

14
• Step 4: (Parameter tuning and weight initialization) There are
parameters for tuning ANN to the desired learning
performance level. Part of this step is initialization of the
network weights and parameters, followed by modification of
the parameters as training performance feedback is received.
– Often, the initial values are important in determining the effectiveness
and length of training.
• Step 5: (Data transformation) Transforms the application data
into the type and format required by the ANN.
• Step 6: (Training) Training is conducted iteratively by
presenting input and known output data to the ANN. The ANN
computes the outputs and adjusts the weights until the
computed outputs are within an acceptable tolerance of the
known outputs for the input cases.

15
• Step 7: (Testing) Once the training has been completed, it is
necessary to test the network.
– The testing examines the performance of ANN using the derived
weights by measuring the ability of the network to classify the
testing data correctly.
– Black-box testing (comparing test results to historical results) is the
primary approach for verifying that inputs produce the appropriate
outputs.
• Step 8: (Implementation) Now a stable set of weights are
obtained.
– Now ANN can reproduce the desired output given inputs like those
in the training set.
– The ANN is ready to use as a stand-alone system or as part of
another software system where new input data will be presented
to it and its output will be a recommended decision.

16
BENEFITS AND LIMITATIONS OF NEURAL NETWORKS
6.1 Benefits of ANNs
• Usefulness for pattern recognition, classification, generalization,
abstraction and interpretation of imcomplete and noisy inputs. (e.g.
handwriting recognition, image recognition, voice and speech
recognition, weather forecasing).
• Providing some human characteristics to problem solving that are
difficult to simulate using the logical, analytical techniques of expert
systems and standard software technologies. (e.g. financial
applications).
• Ability to solve new kinds of problems. ANNs are particularly effective
at solving problems whose solutions are difficult to define. This
opened up a new range of decision support applications formerly
either difficult or impossible to computerize.

[Artificial] Neural Networks
• A class of powerful, general-purpose tools readily applied to:
– Prediction
– Classification
– Clustering
• Biological Neural Net (human brain) is the most powerful – we
can generalize from experience
• Computers are best at following pre-determined instructions
• Computerized Neural Nets attempt to bridge the gap
– Predicting time-series in financial world
– Diagnosing medical conditions
– Identifying clusters of valuable customers
– Fraud detection
– Etc…

Neural Networks
• When applied in well-defined domains, their ability
to generalize and learn from data “mimics” a
human’s ability to learn from experience.
• Very useful in Data Mining…better results are the
hope
• Drawback – training a neural network results in
internal weights distributed throughout the network
making it difficult to understand why a solution is
valid

Neural Networks
What is a Neural Network?
Similarity with biological network
Fundamental processing elements of a neural network
is a neuron
1.Receives inputs from other source
2.Combines them in someway
3.Performs a generally nonlinear operation on the result
4.Outputs the final result
•Biologically motivated approach to
machine learning

Neural Network History
• 1930s thru 1970s
• 1980s:
– Back propagation – better way of training a neural net
– Computing power became available
– Researchers became more comfortable with n-nets
– Relevant operational data more accessible
– Useful applications (expert systems) emerged
• Check out Fair Isaac (www.fairisaac.com) which has a
division here in San Diego (formerly HNC)

Neural Network
• Neural Network learns by adjusting the weights so as
to be able to correctly classify the training data and
hence, after testing phase, to classify unknown data.
• Neural Network needs long time for training.
• Neural Network has a high tolerance to noisy and
incomplete data

Neural Network Classifier
• Input: Classification data
It contains classification attribute
• Data is divided, as in any classification problem.
[Training data and Testing data]
• All data must be normalized.
(i.e. all values of attributes in the database are changed to contain values in
the internal [0,1] or[-1,1])
Neural Network can work with data in the range of (0,1) or (-1,1)
• Two basic normalization techniques
[1] Max-Min normalization
[2] Decimal Scaling normalization

Loan Prospector – HNC/Fair Isaac
• A Neural Network (Expert System) is like a black box that knows how
to process inputs to create a useful output.
• The calculation(s) are quite complex and difficult to understand

Neural Net Limitations
• Neural Nets are good for prediction and estimation
when:
– Inputs are well understood
– Output is well understood
– Experience is available for examples to use to “train” the
neural net application (expert system)
• Neural Nets are only as good as the training set used
to generate it. The resulting model is static and must
be updated with more recent examples and
retraining for it to stay relevant

Neural Network Training
• Training is the process of setting the best weights on the
edges connecting all the units in the network
• The goal is to use the training set to calculate weights where
the output of the network is as close to the desired output as
possible for as many of the examples in the training set as
possible
• Back propagation has been used since the 1980s to adjust the
weights (other methods are now available):
– Calculates the error by taking the difference between the calculated
result and the actual result
– The error is fed back through the network and the weights are
adjusted to minimize the error

27
Introduction
• Data Mining Definitions:
– Building compact and understandable models
incorporating the relationships between the
description of a situation and a result concerning
the situation.
– Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
information or patterns from data in large
databases.

28
Kinds of Data Mining Problems
• Classification / Segmentation
• Forecasting/Prediction (how much)
• Association rule extraction (market basket
analysis)
• Sequence detection

29
Data Mining Techniques:
• Neural Networks
• Decision Trees
• Multivariate Adaptive Regression Splines
(MARS)
• Rule Induction
• Nearest Neighbor Method and discriminant
analysis
• Genetic Algorithms
• Boosting

30
Neural Networks
• What are they?
– Based on early research aimed at representing the
way the human brain works
– Neural networks are composed of many
processing units called neurons
• Types (Supervised versus Unsupervised)
• Training

31
Neural Networks are great, but..
• Problem 1: The black box model!
– Solution: 1. Do we really need to know?
– Solution 2. Rule Extraction techniques
• Problem 2: Long training times
– Solution 1: Get a faster PC with lots of RAM
– Solution 2: Use faster algorithms “For example:
Quickprop”
• Problems 3-: Back propagation
– Solution: Evolutionary Neural Networks!

32
Rule Extraction Techniques
• Representation Methods
• Extraction Strategy
• Network Requirement

Neural Network Concepts
• Neural networks (NN): a brain metaphor for
information processing
• Neural computing
• Artificial neural network (ANN)
• Many uses for ANN for
– pattern recognition, forecasting, prediction, and
classification
• Many application areas
– finance, marketing, manufacturing, operations,
information systems, and so on

Biological Neural Networks
Soma
Axon
Axon
Synapse
Synapse
Dendrites
Dendrites Soma
• Two interconnected brain cells (neurons)

Processing Information in ANN
w1
w2
wn
x1
x2
xn
.
.
.
Y
Y1
Yn
Y2
Inputs Weights Outputs
.
.
.
Neuron (or PE)


n
i
iiWXS
1
)( Sf
Summation
Transfer
Function
• A single neuron (processing element – PE) with
inputs and outputs

Elements of ANN
• Processing element (PE)
• Network architecture
– Hidden layers
– Parallel processing
• Network information processing
– Inputs
– Outputs
– Connection weights
– Summation function

Neural Network Architectures
Recurrent Neural Networks

Learning in ANN
• A process by which a neural network learns the
underlying relationship between input and outputs,
or just among the inputs
• Supervised learning
– For prediction type problems
– E.g., backpropagation
• Unsupervised learning
– For clustering type problems
– Self-organizing
– E.g., adaptive resonance theory

A Taxonomy of ANN Learning
Algorithms
Learning Algorithms
Discrete/binary input Continuous Input
Surepvised Unsupervised
· Delta rule
· Gradient Descent
· Competitive learning
· Neocognitron
· Perceptor
· Simple Hopefield
· Outerproduct AM
· Hamming Net
· ART-1
· Carpenter /
Grossberg
· ART-3
· SOFM (or SOM)
· Other clustering
algorithms
Architectures
Supervised Unsupervised
Recurrent Feedforward Extimator Extractor
· Hopefield · SOFM (or SOM)· Nonlinear vs. linear
· Backpropagation
· ML perceptron
· Boltzmann
· ART-1
· ART-2
UnsupervisedSurepvised

A Supervised Learning Process
Compute
output
Is desired
output
achieved?
Stop
learning
Adjust
weights
Yes
No
ANN
Model
Three-step process:
1. Compute temporary
outputs
2. Compare outputs with
desired targets
3. Adjust the weights and
repeat the process

How a Network Learns
• Example: single neuron that learns the
inclusive OR operation
* See your book for step-by-step progression of the learning process
Learning parameters:
 Learning rate
 Momentum

Backpropagation Learning
• Backpropagation of Error for a Single Neuron
w1
w2
wn
x1
x2
xn
.
.
.
Yi
Neuron (or PE)


n
i
iiWXS
1
)( Sf
Summation
Transfer
Function
)(SfY 
a(Zi – Yi)
error

Backpropagation Learning
• The learning algorithm procedure:
1. Initialize weights with random values and set other
network parameters
2. Read in the inputs and the desired outputs
3. Compute the actual output (by working forward
through the layers)
4. Compute the error (difference between the actual and
desired output)
5. Change the weights by working backward through the
hidden layers
6. Repeat steps 2-5 until weights stabilize

Neural Network Architectures
• Architecture of a neural network is driven by the
task it is intended to address
– Classification, regression, clustering, general
optimization, association, ….
• Most popular architecture: Feedforward, multi-
layered perceptron with backpropagation learning
algorithm
– Used for both classification and regression type
problems

Other Popular ANN Paradigms
Self Organizing Maps (SOM)
• Applications of SOM
– Customer segmentation
– Bibliographic classification
– Image-browsing systems
– Medical diagnosis
– Interpretation of seismic activity
– Speech recognition
– Data compression
– Environmental modeling, many more …

Applications Types of ANN
• Classification
– Feedforward networks (MLP), radial basis function, and
probabilistic NN
• Regression
– Feedforward networks (MLP), radial basis function
• Clustering
– Adaptive Resonance Theory (ART) and SOM
• Association
– Hopfield networks
• Provide examples for each type?

Advantages of ANN
• Able to deal with (identify/model) highly
nonlinear relationships
• Not prone to restricting normality and/or
independence assumptions
• Can handle variety of problem types
• Usually provides better results (prediction and/or
clustering) compared to its statistical
counterparts
• Handles both numerical and categorical variables
(transformation needed!)

Disadvantages of ANN
• They are deemed to be black-box solutions, lacking
expandability
• It is hard to find optimal values for large number of
network parameters
– Optimal design is still an art: requires expertise and
extensive experimentation
• It is hard to handle large number of variables
(especially the rich nominal attributes)
• Training may take a long time for large datasets;
which may require case sampling

ANN Software
• Standalone ANN software tool
– NeuroSolutions
– BrainMaker
– NeuralWare
– NeuroShell, … for more (see pcai.com) …
• Part of a data mining software suit
– PASW (formerly SPSS Clementine)
– SAS Enterprise Miner
– Statistica Data Miner, … many more …

Applications-I
• Handwritten Digit Recognition
• Face recognition
• Time series prediction
• Process identification
• Process control
• Optical character recognition

Application-II
• Forecasting/Market Prediction: finance and banking
• Manufacturing: quality control, fault diagnosis
• Medicine: analysis of electrocardiogram data, RNA & DNA
sequencing, drug development without animal testing
• Control: process, robotics

Artificial Neural Networks for Data Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Artificial Neural Networks for Data Mining

Similar to Artificial Neural Networks for Data Mining (20)

More from Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU

More from Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU (20)

Recently uploaded

Recently uploaded (20)

Artificial Neural Networks for Data Mining