Dr. Irene Kafeza
NALSAR, University of Law
• Data mining is the process of automatically discovering
useful information in large data repositories.
• It helps to find novel and useful patterns in the data
• Predict the outcome of a future observation
• For example, with data mining we can predict if a
newly arrived customer will spend more than 100USD
at a department store
• Data mining is based on knowledge discovery in
Approaches in learning algorithms
• Classification: takes as input a collection of records
(instance, example) and maps each record to a
predefined class label
• Classification: Predicts a certain outcome based on a given
• Next slide shows what features define a vertebrate as a
mammal, reptile, bird, fish or amphibian
• Suppose that we are given the following
characteristics of a creature called gila
• We can use classification based on the data of
the previous slide to determine in which class
How to solve a classification problem
• Use a learning algorithm to create a model
that best fits the relationship between the
attribute set and the class label of the input
• Create a training set consisting of records
whose labels are known
• Use s test set to measure the accuracy of your
Artificial Neural Networks
• Inspired by attempts to simulate biological neural systems
• The human brain consists of nerve cells called neurons
• Neurons are linked together with other neurons via strands
of fiber called axons.
• Axons are used to transmit nerve impulses from one
neuron to another via dendrites which are extensions from
the cell body of the neuron.
• The contact point between a dendrite and an axon is called
• Neurologists have discovered that the human brain learns
by changing the strength of the synaptic connection
between neurons upon repeated stimulation by the same
• The Perceptron model
• An artificial neural network (ANN) is
composed of nodes and directed links
• The Perceptron model
• nodes are the neurons and the links represent
the strength of synaptic connection between the
• As in a biological neural system training a
perceptron model means to adapt the weights of
the links until they fit the input output
relationships of the underlying data
• In the specific example the output is
• 1 if 0.3*x1+0.3*x2+0.3*x3-0.4>0 and it is
• -1 if 0.3*x1+0.3*x2+0.3*x3-0.4<0
• The weight at the arcs is 0.3 and 0.4 is a bias
• In this example we can see how the data of the given
set are divided in two sets. The line is the decision
boundary that was decided by applying the perceptron
learning algorithm to the data set.
Perceptron learning algorithm
• The algorithm maintains a “guess” at good parameters
(weights and bias) as it runs.
• It processes one example at a time.
• For a given example, it makes a prediction.
• It checks to see if this prediction is correct (recall that this
is training data, so we have access to true labels).
• If the prediction is correct, it does nothing.
• Only when the prediction is incorrect does it change its
parameters, and it changes them in such a way that it
would do better on this example next time around.
• It then goes on to the next example. Once it hits the last
example in the training set, it loops back around for a
specified number of iterations