SlideShare a Scribd company logo
1 of 43
Download to read offline
Machine Learning and Pattern Recognition
Term Project
Lecturer: PhD. Asst. Prof. Cemal Okan ŞAKAR
Student: Yusuf Ziya UZUN
Lesson: CMP5130 - Machine Learning and Pattern Recognition (Fall 2014)
Subject: House Price Estimation as a Function Fitting Problem with using ANN Approach
Introduction
As we implemented how to use cross validation and k-Fold approaches in our datasets in first
homework, my aim here is how to use these in different machine learning algorithms, rather than how
to implement it. So in this project I want to introduce you Neural Network Toolbox that Matlab provides
us.
For this purpose I picked Artificial Neural Network (ANN) algorithm with UCI Housing dataset. As
a brief introduction, I would like to give some links below here:
- Neural Network Toolbox for Matlab: http://www.mathworks.com/products/neural-network/
- UCI Housing Dataset: https://archive.ics.uci.edu/ml/datasets/Housing
After giving a small explanation about tool I want to explain our dataset and I want to make
some experiments over dataset with using Neural Net Toolset components. I am happy to say that we
will be able to visualize many things in our solutions.
So basically our way to go is like this:
1- Dataset Information
2- Toolbox Information
3- Experiments
4- Comparisons
5- Conclusion
Also I want to mention that my main aim is giving a basic idea about NNT and using it with ANN
algorithm, so this term project only covers some topics that we are going to use for our Housing dataset.
I will code in Matlab and give some details about that code also.
1- Dataset Information
We have data of housing values with 13 properties that gives us different information about
houses and their environments. First 13 attribute are our input data (features) and 14th
data will be our
target data (output). We have 506 items and have no missing values.
1. Per capita crime rate per town
2. Proportion of residential land zoned for lots over 25,000 sq. ft.
3. proportion of non-retail business acres per town
4. 1 if tract bounds Charles river, 0 otherwise
5. Nitric oxides concentration (parts per 10 million)
6. Average number of rooms per dwelling
7. Proportion of owner-occupied units built prior to 1940
8. Weighted distances to five Boston employment centres
9. Index of accessibility to radial highways
10. Full-value property-tax rate per $10,000
11. Pupil-teacher ratio by town
12. 1000(Bk - 0.63)^2
13. Percent lower status of the population
14. Median value of owner-occupied homes in $1000's (output)
As you can see we have a regression problem and some different kind of input parameters. As a
result we need to estimate housing median price for given inputs. Neural networks are basically good fit
for non-linear problems. Because you can use some number of neurons to make a better fit in given
non-linear problem set.
2- Toolbox Information
First we start with opening NNT window by writing this command: nnstart
After you write this command, a start window must be appear. We will going to use fitting app
which I highlighted it with yellow color. You can also use nftool command to open it instantly.
Here we select our dataset, but we need to introduce each data item as a column and each
property as row in step 5 as shown. Now we need to select our cross validation set sizes in next step:
Here Training set changes by changing Validation and Test Sets. So for an introduction I leave
those defaults. Next we are going to define our network architecture:
As I showed above you can see we have two layers, one is hidden layer, and other one is linear
regression layer which gives us the output value (predicted value).
After everything we can choose the training algorithm and train our divided dataset. Also we can
train dataset multiple times with randomly picked cross validation sets. So each train will have different
results.
After we click train button, you will see a window that shows you some summary and results as
below:
Here we can see some pretty nice information about our tests such as performance, training
state, error histogram and regression plots.
And also there will be some information about error and correlation on main window:
Performance Plot Example:
Training State Plot:
Error Histogram Plot:
Regression Plot:
In next steps we can test our network again if we think it is not a good fit. We can go back to
previous steps and change data dimension or increase/decrease our network size.
After that step we also can see our algorithm visually by Simulink diagram. For this, you need to
click Simulink diagram button in Deploy Solution window.
Now, a Simulink window must be appeared and you should see this basic visualization of our
network:
Now click the down arrow in the Function fitting Neural Network box which I highlighted with
yellow color. You will see this:
When you double click Layer 1 you will get more information:
Here is also how our hidden layer visually looks like:
And our sigmoid transfer function:
Here is the Layer 2 visually:
Simulink is also gives us chance to debug our implementation step by step and lets us to see
simulation of our algorithm. Here, I only introduced the visually generated implementation of our
Neural Network algorithm.
Now let us start with some experiments and use some of these tools for getting better outputs.
3- Experiments
Here, we will inspect how changing cross validation set sizes, training algorithms, hidden layer
neuron sizes effect our accuracy. Then we will use these results for our comparisons. Last we try to
conclude some results from our outputs. We will use randomly divided data sets in each experiment
that’s reason will be explained in conclusion section.
3.1- Changing Training Size, Validation Size, Test Size
Now, let’s change our data set sizes over percentage and get some results. In these experiments
we will keep the training algorithm and neuron sizes same. Our training algorithm will be held to
Levenberg-Marquardt and our neuron size is going to be 10 as default.
3.1.1- Training Size: 50%, Validation Size: 25%, Test Size: 25%
 Training Performance: 13.1571
 Validation Performance: 16.9545
 Test Performance: 34.1554
3.1.2- Training Size: 60%, Validation Size: 20%, Test Size: 20%
 Training Performance: 4.887432
 Validation Performance: 24.091745
 Test Performance: 27.686587
3.1.3- Training Size: 80%, Validation Size: 10%, Test Size: 10%
 Training Performance: 5.254130
 Validation Performance: 6.768036
 Test Performance: 19.687765
3.2- Changing Training Algorithm
After trying different data sizes, we will change the training function to see the effect of function
on out dataset. In this case, we need to take constant data size to make comparison between functions.
So, let us take 80% for training and 10% for validation. And take the number of neurons to 10.
3.2.1- Levenberg-Marquardt (trainlm) Function
We already tried this method as default training function in data size comparisons. So, we pass
here intentionally as already done. Here are the performance results as reminder:
 Training Performance: 5.254130
 Validation Performance: 6.768036
 Test Performance: 19.687765
3.2.2- Scaled conjugate gradient back propagation (trainscg) Function
 Training Performance: 21.853845
 Validation Performance: 50.703653
 Test Performance: 20.041648
3.2.3- Adaptive Gradient Descent with Momentum back propagation (traingdx) Function
 Training Performance: 20.875673
 Validation Performance: 19.159098
 Test Performance: 10.541039
3.3- Changing Hidden Layer Neuron Size
After experimenting data sets and training functions, we are going to change the number of
neurons in our hidden layer. So we will see the relation between accuracy and neuron size. Now, let’s
pick 80% for training and 20% for validation again and also let’s pick the Adaptive Gradient Descent with
Momentum back propagation (traingdx) function for training algorithm as default parameters.
3.3.1- Selecting 5 Neurons
 Training Performance: 59.678844
 Validation Performance: 50.404263
 Test Performance: 100.167308
3.3.2- Selecting 10 Neurons
We already did this experiment in section 3.2.3 with same parameters. So this experiment
intentionally left blank. Just leaving same performance results as reminder here:
 Training Performance: 20.875673
 Validation Performance: 19.159098
 Test Performance: 10.541039
3.3.3- Selecting 15 Neurons
 Training Performance: 13.784050
 Validation Performance: 29.685621
 Test Performance: 16.404043
4- Comparisons
4.1- Comparison of Data Sizes
As you can see in below table as training set increases, our total performance value is decrease.
Performance is in the best point when it reaches to zero. So, this means performance is increasing as
long as its value decreases to zero. As we know from cross validation techniques more training data
makes our algorithm more accurate, but it also may cause overfitting problems. Therefore, we should
keep the size of validation set in necessary proportion.
For performance calculations we always used the Mean Squared Errors (MSE) to calculate the
cost function:
In first data set division we see that our training and validation sets are OK, but in test set we
have big number of performance difference. That’s because we couldn’t give enough proportion of data
to our function to train our network.
Also in the second row, we have pretty nice increased training set performance but our
validation and test sets are still away from the training accuracy. So this is still a problem for accuracy of
testing.
At third, that the performance of training and validation sets are very close each other and also
test set performance increased well. By looking at training and validation sets, we can say that our
algorithm learned well with these proportions of data. So its effect of test is obviously positive.
We can also check the regression plots of each data divisions and see how it is fitted to target.
We see that best fit is in 80-10-10 data division.
Data Sets / Performance Training Validation Test Total Perf.
50 – 25 – 25 13.1571 16.9545 34.1554 19.356025
60 – 20 – 20 4.887432 24.091745 27.686587 13.2881256
80 – 10 – 10 5.254130 6.768036 19.687765 6.8489
4.2- Comparison of Training Functions
Levenberg-Marquardt algorithm (LMA) interpolates between the Gaussian-Newton Algorithm
and the method of Gradient Descent (GD). Generally LMA is much faster than GD, because it converges
faster than GD algorithmically. LM algorithm achieves lower precision in terms of predictive
performance when compared with GD algorithms. An interesting observation is that LMA with the lower
MSE value for the training set does not result in better precision of test set prediction as compared with
Adaptive GD.
Gradient descent algorithm converges slowly by design. For this purpose we added to it
momentum effect, so it reduces the risk of getting stuck in a local minimum, converges faster with less
zig-zag in cost function. Also we added it to online learning approach to make its learning rate fits
better.
In above table in training and validation sets LMA is most performed than other algorithms by
far. But also it is seen that LMA test performance worse than Adaptive GD with momentum method. So
LMA is looks like outperformed in total but has less precision than GD method. It means that GD has less
false positives. Thus, GD method got better accuracy in test dataset.
At the other hand, we see that Scaled Conjugate Gradient (SCG) method has not good
performance results. This is because of our validation checks are same for each algorithm. I intentionally
left number of failed iterations in a row to 6 as defined default. You can also try with bigger numbers by
setting its value (net.trainParam.max_fail) and you will see it is performing well.
Function / Performance Training Validation Test Total Perf.
trainlm 5.254130 6.768036 19.687765 6.8489
trainscg 21.853845 50.703653 20.041648 24.5576
traingdx 20.875673 19.159098 10.541039 19.6706
4.3- Comparison of Hidden Layer Neuron Size
Finding a good number of hidden layer neurons is one of important ANN problem. Small number
of neurons might give you faster results but bad accuracies. On the other hand increasing number of
neurons can give you better accuracy but more time and space complexity. Bigger number of neurons
also cause to complexity of algorithms. Small number of neurons might be responsible of underfitting,
but more neurons than necessity is reason to overfitting also. So, here we have to find an optimal value
of number of neurons in the hidden layer. While we decide the optimum value we have to balance the
tradeoff carefully.
As we can see in performance table, 5 neurons for this dataset are quite less. Training and
validation sets pretty bad with compared to other neuron sizes. Obviously, there is a lack of sending
information over network. Because our Adaptive GD algorithm trying to minimize our cost function, but
our weight parameters cannot be able to carry bigger mass values. This is causes underfitting problem.
When we compared 10 and 15 neurons performances, we see that 15 neurons is getting better
results in training set, but not in test set. And also we see that training and validation performances for
15 neurons are very different. Its reason is, too many weight parameters causing to the overtraining. We
have 13 input parameters in our dataset but we defined 15 neurons in hidden layer.
Neuron Size / Performance Training Validation Test Total Perf.
5 Neurons 59.678844 50.404263 100.167308 62.8002
10 Neurons 20.875673 19.159098 10.541039 19.6706
15 Neurons 13.784050 29.685621 16.404043 15.6362
5- Conclusion
As we seen in the visualized plots of different parameters and algorithms, there is no best
choice. And also we can say that different dataset divisions may cause totally different results (rarely).
For sure, we can change a lot of parameters in these algorithms and try to cross validate all of them. But
for the sake of this project, we only analyzed most of them with default parameters. As you know,
project experiments and results tightly coupled with given dataset. So, we can simply remember the
there is no free lunch theorem.
In most of neural network linear fitting problems we have to resolve some cross validation
problems for getting good fitting results whether it is a simple or complex dataset. Some of them are:
 Hidden Layer Neuron size
 Good Number of iterations for preventing the underfitting and overfitting
 Time, Space, Accuracy tradeoff
 Algorithm based predetermined values (learning rate, bias values, etc.)
For our house price estimation dataset we tried different data sizes, neuron sizes, algorithms.
Instead of taking one predefined data divisions, we used randomly divided different data divisions for
every experiment. So, now we are able to conclude different results from each experiment, and we saw
that it is not affecting the predicted results as much. We tried to find the best fit for our target results.
As far as we made experiment, we inspected that algorithms have different accuracies in training,
validation and test sets.
It’s seen that more training data size gives us more accuracy, but it also needs to be divided into
good proportions. Otherwise, it will lead to over training problem.
Another important criteria that we see is hidden layer’s neuron size. Giving less number of
neurons definitely shows us the dramatically decrease of our performance, means that neurons cannot
be able to carry enough value to generate better results.
From training functions point of view, we can easily say one can perform better at something
and other is at another. For ex, Adaptive GD is good at test set performance but very slow compared to
MLA. There are some tradeoffs (like time-accuracy) in choice of algorithms.
For this dataset, I would go with 10 hidden layer neurons, 80% training set, 10% validation set,
10% test set, and Adaptive GD with momentum method. Because of the dataset is not so big, I would
pick accuracy rather than timing.
Of course, it would be very good to make other experiments with different parameters or
before applying the ANN we could try some dimensionality reduction methods (like PCA, LCA or some
feature selection). Also we could have tried different resampling methodologies like one in one out,
bootstrapping. These all would give us very good information. For now, we are be aware of these
methods, but unfortunately not be able to accomplish.
References:
 https://archive.ics.uci.edu/ml/datasets/Housing
 http://www.mathworks.com/help/nnet/examples/house-price-estimation.html
 http://www.mathworks.com/help/nnet/gs/fit-data-with-a-neural-network.html
 http://www.mathworks.com/help/nnet/ug/choose-a-multilayer-neural-network-training-
function.html
 http://www.mathworks.com/help/nnet/ref/traingdx.html
 http://www.mathworks.com/help/nnet/ref/trainlm.html
 http://www.mathworks.com/help/nnet/ref/trainscg.html
 http://radio.feld.cvut.cz/matlab/toolbox/nnet/trainlm.html
 http://radio.feld.cvut.cz/matlab/toolbox/nnet/traingdx.html
 http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
 http://en.wikipedia.org/wiki/Gradient_descent
 http://alumni.cs.ucr.edu/~vladimir/cs171/nn_summary.pdf
 http://aix1.uottawa.ca/~isoltani/ANN.pdf

More Related Content

What's hot

Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMLAI2
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANNwaseem khan
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Text cnn on acme ugc moderation
Text cnn on acme ugc moderationText cnn on acme ugc moderation
Text cnn on acme ugc moderationMarsan Ma
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
IRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET Journal
 
Face Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchFace Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchEduard Tyantov
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madalineNagarajan
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Neural Networks
Neural NetworksNeural Networks
Neural Networksmailund
 
NEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMS
NEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMSNEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMS
NEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMSESCOM
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summaryankit_ppt
 
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machineShital Andhale
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET Journal
 
Iterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue ProblemsIterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue Problemsijceronline
 
House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sineijcsa
 

What's hot (20)

Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Text cnn on acme ugc moderation
Text cnn on acme ugc moderationText cnn on acme ugc moderation
Text cnn on acme ugc moderation
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
IRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog Images
 
Face Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchFace Recognition: From Scratch To Hatch
Face Recognition: From Scratch To Hatch
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
IEEE
IEEEIEEE
IEEE
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
NEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMS
NEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMSNEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMS
NEURAL NETWORK Widrow-Hoff Learning Adaline Hagan LMS
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machine
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
 
Iterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue ProblemsIterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue Problems
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Lectura seis
Lectura seisLectura seis
Lectura seis
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
 

Similar to House Price Estimation as a Function Fitting Problem with using ANN Approach

Final Report
Final ReportFinal Report
Final ReportAman Soni
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET Journal
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine LearningGaurav Bhalotia
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013Rajib Layek
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataIRJET Journal
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson ChallengeRaouf KESKES
 
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaComparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaIRJET Journal
 
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...IRJET- Automatic Detection of Characteristics of Clothing using Image Process...
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...IRJET Journal
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction modelsMuthu Kumaar Thangavelu
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction modelsMuthu Kumaar Thangavelu
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 

Similar to House Price Estimation as a Function Fitting Problem with using ANN Approach (20)

cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Final Report
Final ReportFinal Report
Final Report
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
 
Deep learning-practical
Deep learning-practicalDeep learning-practical
Deep learning-practical
 
Artificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron'sArtificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron's
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaComparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
 
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...IRJET- Automatic Detection of Characteristics of Clothing using Image Process...
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
N ns 1
N ns 1N ns 1
N ns 1
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 

House Price Estimation as a Function Fitting Problem with using ANN Approach

  • 1. Machine Learning and Pattern Recognition Term Project Lecturer: PhD. Asst. Prof. Cemal Okan ŞAKAR Student: Yusuf Ziya UZUN Lesson: CMP5130 - Machine Learning and Pattern Recognition (Fall 2014) Subject: House Price Estimation as a Function Fitting Problem with using ANN Approach Introduction As we implemented how to use cross validation and k-Fold approaches in our datasets in first homework, my aim here is how to use these in different machine learning algorithms, rather than how to implement it. So in this project I want to introduce you Neural Network Toolbox that Matlab provides us. For this purpose I picked Artificial Neural Network (ANN) algorithm with UCI Housing dataset. As a brief introduction, I would like to give some links below here: - Neural Network Toolbox for Matlab: http://www.mathworks.com/products/neural-network/ - UCI Housing Dataset: https://archive.ics.uci.edu/ml/datasets/Housing After giving a small explanation about tool I want to explain our dataset and I want to make some experiments over dataset with using Neural Net Toolset components. I am happy to say that we will be able to visualize many things in our solutions. So basically our way to go is like this: 1- Dataset Information 2- Toolbox Information 3- Experiments 4- Comparisons
  • 2. 5- Conclusion Also I want to mention that my main aim is giving a basic idea about NNT and using it with ANN algorithm, so this term project only covers some topics that we are going to use for our Housing dataset. I will code in Matlab and give some details about that code also. 1- Dataset Information We have data of housing values with 13 properties that gives us different information about houses and their environments. First 13 attribute are our input data (features) and 14th data will be our target data (output). We have 506 items and have no missing values. 1. Per capita crime rate per town 2. Proportion of residential land zoned for lots over 25,000 sq. ft. 3. proportion of non-retail business acres per town 4. 1 if tract bounds Charles river, 0 otherwise 5. Nitric oxides concentration (parts per 10 million) 6. Average number of rooms per dwelling 7. Proportion of owner-occupied units built prior to 1940 8. Weighted distances to five Boston employment centres 9. Index of accessibility to radial highways 10. Full-value property-tax rate per $10,000 11. Pupil-teacher ratio by town 12. 1000(Bk - 0.63)^2 13. Percent lower status of the population 14. Median value of owner-occupied homes in $1000's (output) As you can see we have a regression problem and some different kind of input parameters. As a result we need to estimate housing median price for given inputs. Neural networks are basically good fit for non-linear problems. Because you can use some number of neurons to make a better fit in given non-linear problem set.
  • 3. 2- Toolbox Information First we start with opening NNT window by writing this command: nnstart After you write this command, a start window must be appear. We will going to use fitting app which I highlighted it with yellow color. You can also use nftool command to open it instantly.
  • 4. Here we select our dataset, but we need to introduce each data item as a column and each property as row in step 5 as shown. Now we need to select our cross validation set sizes in next step:
  • 5. Here Training set changes by changing Validation and Test Sets. So for an introduction I leave those defaults. Next we are going to define our network architecture: As I showed above you can see we have two layers, one is hidden layer, and other one is linear regression layer which gives us the output value (predicted value). After everything we can choose the training algorithm and train our divided dataset. Also we can train dataset multiple times with randomly picked cross validation sets. So each train will have different results.
  • 6. After we click train button, you will see a window that shows you some summary and results as below: Here we can see some pretty nice information about our tests such as performance, training state, error histogram and regression plots.
  • 7. And also there will be some information about error and correlation on main window: Performance Plot Example:
  • 8. Training State Plot: Error Histogram Plot:
  • 9. Regression Plot: In next steps we can test our network again if we think it is not a good fit. We can go back to previous steps and change data dimension or increase/decrease our network size.
  • 10. After that step we also can see our algorithm visually by Simulink diagram. For this, you need to click Simulink diagram button in Deploy Solution window. Now, a Simulink window must be appeared and you should see this basic visualization of our network:
  • 11. Now click the down arrow in the Function fitting Neural Network box which I highlighted with yellow color. You will see this: When you double click Layer 1 you will get more information:
  • 12. Here is also how our hidden layer visually looks like:
  • 13. And our sigmoid transfer function: Here is the Layer 2 visually: Simulink is also gives us chance to debug our implementation step by step and lets us to see simulation of our algorithm. Here, I only introduced the visually generated implementation of our Neural Network algorithm. Now let us start with some experiments and use some of these tools for getting better outputs.
  • 14. 3- Experiments Here, we will inspect how changing cross validation set sizes, training algorithms, hidden layer neuron sizes effect our accuracy. Then we will use these results for our comparisons. Last we try to conclude some results from our outputs. We will use randomly divided data sets in each experiment that’s reason will be explained in conclusion section. 3.1- Changing Training Size, Validation Size, Test Size Now, let’s change our data set sizes over percentage and get some results. In these experiments we will keep the training algorithm and neuron sizes same. Our training algorithm will be held to Levenberg-Marquardt and our neuron size is going to be 10 as default. 3.1.1- Training Size: 50%, Validation Size: 25%, Test Size: 25%  Training Performance: 13.1571  Validation Performance: 16.9545  Test Performance: 34.1554
  • 15.
  • 16.
  • 17.
  • 18. 3.1.2- Training Size: 60%, Validation Size: 20%, Test Size: 20%  Training Performance: 4.887432  Validation Performance: 24.091745  Test Performance: 27.686587
  • 19.
  • 20.
  • 21. 3.1.3- Training Size: 80%, Validation Size: 10%, Test Size: 10%  Training Performance: 5.254130  Validation Performance: 6.768036  Test Performance: 19.687765
  • 22.
  • 23.
  • 24. 3.2- Changing Training Algorithm After trying different data sizes, we will change the training function to see the effect of function on out dataset. In this case, we need to take constant data size to make comparison between functions. So, let us take 80% for training and 10% for validation. And take the number of neurons to 10. 3.2.1- Levenberg-Marquardt (trainlm) Function We already tried this method as default training function in data size comparisons. So, we pass here intentionally as already done. Here are the performance results as reminder:  Training Performance: 5.254130  Validation Performance: 6.768036  Test Performance: 19.687765 3.2.2- Scaled conjugate gradient back propagation (trainscg) Function  Training Performance: 21.853845  Validation Performance: 50.703653  Test Performance: 20.041648
  • 25.
  • 26.
  • 27. 3.2.3- Adaptive Gradient Descent with Momentum back propagation (traingdx) Function  Training Performance: 20.875673
  • 28.  Validation Performance: 19.159098  Test Performance: 10.541039
  • 29.
  • 30.
  • 31. 3.3- Changing Hidden Layer Neuron Size After experimenting data sets and training functions, we are going to change the number of neurons in our hidden layer. So we will see the relation between accuracy and neuron size. Now, let’s pick 80% for training and 20% for validation again and also let’s pick the Adaptive Gradient Descent with Momentum back propagation (traingdx) function for training algorithm as default parameters. 3.3.1- Selecting 5 Neurons  Training Performance: 59.678844  Validation Performance: 50.404263  Test Performance: 100.167308
  • 32.
  • 33.
  • 34.
  • 35. 3.3.2- Selecting 10 Neurons We already did this experiment in section 3.2.3 with same parameters. So this experiment intentionally left blank. Just leaving same performance results as reminder here:  Training Performance: 20.875673  Validation Performance: 19.159098  Test Performance: 10.541039 3.3.3- Selecting 15 Neurons  Training Performance: 13.784050  Validation Performance: 29.685621  Test Performance: 16.404043
  • 36.
  • 37.
  • 38.
  • 39. 4- Comparisons 4.1- Comparison of Data Sizes As you can see in below table as training set increases, our total performance value is decrease. Performance is in the best point when it reaches to zero. So, this means performance is increasing as long as its value decreases to zero. As we know from cross validation techniques more training data makes our algorithm more accurate, but it also may cause overfitting problems. Therefore, we should keep the size of validation set in necessary proportion. For performance calculations we always used the Mean Squared Errors (MSE) to calculate the cost function: In first data set division we see that our training and validation sets are OK, but in test set we have big number of performance difference. That’s because we couldn’t give enough proportion of data to our function to train our network. Also in the second row, we have pretty nice increased training set performance but our validation and test sets are still away from the training accuracy. So this is still a problem for accuracy of testing. At third, that the performance of training and validation sets are very close each other and also test set performance increased well. By looking at training and validation sets, we can say that our algorithm learned well with these proportions of data. So its effect of test is obviously positive. We can also check the regression plots of each data divisions and see how it is fitted to target. We see that best fit is in 80-10-10 data division. Data Sets / Performance Training Validation Test Total Perf. 50 – 25 – 25 13.1571 16.9545 34.1554 19.356025 60 – 20 – 20 4.887432 24.091745 27.686587 13.2881256 80 – 10 – 10 5.254130 6.768036 19.687765 6.8489
  • 40. 4.2- Comparison of Training Functions Levenberg-Marquardt algorithm (LMA) interpolates between the Gaussian-Newton Algorithm and the method of Gradient Descent (GD). Generally LMA is much faster than GD, because it converges faster than GD algorithmically. LM algorithm achieves lower precision in terms of predictive performance when compared with GD algorithms. An interesting observation is that LMA with the lower MSE value for the training set does not result in better precision of test set prediction as compared with Adaptive GD. Gradient descent algorithm converges slowly by design. For this purpose we added to it momentum effect, so it reduces the risk of getting stuck in a local minimum, converges faster with less zig-zag in cost function. Also we added it to online learning approach to make its learning rate fits better. In above table in training and validation sets LMA is most performed than other algorithms by far. But also it is seen that LMA test performance worse than Adaptive GD with momentum method. So LMA is looks like outperformed in total but has less precision than GD method. It means that GD has less false positives. Thus, GD method got better accuracy in test dataset. At the other hand, we see that Scaled Conjugate Gradient (SCG) method has not good performance results. This is because of our validation checks are same for each algorithm. I intentionally left number of failed iterations in a row to 6 as defined default. You can also try with bigger numbers by setting its value (net.trainParam.max_fail) and you will see it is performing well. Function / Performance Training Validation Test Total Perf. trainlm 5.254130 6.768036 19.687765 6.8489 trainscg 21.853845 50.703653 20.041648 24.5576 traingdx 20.875673 19.159098 10.541039 19.6706
  • 41. 4.3- Comparison of Hidden Layer Neuron Size Finding a good number of hidden layer neurons is one of important ANN problem. Small number of neurons might give you faster results but bad accuracies. On the other hand increasing number of neurons can give you better accuracy but more time and space complexity. Bigger number of neurons also cause to complexity of algorithms. Small number of neurons might be responsible of underfitting, but more neurons than necessity is reason to overfitting also. So, here we have to find an optimal value of number of neurons in the hidden layer. While we decide the optimum value we have to balance the tradeoff carefully. As we can see in performance table, 5 neurons for this dataset are quite less. Training and validation sets pretty bad with compared to other neuron sizes. Obviously, there is a lack of sending information over network. Because our Adaptive GD algorithm trying to minimize our cost function, but our weight parameters cannot be able to carry bigger mass values. This is causes underfitting problem. When we compared 10 and 15 neurons performances, we see that 15 neurons is getting better results in training set, but not in test set. And also we see that training and validation performances for 15 neurons are very different. Its reason is, too many weight parameters causing to the overtraining. We have 13 input parameters in our dataset but we defined 15 neurons in hidden layer. Neuron Size / Performance Training Validation Test Total Perf. 5 Neurons 59.678844 50.404263 100.167308 62.8002 10 Neurons 20.875673 19.159098 10.541039 19.6706 15 Neurons 13.784050 29.685621 16.404043 15.6362
  • 42. 5- Conclusion As we seen in the visualized plots of different parameters and algorithms, there is no best choice. And also we can say that different dataset divisions may cause totally different results (rarely). For sure, we can change a lot of parameters in these algorithms and try to cross validate all of them. But for the sake of this project, we only analyzed most of them with default parameters. As you know, project experiments and results tightly coupled with given dataset. So, we can simply remember the there is no free lunch theorem. In most of neural network linear fitting problems we have to resolve some cross validation problems for getting good fitting results whether it is a simple or complex dataset. Some of them are:  Hidden Layer Neuron size  Good Number of iterations for preventing the underfitting and overfitting  Time, Space, Accuracy tradeoff  Algorithm based predetermined values (learning rate, bias values, etc.) For our house price estimation dataset we tried different data sizes, neuron sizes, algorithms. Instead of taking one predefined data divisions, we used randomly divided different data divisions for every experiment. So, now we are able to conclude different results from each experiment, and we saw that it is not affecting the predicted results as much. We tried to find the best fit for our target results. As far as we made experiment, we inspected that algorithms have different accuracies in training, validation and test sets. It’s seen that more training data size gives us more accuracy, but it also needs to be divided into good proportions. Otherwise, it will lead to over training problem. Another important criteria that we see is hidden layer’s neuron size. Giving less number of neurons definitely shows us the dramatically decrease of our performance, means that neurons cannot be able to carry enough value to generate better results. From training functions point of view, we can easily say one can perform better at something and other is at another. For ex, Adaptive GD is good at test set performance but very slow compared to MLA. There are some tradeoffs (like time-accuracy) in choice of algorithms.
  • 43. For this dataset, I would go with 10 hidden layer neurons, 80% training set, 10% validation set, 10% test set, and Adaptive GD with momentum method. Because of the dataset is not so big, I would pick accuracy rather than timing. Of course, it would be very good to make other experiments with different parameters or before applying the ANN we could try some dimensionality reduction methods (like PCA, LCA or some feature selection). Also we could have tried different resampling methodologies like one in one out, bootstrapping. These all would give us very good information. For now, we are be aware of these methods, but unfortunately not be able to accomplish. References:  https://archive.ics.uci.edu/ml/datasets/Housing  http://www.mathworks.com/help/nnet/examples/house-price-estimation.html  http://www.mathworks.com/help/nnet/gs/fit-data-with-a-neural-network.html  http://www.mathworks.com/help/nnet/ug/choose-a-multilayer-neural-network-training- function.html  http://www.mathworks.com/help/nnet/ref/traingdx.html  http://www.mathworks.com/help/nnet/ref/trainlm.html  http://www.mathworks.com/help/nnet/ref/trainscg.html  http://radio.feld.cvut.cz/matlab/toolbox/nnet/trainlm.html  http://radio.feld.cvut.cz/matlab/toolbox/nnet/traingdx.html  http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm  http://en.wikipedia.org/wiki/Gradient_descent  http://alumni.cs.ucr.edu/~vladimir/cs171/nn_summary.pdf  http://aix1.uottawa.ca/~isoltani/ANN.pdf