SlideShare a Scribd company logo
Predicting rainfall using ensemble of Ensembles.∗†
Prolok Sundaresan, Varad Meru, and Prateek Jain‡
University of California, Irvine
{sunderap,vmeru,prateekj}@uci.edu
Abstract
Regression is an approach for modeling the relationship between data X
and the dependent variable y. In this report, we present our experiments
with multiple approaches, ranging from Ensemble of Learning to Deep
Learning Networks on the weather modeling data to predict the rainfall.
The competition was held on the online data science competition portal
‘Kaggle’. The results for weighted ensemble of learners gave us a top-10
ranking, with the testing root-mean-squared error being 0.5878.
1 Introduction
The task of this in-class Kaggle competition was to predict the amount of rainfall
at a particular location using satellite data. We wanted to try various algorithms
and ensembles for regression to experiment and learn. The report is structured
in the following manner. The section 2 describes the dataset contents and the
latent structure found using latent variable analysis and clustering. This was
done by Prolok and Prateek. The section 3 describes various models used in
the project in detail. The Neural Network/Deep Learning section was done
by Varad. Random Forests was done by Prolok and Prateek. The work on
Gradient Boosting was done by Prateek and Varad. The section 4 described
the ensemble of ensembles technique used by us. The ensemble sits on top of
different ensembles and learners which were done in section 3. The work on the
final ensemble was done by all the three members. The section 5 presents our
learning and conclusion.
2 Understanding The Data
Visualizing the data was a difficult task since the data was in 91 dimensions.
In order to look for patterns in the data and visualized it, we applied SVD
technique to reduce the dimensionality of the features to 2 principle dimensions.
Then we applied k means clustering with k=5 on the data with 91 dimensions
∗The online competition is available at the Kaggle website https://inclass.kaggle.com/
c/how-s-the-weather. The name of the team was skynet
†This work was does as a part of the project for CS 273: Machine Learning, Fall 2014,
taught by Prof. Alexander Ihler.
‡Prolok Sundaresan: Student# 66008474, Varad Meru: Student# 26648958, Prateek Jain:
Student# 28321844
1
and plotted the assignments in the 2 dimensional transformed feature space.
We saw patterns in the data. Especially some points were densely clustered and
some were sparse.
To visualize it better, we transformed the feature in 3 dimensional space,
with the first 3 principle components, and saw that the points were clustered
around 3 planes.
Figure 1: Visualizing the data in 3 dimensions
3 Machine Learning Models
3.1 Mixture of Experts
As seen from our visualization in Figure 1, we could identify two highly dense
areas of the feature data on either side of a region of sparsely distributed data.
The idea behind using the mixture of experts approach was, that intuitively, it
would be difficult for a single regressor to fit the dataset, since the distribution
is non-uniform. We decided to split the data into clusters. To cluster the data,
we used several initialization of the k means algorithm with the kmeans++. We
used number of clusters as one of the parameters of our model which we tried
to change.
Since each of the clusters got a subset of a points from the original dataset,
number of data points per cluster was not a very large number. Our concern
with this was that any model we chose would overfit the data in its cluster.
Therefore, we used the ensemble method of gradient boosting for each of the
clusters. Since, in gradient boosting, we start with an underfitting model and
2
(a) Cluster assignments of Data Points
(b) Mixture of Experts Error
Figure 2: Visualizing the principle components of Data
3
then gradually add complexity, the chances of overfitting would be less in this
model. We decided to use Decision stumps as our regressors for the boosting
algorithm.
For evaluating the prediction for the validation split and the test data, we
first check which cluster the data point belongs to. We did this, by creating a K
nearest neighbor classifier on the center of the 3 clusters created in the previous
step. Then, the classifier predicts the cluster assignment for each test point,
and we use the array of boosting regressors corresponding to that cluster on the
data point, to get its corresponding prediction.
The parameters of the model we modified were the number of clusters and
the number of regressors used for boosting. We found that though the test error
reduced considerably on increasing the regressors for boosting, the validation
error increased after a certain point as can be seen from Figure 3. We got
minimum validation error for 700 regressors.
3.2 Neural Networks
We implemented various types of neural networks, ranging from single layer
networks to 3-layer sigmoidal neural networks.
Single Layer Network
Figure 3: Single Layer Architecture.
We build the neural network using the MATLAB’s Neural-Network-Toolkit
and PyBrain library implemented in Python. For the MATLAB implementa-
tion, there were various runs made for different number of neurons in the hidden
layer. The architecture of the neural network can be seen in Figure 3. The Fig-
ure 4 show the train-test-validation plots for different network architectures.
The dataset was distributed into 70% (Training), 20% (Validation) and 10%
(Testing) section for the neural network to run. The subsection 3.4 shows the
performance of the models learned. It was seen that the neural networks started
to overfit as the number of neurons were increased more than 40.
# of Neurons Training Error (RMSE) Testing Error (RMSE)
10 0.5986 0.61341
20 0.5875 0.61301
50 0.5852 0.62889
Table 1: RMSE Error rates for different network architectures.
It was observed that the learner could not learn very accurately as the data
a lot as the data was not much for the neural network to learn on.
4
(a) Train-Validation-Test error plot for 10
neuron hidden layer
(b) Error distribution histogram for 10 neuron
hidden layer
(c) Train-Validation-Test error plot for 20
neuron hidden layer
(d) Error distribution histogram for 20 neuron
hidden layer
(e) Train-Validation-Test error plot for 50
neuron hidden layer
(f) Error distribution histogram for 50 neuron
hidden layer
Figure 4: Plots of various Train-Validation-Test error for number of neurons =
[10, 20, 50]
5
Deep Networks
For this project, we tried using deep networks as well. The deep network was
made using PyBrain. We tried using different activation functions and archi-
tectures to understand how deep networks would work. The architecture shown
in Figure 5 had 3 layers - visible later contains 91 neurons, the first hidden
layer (tanh) had 91 neurons, the second hidden layer (sigmoid) had 50 neu-
rons, the third hidden layer (sigmoid) had 20 neurons, and the output layer
had 1 linear node. The testing error was 0.83643 was very high compared to
other approaches. We concluded that the network was learning the data well,
but was overfitting.
Input
layer
Hidden layer
(Hyperbolic
Tangent)
Hidden
layer(Sigmoid)
y1
y2
y3
Output
layer
3.3 Gradient Boosting
In parallel, we worked on training the gradient boosting model with varying
parameters to get the best fit for the data. We started with basic decision
stumps with number of regressors ranging from 1 to 2000. We also varied the
maximum Depth for the decision tree used as the regression model from 3 to 7.
We used alpha 0.9 for our algorithm. We observed that we got best performance
with 2000 boosters and depth as 7.
3.4 Random Forests
Several aspects of Random Forest technique was explored. The major funda-
mental behind Random Forest is to take a model, that overfits, the data, then
use feature and data bagging to bring down the complexity to fit the data bet-
ter. The usual model that is used in Random Forest is a high depth Regression
tree. We tried to explore other models, that overfitted the data.
The first option was to consider simple linear regression with feature trans-
formation. The data from X1 was transformed into X1 and X12
features and
6
Figure 5: Train and Test error plot for Gradient Boosting vs number of learners
linear regression was done on that. Significantly better results were obtained in
this transformation( a test error of 0.4322 compared to 0.4181) , but it signifi-
cantly worsened with an addition of X13
features to the feature list. This was
used as the regressor for the Random Forests, but the results were better for
a Tree Regressor. The major take away from this analysis was the use of X22
features into the feature list for tree regression. Several other regressors were
also tried like knn regressor was used, but tree regressor came out on top.
Since Decision Tree regression was significantly better than linear regression
in Random Forest, we decided to proceed with that with the X22
features also
in place(a total of 182 features). nFeatures was chosen as 150, and the depth was
set as 13,14,15,16,17, of which a maxDepth of 14 obtained optimal performance.
150 decision trees were learned and the optimum results were obtained for 90
learners.
Learner Training Error (MSE) Testing Error (MSE)
Linear Regressor 0.4068 0.4243
Linear Regressor with X12
feature 0.3996 0.4140
Tree Regressor 0.1951 0.3822
Table 2: MSE Error rates for Random Forests
4 Ensemble of all Learners
At the end, since we trained a lot of learners separately, some of which were
ensembles themselves, we thought of aggregating the results of the learners
to improve our prediction.We also analyzed the variance between the results
of our learners, and an average variance of 0.0204 was obtained. Since the
7
variance was noticeable, a weighted average aggregation of the results seemed
the best approach. We chose the model parameters for the best performing
models from each category to get a consolidated result. The section 4 shows
the architecture of our ensember. Initially, we chose a very simple approach of
assigning all models with the same weights to get a prediction. We got a some
improvement with MSE of 0.5908. We, saw that this was performing just below
our best individual prediction model. So, we decided to bump the weight of our
best learner in the ensemble. This helped improve our accumulated prediction,
providing an MSE of 0.5878.
Figure 6: Ensemble of Learners
5 Conclusion
This project gave a us glimpse on how machine learning techniques are applied
to real world problems. We applied a variety of techniques including neural
networks, decision trees, random forests, gradient boosting, kmeans clustering,
and PCA. Testing out various parameters of the different learner types helped us
identify where each of the models under-fitted and over-fitted the data. Finally,
while modifying the parameters of each model helped us reduce the variance in
the models, we used a final weighted ensemble of various learners to reduce the
bias of individual learners.
8

More Related Content

What's hot

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
Stefanie Zhao
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
Milind Bhandarkar
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
Ted Dunning
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
York University
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed Storage
Nilesh Salpe
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Paper
sameiralk
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
Victoria López
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleDataWorks Summit
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
Ted Dunning
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
Spark Summit
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databases
Tilak Patidar
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
Rachel Warren
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
Sri Ambati
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
Bikash Chandra Karmokar
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
Spark Summit
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkSupriya .
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Jetlore
 

What's hot (20)

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Spark and shark
Spark and sharkSpark and shark
Spark and shark
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed Storage
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Paper
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databases
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 

Similar to Predicting rainfall using ensemble of ensembles

Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
Shaleen Kumar Gupta
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
IJDKP
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
Armando Vieira
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
cscpconf
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification
IJCSEA Journal
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
F017533540
F017533540F017533540
F017533540
IOSR Journals
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Scientific Review SR
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Scientific Review
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classificationComparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
Alexander Decker
 
Novel algorithms for Knowledge discovery from neural networks in Classificat...
Novel algorithms for  Knowledge discovery from neural networks in Classificat...Novel algorithms for  Knowledge discovery from neural networks in Classificat...
Novel algorithms for Knowledge discovery from neural networks in Classificat...
Dr.(Mrs).Gethsiyal Augasta
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
IRJET Journal
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
ijcsa
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
IRJET Journal
 
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...Alexander Decker
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
ssuser31398b
 

Similar to Predicting rainfall using ensemble of ensembles (20)

Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
debatrim_report (1)
debatrim_report (1)debatrim_report (1)
debatrim_report (1)
 
F017533540
F017533540F017533540
F017533540
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classificationComparison of hybrid pso sa algorithm and genetic algorithm for classification
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
 
Novel algorithms for Knowledge discovery from neural networks in Classificat...
Novel algorithms for  Knowledge discovery from neural networks in Classificat...Novel algorithms for  Knowledge discovery from neural networks in Classificat...
Novel algorithms for Knowledge discovery from neural networks in Classificat...
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
 
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
 

More from Varad Meru

Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep Learning
Varad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Varad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Varad Meru
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
Varad Meru
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
Varad Meru
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
Varad Meru
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An Overview
Varad Meru
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Varad Meru
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
Varad Meru
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
Varad Meru
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Varad Meru
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Varad Meru
 
Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...
Varad Meru
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
Varad Meru
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducation
Varad Meru
 

More from Varad Meru (16)

Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep Learning
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An Overview
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducation
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Predicting rainfall using ensemble of ensembles

  • 1. Predicting rainfall using ensemble of Ensembles.∗† Prolok Sundaresan, Varad Meru, and Prateek Jain‡ University of California, Irvine {sunderap,vmeru,prateekj}@uci.edu Abstract Regression is an approach for modeling the relationship between data X and the dependent variable y. In this report, we present our experiments with multiple approaches, ranging from Ensemble of Learning to Deep Learning Networks on the weather modeling data to predict the rainfall. The competition was held on the online data science competition portal ‘Kaggle’. The results for weighted ensemble of learners gave us a top-10 ranking, with the testing root-mean-squared error being 0.5878. 1 Introduction The task of this in-class Kaggle competition was to predict the amount of rainfall at a particular location using satellite data. We wanted to try various algorithms and ensembles for regression to experiment and learn. The report is structured in the following manner. The section 2 describes the dataset contents and the latent structure found using latent variable analysis and clustering. This was done by Prolok and Prateek. The section 3 describes various models used in the project in detail. The Neural Network/Deep Learning section was done by Varad. Random Forests was done by Prolok and Prateek. The work on Gradient Boosting was done by Prateek and Varad. The section 4 described the ensemble of ensembles technique used by us. The ensemble sits on top of different ensembles and learners which were done in section 3. The work on the final ensemble was done by all the three members. The section 5 presents our learning and conclusion. 2 Understanding The Data Visualizing the data was a difficult task since the data was in 91 dimensions. In order to look for patterns in the data and visualized it, we applied SVD technique to reduce the dimensionality of the features to 2 principle dimensions. Then we applied k means clustering with k=5 on the data with 91 dimensions ∗The online competition is available at the Kaggle website https://inclass.kaggle.com/ c/how-s-the-weather. The name of the team was skynet †This work was does as a part of the project for CS 273: Machine Learning, Fall 2014, taught by Prof. Alexander Ihler. ‡Prolok Sundaresan: Student# 66008474, Varad Meru: Student# 26648958, Prateek Jain: Student# 28321844 1
  • 2. and plotted the assignments in the 2 dimensional transformed feature space. We saw patterns in the data. Especially some points were densely clustered and some were sparse. To visualize it better, we transformed the feature in 3 dimensional space, with the first 3 principle components, and saw that the points were clustered around 3 planes. Figure 1: Visualizing the data in 3 dimensions 3 Machine Learning Models 3.1 Mixture of Experts As seen from our visualization in Figure 1, we could identify two highly dense areas of the feature data on either side of a region of sparsely distributed data. The idea behind using the mixture of experts approach was, that intuitively, it would be difficult for a single regressor to fit the dataset, since the distribution is non-uniform. We decided to split the data into clusters. To cluster the data, we used several initialization of the k means algorithm with the kmeans++. We used number of clusters as one of the parameters of our model which we tried to change. Since each of the clusters got a subset of a points from the original dataset, number of data points per cluster was not a very large number. Our concern with this was that any model we chose would overfit the data in its cluster. Therefore, we used the ensemble method of gradient boosting for each of the clusters. Since, in gradient boosting, we start with an underfitting model and 2
  • 3. (a) Cluster assignments of Data Points (b) Mixture of Experts Error Figure 2: Visualizing the principle components of Data 3
  • 4. then gradually add complexity, the chances of overfitting would be less in this model. We decided to use Decision stumps as our regressors for the boosting algorithm. For evaluating the prediction for the validation split and the test data, we first check which cluster the data point belongs to. We did this, by creating a K nearest neighbor classifier on the center of the 3 clusters created in the previous step. Then, the classifier predicts the cluster assignment for each test point, and we use the array of boosting regressors corresponding to that cluster on the data point, to get its corresponding prediction. The parameters of the model we modified were the number of clusters and the number of regressors used for boosting. We found that though the test error reduced considerably on increasing the regressors for boosting, the validation error increased after a certain point as can be seen from Figure 3. We got minimum validation error for 700 regressors. 3.2 Neural Networks We implemented various types of neural networks, ranging from single layer networks to 3-layer sigmoidal neural networks. Single Layer Network Figure 3: Single Layer Architecture. We build the neural network using the MATLAB’s Neural-Network-Toolkit and PyBrain library implemented in Python. For the MATLAB implementa- tion, there were various runs made for different number of neurons in the hidden layer. The architecture of the neural network can be seen in Figure 3. The Fig- ure 4 show the train-test-validation plots for different network architectures. The dataset was distributed into 70% (Training), 20% (Validation) and 10% (Testing) section for the neural network to run. The subsection 3.4 shows the performance of the models learned. It was seen that the neural networks started to overfit as the number of neurons were increased more than 40. # of Neurons Training Error (RMSE) Testing Error (RMSE) 10 0.5986 0.61341 20 0.5875 0.61301 50 0.5852 0.62889 Table 1: RMSE Error rates for different network architectures. It was observed that the learner could not learn very accurately as the data a lot as the data was not much for the neural network to learn on. 4
  • 5. (a) Train-Validation-Test error plot for 10 neuron hidden layer (b) Error distribution histogram for 10 neuron hidden layer (c) Train-Validation-Test error plot for 20 neuron hidden layer (d) Error distribution histogram for 20 neuron hidden layer (e) Train-Validation-Test error plot for 50 neuron hidden layer (f) Error distribution histogram for 50 neuron hidden layer Figure 4: Plots of various Train-Validation-Test error for number of neurons = [10, 20, 50] 5
  • 6. Deep Networks For this project, we tried using deep networks as well. The deep network was made using PyBrain. We tried using different activation functions and archi- tectures to understand how deep networks would work. The architecture shown in Figure 5 had 3 layers - visible later contains 91 neurons, the first hidden layer (tanh) had 91 neurons, the second hidden layer (sigmoid) had 50 neu- rons, the third hidden layer (sigmoid) had 20 neurons, and the output layer had 1 linear node. The testing error was 0.83643 was very high compared to other approaches. We concluded that the network was learning the data well, but was overfitting. Input layer Hidden layer (Hyperbolic Tangent) Hidden layer(Sigmoid) y1 y2 y3 Output layer 3.3 Gradient Boosting In parallel, we worked on training the gradient boosting model with varying parameters to get the best fit for the data. We started with basic decision stumps with number of regressors ranging from 1 to 2000. We also varied the maximum Depth for the decision tree used as the regression model from 3 to 7. We used alpha 0.9 for our algorithm. We observed that we got best performance with 2000 boosters and depth as 7. 3.4 Random Forests Several aspects of Random Forest technique was explored. The major funda- mental behind Random Forest is to take a model, that overfits, the data, then use feature and data bagging to bring down the complexity to fit the data bet- ter. The usual model that is used in Random Forest is a high depth Regression tree. We tried to explore other models, that overfitted the data. The first option was to consider simple linear regression with feature trans- formation. The data from X1 was transformed into X1 and X12 features and 6
  • 7. Figure 5: Train and Test error plot for Gradient Boosting vs number of learners linear regression was done on that. Significantly better results were obtained in this transformation( a test error of 0.4322 compared to 0.4181) , but it signifi- cantly worsened with an addition of X13 features to the feature list. This was used as the regressor for the Random Forests, but the results were better for a Tree Regressor. The major take away from this analysis was the use of X22 features into the feature list for tree regression. Several other regressors were also tried like knn regressor was used, but tree regressor came out on top. Since Decision Tree regression was significantly better than linear regression in Random Forest, we decided to proceed with that with the X22 features also in place(a total of 182 features). nFeatures was chosen as 150, and the depth was set as 13,14,15,16,17, of which a maxDepth of 14 obtained optimal performance. 150 decision trees were learned and the optimum results were obtained for 90 learners. Learner Training Error (MSE) Testing Error (MSE) Linear Regressor 0.4068 0.4243 Linear Regressor with X12 feature 0.3996 0.4140 Tree Regressor 0.1951 0.3822 Table 2: MSE Error rates for Random Forests 4 Ensemble of all Learners At the end, since we trained a lot of learners separately, some of which were ensembles themselves, we thought of aggregating the results of the learners to improve our prediction.We also analyzed the variance between the results of our learners, and an average variance of 0.0204 was obtained. Since the 7
  • 8. variance was noticeable, a weighted average aggregation of the results seemed the best approach. We chose the model parameters for the best performing models from each category to get a consolidated result. The section 4 shows the architecture of our ensember. Initially, we chose a very simple approach of assigning all models with the same weights to get a prediction. We got a some improvement with MSE of 0.5908. We, saw that this was performing just below our best individual prediction model. So, we decided to bump the weight of our best learner in the ensemble. This helped improve our accumulated prediction, providing an MSE of 0.5878. Figure 6: Ensemble of Learners 5 Conclusion This project gave a us glimpse on how machine learning techniques are applied to real world problems. We applied a variety of techniques including neural networks, decision trees, random forests, gradient boosting, kmeans clustering, and PCA. Testing out various parameters of the different learner types helped us identify where each of the models under-fitted and over-fitted the data. Finally, while modifying the parameters of each model helped us reduce the variance in the models, we used a final weighted ensemble of various learners to reduce the bias of individual learners. 8