SlideShare a Scribd company logo
1 of 36
+
Training Combined Cycle Power Plant Data Set on HPC
Yuwu Chen
4/28/2015
+  Data Description
 Data Summary
 Method Description
 Method Comparison
 Conclusion
 Reference
+
Data Description
 In electric power generation a combined cycle is an assembly of
heat engines that work in series from the same source of heat.
 The principle is that after completing its cycle in the first engine,
the working fluid of the first heat engine is still low enough in its
entropy that a second subsequent heat engine may extract energy
from the waste heat (energy) of the working fluid of the first engine.
 The electrical energy output of a power plant is influenced by four
main parameters. The goal is to find a model for testing the
influence.
Turbine
Electric generator
+
Data Description
 The initial dataset was donated by a confidential power
plant.
 The full dataset is available on UCI’s website:
http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+P
ower+Plant
 The dataset recorded 6 years (2006-2011) of electrical
power output when the plant was set to work with full
load.
 9568 observations, 5 variables.
 Observations are independent from each other.
 Data has been cleaned (e.g.: no missing values) before
uploaded to the UCI.
+
Data Description
 Independent variables:
 Temperature (T)1.81°C - 37.11°C
 Ambient Pressure (AP) 992.89 -1033.30 milibar
 Relative Humidity (RH) 25.56% - 100.16%
 Exhaust Vacuum (V) 25.36-81.56 cm Hg
 Dependent variable:
 Net hourly electrical energy output (PE) 420.26-495.76 MW
+
Data Summary
+
Data Summary
 Visualize Correlation Matrix
+
Method Description
 Step1: Assess the training data by several untrained models
 Multiple linear regression
 Backward selection
 Ridge regression
 Elasticnet
 Lasso
 SVM with linear kernel
 Pruned tree
 MARS
 Boosted tree
 Bagging tree
 Step2: Fit the test data set with the obtained models and evaluate the model in terms
of the RMSE and MAD values.
 Step3: Train models with resampling methods on the LSU HPC, and evaluate each
model by RMSE and MAD
𝑀𝐴𝐷 =
1
𝑁
×
𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖RMSE = 𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖
2/𝑁
+
Multiple linear regression
+
Backward selection
Backward selection didn’t remove any independent variables from
the model. So it will give the same RMSE and MAD as MLR.
+
Ridge regression and Lasso
Consider fire area as binary logical response
λ=1 is used for the RMSE and MAD calculation
ridge=glmnet (x.train,y.train,alpha =0)
lasso=glmnet (x.train,y.train,alpha =1)
+
Elasticnet
Consider fire area as binary logical response
λ=1 is used for the Elasticnet calculation
elastic=glmnet (x.train,y.train,alpha =0.5)
+
SVM with linear kernel
Consider fire area as binary logical response
ksvm <- ksvm(PE ~ AT+V+AP+RH, data = data.train,kernel="vanilladot",C=1)
+
Single pruned tree
Consider fire area as binary logical response
rpart <- rpart(PE ~ AT+V+AP+RH, data = data.train,control =
rpart.control(xval = 10, minbucket = 100,cp = 0.01))
+
MARS
Consider fire area as binary logical response
mars <- earth(PE ~ AT+V+AP+RH, data = data.train,degree=1)
+
Boosted tree
Consider fire area as binary logical response
boost<- gbm(PE ~ AT+V+AP+RH, data =
data.train,distribution="gaussian",n.trees =1000,
interaction.depth=4,shrinkage =0.01)
+
Bagging tree
Consider fire area as binary logical response
bag <- randomForest(PE ~ AT+V+AP+RH, data = data.train, mtry=4,
importance =TRUE)
+
The Predictive Results in terms of the
MAD and RMSE values (untrained)
Model Package RMSE MAD
MLR 4.583379 3.622121
Backward leaps 4.583379 3.622121
Ridge glmnet 4.92302 3.928416
Lasso glmnet 5.039077 4.015422
Elesticnet glmnet 4.848145 3.866809
SVM-linear kernel kernlab 4.588058 3.604371
Pruned tree rpart 5.422748 4.241274
MARS earth 4.282067 3.330725
Boost tree gbm 3.978378 3.026208
Bagging tree randomForest 3.604678 2.615768
𝑀𝐴𝐷 =
1
𝑁
×
𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖RMSE = 𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖
2/𝑁
+
Train models with resampling
methods
 Train method: The train function in the caret package
 Can train all models used in this project with resampling methods
 Easy to manipulate, well documented.
 Will automatically parallelize when multiple cpu cores are
registered.
+
Train models with resampling
methods
Model Resampling method Tuning parameter
MLR bootstrapping N/A
Backward Selection cross-validation #Randomly Selected
Predictors
Ridge cross-validation λ
Lasso cross-validation λ
Elesticnet cross-validation α and λ
SVM-linear kernel cross-validation cost
Pruned tree bootstrapping cp
MARS bootstrapping #prune and degree
Boost tree repeat cross-
validation
#.trees, shrinkage
interaction.depth,
Bagging (RF) cross-validation #Randomly Selected
Predictors
+
Parallel computing in R
 Motivation: Save computation time.
 A for loop can be very slow if there are a large number of
computations that need to be carried out.
 Almost all computers now have multicore processors.
 As long as these computations do not need to communicate
(resampling methods are excellent examples), they can be
spread across multiple cores and executed in parallel.
 The parallel package
+
Running R on LONI and LSU HPC
clusters
 LONI QueenBee-2 landed 46th on TOP500 in the world (Nov. 2014)
Training model: MLR
Resampling: Bootstrapped (10000 reps)
+
Training backward selection
The 10 CV training still didn’t remove any independent variables
from the model. So it will give the same RMSE and MAD as MLR.
+
Training ridge, elasticnet and Lasso
The final λ for ridge is 1.417
The final λ for lasso is 0.0497
The final α is 0.5 and λ is 0.0497 for elasticnet
+
Training SVM with linear kernel
The final selected cost is 2
+
Training single tree
Consider fire area as binary logical response
treetrain <- train(PE ~ ., data = data.train,method = "rpart",trControl =
trainControl(method = "boot",number = 1000),tuneLength=10)
+
Training MARS
Consider fire area as binary logical response
marsGrid <- expand.grid(degree = c(1,2), nprune = (1:10) * 2)
earthtrain <- train(PE ~ ., data = data.train,method = "earth",tuneGrid =
marsGrid,maximize = FALSE,trControl = trainControl(method = "boot",number =
1000))
The # of prune is 14
The degree is 2
+
Training boosting tree
Consider fire area as binary logical response
fitControl <- trainControl(method = "repeatedcv",number = 10,repeats = 10)
gbmGrid <- expand.grid(interaction.depth = c(1, 4, 7),n.trees =
(1:30)*50,shrinkage = c(0.001,0.01,0.1))
boosttrain <- train(PE ~ ., data = data.train,method = "gbm",trControl = fitControl,
tuneGrid = gbmGrid)
The final # of trees is 1500
The final interaction depth is 7
The shrinkage is 0.1
+
Training bagging trees
Consider fire area as binary logical response
Convert Bagging trees to RF
The optimized model retained two predict variables
+
Training improvement: RMSE
Consider fire area as binary logical response
+
Training improvement: MAD
Consider fire area as binary logical response
+
Values
Consider fire area as binary logical response
RMSE MAD
untrained trained untrained trained
MLR 4.583379 4.583379 3.622121 3.622121
Backward 4.583379 4.583379 3.622121 3.622121
Ridge 4.92302 4.923921 3.928416 3.92914
Elasticnet 4.848145 4.584562 3.866809 3.626797
Lasso 5.039077 4.584209 4.015422 3.624946
SVM linear kernel 4.588058 4.588086 3.604371 3.604358
Pruned tree 5.422748 5.006409 4.241274 3.913017
MARS 4.282067 4.233078 3.330725 3.279154
BoostingTree 3.978378 3.468537 3.026208 2.506058
BaggingTree 3.604678 3.498071 2.615768 2.536362
+
t-test to evaluate the null hypothesis that there is
no difference between models
Consider fire area as binary logical response
The bagging(rf) model is significantly different from other two
models.
+
Summary
 Ten models have been used for testing the influence of the
independent variables.
 The training process in caret package improves the
performance of seven models.
 Parallel computation on the HPC can speed up the resampling
calculation significantly.
 The RMSE and MAD values indicate that, after the training, the
bagging(RF) and boosting trees tend to produce the best
predictions.
+
Future work
 The mechanism of the training in the caret package should be
explored. E.g. there is no tuning parameter available when
training MLR model, so which part has been bootstrapped?
+
Reference
 Pınar Tüfekci, Prediction of full load electrical power output of a
base load operated combined cycle power plant using machine
learning methods, International Journal of Electrical Power &
Energy Systems, Volume 60, September 2014, Pages 126-140,
 Heysem Kaya, Pınar Tüfekci , Sadık Fikret Gürgen: Local and
Global Learning Methods for Predicting Power of a Combined
Gas & Steam Turbine, Proceedings of the International
Conference on Emerging Trends in Computer and Electronics
Engineering ICETCEE 2012, pp. 13-18

More Related Content

What's hot

Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...
Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...
Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...CSCJournals
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2arogozhnikov
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
 
K means clustering
K means clusteringK means clustering
K means clusteringThomas K T
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++sabbirantor
 
JOURNAL PAPER
JOURNAL PAPERJOURNAL PAPER
JOURNAL PAPERRaj kumar
 
Power losses reduction of power transmission network using optimal location o...
Power losses reduction of power transmission network using optimal location o...Power losses reduction of power transmission network using optimal location o...
Power losses reduction of power transmission network using optimal location o...IJECEIAES
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkMila, Université de Montréal
 
Memory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterMemory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterIJERA Editor
 
Hierarchical clustering techniques
Hierarchical clustering techniquesHierarchical clustering techniques
Hierarchical clustering techniquesMd Syed Ahamad
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural NetworkAshwani Jha
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based ClusteringSSA KPI
 
Availability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active ComponentsAvailability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active Componentstheijes
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 

What's hot (20)

post119s1-file2
post119s1-file2post119s1-file2
post119s1-file2
 
Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...
Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...
Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...
 
Optimal Power System Planning with Renewable DGs with Reactive Power Consider...
Optimal Power System Planning with Renewable DGs with Reactive Power Consider...Optimal Power System Planning with Renewable DGs with Reactive Power Consider...
Optimal Power System Planning with Renewable DGs with Reactive Power Consider...
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
PhotonCountingMethods
PhotonCountingMethodsPhotonCountingMethods
PhotonCountingMethods
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
JOURNAL PAPER
JOURNAL PAPERJOURNAL PAPER
JOURNAL PAPER
 
Power losses reduction of power transmission network using optimal location o...
Power losses reduction of power transmission network using optimal location o...Power losses reduction of power transmission network using optimal location o...
Power losses reduction of power transmission network using optimal location o...
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Memory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterMemory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital Predistorter
 
Hierarchical clustering techniques
Hierarchical clustering techniquesHierarchical clustering techniques
Hierarchical clustering techniques
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural Network
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
Availability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active ComponentsAvailability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active Components
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Ijetcas14 403
Ijetcas14 403Ijetcas14 403
Ijetcas14 403
 
first research paper
first research paperfirst research paper
first research paper
 

Viewers also liked

Ppt combined cycle power plant
Ppt combined cycle power plantPpt combined cycle power plant
Ppt combined cycle power plantDavid Grenier
 
Combined Cycle Gas Turbine Power Plant Part 1
Combined Cycle Gas Turbine Power Plant Part 1Combined Cycle Gas Turbine Power Plant Part 1
Combined Cycle Gas Turbine Power Plant Part 1Anurak Atthasit
 
2007 PROJECT DOCUMENT ON EXERGY
2007  PROJECT DOCUMENT ON EXERGY 2007  PROJECT DOCUMENT ON EXERGY
2007 PROJECT DOCUMENT ON EXERGY Alekhya Madisetty
 
Internship report RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Kayamkulam
Internship report  RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. KayamkulamInternship report  RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Kayamkulam
Internship report RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. KayamkulamSreesankar Jayasingrajan
 
Presentation Erection Gas Turbine
Presentation Erection Gas TurbinePresentation Erection Gas Turbine
Presentation Erection Gas Turbinemohamad masumi
 
Hrsg startup proceudre
Hrsg startup proceudreHrsg startup proceudre
Hrsg startup proceudrekamaraprasad
 
Basic ccpp overview Power plant
Basic ccpp overview Power plantBasic ccpp overview Power plant
Basic ccpp overview Power plantMARLON RAMIREZ
 
Combined Cycle Power Generation Technology
Combined Cycle Power Generation TechnologyCombined Cycle Power Generation Technology
Combined Cycle Power Generation TechnologyAndrew Schnobrich
 
Power plant instrumentation
Power plant instrumentationPower plant instrumentation
Power plant instrumentationSakshi Vashist
 
Thermal plant instrumentation and control
Thermal plant instrumentation and controlThermal plant instrumentation and control
Thermal plant instrumentation and controlShilpa Shukla
 
Instrumentation & Control For Thermal Power Plant
Instrumentation & Control For Thermal Power PlantInstrumentation & Control For Thermal Power Plant
Instrumentation & Control For Thermal Power PlantSHIVAJI CHOUDHURY
 
Design Principles of Excel Dashboards & Reports
Design Principles of Excel Dashboards & ReportsDesign Principles of Excel Dashboards & Reports
Design Principles of Excel Dashboards & ReportsWiley
 

Viewers also liked (13)

Ppt combined cycle power plant
Ppt combined cycle power plantPpt combined cycle power plant
Ppt combined cycle power plant
 
Combined Cycle Gas Turbine Power Plant Part 1
Combined Cycle Gas Turbine Power Plant Part 1Combined Cycle Gas Turbine Power Plant Part 1
Combined Cycle Gas Turbine Power Plant Part 1
 
2007 PROJECT DOCUMENT ON EXERGY
2007  PROJECT DOCUMENT ON EXERGY 2007  PROJECT DOCUMENT ON EXERGY
2007 PROJECT DOCUMENT ON EXERGY
 
Internship report RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Kayamkulam
Internship report  RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. KayamkulamInternship report  RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Kayamkulam
Internship report RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Kayamkulam
 
Presentation Erection Gas Turbine
Presentation Erection Gas TurbinePresentation Erection Gas Turbine
Presentation Erection Gas Turbine
 
Hrsg startup proceudre
Hrsg startup proceudreHrsg startup proceudre
Hrsg startup proceudre
 
Basic ccpp overview Power plant
Basic ccpp overview Power plantBasic ccpp overview Power plant
Basic ccpp overview Power plant
 
Combined Cycle Power Generation Technology
Combined Cycle Power Generation TechnologyCombined Cycle Power Generation Technology
Combined Cycle Power Generation Technology
 
Power plant instrumentation
Power plant instrumentationPower plant instrumentation
Power plant instrumentation
 
Thermal plant instrumentation and control
Thermal plant instrumentation and controlThermal plant instrumentation and control
Thermal plant instrumentation and control
 
Gas Turbine Power Plant
Gas Turbine Power PlantGas Turbine Power Plant
Gas Turbine Power Plant
 
Instrumentation & Control For Thermal Power Plant
Instrumentation & Control For Thermal Power PlantInstrumentation & Control For Thermal Power Plant
Instrumentation & Control For Thermal Power Plant
 
Design Principles of Excel Dashboards & Reports
Design Principles of Excel Dashboards & ReportsDesign Principles of Excel Dashboards & Reports
Design Principles of Excel Dashboards & Reports
 

Similar to Yuwu chen

Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Niklas Ignell
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodIOSR Journals
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Databricks
 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
 
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
 
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATION
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATIONECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATION
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATIONMln Phaneendra
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Value Function Approximation via Low-Rank Models
Value Function Approximation via Low-Rank ModelsValue Function Approximation via Low-Rank Models
Value Function Approximation via Low-Rank ModelsLyft
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionIRJET Journal
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...o_almasi
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptxssuser31398b
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesVarad Meru
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...
Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...
Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...Cemal Ardil
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
The convergence of the iterated irs method
The convergence of the iterated irs methodThe convergence of the iterated irs method
The convergence of the iterated irs methodJuan Carlos Molano Toro
 

Similar to Yuwu chen (20)

Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction
 
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
 
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATION
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATIONECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATION
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATION
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Value Function Approximation via Low-Rank Models
Value Function Approximation via Low-Rank ModelsValue Function Approximation via Low-Rank Models
Value Function Approximation via Low-Rank Models
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel Selection
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...
Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...
Automatic generation-control-of-multi-area-electric-energy-systems-using-modi...
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
The convergence of the iterated irs method
The convergence of the iterated irs methodThe convergence of the iterated irs method
The convergence of the iterated irs method
 
L14.pdf
L14.pdfL14.pdf
L14.pdf
 

Recently uploaded

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Recently uploaded (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

Yuwu chen

  • 1. + Training Combined Cycle Power Plant Data Set on HPC Yuwu Chen 4/28/2015
  • 2. +  Data Description  Data Summary  Method Description  Method Comparison  Conclusion  Reference
  • 3. + Data Description  In electric power generation a combined cycle is an assembly of heat engines that work in series from the same source of heat.  The principle is that after completing its cycle in the first engine, the working fluid of the first heat engine is still low enough in its entropy that a second subsequent heat engine may extract energy from the waste heat (energy) of the working fluid of the first engine.  The electrical energy output of a power plant is influenced by four main parameters. The goal is to find a model for testing the influence. Turbine Electric generator
  • 4. + Data Description  The initial dataset was donated by a confidential power plant.  The full dataset is available on UCI’s website: http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+P ower+Plant  The dataset recorded 6 years (2006-2011) of electrical power output when the plant was set to work with full load.  9568 observations, 5 variables.  Observations are independent from each other.  Data has been cleaned (e.g.: no missing values) before uploaded to the UCI.
  • 5. + Data Description  Independent variables:  Temperature (T)1.81°C - 37.11°C  Ambient Pressure (AP) 992.89 -1033.30 milibar  Relative Humidity (RH) 25.56% - 100.16%  Exhaust Vacuum (V) 25.36-81.56 cm Hg  Dependent variable:  Net hourly electrical energy output (PE) 420.26-495.76 MW
  • 7. + Data Summary  Visualize Correlation Matrix
  • 8. + Method Description  Step1: Assess the training data by several untrained models  Multiple linear regression  Backward selection  Ridge regression  Elasticnet  Lasso  SVM with linear kernel  Pruned tree  MARS  Boosted tree  Bagging tree  Step2: Fit the test data set with the obtained models and evaluate the model in terms of the RMSE and MAD values.  Step3: Train models with resampling methods on the LSU HPC, and evaluate each model by RMSE and MAD 𝑀𝐴𝐷 = 1 𝑁 × 𝑖=1 𝑁 𝑦𝑖 − 𝑦𝑖RMSE = 𝑖=1 𝑁 𝑦𝑖 − 𝑦𝑖 2/𝑁
  • 10. + Backward selection Backward selection didn’t remove any independent variables from the model. So it will give the same RMSE and MAD as MLR.
  • 11. + Ridge regression and Lasso Consider fire area as binary logical response λ=1 is used for the RMSE and MAD calculation ridge=glmnet (x.train,y.train,alpha =0) lasso=glmnet (x.train,y.train,alpha =1)
  • 12. + Elasticnet Consider fire area as binary logical response λ=1 is used for the Elasticnet calculation elastic=glmnet (x.train,y.train,alpha =0.5)
  • 13. + SVM with linear kernel Consider fire area as binary logical response ksvm <- ksvm(PE ~ AT+V+AP+RH, data = data.train,kernel="vanilladot",C=1)
  • 14. + Single pruned tree Consider fire area as binary logical response rpart <- rpart(PE ~ AT+V+AP+RH, data = data.train,control = rpart.control(xval = 10, minbucket = 100,cp = 0.01))
  • 15. + MARS Consider fire area as binary logical response mars <- earth(PE ~ AT+V+AP+RH, data = data.train,degree=1)
  • 16. + Boosted tree Consider fire area as binary logical response boost<- gbm(PE ~ AT+V+AP+RH, data = data.train,distribution="gaussian",n.trees =1000, interaction.depth=4,shrinkage =0.01)
  • 17. + Bagging tree Consider fire area as binary logical response bag <- randomForest(PE ~ AT+V+AP+RH, data = data.train, mtry=4, importance =TRUE)
  • 18. + The Predictive Results in terms of the MAD and RMSE values (untrained) Model Package RMSE MAD MLR 4.583379 3.622121 Backward leaps 4.583379 3.622121 Ridge glmnet 4.92302 3.928416 Lasso glmnet 5.039077 4.015422 Elesticnet glmnet 4.848145 3.866809 SVM-linear kernel kernlab 4.588058 3.604371 Pruned tree rpart 5.422748 4.241274 MARS earth 4.282067 3.330725 Boost tree gbm 3.978378 3.026208 Bagging tree randomForest 3.604678 2.615768 𝑀𝐴𝐷 = 1 𝑁 × 𝑖=1 𝑁 𝑦𝑖 − 𝑦𝑖RMSE = 𝑖=1 𝑁 𝑦𝑖 − 𝑦𝑖 2/𝑁
  • 19. + Train models with resampling methods  Train method: The train function in the caret package  Can train all models used in this project with resampling methods  Easy to manipulate, well documented.  Will automatically parallelize when multiple cpu cores are registered.
  • 20. + Train models with resampling methods Model Resampling method Tuning parameter MLR bootstrapping N/A Backward Selection cross-validation #Randomly Selected Predictors Ridge cross-validation λ Lasso cross-validation λ Elesticnet cross-validation α and λ SVM-linear kernel cross-validation cost Pruned tree bootstrapping cp MARS bootstrapping #prune and degree Boost tree repeat cross- validation #.trees, shrinkage interaction.depth, Bagging (RF) cross-validation #Randomly Selected Predictors
  • 21. + Parallel computing in R  Motivation: Save computation time.  A for loop can be very slow if there are a large number of computations that need to be carried out.  Almost all computers now have multicore processors.  As long as these computations do not need to communicate (resampling methods are excellent examples), they can be spread across multiple cores and executed in parallel.  The parallel package
  • 22. + Running R on LONI and LSU HPC clusters  LONI QueenBee-2 landed 46th on TOP500 in the world (Nov. 2014) Training model: MLR Resampling: Bootstrapped (10000 reps)
  • 23. + Training backward selection The 10 CV training still didn’t remove any independent variables from the model. So it will give the same RMSE and MAD as MLR.
  • 24. + Training ridge, elasticnet and Lasso The final λ for ridge is 1.417 The final λ for lasso is 0.0497 The final α is 0.5 and λ is 0.0497 for elasticnet
  • 25. + Training SVM with linear kernel The final selected cost is 2
  • 26. + Training single tree Consider fire area as binary logical response treetrain <- train(PE ~ ., data = data.train,method = "rpart",trControl = trainControl(method = "boot",number = 1000),tuneLength=10)
  • 27. + Training MARS Consider fire area as binary logical response marsGrid <- expand.grid(degree = c(1,2), nprune = (1:10) * 2) earthtrain <- train(PE ~ ., data = data.train,method = "earth",tuneGrid = marsGrid,maximize = FALSE,trControl = trainControl(method = "boot",number = 1000)) The # of prune is 14 The degree is 2
  • 28. + Training boosting tree Consider fire area as binary logical response fitControl <- trainControl(method = "repeatedcv",number = 10,repeats = 10) gbmGrid <- expand.grid(interaction.depth = c(1, 4, 7),n.trees = (1:30)*50,shrinkage = c(0.001,0.01,0.1)) boosttrain <- train(PE ~ ., data = data.train,method = "gbm",trControl = fitControl, tuneGrid = gbmGrid) The final # of trees is 1500 The final interaction depth is 7 The shrinkage is 0.1
  • 29. + Training bagging trees Consider fire area as binary logical response Convert Bagging trees to RF The optimized model retained two predict variables
  • 30. + Training improvement: RMSE Consider fire area as binary logical response
  • 31. + Training improvement: MAD Consider fire area as binary logical response
  • 32. + Values Consider fire area as binary logical response RMSE MAD untrained trained untrained trained MLR 4.583379 4.583379 3.622121 3.622121 Backward 4.583379 4.583379 3.622121 3.622121 Ridge 4.92302 4.923921 3.928416 3.92914 Elasticnet 4.848145 4.584562 3.866809 3.626797 Lasso 5.039077 4.584209 4.015422 3.624946 SVM linear kernel 4.588058 4.588086 3.604371 3.604358 Pruned tree 5.422748 5.006409 4.241274 3.913017 MARS 4.282067 4.233078 3.330725 3.279154 BoostingTree 3.978378 3.468537 3.026208 2.506058 BaggingTree 3.604678 3.498071 2.615768 2.536362
  • 33. + t-test to evaluate the null hypothesis that there is no difference between models Consider fire area as binary logical response The bagging(rf) model is significantly different from other two models.
  • 34. + Summary  Ten models have been used for testing the influence of the independent variables.  The training process in caret package improves the performance of seven models.  Parallel computation on the HPC can speed up the resampling calculation significantly.  The RMSE and MAD values indicate that, after the training, the bagging(RF) and boosting trees tend to produce the best predictions.
  • 35. + Future work  The mechanism of the training in the caret package should be explored. E.g. there is no tuning parameter available when training MLR model, so which part has been bootstrapped?
  • 36. + Reference  Pınar Tüfekci, Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods, International Journal of Electrical Power & Energy Systems, Volume 60, September 2014, Pages 126-140,  Heysem Kaya, Pınar Tüfekci , Sadık Fikret Gürgen: Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine, Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE 2012, pp. 13-18

Editor's Notes

  1. RMSE is more sensitive to outliers than the MAD metric.
  2. RMSE is more sensitive to outliers than the MAD metric.
  3. RMSE is more sensitive to outliers than the MAD metric.