Test different neural networks models for forecasting of wind,solar and energ...Tonmoy Ibne Arif
In this project work, a multi-step deep neural network is used to forecast power generation and load demand for a short-term time frame. The data or feature vectors that have been used to predict the target, is a sequential time series sequence. In this project, a Recurrent Neural Network has been used in combination with a convolutional neural network to have a better forecasting model for the Windpark, Solar park and Loadpark datasets. Moreover, the forecasting performance of Feedforward neural network and Long Short Term Memory also has been compared. The whole project work has divided into two parts, in the first approach the raw dataset has been divided into a train, test split and no previous step data have been used. In the second step whole raw dataset has been divided into test, train and validation split. Additionally, current and seven previous time steps data has been fed into the model.
Principal component analysis (PCA) is a technique used to simplify complex datasets. It works by converting a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components. PCA identifies patterns in data and expresses the data in such a way as to highlight their similarities and differences. The main implementations of PCA are eigenvalue decomposition and singular value decomposition. PCA is useful for data compression, reducing dimensionality for visualization and building predictive models. However, it works best for data that follows a multidimensional normal distribution.
The aim of this report is to use eigenvectors, eigenvalues, and orthogonality to understand the concept of Principal Component Analysis (PCA) and to show why PCA is useful.
Lower back pain can be caused by a variety of problems with any parts of the complex, interconnected network of spinal muscles, nerves, bones, discs or tendons in the lumbar spine. This solution is about to identify a person is abnormal or normal using collected physical spine details.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
The document describes an enhancement to the standard k-means clustering algorithm. The enhancement aims to improve computational speed by storing additional information from each iteration, such as the closest cluster and distance for each data point. This avoids needing to recompute distances to all cluster centers in subsequent iterations if a point does not change clusters. The complexity of the enhanced algorithm is reduced from O(nkl) to O(nk) where n is points, k is clusters, and l is iterations.
Introductory session for basic matlab commands and a brief overview of K-mean clustering algorithm with image processing example.
NOTE: you can find code of k-mean clustering algorithm for image processing in notes.
The document compares and summarizes several clustering algorithms, including K-Means, DBSCAN, hierarchical clustering, and CURE. It discusses the time and space complexity of each algorithm, how they are affected by the use of KD trees, their benefits and limitations, and provides examples of their performance on benchmark datasets.
Test different neural networks models for forecasting of wind,solar and energ...Tonmoy Ibne Arif
In this project work, a multi-step deep neural network is used to forecast power generation and load demand for a short-term time frame. The data or feature vectors that have been used to predict the target, is a sequential time series sequence. In this project, a Recurrent Neural Network has been used in combination with a convolutional neural network to have a better forecasting model for the Windpark, Solar park and Loadpark datasets. Moreover, the forecasting performance of Feedforward neural network and Long Short Term Memory also has been compared. The whole project work has divided into two parts, in the first approach the raw dataset has been divided into a train, test split and no previous step data have been used. In the second step whole raw dataset has been divided into test, train and validation split. Additionally, current and seven previous time steps data has been fed into the model.
Principal component analysis (PCA) is a technique used to simplify complex datasets. It works by converting a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components. PCA identifies patterns in data and expresses the data in such a way as to highlight their similarities and differences. The main implementations of PCA are eigenvalue decomposition and singular value decomposition. PCA is useful for data compression, reducing dimensionality for visualization and building predictive models. However, it works best for data that follows a multidimensional normal distribution.
The aim of this report is to use eigenvectors, eigenvalues, and orthogonality to understand the concept of Principal Component Analysis (PCA) and to show why PCA is useful.
Lower back pain can be caused by a variety of problems with any parts of the complex, interconnected network of spinal muscles, nerves, bones, discs or tendons in the lumbar spine. This solution is about to identify a person is abnormal or normal using collected physical spine details.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
The document describes an enhancement to the standard k-means clustering algorithm. The enhancement aims to improve computational speed by storing additional information from each iteration, such as the closest cluster and distance for each data point. This avoids needing to recompute distances to all cluster centers in subsequent iterations if a point does not change clusters. The complexity of the enhanced algorithm is reduced from O(nkl) to O(nk) where n is points, k is clusters, and l is iterations.
Introductory session for basic matlab commands and a brief overview of K-mean clustering algorithm with image processing example.
NOTE: you can find code of k-mean clustering algorithm for image processing in notes.
The document compares and summarizes several clustering algorithms, including K-Means, DBSCAN, hierarchical clustering, and CURE. It discusses the time and space complexity of each algorithm, how they are affected by the use of KD trees, their benefits and limitations, and provides examples of their performance on benchmark datasets.
1) The Group Steiner Problem involves finding a minimum-cost tree connecting at least one node from each of k disjoint groups in a graph. It has applications in VLSI circuit routing.
2) The authors propose a CUDA-aware MPI-based approach to solve large instances of the Group Steiner Problem using a heuristic that constructs 2-star trees in parallel.
3) Experimental results on a supercomputer show the parallel implementation achieves up to 302x speedup over sequential algorithms and finds accurate solutions to industry benchmark instances.
Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...CSCJournals
A simple and fast genetic algorithm (GA) developed to reduce the sidelobes in non-uniformly spaced linear antenna arrays. The proposed GA algorithm optimizes two vectors of variables to increase the Main lobe to Sidelobe power ratio (M/S) of array’s radiation pattern. The algorithm, in the first phase calculates the positions of the array elements and in the second phase, it manipulates the amplitude of excitation signals for each element. The simulations performed for 16 and 24 elements array structure. The results indicated that M/S improved in first phase from 13.2 to over 22.2dB meanwhile the half power beamwidth (HPBW) left almost unchanged. After element replacement, in the second phase, by using amplitude tapering further improvement up to 32dB was achieved. Also, the simulations shown that after element space perturbation, some antenna elements can be merged together without any performance degradation in radiation pattern in terms of gain and sidelobes level.
This paper analyses the optimal power system planning with DGs used as real and reactive power compensator. Recently planning of DG placement reactive power compensation are the major problems in distribution system. As the requirement in the power is more the DG placement becomes important. When planned to make the DG placement, cost analysis becomes as a major concern. And if the DGs operate as reactive power compensator it is most helpful in power quality maintenance. So, this paper deals with the optimal power system planning with renewable DGs which can be used as a reactive power compensators. The problem is formulated and solved using popular meta-heuristic techniques called cuckoo search algorithm (CSA) and particle swarm optimization (PSO). the comparative results are presented.
1. The document discusses various machine learning algorithms for classification and regression including logistic regression, neural networks, decision trees, and ensemble methods.
2. It explains key concepts like overfitting, regularization, kernel methods, and different types of neural network architectures like convolutional neural networks.
3. Decision trees are described as intuitive algorithms for classification and regression but are unstable and use greedy optimization. Techniques like pre-pruning and post-pruning are used to improve decision trees.
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
This document discusses principal component analysis (PCA) and matrix factorizations for learning. It provides an overview of PCA and singular value decomposition (SVD), their history and applications. PCA and SVD are widely used techniques for dimensionality reduction and data transformation. The document also discusses how PCA relates to other methods like spectral clustering and correspondence analysis.
This document discusses the application of photon counting multibin detectors to spectral CT. It describes how basis decomposition and energy weighting techniques can be used with multibin detectors to overcome limitations of conventional CT like beamhardening artifacts, non-standardized CT numbers, suboptimal contrast-to-noise ratio, and contrast cancellation. Basis decomposition involves representing the linear attenuation coefficient as a combination of energy dependent basis functions, while energy weighting optimizes contrast-to-noise by assigning different weights to different energy bins. These techniques enable spectral CT to perform quantitative imaging and low-dose scanning.
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
Slides for Introductory session on K Means Clustering.
simple and good. ppt
Could be used for taking classes for MCA students on Clustering Algorithms for Data mining.
Prepared By K.T.Thomas HOD of Computer Science, Santhigiri College Vazhithala
The document describes a proposed low power, high speed multiplier circuit designed using a technique called New Vedic VLSI. The multiplier uses a Vedic multiplication method to generate partial products faster. An addition section with a carry look ahead adder is used to sum the partial products, providing faster operation than a ripple carry adder. Simulation results showed the proposed design consumed 41.868 μw of power over 10ns, compared to 65.4 μw for a design using a ripple carry adder, for a 23.592 μw power reduction. The high speed, low power multiplier design is suitable for applications like digital signal processors that require efficient multiplication.
Power losses reduction of power transmission network using optimal location o...IJECEIAES
Due to the growth of demand for electric power, electric power loss reduction takes great attention for the power utility. In this paper, a low-level generation or distributed generation (DG) has been used for transmission power losses reduction. Karbala city transmission network (which is the case study) has been represented by using MATLAB m-file to study the load flow and the power loss for it. The paper proposed the particle swarm optimization (PSO) technique in order to find the optimal number and allocation of DG with the objective to decrease power losses as possible. The results show the effect of the optimal allocation of DG on power loss reduction.
The document discusses distributed linear classification on Apache Spark. It describes using Spark to train logistic regression and linear support vector machine models on large datasets. Spark improves on MapReduce by conducting communications in-memory and supporting fault tolerance. The paper proposes using a trust region Newton method to optimize the objective functions for logistic regression and linear SVM. Conjugate gradient is used to approximate the Hessian matrix and solve the Newton system without explicitly storing the large Hessian.
Memory Polynomial Based Adaptive Digital PredistorterIJERA Editor
Digital predistortion (DPD) is a baseband signal processing technique that corrects for impairments in RF
power amplifiers (PAs). These impairments cause out-of-band emissions or spectral regrowth and in-band
distortion, which correlate with an increased bit error rate (BER). Wideband signals with a high peak-to-average
ratio, are more susceptible to these unwanted effects. So to reduce these impairments, this paper proposes the
modeling of the digital predistortion for the power amplifier using GSA algorithm.
The document summarizes hierarchical clustering techniques. It discusses two main types of hierarchical clustering - agglomerative and divisive. It presents an example dendrogram to illustrate hierarchical clustering. It also summarizes a research paper on a new algorithm called CLUBS that performs faster and more accurate hierarchical clustering compared to existing algorithms. The document concludes by discussing experiments applying hierarchical clustering on two biomedical datasets containing gene expression data to group patients and cell samples.
Advanced Support Vector Machine for classification in Neural NetworkAshwani Jha
This paper proposes a method to classify EEG signals using a support vector machine (SVM) classifier combined with a quicksort sorting algorithm (SVM-Q). The method involves extracting phase locking value (PLV) features from EEG data, sorting the feature vectors using quicksort, and then applying SVM classification. When tested on two BCI competition datasets, SVM-Q achieved classification accuracies of 86% and 77%, outperforming plain SVM which achieved 62% and 54% respectively. The paper concludes that adding a sorting algorithm to SVM can improve its classification performance for brain-computer interface applications.
The document discusses graph-based clustering methods. It describes how graphs can be used to represent real-world networks from domains like biology, technology, social networks, and economics. It introduces the idea of using minimal spanning trees and hierarchical clustering to identify clusters in graph data. Two common algorithms for finding minimal spanning trees are described: Prim's algorithm and Kruskal's algorithm. Different strategies for iteratively deleting branches from the minimal spanning tree are also summarized to form clusters, such as deleting the branch with the maximum weight or inconsistent branches based on a reference value.
Availability of a Redundant System with Two Parallel Active Componentstheijes
This paper considers a redundant system which consists of two parallel active components. The time-to-failure
and the time-to-repair of the components follow an exponential and a general distribution, respectively. The
repairs of failed components are randomly interrupted. The time-to-interrupt is taken from an exponentially
distributed random variable and the interrupt times are generally distributed. We obtain the availability for the
system
The document discusses clustering techniques and provides details about the k-means clustering algorithm. It begins with an introduction to clustering and lists different clustering techniques. It then describes the k-means algorithm in detail, including how it works, the steps involved, and provides an example illustration. Finally, it discusses comments on the k-means algorithm, focusing on aspects like choosing the value of k, initializing cluster centroids, and different distance measurement methods.
The document summarizes research on thinning semi-elliptical and quarter-elliptical antenna arrays using genetic algorithm optimization to reduce side-lobe levels. Key points:
1) Genetic algorithms were used to optimize thinning of semi-elliptical and quarter-elliptical arrays with uniform excitation and spacing to produce narrow beams without degrading performance.
2) Simulation results showed thinning using genetic algorithms reduced side-lobe levels of the arrays compared to fully populated arrays.
3) Maximum side-lobe levels, beamwidths, and other metrics were tabulated for various array sizes and geometries, indicating thinning effectively lowered side-lobes.
This document summarizes work exploring the use of CUDA GPUs and Cell processors to accelerate a gravitational wave source-modelling application called the EMRI Teukolsky code. The code models gravitational waves generated by a small compact object orbiting a supermassive black hole. The authors implemented the code on a Cell processor and Nvidia GPU using CUDA. They were able to achieve over an order of magnitude speedup compared to a CPU implementation by leveraging the parallelism of these hardware accelerators.
The document discusses combined cycle power plants (CCPP) which use natural gas more efficiently than other power generation technologies by consuming one-third less natural gas per kW.h of electricity generated. CCPPs allow countries like France to reduce CO2 emissions while modernizing their electricity production. However, natural gas has disadvantages such as limited supply that must be considered along with the higher costs of transport and treatment compared to other fuels.
Combined Cycle Gas Turbine Power Plant Part 1Anurak Atthasit
Introduction to Combined Cycle Gas Turbine Power Plant. Describing the advantage and design limit of the CCGT. Overview of Brayton Cycle and Rankine Cycle - showing some basic thermodynamic to explain some background of CCGT.
1) The Group Steiner Problem involves finding a minimum-cost tree connecting at least one node from each of k disjoint groups in a graph. It has applications in VLSI circuit routing.
2) The authors propose a CUDA-aware MPI-based approach to solve large instances of the Group Steiner Problem using a heuristic that constructs 2-star trees in parallel.
3) Experimental results on a supercomputer show the parallel implementation achieves up to 302x speedup over sequential algorithms and finds accurate solutions to industry benchmark instances.
Elements Space and Amplitude Perturbation Using Genetic Algorithm for Antenna...CSCJournals
A simple and fast genetic algorithm (GA) developed to reduce the sidelobes in non-uniformly spaced linear antenna arrays. The proposed GA algorithm optimizes two vectors of variables to increase the Main lobe to Sidelobe power ratio (M/S) of array’s radiation pattern. The algorithm, in the first phase calculates the positions of the array elements and in the second phase, it manipulates the amplitude of excitation signals for each element. The simulations performed for 16 and 24 elements array structure. The results indicated that M/S improved in first phase from 13.2 to over 22.2dB meanwhile the half power beamwidth (HPBW) left almost unchanged. After element replacement, in the second phase, by using amplitude tapering further improvement up to 32dB was achieved. Also, the simulations shown that after element space perturbation, some antenna elements can be merged together without any performance degradation in radiation pattern in terms of gain and sidelobes level.
This paper analyses the optimal power system planning with DGs used as real and reactive power compensator. Recently planning of DG placement reactive power compensation are the major problems in distribution system. As the requirement in the power is more the DG placement becomes important. When planned to make the DG placement, cost analysis becomes as a major concern. And if the DGs operate as reactive power compensator it is most helpful in power quality maintenance. So, this paper deals with the optimal power system planning with renewable DGs which can be used as a reactive power compensators. The problem is formulated and solved using popular meta-heuristic techniques called cuckoo search algorithm (CSA) and particle swarm optimization (PSO). the comparative results are presented.
1. The document discusses various machine learning algorithms for classification and regression including logistic regression, neural networks, decision trees, and ensemble methods.
2. It explains key concepts like overfitting, regularization, kernel methods, and different types of neural network architectures like convolutional neural networks.
3. Decision trees are described as intuitive algorithms for classification and regression but are unstable and use greedy optimization. Techniques like pre-pruning and post-pruning are used to improve decision trees.
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
This document discusses principal component analysis (PCA) and matrix factorizations for learning. It provides an overview of PCA and singular value decomposition (SVD), their history and applications. PCA and SVD are widely used techniques for dimensionality reduction and data transformation. The document also discusses how PCA relates to other methods like spectral clustering and correspondence analysis.
This document discusses the application of photon counting multibin detectors to spectral CT. It describes how basis decomposition and energy weighting techniques can be used with multibin detectors to overcome limitations of conventional CT like beamhardening artifacts, non-standardized CT numbers, suboptimal contrast-to-noise ratio, and contrast cancellation. Basis decomposition involves representing the linear attenuation coefficient as a combination of energy dependent basis functions, while energy weighting optimizes contrast-to-noise by assigning different weights to different energy bins. These techniques enable spectral CT to perform quantitative imaging and low-dose scanning.
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
Slides for Introductory session on K Means Clustering.
simple and good. ppt
Could be used for taking classes for MCA students on Clustering Algorithms for Data mining.
Prepared By K.T.Thomas HOD of Computer Science, Santhigiri College Vazhithala
The document describes a proposed low power, high speed multiplier circuit designed using a technique called New Vedic VLSI. The multiplier uses a Vedic multiplication method to generate partial products faster. An addition section with a carry look ahead adder is used to sum the partial products, providing faster operation than a ripple carry adder. Simulation results showed the proposed design consumed 41.868 μw of power over 10ns, compared to 65.4 μw for a design using a ripple carry adder, for a 23.592 μw power reduction. The high speed, low power multiplier design is suitable for applications like digital signal processors that require efficient multiplication.
Power losses reduction of power transmission network using optimal location o...IJECEIAES
Due to the growth of demand for electric power, electric power loss reduction takes great attention for the power utility. In this paper, a low-level generation or distributed generation (DG) has been used for transmission power losses reduction. Karbala city transmission network (which is the case study) has been represented by using MATLAB m-file to study the load flow and the power loss for it. The paper proposed the particle swarm optimization (PSO) technique in order to find the optimal number and allocation of DG with the objective to decrease power losses as possible. The results show the effect of the optimal allocation of DG on power loss reduction.
The document discusses distributed linear classification on Apache Spark. It describes using Spark to train logistic regression and linear support vector machine models on large datasets. Spark improves on MapReduce by conducting communications in-memory and supporting fault tolerance. The paper proposes using a trust region Newton method to optimize the objective functions for logistic regression and linear SVM. Conjugate gradient is used to approximate the Hessian matrix and solve the Newton system without explicitly storing the large Hessian.
Memory Polynomial Based Adaptive Digital PredistorterIJERA Editor
Digital predistortion (DPD) is a baseband signal processing technique that corrects for impairments in RF
power amplifiers (PAs). These impairments cause out-of-band emissions or spectral regrowth and in-band
distortion, which correlate with an increased bit error rate (BER). Wideband signals with a high peak-to-average
ratio, are more susceptible to these unwanted effects. So to reduce these impairments, this paper proposes the
modeling of the digital predistortion for the power amplifier using GSA algorithm.
The document summarizes hierarchical clustering techniques. It discusses two main types of hierarchical clustering - agglomerative and divisive. It presents an example dendrogram to illustrate hierarchical clustering. It also summarizes a research paper on a new algorithm called CLUBS that performs faster and more accurate hierarchical clustering compared to existing algorithms. The document concludes by discussing experiments applying hierarchical clustering on two biomedical datasets containing gene expression data to group patients and cell samples.
Advanced Support Vector Machine for classification in Neural NetworkAshwani Jha
This paper proposes a method to classify EEG signals using a support vector machine (SVM) classifier combined with a quicksort sorting algorithm (SVM-Q). The method involves extracting phase locking value (PLV) features from EEG data, sorting the feature vectors using quicksort, and then applying SVM classification. When tested on two BCI competition datasets, SVM-Q achieved classification accuracies of 86% and 77%, outperforming plain SVM which achieved 62% and 54% respectively. The paper concludes that adding a sorting algorithm to SVM can improve its classification performance for brain-computer interface applications.
The document discusses graph-based clustering methods. It describes how graphs can be used to represent real-world networks from domains like biology, technology, social networks, and economics. It introduces the idea of using minimal spanning trees and hierarchical clustering to identify clusters in graph data. Two common algorithms for finding minimal spanning trees are described: Prim's algorithm and Kruskal's algorithm. Different strategies for iteratively deleting branches from the minimal spanning tree are also summarized to form clusters, such as deleting the branch with the maximum weight or inconsistent branches based on a reference value.
Availability of a Redundant System with Two Parallel Active Componentstheijes
This paper considers a redundant system which consists of two parallel active components. The time-to-failure
and the time-to-repair of the components follow an exponential and a general distribution, respectively. The
repairs of failed components are randomly interrupted. The time-to-interrupt is taken from an exponentially
distributed random variable and the interrupt times are generally distributed. We obtain the availability for the
system
The document discusses clustering techniques and provides details about the k-means clustering algorithm. It begins with an introduction to clustering and lists different clustering techniques. It then describes the k-means algorithm in detail, including how it works, the steps involved, and provides an example illustration. Finally, it discusses comments on the k-means algorithm, focusing on aspects like choosing the value of k, initializing cluster centroids, and different distance measurement methods.
The document summarizes research on thinning semi-elliptical and quarter-elliptical antenna arrays using genetic algorithm optimization to reduce side-lobe levels. Key points:
1) Genetic algorithms were used to optimize thinning of semi-elliptical and quarter-elliptical arrays with uniform excitation and spacing to produce narrow beams without degrading performance.
2) Simulation results showed thinning using genetic algorithms reduced side-lobe levels of the arrays compared to fully populated arrays.
3) Maximum side-lobe levels, beamwidths, and other metrics were tabulated for various array sizes and geometries, indicating thinning effectively lowered side-lobes.
This document summarizes work exploring the use of CUDA GPUs and Cell processors to accelerate a gravitational wave source-modelling application called the EMRI Teukolsky code. The code models gravitational waves generated by a small compact object orbiting a supermassive black hole. The authors implemented the code on a Cell processor and Nvidia GPU using CUDA. They were able to achieve over an order of magnitude speedup compared to a CPU implementation by leveraging the parallelism of these hardware accelerators.
The document discusses combined cycle power plants (CCPP) which use natural gas more efficiently than other power generation technologies by consuming one-third less natural gas per kW.h of electricity generated. CCPPs allow countries like France to reduce CO2 emissions while modernizing their electricity production. However, natural gas has disadvantages such as limited supply that must be considered along with the higher costs of transport and treatment compared to other fuels.
Combined Cycle Gas Turbine Power Plant Part 1Anurak Atthasit
Introduction to Combined Cycle Gas Turbine Power Plant. Describing the advantage and design limit of the CCGT. Overview of Brayton Cycle and Rankine Cycle - showing some basic thermodynamic to explain some background of CCGT.
This document discusses the working cycles of thermal power plants, specifically the Rankine, regenerative, and reheat cycles. It begins with an introduction to the Rayalaseema Thermal Power Plant located in Kadapa, India, including its layout, raw materials (coal, furnace oil, diesel, water), combustion process, and operational data from 1995-2013. It then provides details on the Rankine, regenerative, and reheat thermodynamic cycles used in thermal power generation, including diagrams, components, efficiencies, and factors that increase efficiency. The document contains extensive tables of thermodynamic properties and calculations for the plant components as well as graphs comparing exergy destruction and efficiencies.
Detailed Internship Report about RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Includes information about Thermodynamic Cycles, Combined Cycle, HRSG (Heat Recovery Steam Generator), and various components of a Combined Cycle Power Plant.
This document provides information on erecting a GE 9000E power plant, outlining the typical logical sequence of erection. It describes the erection of various components including the template and subsole plates, transporting and unloading heavy equipment, erecting the air inlet supports structure, air inlet ducts, air filter, vertical and lateral exhaust ducts, chimney, off base coolers, pipes and skids, walkways, electrical systems, and tank. Diagrams illustrate the erection process and components.
This document provides operating procedures for a heat recovery steam generator (HRSG) unit, including:
1) Pre-operating procedures such as cleaning, chemical cleaning, blowing steam lines, setting safety valves, and filling the unit before cold startup.
2) Cold startup procedures such as adjusting water levels, purging the unit, gradually warming it up and introducing supplementary firing.
3) Guidelines for normal shutdown including stopping supplementary firing and slowly reducing gas flow and pressure.
4) Emergency procedures for low or high water levels and tube failures.
5) A walkdown checklist to inspect for leaks or abnormal conditions.
The document provides an overview and course outline for a training on combined cycle power plants. It discusses the key components of a heat recovery steam generator (HRSG) system including the low pressure, intermediate pressure and high pressure systems. It explains the Brayton and Rankine cycles used in combined cycle plants and how they improve overall efficiency compared to simple cycle plants. Key parameters and operational considerations for the low pressure system are also reviewed.
The document describes the instrumentation and process control systems used at a pulp and paper mill. It discusses the various measurement devices such as level transmitters, pressure transmitters, and flow transmitters. It also describes the control systems that regulate processes like stock preparation, quality control, boiler operations, and turbine generation. The calibration and maintenance of instruments is conducted according to ISO standards to ensure accuracy of measurements.
Thermal plant instrumentation and controlShilpa Shukla
This document provides an overview of instrumentation and control systems used in a thermal power plant. It discusses the key components measured including pressure, temperature, flow, level, vibration and flue gas analysis. It describes the various sensors and instruments used to measure these variables, including bourdon tubes, diaphragms, bellows, thermocouples, RTDs, orifice plates, and analyzers. It also discusses the control and monitoring systems, laboratories, and pollution control systems used in thermal power plants.
Gas turbine power plants work by compressing air which is then mixed with fuel and ignited in a combustion chamber. This powers a turbine, which drives both a generator to produce electricity and the air compressor. Gas turbines have three main parts - an air compressor, combustion chamber, and turbine. They can use fuels like oil, natural gas, or pulverized coal and are used for power generation especially for peak loads or as backup. Advantages include easier fuel storage and handling as well as lower maintenance costs compared to steam plants.
The document discusses instrumentation and control systems used in thermal power plants. It describes the objectives of instrumentation and control which include safe and efficient plant operation. It provides an overview of the Distributed Digital Control and Management Information System (DDCMIS) and its components, including the burner management system, turbine control system, and generator instruments. It explains the various functions, measurements, controls, and benefits provided by the DDCMIS.
Design Principles of Excel Dashboards & ReportsWiley
Get yourself into a dashboard state of mine with these best practices for Excel dashboards and reports.
Content from Excel Dashboards & Reports For Dummies by Michael Alexander. Learn more: http://bit.ly/FDExcelDashboards
and
SalesForce.com For Dummies by Tom Wong, Liz Kao, Matt Kaufma. Learn more: http://bit.ly/ForDummiesSF
Forecasting day ahead power prices in germany using fixed size least squares ...Niklas Ignell
By using less than 0.5% of the original training dataset the aggregated out-of sample result, over the
period the model was tested, 7 days in May 2017, shows that the average difference between the actual
and the forecasted average daily hourly German EPEX power prices differed 0.4%. The presence of
outliers, heteroskedastic residuals and sparseness of prices at lower price levels in the training data set
can explain that two of the days in the test period differed by more than +/- 10%.
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
Branch-and-Bound algorithm is the basis for the majority of solving methods in mixed integer linear programming. It has been proving its efficiency in different fields. In fact, it creates little by little a tree of nodes by adopting two strategies. These strategies are variable selection strategy and node selection strategy. In our previous work, we experienced a methodology of learning branch-and-bound strategies using regression-based support vector machine twice. That methodology allowed firstly to exploit information from previous executions of Branch-and-Bound algorithm on other instances. Secondly, it created information channel between node selection strategy and variable branching strategy. And thirdly, it gave good results in term of running time comparing to standard Branch-and-Bound algorithm. In this work, we will focus on increasing SVM performance by using cross validation coupled with model selection.
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodIOSR Journals
This document proposes a modified lambda-iteration method for solving economic dispatch problems and minimizing fuel costs. It involves determining the optimal power output of each generator given constraints like load demand and transmission losses. The method is implemented in MATLAB and tested on a 6 generator system. Results found the total power was 1263.0074MW at an incremental cost of 13.2539$/MWh, close to those from a genetic algorithm solution. The proposed method provides a fast, easy to use approach for economic dispatch optimization problems.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
See our presentation from the 6th International EULAG Users Workshop. We talked about taking HPC to the "Industry 4.0" by implementing smart techniques to optimize the codes in terms of performance and energy consumption. It explains how Machine Learning can dynamically optimize HPC simulations and byteLAKE's software autotuning solution.
Find out more about byteLAKE at: www.byteLAKE.com
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Databricks
This document summarizes an approach for joint optimization of AutoML and transfer learning. It discusses challenges with using AutoML for transfer learning due to limitations on the search space from pretrained models and inability to reuse models across datasets. The proposed approach uses AutoML to search for neural network architectures and hyperparameters based on pretrained models. It then fine-tunes the selected models on target datasets, achieving better accuracy and stability than traditional fine-tuning or standalone AutoML. Experimental results on image classification tasks demonstrate the advantages of the joint optimization approach.
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATIONMln Phaneendra
In this ppt particle swarm optimization (PSO) is applied to allot the active power among the generating stations satisfying the system constraints and minimizing the cost of power generated.The viability of the method is analyzed for its accuracy and rate of convergence. The economic load dispatch problem is solved for three and six unit system using PSO and conventional method for both cases of neglecting and including transmission losses. The results of PSO method were compared with conventional method and were found to be superior.
This document provides an overview of various machine learning algorithms and concepts, including supervised learning techniques like linear regression, logistic regression, decision trees, random forests, and support vector machines. It also discusses unsupervised learning methods like principal component analysis and kernel-based PCA. Key aspects of linear regression, logistic regression, and random forests are summarized, such as cost functions, gradient descent, sigmoid functions, and bagging. Kernel methods are also introduced, explaining how the kernel trick can allow solving non-linear problems by mapping data to a higher-dimensional feature space.
Value Function Approximation via Low-Rank ModelsLyft
We propose a novel value function approximation technique for Markov decision processes that compactly represents the state-action value function using a low-rank and sparse matrix model. Under minimal assumptions, this decomposition is a Robust Principal Component Analysis problem that can be solved exactly via the Principal Component Pursuit convex optimization problem.
Support Vector Machine Optimal Kernel SelectionIRJET Journal
This document discusses selecting the optimal kernel for support vector machines (SVMs) based on different datasets. It provides background on SVMs and how their performance depends on the kernel function used. The document evaluates 4 kernel types (linear, polynomial, radial basis function (RBF), sigmoid) on 3 datasets: heart disease data, digit recognition data, and social network ads data. For each dataset and kernel combination, it reports accuracy, sensitivity, specificity, and kappa statistic metrics from implementing SVMs in R. The linear and RBF kernels generally performed best, with RBF working best for datasets with larger numbers of features like digit recognition data.
A parsimonious SVM model selection criterion for classification of real-world ...o_almasi
This paper proposes and optimizes a two-term cost function consisting of a sparseness term and a generalized v-fold cross-validation term by a new adaptive particle swarm optimization (APSO). APSO updates its parameters adaptively based on a dynamic feedback from the success rate of the each particle’s personal best. Since the proposed cost function is based on the choosing fewer numbers of support vectors, the complexity of SVM models decreased while the accuracy remains in an acceptable range. Therefore, the testing time decreases and makes SVM more applicable for practical applications in real data sets. A comparative study on data sets of UCI database is performed between the proposed cost function and conventional cost function to demonstrate the effectiveness of the proposed cost function.
This document summarizes a team's analysis of a flight delay prediction problem. The team analyzed a dataset with 29 features and 484,551 rows to understand missing values, duplicates, outliers, and categorical variables. They performed data visualization, feature engineering including encoding, scaling, and selection. Models tested include linear regression, Ridge regression, SVC, random forest, and neural network. Ridge regression and random forest performed best with 98-99% accuracy. Issues with linear regression like overfitting were addressed using regularization. SVC was unsuitable due to time and accuracy. Future work may include more data and relevant features.
Predicting rainfall using ensemble of ensemblesVarad Meru
The Paper was done in a group of three for the class project of CS 273: Introduction to Machine Learning at UC Irvine. The group members were Prolok Sundaresan, Varad Meru, and Prateek Jain.
Regression is an approach for modeling the relationship between data X and the dependent variable y. In this report, we present our experiments with multiple approaches, ranging from Ensemble of Learning to Deep Learning Networks on the weather modeling data to predict the rainfall. The competition was held on the online data science competition portal ‘Kaggle’. The results for weighted ensemble of learners gave us a top-10 ranking, with the testing root-mean-squared error being 0.5878.
MLPfit is a tool for designing and training multi-layer perceptrons (MLPs) for tasks like function approximation and classification. It implements stochastic minimization as well as more powerful methods like conjugate gradients and BFGS. MLPfit is designed to be simple, precise, fast and easy to use for both standalone and integrated applications. Documentation and source code are available online.
- The document discusses using a modified genetic algorithm to optimize parameters for automatic generation control of multi-area electric energy systems.
- It proposes a modified genetic algorithm that employs one-point crossover with modification. The crossover site helps maintain diversity of search points while exploiting known optimal values, balancing exploration and exploitation.
- The algorithm is used along with decomposition techniques and trapezoidal integration to determine optimal control parameters for multi-area systems and obtain better dynamic performance under small load perturbations, with and without considering nonlinearity from governor deadbands.
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
This paper explores using block coordinate descent to scale kernel learning methods to large datasets. It compares exact kernel methods to two approximation techniques, Nystrom and random Fourier features, on speech, text, and image datasets. Experimental results show that Nystrom generally achieves better accuracy than random features but requires more iterations. The paper also analyzes the performance and scalability of computing kernel blocks in a distributed setting.
The iterated IRS method is an iterative model reduction technique that aims to more accurately approximate the modal properties of a full finite element model using a reduced set of degrees of freedom. This paper proves that the iterated IRS method converges, meaning that after repeated iterations the transformation matrix converges to a fixed value. The converged transformation matrix is shown to be equivalent to the transformation obtained from the System Equivalent Reduction Expansion Process (SEREP). The speed of convergence depends on factors like the choice of master degrees of freedom, with lower modes converging more quickly than higher modes.
This document discusses hardware architectures for deep learning and exploiting sparsity. It covers three key topics:
1) Exploiting sparsity in deep neural networks to reduce operations and storage requirements, such as sparsity from ReLU activations and pruning weights. This can reduce data movement and the number of operations.
2) Methods for exploiting sparsity, including compressing sparse data, skipping zero activations, pruning small activations, and exploiting spatial/temporal correlations. Pruning techniques like magnitude-based and sensitivity-based pruning are also covered.
3) The benefits of sparsity for different hardware architectures, such as graph neural networks where the graph representation is often sparse, allowing for reductions in data movement and storage.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
2. + Data Description
Data Summary
Method Description
Method Comparison
Conclusion
Reference
3. +
Data Description
In electric power generation a combined cycle is an assembly of
heat engines that work in series from the same source of heat.
The principle is that after completing its cycle in the first engine,
the working fluid of the first heat engine is still low enough in its
entropy that a second subsequent heat engine may extract energy
from the waste heat (energy) of the working fluid of the first engine.
The electrical energy output of a power plant is influenced by four
main parameters. The goal is to find a model for testing the
influence.
Turbine
Electric generator
4. +
Data Description
The initial dataset was donated by a confidential power
plant.
The full dataset is available on UCI’s website:
http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+P
ower+Plant
The dataset recorded 6 years (2006-2011) of electrical
power output when the plant was set to work with full
load.
9568 observations, 5 variables.
Observations are independent from each other.
Data has been cleaned (e.g.: no missing values) before
uploaded to the UCI.
5. +
Data Description
Independent variables:
Temperature (T)1.81°C - 37.11°C
Ambient Pressure (AP) 992.89 -1033.30 milibar
Relative Humidity (RH) 25.56% - 100.16%
Exhaust Vacuum (V) 25.36-81.56 cm Hg
Dependent variable:
Net hourly electrical energy output (PE) 420.26-495.76 MW
8. +
Method Description
Step1: Assess the training data by several untrained models
Multiple linear regression
Backward selection
Ridge regression
Elasticnet
Lasso
SVM with linear kernel
Pruned tree
MARS
Boosted tree
Bagging tree
Step2: Fit the test data set with the obtained models and evaluate the model in terms
of the RMSE and MAD values.
Step3: Train models with resampling methods on the LSU HPC, and evaluate each
model by RMSE and MAD
𝑀𝐴𝐷 =
1
𝑁
×
𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖RMSE = 𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖
2/𝑁
11. +
Ridge regression and Lasso
Consider fire area as binary logical response
λ=1 is used for the RMSE and MAD calculation
ridge=glmnet (x.train,y.train,alpha =0)
lasso=glmnet (x.train,y.train,alpha =1)
12. +
Elasticnet
Consider fire area as binary logical response
λ=1 is used for the Elasticnet calculation
elastic=glmnet (x.train,y.train,alpha =0.5)
13. +
SVM with linear kernel
Consider fire area as binary logical response
ksvm <- ksvm(PE ~ AT+V+AP+RH, data = data.train,kernel="vanilladot",C=1)
14. +
Single pruned tree
Consider fire area as binary logical response
rpart <- rpart(PE ~ AT+V+AP+RH, data = data.train,control =
rpart.control(xval = 10, minbucket = 100,cp = 0.01))
15. +
MARS
Consider fire area as binary logical response
mars <- earth(PE ~ AT+V+AP+RH, data = data.train,degree=1)
16. +
Boosted tree
Consider fire area as binary logical response
boost<- gbm(PE ~ AT+V+AP+RH, data =
data.train,distribution="gaussian",n.trees =1000,
interaction.depth=4,shrinkage =0.01)
17. +
Bagging tree
Consider fire area as binary logical response
bag <- randomForest(PE ~ AT+V+AP+RH, data = data.train, mtry=4,
importance =TRUE)
18. +
The Predictive Results in terms of the
MAD and RMSE values (untrained)
Model Package RMSE MAD
MLR 4.583379 3.622121
Backward leaps 4.583379 3.622121
Ridge glmnet 4.92302 3.928416
Lasso glmnet 5.039077 4.015422
Elesticnet glmnet 4.848145 3.866809
SVM-linear kernel kernlab 4.588058 3.604371
Pruned tree rpart 5.422748 4.241274
MARS earth 4.282067 3.330725
Boost tree gbm 3.978378 3.026208
Bagging tree randomForest 3.604678 2.615768
𝑀𝐴𝐷 =
1
𝑁
×
𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖RMSE = 𝑖=1
𝑁
𝑦𝑖 − 𝑦𝑖
2/𝑁
19. +
Train models with resampling
methods
Train method: The train function in the caret package
Can train all models used in this project with resampling methods
Easy to manipulate, well documented.
Will automatically parallelize when multiple cpu cores are
registered.
20. +
Train models with resampling
methods
Model Resampling method Tuning parameter
MLR bootstrapping N/A
Backward Selection cross-validation #Randomly Selected
Predictors
Ridge cross-validation λ
Lasso cross-validation λ
Elesticnet cross-validation α and λ
SVM-linear kernel cross-validation cost
Pruned tree bootstrapping cp
MARS bootstrapping #prune and degree
Boost tree repeat cross-
validation
#.trees, shrinkage
interaction.depth,
Bagging (RF) cross-validation #Randomly Selected
Predictors
21. +
Parallel computing in R
Motivation: Save computation time.
A for loop can be very slow if there are a large number of
computations that need to be carried out.
Almost all computers now have multicore processors.
As long as these computations do not need to communicate
(resampling methods are excellent examples), they can be
spread across multiple cores and executed in parallel.
The parallel package
22. +
Running R on LONI and LSU HPC
clusters
LONI QueenBee-2 landed 46th on TOP500 in the world (Nov. 2014)
Training model: MLR
Resampling: Bootstrapped (10000 reps)
23. +
Training backward selection
The 10 CV training still didn’t remove any independent variables
from the model. So it will give the same RMSE and MAD as MLR.
24. +
Training ridge, elasticnet and Lasso
The final λ for ridge is 1.417
The final λ for lasso is 0.0497
The final α is 0.5 and λ is 0.0497 for elasticnet
26. +
Training single tree
Consider fire area as binary logical response
treetrain <- train(PE ~ ., data = data.train,method = "rpart",trControl =
trainControl(method = "boot",number = 1000),tuneLength=10)
27. +
Training MARS
Consider fire area as binary logical response
marsGrid <- expand.grid(degree = c(1,2), nprune = (1:10) * 2)
earthtrain <- train(PE ~ ., data = data.train,method = "earth",tuneGrid =
marsGrid,maximize = FALSE,trControl = trainControl(method = "boot",number =
1000))
The # of prune is 14
The degree is 2
28. +
Training boosting tree
Consider fire area as binary logical response
fitControl <- trainControl(method = "repeatedcv",number = 10,repeats = 10)
gbmGrid <- expand.grid(interaction.depth = c(1, 4, 7),n.trees =
(1:30)*50,shrinkage = c(0.001,0.01,0.1))
boosttrain <- train(PE ~ ., data = data.train,method = "gbm",trControl = fitControl,
tuneGrid = gbmGrid)
The final # of trees is 1500
The final interaction depth is 7
The shrinkage is 0.1
29. +
Training bagging trees
Consider fire area as binary logical response
Convert Bagging trees to RF
The optimized model retained two predict variables
33. +
t-test to evaluate the null hypothesis that there is
no difference between models
Consider fire area as binary logical response
The bagging(rf) model is significantly different from other two
models.
34. +
Summary
Ten models have been used for testing the influence of the
independent variables.
The training process in caret package improves the
performance of seven models.
Parallel computation on the HPC can speed up the resampling
calculation significantly.
The RMSE and MAD values indicate that, after the training, the
bagging(RF) and boosting trees tend to produce the best
predictions.
35. +
Future work
The mechanism of the training in the caret package should be
explored. E.g. there is no tuning parameter available when
training MLR model, so which part has been bootstrapped?
36. +
Reference
Pınar Tüfekci, Prediction of full load electrical power output of a
base load operated combined cycle power plant using machine
learning methods, International Journal of Electrical Power &
Energy Systems, Volume 60, September 2014, Pages 126-140,
Heysem Kaya, Pınar Tüfekci , Sadık Fikret Gürgen: Local and
Global Learning Methods for Predicting Power of a Combined
Gas & Steam Turbine, Proceedings of the International
Conference on Emerging Trends in Computer and Electronics
Engineering ICETCEE 2012, pp. 13-18
Editor's Notes
RMSE is more sensitive to outliers than the MAD metric.
RMSE is more sensitive to outliers than the MAD metric.
RMSE is more sensitive to outliers than the MAD metric.