Designed and implemented three variants of evolutionary algorithms using pthreads for hyperparameter optimization of
Deep Neural Networks that give upto 9x speedups on 16 cores and scale very well with increasing number of threads,
hyperparameter space, search time and accuracy compared to standard baseline algorithms in OpenMP
Introduction to Machine Learning for newcomers. It will show you some basic concepts like what is supervised learning, unsupervised learning, classification, regression, under/overfitting, clustering, anomaly detection, and how to have some measures. It will illustrates examples through scikit-learn and tensorflow code
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
Using Optimal Learning to Tune Deep Learning PipelinesScott Clark
SigOpt talk from NVIDIA GTC 2017 and AWS Popup Loft AI Day
We'll introduce Bayesian optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters, including hyperparameters, the architecture, and feature transformations, that can have a large impact on the efficacy of the model. We'll provide several example applications using multiple open source deep learning frameworks and open datasets. We'll compare the results of Bayesian optimization to standard techniques like grid search, random search, and expert tuning. Additionally, we'll present a robust benchmark suite for comparing these methods in general.
Introduction to Machine Learning for newcomers. It will show you some basic concepts like what is supervised learning, unsupervised learning, classification, regression, under/overfitting, clustering, anomaly detection, and how to have some measures. It will illustrates examples through scikit-learn and tensorflow code
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
Using Optimal Learning to Tune Deep Learning PipelinesScott Clark
SigOpt talk from NVIDIA GTC 2017 and AWS Popup Loft AI Day
We'll introduce Bayesian optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters, including hyperparameters, the architecture, and feature transformations, that can have a large impact on the efficacy of the model. We'll provide several example applications using multiple open source deep learning frameworks and open datasets. We'll compare the results of Bayesian optimization to standard techniques like grid search, random search, and expert tuning. Additionally, we'll present a robust benchmark suite for comparing these methods in general.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Pareto depth for multiple-query image retrievaljpstudcorner
To get this project in ONLINE or through TRAINING Sessions,
Contact:JP INFOTECH, Old No.31, New No.86, 1st Floor, 1st Avenue, Ashok Pillar, Chennai -83. Landmark: Next to Kotak Mahendra Bank. Pondicherry Office: JP INFOTECH, #45, Kamaraj Salai, Thattanchavady, Puducherry -9. Landmark: Next to VVP Nagar Arch. Mobile: (0) 9952649690 , Email: jpinfotechprojects@gmail.com, web: www.jpinfotech.org Blog: www.jpinfotech.blogspot.com
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
Clonal Selection Algorithm Parallelization with MPJExpressAyi Purbasari
This paper exploits the parallelism potential on a Clonal Selection Algorithm (CSA) as a parallel metaheuristic algorithm, due the lack of explanation detail of the stages of designing parallel algorithms. To parallelise population-based algorithms, we need to exploit and define their granularity for each stage; do data or functional partition; and choose the communication model. Using a library for a message-passing model, such as MPJExpress, we define appropriate methods to implement process communication. This research results pseudo-code for the two communication message-passing models, using MPJExpress. We implemented this pseudo-codes using Java Language with a dataset from the Travelling Salesman Problem (TSP). The experiments showed that multicommunication model using alltogether method gained better performance that master-slave model that using send-and receive method.
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Cognitive/AI systems process knowledge that is far too complex for current databases. They require an expressive data model and an intelligent query language to perform knowledge engineering over complex datasets. In this Meetup event, we will introduce GRAKN.AI, a distributed hyper-relational database for knowledge engineering, to Amsterdam's engineering community.
Grakn provides the knowledge base foundation for intelligent systems to manage complex data. We will also introduce Graql: Grakn's reasoning (through OLTP) and analytics (through OLAP) query language. Graql provides the tools required to do knowledge engineering: an expressive schema for knowledge modelling, reasoning transactions for real-time inference, distributed algorithms for large-scale analytics, and optimisation of query execution. And finally, we will discuss how Graql’s language serves as unified data representation of data for cognitive systems.
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AIChristopherTHyatt
Unleash the full potential of your AI models through hyperparameter tuning. Explore automated techniques, such as grid search and Bayesian optimization, to optimize configurations and elevate model performance.
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...IAESIJAI
Hyperparameter tuning plays a significant role when building a machine learning or a deep learning model. The tuning process aims to find the optimal hyperparameter setting for a model or algorithm from a pre-defined search space of the hyperparameters configurations. Several tuning algorithms have been proposed in recent years and there is scope for improvement in achieving a better exploration-exploitation tradeoff of the search space. In this paper, we present a novel hyperparameter tuning algorithm named adaptive Bayesian contextual hyperband (Adaptive BCHB) that incorporates a new sampling approach to identify best regions of the search space and exploit those configurations that produce minimum validation loss by dynamically updating the threshold in every iteration. The proposed algorithm is assessed using benchmark models and datasets on traditional machine learning tasks. The proposed Adaptive BCHB algorithm shows a significant improvement in terms of accuracy and computational time for different types of hyperparameters when compared with state-of-the-art tuning algorithms.
To demonstrate our approaches we will use Sudoku puzzles, which are an excellent test bed for
evolutionary algorithms. The puzzles are accessible enough for people to enjoy. However the more complex
puzzles require thousands of iterations before an evolutionary algorithm finds a solution. If we were
attempting to compare evolutionary algorithms we could count their iterations to solution as an indicator
of relative efficiency. Evolutionary algorithms however include a process of random mutation for solution
candidates. We will show that by improving the random mutation behaviours we were able to solve
problems with minimal evolutionary optimisation. Experiments demonstrated the random mutation was at
times more effective at solving the harder problems than the evolutionary algorithms. This implies that the
quality of random mutation may have a significant impact on the performance of evolutionary algorithms
with Sudoku puzzles. Additionally this random mutation may hold promise for reuse in hybrid evolutionary
algorithm behaviours.
Mining Frequent Item set Using Genetic Algorithmijsrd.com
By applying rule mining algorithms, frequent itemsets are generated from large data sets e.g. Apriori algorithm. It takes so much computer time to compute all frequent itemsets. We can solve this problem much efficiently by using Genetic Algorithm(GA). GA performs global search and the time complexity is less compared to other algorithms. Genetic Algorithms (GAs) are adaptive heuristic search & optimization method for solving both constrained and unconstrained problems based on the evolutionary ideas of natural selection and genetic. The main aim of this work is to find all the frequent itemsets from given data sets using genetic algorithm & compare the results generated by GA with other algorithms. Population size, number of generation, crossover probability, and mutation probability are the parameters of GA which affect the quality of result and time of calculation.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Pareto depth for multiple-query image retrievaljpstudcorner
To get this project in ONLINE or through TRAINING Sessions,
Contact:JP INFOTECH, Old No.31, New No.86, 1st Floor, 1st Avenue, Ashok Pillar, Chennai -83. Landmark: Next to Kotak Mahendra Bank. Pondicherry Office: JP INFOTECH, #45, Kamaraj Salai, Thattanchavady, Puducherry -9. Landmark: Next to VVP Nagar Arch. Mobile: (0) 9952649690 , Email: jpinfotechprojects@gmail.com, web: www.jpinfotech.org Blog: www.jpinfotech.blogspot.com
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
Clonal Selection Algorithm Parallelization with MPJExpressAyi Purbasari
This paper exploits the parallelism potential on a Clonal Selection Algorithm (CSA) as a parallel metaheuristic algorithm, due the lack of explanation detail of the stages of designing parallel algorithms. To parallelise population-based algorithms, we need to exploit and define their granularity for each stage; do data or functional partition; and choose the communication model. Using a library for a message-passing model, such as MPJExpress, we define appropriate methods to implement process communication. This research results pseudo-code for the two communication message-passing models, using MPJExpress. We implemented this pseudo-codes using Java Language with a dataset from the Travelling Salesman Problem (TSP). The experiments showed that multicommunication model using alltogether method gained better performance that master-slave model that using send-and receive method.
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Cognitive/AI systems process knowledge that is far too complex for current databases. They require an expressive data model and an intelligent query language to perform knowledge engineering over complex datasets. In this Meetup event, we will introduce GRAKN.AI, a distributed hyper-relational database for knowledge engineering, to Amsterdam's engineering community.
Grakn provides the knowledge base foundation for intelligent systems to manage complex data. We will also introduce Graql: Grakn's reasoning (through OLTP) and analytics (through OLAP) query language. Graql provides the tools required to do knowledge engineering: an expressive schema for knowledge modelling, reasoning transactions for real-time inference, distributed algorithms for large-scale analytics, and optimisation of query execution. And finally, we will discuss how Graql’s language serves as unified data representation of data for cognitive systems.
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AIChristopherTHyatt
Unleash the full potential of your AI models through hyperparameter tuning. Explore automated techniques, such as grid search and Bayesian optimization, to optimize configurations and elevate model performance.
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...IAESIJAI
Hyperparameter tuning plays a significant role when building a machine learning or a deep learning model. The tuning process aims to find the optimal hyperparameter setting for a model or algorithm from a pre-defined search space of the hyperparameters configurations. Several tuning algorithms have been proposed in recent years and there is scope for improvement in achieving a better exploration-exploitation tradeoff of the search space. In this paper, we present a novel hyperparameter tuning algorithm named adaptive Bayesian contextual hyperband (Adaptive BCHB) that incorporates a new sampling approach to identify best regions of the search space and exploit those configurations that produce minimum validation loss by dynamically updating the threshold in every iteration. The proposed algorithm is assessed using benchmark models and datasets on traditional machine learning tasks. The proposed Adaptive BCHB algorithm shows a significant improvement in terms of accuracy and computational time for different types of hyperparameters when compared with state-of-the-art tuning algorithms.
To demonstrate our approaches we will use Sudoku puzzles, which are an excellent test bed for
evolutionary algorithms. The puzzles are accessible enough for people to enjoy. However the more complex
puzzles require thousands of iterations before an evolutionary algorithm finds a solution. If we were
attempting to compare evolutionary algorithms we could count their iterations to solution as an indicator
of relative efficiency. Evolutionary algorithms however include a process of random mutation for solution
candidates. We will show that by improving the random mutation behaviours we were able to solve
problems with minimal evolutionary optimisation. Experiments demonstrated the random mutation was at
times more effective at solving the harder problems than the evolutionary algorithms. This implies that the
quality of random mutation may have a significant impact on the performance of evolutionary algorithms
with Sudoku puzzles. Additionally this random mutation may hold promise for reuse in hybrid evolutionary
algorithm behaviours.
Mining Frequent Item set Using Genetic Algorithmijsrd.com
By applying rule mining algorithms, frequent itemsets are generated from large data sets e.g. Apriori algorithm. It takes so much computer time to compute all frequent itemsets. We can solve this problem much efficiently by using Genetic Algorithm(GA). GA performs global search and the time complexity is less compared to other algorithms. Genetic Algorithms (GAs) are adaptive heuristic search & optimization method for solving both constrained and unconstrained problems based on the evolutionary ideas of natural selection and genetic. The main aim of this work is to find all the frequent itemsets from given data sets using genetic algorithm & compare the results generated by GA with other algorithms. Population size, number of generation, crossover probability, and mutation probability are the parameters of GA which affect the quality of result and time of calculation.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
Nature Inspired Models And The Semantic WebStefan Ceriu
In this paper we present a series of nature inspired models used as alternative solutions for Semantic Web concerns. Some of the methods presented in this article perform better than classic algorithms by enhancing response time and computational costs. Others are just proof of concept, first steps towards new techniques that will improve their respective field. The intricate nature of the Semantic Web urges the need for faster, more intelligent algorithms and nature inspired models have been proven to be more than suitable for such complex tasks.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
An Auction Portal where people can buy (immediately or through auction), sell and get updates about their product status. Preventive measures for auction sniping and Real time synchronization during auction and notifications for users are provided. Web2py framework and mysql database is used.
This application takes care of IIT JEE admission process after the results of JEE is declared. This is done by a group of 20 people. My part is the backend coding of the admission process. The entire logic is given at the beginning of this report in a flow chart. This application was designed using java with mysql database using JDBC connector and Net Beans served as the primary IDE in this project.
A 4-bit CPU is implemented using TTL components and was based on micro-programmed control. The system implements 12 basic arithmetic, logic and control instructions with a 4 bit data bus and an 8 bit address bus. This project was done during 2nd year at IIT Guwahati
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Meta Machine Learning: Hyperparameter Optimization
1. 15-618 Parallel Computer Architecture and Programming
Meta Machine Learning: Hyperparameter optimization
https://pbollimp.github.io/Meta-Machine-Learning/
Priyatham Bollimpalli (pbollimp)
Mohit Deep Singh (mohitdes)
Abstract
In this project, we implemented various hyperparameter optimization algorithms primarily for
Deep Neural Networks (Multi-Layer Perceptrons). We used OpenMP as the baseline for grid
search, random search and a simple evolutionary algorithm. We then proceeded by
implementing our own three variants of evolutionary algorithms using pthreads that not only
give upto 9x speedups on 16 cores and scale very well with increasing number of threads, but
also explore a much larger hyperparameter space and finds the hyperparameters that give good
accuracy much faster than the baselines. Even though our experiments are specific to MLPs, we
are confident that these methods are very applicable to hyperparameter optimization for any
machine learning problem. We also performed an extensive breakdown of the execution times
for our algorithms and reasoned about the performance of individual algorithms.
Background
The task of Hyperparameter search
In the recent past, deep learning has been incredibly successful in various tasks such as image
classification and tracking, machine translation, speech recognition etc. One of the biggest
challenges faced by all machine learning researchers today (especially in deep learning) is
selecting hyperparameters for models. A hyperparameter is a parameter whose value is set
before the learning process starts. In contrast, model parameters are parameters that are
optimized as part of the learning process.
Hyperparameters usually have a huge impact on the amount of time it takes to train a model,
and how well the model actually does. Usually, hyperparameters end up being critical in model
performance, and that makes it critical for the researchers to find the most optimal values for it.
2. In practice today, most researchers manually tweak the hyperparameters and train the networks
to see which hyperparameters give the best results. This process is extremely tedious and time
consuming, and the process of automating this is called hyperparameter optimization. The few
main approaches that exist as of today include:
1. Grid Search: This basically involves exhaustively searching through the search space of
the hyperparameters, evaluating how each set of hyperparameters performs in terms of
training, and choosing the best combination. This approach is extremely heavyweight,
involves doing a brute force search over the space of hyperparameters. The advantage of
this is that it can be parallelized well, and given enough resources (and good intuition),
one can usually end up with the best set of hyperparameters.
2. Random Search: This approach involves randomly sampling hyperparameter sets
through the search space, and searching till it gets the desired accuracy. This method is
very successful in practice, since it doesn’t exhaustively search the entire space, and still
gets good results in practice.
3. Bayesian Optimizations: This approach usually involves using the past instances of the
hyperparameters already tested to learn, and sample better sets of hyperparameters to
evaluate. Evolutionary search and gaussian processes are the most common techniques
used as bayesian optimization techniques. The biggest challenge is that since there is an
optimization step, this makes these inherently sequential, and harder to parallelize.
3. Evolutionary Search Algorithm
In the artificial intelligence literature, an evolutionary search algorithm is a generic population
based meta-heuristic based optimization algorithm. An evolutionary strategy algorithm uses
methods that are inspired by evolution, such a mutation, reproduction, reconstruction and
selection. The high level idea is that given a population of individuals, we have a fitness function
to rank which of the individuals are the best performers. Using this function, we rank the
individuals, kill a certain fraction of the weakest part of the population, and have the stronger
part of the population reproduce and mutate to arrive at better solutions.
In our case, the individual hyperparameters sets are the candidates in a population. The
accuracy of a neural network trained with those hyperparameters is our fitness function. The
algorithm inputs are the population size, some initial random population of candidates and the
mutations and crossovers mechanism. The output is the best candidate with the highest
accuracy (fitness). The mutations can heavily vary based on the application. In our particular
case,we randomly chose 3 candidates from the pool of the strongest candidates, took the
average of their hyperparameters, and added some noise to it to encourage the algorithm to
explore more.
Initial Analysis: Workload and Data Structures
The main components of our program are the neural network (training and inference), the
dataset which the neural network is being trained on and the set of the best hyperparameters,
or the hyperparameters that we are searching for. In the described algorithms, the main
computationally expensive part is training and inferring multiple neural networks to get the
accuracies of different sets of parameters. So parallelizing this would give us huge benefits, as
we can test the fitness of multiple candidates in parallel.
There are a few dependencies in this program. Firstly, for evolutionary search algorithms, we
want to very carefully search through the hyperparameter search, so as to make sure we are
regularly using the past results to carefully restrict our search space in the future to get better
results. This requires us to frequently synchronize all parallel threads, evaluate the current
population, rank it and mutate it. Furthermore, depending on what parameters we are
searching, and what the values are, we could end up having very skewed workloads, where
some hyperparameters based on their value might take a much longer time to train and
evaluate as opposed to other. Therefore, these algorithms could greatly benefit from data
parallelism in the first few approaches, where we could analyze fitness of the population in
parallel, synchronize and then parallelize the mutations as well.
The key data structures ended up being the neural networks, which we could not end up sharing
(since we need to train each network individually through backpropagation to evaluate how well
the hyperparameters would perform). The other key structure is the array of all candidates of a
population or multiple populations. The main operations on this array were updating the correct
accuracies, ranking them with respect to each other, and killing the weakest candidates and
mutating and reproducing the strongest ones to produce new candidates.
4. Main Challenge
The task of hyperparameter search is a very tedious one, especially given we need to train
multiple neural networks from scratch with different hyperparameter configurations. As the
networks become more complex, and the number of hyperparameters increase, this task
becomes more and more challenging - the search space increases exponentially, and more
applicable methods such as bayesian optimization techniques are difficult to parallelize.
The main challenge we faced while trying to parallelize these algorithms was the ability to speed
up not just the algorithms, but also get the correct tradeoff between the accuracy and how fast
and effectively we can search the hyperparameter space. There is an inherently sequential part
of any bayesian optimization, where based on the past results, we need to come up with better
estimates of the hyperparameters. We tried parallelizing the algorithms in phases first. Then, we
aimed to optimize that sequential part as well, and traded-off the quality of the bayesian
optimization for a faster albeit slightly less accurate search.
Approach
In this section, we go into our approaches and techniques we used to parallelize the algorithms
Grid Search (GS)
The high level algorithm is as follows:
For grid search, we loop through all possible configurations of hyperparameters. In our case, the
fitness function is training a neural network, and getting the accuracy of the network on the
5. validation set. Parallelizing grid search was mostly straightforward, other than lines 5 and 6 in
the algorithm. As a result, we just created an array that kept track of all accuracies for all
configurations, and parallelized the calculation of the fitness for each possible configuration. In
the end, we get the maximum accuracy, and select that configuration. We realized this was not
very scalable with the size of the hyperparameters, so we broke up the entire parameter space
into chunks, parallely calculate the fitness of those chunks, and incrementally keep track of the
most accurate configuration.
Random Search (GS)
The high level algorithm is as follows:
As can be seen in the algorithm, random search randomly samples hyperparameters from their
search space, and tests them for their fitness. It keeps track on the best parameter configuration
at any given point, and if the fitness is over a threshold, it stops searching the parameter space.
Parallelizing this was not absolutely straightforward, but we followed the strategy of creating
batches of threads which randomly sample from the hyperparameter space and report its
accuracy to an array in the global memory. The master thread, every few chunks, gets the best
accuracy from this array, and if it is over the threshold kills the program. Otherwise it keeps
spawning off new blocks of threads to evaluate other random configurations. This is always in
practice much faster than grid search, and parallelized version of it usually helps explore the
space in a more efficient manner and get the desired accuracy pretty quickly.
6. Evolutionary Search - Basic Algorithm
As can be seen in the algorithm, the basic evolutionary search works in a way where it tries to
leverage the past evaluations of the fitness functions of the explored parameters to try and
come up with a better and more effective form of optimizing the search space.
The idea is the following. We start out with a population of candidate hyperparameters, and
evaluate all of them. After evaluating them, we rank all the candidates in the population
according to their fitness. We then kill/discard the last k candidates (least fit). Then we go ahead
and generate new children using the top/best performing hyperparameter. The way we mutated
our top performing parameters were: we randomly sample 3 hyperparameters from the top
performing ones, and took the average. We then add some uniform noise to these averages to
encourage our algorithm to explore.
Parallel Evolutionary Search 1 (EV1)
The first approach we took was to try and parallelize the basic evolutionary search algorithm. We
first decided to parallelize the evaluation of the fitness functions of the individual candidates. We
ended up parallelizing this using OpenMP, and creating threads to divide this work up. We
noticed that with certain hyperparameters like the number of hidden units, the workload
imbalance ended up being very high, and openMP static scheduling did not give us good
7. speedups. So we decided to use dynamic scheduling, and despite the existence of sequential
code (which ranked the population and mutated it), we got decent speedups (discussed in the
results section).
Evolutionary Search with Islands - Our adaptation of the basic algorithm
The earlier algorithm usually would do well with hyperparameter spaces as long as the
dimensionality is not too high. But this algorithm is not ideal when the dimensionality of the
number of hyperparameters we search for is high. Because of random sampling, if it does not
start with good enough points, a basic evolutionary search would get stuck at local optimums. It
helps if we create multiple local evolutionary search algorithm with specific start points, and
then have them communicate with each other what their best parameters are after a few
generations of local optimization. We call this variant the “island” approach, where you have
island of local populations, which intermingle every few iterations.
The basic idea was, we kept n different “islands”, each of which had its own population of
candidate hyperparameters. Each of these islands separately carried out the evolutionary search
on its candidate population for a few generations. This means each island would explore their
candidates, rank them within their population and kill and repopulate the weakest members.
After a few iterations, we took the top-k candidates from each of these islands, merged them
together and applied a global evolution to it. This means, the merged candidates were now
sorted, the weakest in this global evolutionary search were killed and repopulated based on the
mutations explained in evolutionary search algorithm section. After this, we would reseed the
local population with these globally mutated seed, and some random seeds, so that it can
explore the high dimensional space in a better way.
We explained how we parallelized these algorithms in the next few sections.
8. Parallel Evolutionary Search 2 (EV2)
We wanted to parallelize the island approach, to be able to scale better and search dense
hyperparameter spaces more efficiently. We decided to use the fork join model with shared
memory to implement this parallel approach.
The initial idea was that the global top candidates would be stored in shared memory. The
master then spawns off the requisite number of threads, and every thread essentially ends up
acting as an island. Every island then reads its chunk of “seed” candidates from the global top
candidates (which is shuffled after mutation by the master). Then each of these threads run a
certain number of iterations of the evolutionary search algorithm locally. After finding its top
candidates having run a few generations, the islands report their top-k candidates to the global
array. The master waits for all these threads to complete executing their local “generations”,
using join. Once every thread finishes, master then sorts these reported local winners in the
global top array, and applies a “global evolution” on this array. This is followed by the master
shuffling this global array, and then spawning off local threads again to do more iterations of
these local evolutionary searches.
9. The approach we took here was to parallelize the islands, rather than parallelizing every single
candidate, so that it is easier for us to do a global intermingling of the best candidates.
Parallel Evolutionary Search 3 (EV3)
Independent to EV2, we also tried implementing a threading model, where instead of using join
every time, we used a pool of threads, and shared variables and mutexes to synchronize
between the threads. The idea was to compare the performance of the two and see which one
would do better.
We used a similar model in terms of splitting up the islands between the threads. We had a
shared count variables, and a mutex per thread, and also a mutex for the count variable. The
master thread created a thread pool of the specified number of threads to begin with and would
set the count variable to the number of threads. Each thread would work in an infinite loop,
where at every iteration of the while loop, it would lock its respective mutex. Then it would get
the requisite global “seed” candidates, and run the respective local evolutionary search
algorithms. After each individual thread would be done, it would decrement the count variable
by one, and loop back and try grabbing its own lock again (where it will wait). The master thread
waits for the count variable to get to 0, and when it does, it sorts the global tops array, applies
the respective global mutations, shuffles the array, and resets the count variable to num
threads. After that the master thread unlocks the respective mutex for each thread so they can
start running. This would run till either the master thread found an accuracy greater than the
threshold, or a certain upper limit of iterations. Once the stopping condition would hit, the
master thread sends out a signal to the other threads to quit, using a global variable.
The motivation behind this was that, we thought pthread fork and joins may have extra
overhead, and using mutexes might help us prevent that overhead. As discussed in the result
section, this ended up being slower than EV2 for smaller number of threads, and that was
mostly because of the synchronization overheads due to mutexes and shared variables. But for
16 threads, this outperformed EV2, because of the overhead of spawning new threads every
iteration.
Finally, we wanted to target and reduce the time the island program was taking in the sequential
section. The approach we took is defined below.
11. In the last couple of approaches we noticed a few things. First was that sometimes the workload
distribution ends up being skewed, and that causes some threads to just be idle to wait for other
threads to catch up. Also, the global evolution would happen in a sequential fashion, which was
definitely a bottleneck in further speedups.
For this, we went back to the drawing board, and realized that the whole idea behind us coming
up with the island approach was to have different local evolutionary populations communicate
and intermingle to encourage more random exploration, and better evolution eventually. We
realized that the communication does not need to be completely exact (and accurate) in time,
and we could do this in an asynchronous manner. We built this on top of EV2, but got rid of the
synchronizations. So every thread, including the master would exclusively work on its local
population. We created three shared arrays of global populations. One of the arrays was called
the “read array”: this is where the master initializes the first “seed” for all islands. The second
array, was the “write array” where all threads would report the best candidates from their
population. The third is the buffer array, which is used to do any intermittent calculations.
The master first initializes the “read array”. Then it spawns off all other threads. Each thread
basically runs the local evolutionary search algorithm, which either runs till the master signals it
to stop (using shared variable), or it hits a certain number of iterations. The master also runs
local evolutionary search, but every few iterations, it performs the global sorting and mutation
step without blocking other threads. It copies the “write array” results to the buffer, sorts the
buffer, applies global evolutionary search to it, and shuffles it. Then it switches the pointer
locations of read array and the buffer, so now the threads can read the new globally evolved
array. This allows all threads to operate asynchronously, where the only drawback is that the
intermingling does not exactly happen at the same times for the threads. This is acceptable,
because the main idea was to interchange information between islands, and by making this
asynchronous we did not lose much accuracy.
The only problem we faced with this was that the master thread now had to do extra work so it
severely lagged behind, and that caused the other threads to work on stale data for a much
longer time. As a fix to this, we significantly reduced the population size of the local evolutionary
search for the master, and that way, the master was always either ahead of the other threads
(which did us no harm), or would be at par with the other threads. We kept executing master till
either it hit a threshold, or some other thread exited, which signaled all other threads to quit via
a shared variable. This way, we completely removed any sequential code from our algorithm.
We did tradeoff a little bit of freshness in terms of global evolution, but since the local evolutions
follow the process correctly anyway, this ended up not causing a huge issue with accuracies. We
just wanted information to flow through all threads to explore a larger space better, and not get
stuck in local optimums, which this model helps us achieve. This gave us more speedups as we
have shown in the results section.
12. Parallel Evolutionary Search 5 (EV5)
For this approach we used the same model as EV4, but we also parallelized the part of the code
where the master sorts and evolves the global array. We used openMP to implement a parallel
for loop here and ended up parallelizing the evolutions carried out by the master thread.
In addition to that, we also padded the three global arrays to completely take up the cache line
size in order to prevent false sharing. This method thus is our best model in terms to
performance and gave us the most speedups.
Experimental Setup
Tiny DNN
Our initial approach was to use a highly optimized neural network library, to run
hyperparameter search on convolutional neural networks. We wanted to stick with C/C++ for our
implementation, as it is much easier to parallelize on that, rather than python (which has much
better frameworks for deep learning).
We found a library called tiny-dnn which provided a very high level abstraction of neural
networks. It was a highly optimized library which could be linked to a project by using header
only compilation. This library was also in C++14 which prevented us from running this on GHC.
We implemented most of our techniques using the Tiny DNN framework, but on actually running
the tests on multiple threads, we saw that there were no performance improvements. We
hypothesized that this is because of the tinyDNN doing some optimizations using openMP and
pthreads under the hood in the background and this was clashing with our parallelization
implementations. The library was taking up most resources on the limited resource machine we
were running our experiments on. We could not profile because we could not run it on GHC (due
to C++14) and AWS blocked proper profiling on their instances. At this point, we decided to
abandon this library, and use a simpler DNN (a multi layer perceptron) to train our networks.
MLP
We eventually decided that we will mostly try and use an existing simple multi-layer perceptron
codebase, or reimplement our own version on a simple multilayer perceptron with
backpropagation. We ended up taking some motivation and code [2], and reimplemented parts
of it to fit our needs.
This meant that we could not test our algorithms on very complicated networks (just MLPs), but
given the resource constraints we had, and the amount of time it takes to train even a single
complicated network, it would have taken us considerable amount of resources and time to
13. write C++ code and train these networks to get good accuracies. So we scoped our problem
down to multi-layer perceptrons, as it was easier for us to implement it and use a version of it
for hyperparameter search. We still spent a considerable amount of time trying to implement
and get a simple MLP working in C++, so we decided to focus the rest of our time trying to
parallelize hyperparameter search, rather than build more complicated networks such as
convolution networks.
Datasets
We primarily tested our results on two datasets. We used the iris dataset as our first dataset,
because it was a rather small classification dataset (150 points, with 3 labels), and it was very
quick for us to train multiple networks in parallel with different hyperparameters to test the
speedups and results. Once we verified our approaches and got some considerable speedups
with the iris dataset, we switched to using the MNIST dataset, which was considerably larger
(60,000 images). We initially wanted to get all our results on the CIFAR10 datasets, but since we
resorted to using MLPs, it was very hard to get any substantial accuracies with the CIFAR dataset,
and that was throwing our hyperparameter search off.
Machine Specifications
All the experiments were run on the AWS c5.4xlarge instances. These instances have Intel Xeon
Platinum 8000 series, with 16 virtual cores and 32 gb memory. We also tested this on the GHC
machine, which have eight 3.2 GHz Intel Core i7 processors, and we noticed similar speedups.
14. Results and Analysis
Speedups
Wall Clock Times
The wallclock time for different algorithms are given here. Observations:
1. GS, RS, EV1 are going to be faster because the parameter space size on which they search
is much less than the other algorithms. We significantly restricted these to measure
speedups, and not waste time on running it on hyperparameter large spaces.
2. EV3 takes the longest time since the synchronization cost is a lot more than the other
techniques
3. For random search, average of three runs is taken and since the algorithm stops when it
reaches an given high accuracy threshold (75% for MNIST and 97% for Iris), it’s total
execution time is much less.
4. EV4 execution time is less because it has near zero synchronization cost.
5. EV5 execution time is the least among EV2-5s since it benefits the most from the
parallelization while maintaining the same search space.
15. Relative Speedups
The relative speedup for different algorithms on both the datasets are given here. Observations:
1. RS doesn’t scale after a point since the time needed for achieving a minimum good
accuracy has been reached with a certain number of threads (8 in this case) and more
threads do not improve the execution time since we hit the steady state in the expected
iterations to hit the maximum accuracy threshold.
2. EV 3, 4, 5 scales similarly since the parallelization technique is similar. EV 5 scales the best
since it has no synchronization overhead.
3. As the number of threads increase, the overhead of repeatedly creating pthreads
increases for EV2 due to which it’s speedup at the end is not as good as the others. EV3
just creates a pool of threads at the start and uses mutex to synchronize due to which it
did not achieve perfect speed up but it did better than EV2.
4. EV1 does not scale well since the contention between different threads increases as they
are operating on the same array which leads to false sharing and more synchronization
overhead due to some workload imbalance (explained in next section). Moreover, the
search space (amount of work) is less, so more threads are not giving ideal speedups.
16. 5. EV5 scales the best since it has no synchronization (build from EV4), and additionally we
added padding to the structure which reduces false sharing and it also contained
additional parallelism in addition to the parallel component of parameter search due to
which we get the best efficiency at the end.
To summarize, the overhead of creating threads, synchronization cost, part sequential
work inherent in each algorithm in order to reconcile and get best hyperparameters and
other costs (like initial data loading) prevent us from achieving perfect speedups. This is
more elaborately portrayed for EV2, EV3 and EV4 in ‘Normalized Execution Time
breakdown’ section.
17.
18. Scaling with different parameters
This section gives the relative change in the execution time within each algorithm. The
experiments are run using the best possible execution time that is achieved using 16 threads for
all the algorithms.
Dataset size
The relative execution time (degradation in this scale), when scaled with dataset size is is shown
here. Note that the dataset sizes are increased exponentially, so an exponential slowdown in this
case is observed (except for RS since the time to reach the accuracy threshold for the network
becomes almost constant after certain amount of data). Since the data is passed through the
neural network, irrespective of the algorithm, we should observe the same proportionate
change in the execution times. This is because the forward pass, backward pass and the testing
(which is 10% of the dataset) is done in the neural network and this changes exactly in the same
way for all the algorithms due to which same trend is observed.
19. Number of Initial Hyperparameters
The relative execution time (degradation in this scale), when scaled with hyperparameter search
size is shown here. There are multiple interesting observations. Note that the number of
hyperparameters are also changed exponentially.
1. The grid search is essentially an exhaustive search due to which an exponential increase
in the hyperparameters gave an exponential increase in execution times.
2. If the space of random hyperparameter sampling is more, it is observed that random
search quickly finds the best model and reaches its accuracy threshold. Thus, the
runtimes decrease in this case.
3. The whole idea of evolutionary search is to retain only the best hyperparameters. Thus,
even though we sample from a large hyper parameter space, since we discard the bad
ones and only operate on the best ones, the execution time remains constant. There is a
slight increase due to the increase in the sampling cost because of larger range.
20. Size of Neural Network
The relative execution time (degradation in this scale), when scaled with the size of the neural
network is shown here. Note that similar to the increase in the dataset, the cost incurred will be
the same across all the algorithms. The training and testing times proportionately increase here
too and since the depth of the network increases exponentially from 1 to 8, so are the relative
execution times.
21. Workload distribution between the threads in the algorithms
The primary difference in EV2 and EV5 are that in EV2, the pthreads are forked and joined
repeatedly. A major problem with this is the unequal execution times between the threads. For
instance, having a hidden layer size of 512 would take more time to train and test compared to a
32 unit one. This would lead to multiple threads being idle waiting for the long running threads.
In EV5, since the algorithm is modified to remove synchronization, all the threads can be busy
and get the best result. From the figure above, for 16 threaded run, ‘blue’ from EV5 are better
distributed than the ‘orange’ color ones from EV2. This can also be confirmed from the speedups
and wallclock time graphs too in which EV5 is shown to be the best.
Note that all the results on GS, RS and EV1 are reported on dynamically scheduled workload of
openMP since similar trend is observed.
22. Normalized Execution Time Breakdown
For each algorithm, a detailed normalized execution time breakdown is given below.
The relative trends and speedups in the execution times are already explained above. But from
this graph, we can clearly see exactly why that is the case.
EV3 has a lot more synchronization cost associate with it since the threads stall a lot more to
acquire the mutexes. This cost (time) adds up and only gets exponentially worse as the number
of threads increase.
EV2 in contrast, has some synchronization cost during the ‘join’ part of the pthreads. But this is
almost constant since the master executes them a fixed number of times (that is specific to the
algorithm).
We clearly see here that for EV4 (and EV5 too since its built on top of EV4), the lack of
synchronization time makes them very efficient.
The algorithms have similar proportion of total parallel work (given in orange) across the threads
but we see the speedups because this is divided between the threads and each thread has
lesser work to do as we increase the number of threads.
Moreover, we can see that the inherent sequential part of the algorithms prevent us from
achieving perfect speedup.
23. Accuracy vs. Time
For this chart, we set a threshold for each of the parameters to stop searching the parameter
space as soon as it hits a 75% accuracy (which is the a good accuracy on MNIST using MLP). We
had significantly constrained the parameter space for grid search and random search to prevent
long runtimes, so these numbers are the proportion of the time taken to get to the required
accuracy, with respect to the total time taken(for an iteration bound search or an exhaustive
search for that particular algorithm).
As we can see, grid search needs to do a sweep of all parameters before it can check the
accuracies, so it takes the most time. Random search, has a threshold on accuracies that are
checked after every block of parallel execution, and we can see that it reaches 75% accuracy in
about 20% of the time it takes to do a longer sweep and still get similar results.
We can see that EV4 takes the least amount of time, as it is searching in an asynchronous
manner, and it still retains information across threads or islands. EV2 and EV3 are very close,
and this can even be written down to the random initializations of the populations, but the idea
is they get to a pretty good solution significantly faster than grid or random search. EV3 is
slightly higher due to synchronization cost involved (explained in previous section)
24. Credit Distribution
The work was equal by both partners. We both worked together on figuring out the approaches,
setting up frameworks and getting a basic version of our project working. We split up the
algorithms we have implemented equally, and had equal contribution in terms of the analysis
and the report.
References
[1] https://github.com/tiny-dnn
[2] https://github.com/davidalbertonogueira/MLP/blob/master/README.md
[3] https://deepmind.com/blog/population-based-training-neural-networks/
[4] http://geco.mines.edu/workshop/aug2011/05fri/parallelGA.pdf
[5] https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/
Two images were taken from:
[6] https://www.slideshare.net/alirezaandalib77/evolutionary-algorithms-69187390
[7] https://www.slideshare.net/alirezaandalib77/evolutionary-algorithms-69187390