Deep Learning (DL) is one of the hottest fields. To foster the growth of DL, several open source frameworks appeared providing implementations of the most common DL algorithms. These frameworks vary in the algorithms they support and in the quality of their implementations. The purpose of this work is to provide a qualitative and quantitative comparison among three such frameworks: TensorFlow, Theano and CNTK. To ensure that our study is as comprehensive as possible, we consider multiple benchmark datasets from different fields (image processing, NLP, etc.) and measure the performance of the frameworks’ implementations of different DL algorithms. For most of our experiments, we find out that CNTK’s implementations are superior to the other ones under consideration.
Deep Learning libraries and first experiments with TheanoVincenzo Lomonaco
In recent years, neural networks and deep learning techniques have shown to perform well on many
problems in image recognition, speech recognition, natural language processing and many other tasks.
As a result, a large number of libraries, toolkits and frameworks came out in different languages and
with different purposes. In this report, firstly we take a look at these projects and secondly we choose the
framework that best suits our needs: Theano. Eventually, we implement a simple convolutional neural net
using this framework to test both its ease-of-use and efficiency.
Clone group mapping has a very important significance in the evolution of code clone. The topic modeling techniques were applied into code clone firstly and a new clone group mapping method was proposed. By using topic modeling techniques to transform the mapping problem of
high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was indirectly reached by mapping clone group topics. Experiments on four open source software show that the recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone group mapping.
Cloud Technologies for Microsoft Computational Biology Toolsijait
Executing large number of self-regulating tasks or tasks that execute minimal inter-task communication in analogous is a common requirement in many domains. In this paper, we present our knowledge in applying two new Microsoft technologies Dryad and Azure to three bioinformatics applications. We also contrast with traditional MPI and Apache Hadoop MapReduce completion in one example. The applications are an EST (Expressed Sequence Tag) series assembly program, PhyloD statistical package to recognize HLA-associated viral evolution, and a pairwise Alu gene alignment application. We give detailed presentation discussion on a 768 core Windows HPC Server cluster and an Azure cloud. All the applications start with a “doubly data parallel step” connecting independent data chosen from two parallel (EST, Alu) or two different databases (PhyloD). There are different structures for final stages in each application.
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
Deep Learning libraries and first experiments with TheanoVincenzo Lomonaco
In recent years, neural networks and deep learning techniques have shown to perform well on many
problems in image recognition, speech recognition, natural language processing and many other tasks.
As a result, a large number of libraries, toolkits and frameworks came out in different languages and
with different purposes. In this report, firstly we take a look at these projects and secondly we choose the
framework that best suits our needs: Theano. Eventually, we implement a simple convolutional neural net
using this framework to test both its ease-of-use and efficiency.
Clone group mapping has a very important significance in the evolution of code clone. The topic modeling techniques were applied into code clone firstly and a new clone group mapping method was proposed. By using topic modeling techniques to transform the mapping problem of
high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was indirectly reached by mapping clone group topics. Experiments on four open source software show that the recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone group mapping.
Cloud Technologies for Microsoft Computational Biology Toolsijait
Executing large number of self-regulating tasks or tasks that execute minimal inter-task communication in analogous is a common requirement in many domains. In this paper, we present our knowledge in applying two new Microsoft technologies Dryad and Azure to three bioinformatics applications. We also contrast with traditional MPI and Apache Hadoop MapReduce completion in one example. The applications are an EST (Expressed Sequence Tag) series assembly program, PhyloD statistical package to recognize HLA-associated viral evolution, and a pairwise Alu gene alignment application. We give detailed presentation discussion on a 768 core Windows HPC Server cluster and an Azure cloud. All the applications start with a “doubly data parallel step” connecting independent data chosen from two parallel (EST, Alu) or two different databases (PhyloD). There are different structures for final stages in each application.
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
Collective Mind: a collaborative curation tool for program optimizationGrigori Fursin
Designing and optimizing applications becomes increasingly tedious, time consuming, ad-hoc and error prone due to ever changing and complex hardware and software stack. At the same time, it becomes difficult or even impossible to validate, reproduce and extend many proposed optimization and auto-tuning techniques from numerous publications. One of the main reasons is a lack of common and practical way to preserve, systematize and reuse available knowledge and artifacts including developments, optimizations and experimental data.
In this talk, I will present modular, extensible, python-based Collective Mind framework and web-based schema-free repository
(c-mind.org) that I developed at first to systematize my own research and experimentation on machine learning based program optimization and compiler design. This infrastructure can be customized to preserve, describe, share and reproduce whole experimental setups including benchmarks, data sets, libraries, tools, predictive models and optimization information with related JSON-based meta-data. I will also present and discuss positive and negative feedback during several recent academic and industrial usages of this framework to systematize benchmarking and program optimization, and to initiate new publication model where experimental results and related research artifacts are shared, reproduced and validated by the community. In a long term, I hope that such approach and collective knowledge will eventually help us to squeeze maximum performance from computer systems while minimizing energy, development time and other costs.
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming TELKOMNIKA JOURNAL
Genomic repeats, i.e., pattern searching in the string processing process to find
repeated base pairs in the order of deoxyribonucleic acid (DNA), requires
a long processing time. This research builds a big-data computational
model to look for patterns in strings by modifying and implementing
the Boyer-Moore algorithm on Apache Spark Streaming for human DNA
sequences from the ensemble site. Moreover, we perform some experiments
on cloud computing by varying different specifications of computer clusters
with involving datasets of human DNA sequences. The results obtained show
that the proposed computational model on Apache Spark Streaming is faster
than standalone computing and parallel computing with multicore. Therefore,
it can be stated that the main contribution in this research, which is to develop
a computational model for reducing the computational costs, has been
achieved.
Hardware/Software Co-Design for Efficient Microkernel ExecutionMartin Děcký
While the performance overhead of IPC in microkernel multiserver operating systems is no longer considered a blocker for their practical deployment (thanks to many optimization ideas that have been proposed and implemented over the years), it is undeniable that the overhead still does exist and the more fine-grained the architecture of the operating system is (which is desirable from the reliability, dependability, safety and security point of view), the more severe performance penalties due to the IPC overhead it suffers. A closely related issue is the overhead of handing hardware interrupts in user space device drivers. This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems.
One reason for the IPC overhead is the fact that current hardware and CPUs were never designed with microkernel multiserver operating systems in mind, but they were rather fitted for the traditional monolithic operating systems. This calls for an out-of-the-box thinking while designing instruction set architecture (ISA) extensions and other hardware features that would support (a) efficient communication between isolated virtual address spaces using synchronous and asynchronous IPC primitives, and (b) treating object references (e.g. capability references) as first-class entities on the hardware level. A good testbed for evaluating such approaches (with the potential to be eventually adopted as industry standard) is the still unspecified RV128 ISA (128-bit variant of RISC-V).
This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems.
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com
This talk will focus on challenges in designing software libraries and middleware for upcoming exascale systems with millions of processors and accelerators. Two kinds of application domains - Scientific Computing and Big data will be considered. For scientific computing domain, we will discuss about challenges in designing runtime environments for MPI and PGAS (UPC and OpenSHMEM) programming models by taking into account support for multi-core, high-performance networks, GPGPUs and Intel MIC. Features and sample performance numbers from MVAPICH2 and MVAPICH2-X (supporting Hybrid MPI and PGAS (UPC and OpenSHMEM)) libraries will be presented. For Big Data domain, we will focus on high-performance and scalable designs of Hadoop (including HBase) and Memcached using native RDMA support of InfiniBand and RoCE.
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskyDatabricks
Over the past decade, the fourth paradigm of data-intensive science rapidly became a major driving concept of multiple application domains encompassing and generating large-scale devices such as light sources and cutting edge telescopes. The success of data-intensive projects subsequently triggered the next generation of machine learning approaches. These new artificial intelligent systems clearly represent a paradigm shift from data processing pipelines towards the fifth paradigm of knowledge-centric cognitive applications requiring the integration of Big Data processing platforms and HPC technologies.
The talk addresses the existing impedance mismatch between data-intensive and compute-intensive ecosystems by presenting the Spark-MPI approach based on the MPI Process Management Interface (PMI). The approach was originally designed for building high-performance streaming image reconstruction pipelines at light source facilities. This talk will demonstrate Spark-MPI within the context of distributed deep learning applications by integrating the Apache Spark platform, PMI Exascale and Horovod MPI-based training framework for TensorFlow.
hadoop training in mumbai at Asterix Solution is designed to scale up from single servers to thousands of machines, each offering local computation and storage. With the rate at which memory cost decreased the processing speed of data never increased and hence loading the large set of data is still a big headache and here comes Hadoop as the solution for it.
http://www.asterixsolution.com/big-data-hadoop-training-in-mumbai.html
Performance Comparison between Pytorch and Mindsporeijdms
Deep learning has been well used in many fields. However, there is a large amount of data when training neural networks, which makes many deep learning frameworks appear to serve deep learning practitioners, providing services that are more convenient to use and perform better. MindSpore and PyTorch are both deep learning frameworks. MindSpore is owned by HUAWEI, while PyTorch is owned by Facebook. Some people think that HUAWEI's MindSpore has better performance than FaceBook's PyTorch, which makes deep learning practitioners confused about the choice between the two. In this paper, we perform analytical and experimental analysis to reveal the comparison of training speed of MIndSpore and PyTorch on a single GPU. To ensure that our survey is as comprehensive as possible, we carefully selected neural networks in 2 main domains, which cover computer vision and natural language processing (NLP). The contribution of this work is twofold. First, we conduct detailed benchmarking experiments on MindSpore and PyTorch to analyze the reasons for their performance differences. This work provides guidance for end users to choose between these two frameworks.
TANGO Project is a new initiative undartaken by a group of european organizations and institutions to fullfil one purpose: Simplify the way developers approach the development of next-generation applications based in heterogeneous hardware architectures, configurations and software systems including heterogeneous clusters, chips and programmable logic devices.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
Collective Mind: a collaborative curation tool for program optimizationGrigori Fursin
Designing and optimizing applications becomes increasingly tedious, time consuming, ad-hoc and error prone due to ever changing and complex hardware and software stack. At the same time, it becomes difficult or even impossible to validate, reproduce and extend many proposed optimization and auto-tuning techniques from numerous publications. One of the main reasons is a lack of common and practical way to preserve, systematize and reuse available knowledge and artifacts including developments, optimizations and experimental data.
In this talk, I will present modular, extensible, python-based Collective Mind framework and web-based schema-free repository
(c-mind.org) that I developed at first to systematize my own research and experimentation on machine learning based program optimization and compiler design. This infrastructure can be customized to preserve, describe, share and reproduce whole experimental setups including benchmarks, data sets, libraries, tools, predictive models and optimization information with related JSON-based meta-data. I will also present and discuss positive and negative feedback during several recent academic and industrial usages of this framework to systematize benchmarking and program optimization, and to initiate new publication model where experimental results and related research artifacts are shared, reproduced and validated by the community. In a long term, I hope that such approach and collective knowledge will eventually help us to squeeze maximum performance from computer systems while minimizing energy, development time and other costs.
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming TELKOMNIKA JOURNAL
Genomic repeats, i.e., pattern searching in the string processing process to find
repeated base pairs in the order of deoxyribonucleic acid (DNA), requires
a long processing time. This research builds a big-data computational
model to look for patterns in strings by modifying and implementing
the Boyer-Moore algorithm on Apache Spark Streaming for human DNA
sequences from the ensemble site. Moreover, we perform some experiments
on cloud computing by varying different specifications of computer clusters
with involving datasets of human DNA sequences. The results obtained show
that the proposed computational model on Apache Spark Streaming is faster
than standalone computing and parallel computing with multicore. Therefore,
it can be stated that the main contribution in this research, which is to develop
a computational model for reducing the computational costs, has been
achieved.
Hardware/Software Co-Design for Efficient Microkernel ExecutionMartin Děcký
While the performance overhead of IPC in microkernel multiserver operating systems is no longer considered a blocker for their practical deployment (thanks to many optimization ideas that have been proposed and implemented over the years), it is undeniable that the overhead still does exist and the more fine-grained the architecture of the operating system is (which is desirable from the reliability, dependability, safety and security point of view), the more severe performance penalties due to the IPC overhead it suffers. A closely related issue is the overhead of handing hardware interrupts in user space device drivers. This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems.
One reason for the IPC overhead is the fact that current hardware and CPUs were never designed with microkernel multiserver operating systems in mind, but they were rather fitted for the traditional monolithic operating systems. This calls for an out-of-the-box thinking while designing instruction set architecture (ISA) extensions and other hardware features that would support (a) efficient communication between isolated virtual address spaces using synchronous and asynchronous IPC primitives, and (b) treating object references (e.g. capability references) as first-class entities on the hardware level. A good testbed for evaluating such approaches (with the potential to be eventually adopted as industry standard) is the still unspecified RV128 ISA (128-bit variant of RISC-V).
This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems.
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com
This talk will focus on challenges in designing software libraries and middleware for upcoming exascale systems with millions of processors and accelerators. Two kinds of application domains - Scientific Computing and Big data will be considered. For scientific computing domain, we will discuss about challenges in designing runtime environments for MPI and PGAS (UPC and OpenSHMEM) programming models by taking into account support for multi-core, high-performance networks, GPGPUs and Intel MIC. Features and sample performance numbers from MVAPICH2 and MVAPICH2-X (supporting Hybrid MPI and PGAS (UPC and OpenSHMEM)) libraries will be presented. For Big Data domain, we will focus on high-performance and scalable designs of Hadoop (including HBase) and Memcached using native RDMA support of InfiniBand and RoCE.
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskyDatabricks
Over the past decade, the fourth paradigm of data-intensive science rapidly became a major driving concept of multiple application domains encompassing and generating large-scale devices such as light sources and cutting edge telescopes. The success of data-intensive projects subsequently triggered the next generation of machine learning approaches. These new artificial intelligent systems clearly represent a paradigm shift from data processing pipelines towards the fifth paradigm of knowledge-centric cognitive applications requiring the integration of Big Data processing platforms and HPC technologies.
The talk addresses the existing impedance mismatch between data-intensive and compute-intensive ecosystems by presenting the Spark-MPI approach based on the MPI Process Management Interface (PMI). The approach was originally designed for building high-performance streaming image reconstruction pipelines at light source facilities. This talk will demonstrate Spark-MPI within the context of distributed deep learning applications by integrating the Apache Spark platform, PMI Exascale and Horovod MPI-based training framework for TensorFlow.
hadoop training in mumbai at Asterix Solution is designed to scale up from single servers to thousands of machines, each offering local computation and storage. With the rate at which memory cost decreased the processing speed of data never increased and hence loading the large set of data is still a big headache and here comes Hadoop as the solution for it.
http://www.asterixsolution.com/big-data-hadoop-training-in-mumbai.html
Performance Comparison between Pytorch and Mindsporeijdms
Deep learning has been well used in many fields. However, there is a large amount of data when training neural networks, which makes many deep learning frameworks appear to serve deep learning practitioners, providing services that are more convenient to use and perform better. MindSpore and PyTorch are both deep learning frameworks. MindSpore is owned by HUAWEI, while PyTorch is owned by Facebook. Some people think that HUAWEI's MindSpore has better performance than FaceBook's PyTorch, which makes deep learning practitioners confused about the choice between the two. In this paper, we perform analytical and experimental analysis to reveal the comparison of training speed of MIndSpore and PyTorch on a single GPU. To ensure that our survey is as comprehensive as possible, we carefully selected neural networks in 2 main domains, which cover computer vision and natural language processing (NLP). The contribution of this work is twofold. First, we conduct detailed benchmarking experiments on MindSpore and PyTorch to analyze the reasons for their performance differences. This work provides guidance for end users to choose between these two frameworks.
TANGO Project is a new initiative undartaken by a group of european organizations and institutions to fullfil one purpose: Simplify the way developers approach the development of next-generation applications based in heterogeneous hardware architectures, configurations and software systems including heterogeneous clusters, chips and programmable logic devices.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIESijdpsjournal
By programming both the data plane and the control plane, network operators can adapt their networks to
their needs. Thanks to research over the past decade, this concept has more formulized and more
technologically feasible. However, since control plane programmability came first, it has already been
successfully implemented in the real network and is beginning to pay off. Today, the data plane
programmability is evolving very rapidly to reach this level, attracting the attention of researchers and
developers: Designing data plane languages, application development on it, formulizing software switches
and architecture that can run data plane codes and the applications, increasing performance of software
switch, and so on. As the control plane and data plane become more open, many new innovations and
technologies are emerging, but some experts warn that consumers may be confused as to which of the many
technologies to choose. This is a testament to how much innovation is emerging in the network. This paper
outlines some emerging applications on the data plane and offers opportunities for further improvement
and optimization. Our observations show that most of the implementations are done in a test environment
and have not been tested well enough in terms of performance, but there are many interesting works, for
example, previous control plane solutions are being implemented in the data plane.
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES ijdpsjournal
By programming both the data plane and the control plane, network operators can adapt their networks to
their needs. Thanks to research over the past decade, this concept has more formulized and more
technologically feasible. However, since control plane programmability came first, it has already been
successfully implemented in the real network and is beginning to pay off. Today, the data plane
programmability is evolving very rapidly to reach this level, attracting the attention of researchers and
developers: Designing data plane languages, application development on it, formulizing software switches
and architecture that can run data plane codes and the applications, increasing performance of software
switch, and so on. As the control plane and data plane become more open, many new innovations and
technologies are emerging, but some experts warn that consumers may be confused as to which of the many
technologies to choose. This is a testament to how much innovation is emerging in the network. This paper
outlines some emerging applications on the data plane and offers opportunities for further improvement
and optimization. Our observations show that most of the implementations are done in a test environment
and have not been tested well enough in terms of performance, but there are many interesting works, for
example, previous control plane solutions are being implemented in the data plane.
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...Paolo Nesi
Abstract—The recent growth of the World Wide Web at increasing rate and speed and the number of online available resources populating Internet represent a massive source of knowledge for various research and business interests. Such knowledge is, for the most part, embedded in the textual content of web pages and documents, which is largely represented as unstructured natural language formats. In order to automatically ingest and process such huge amounts of data, single-machine, non-distributed architectures are proving to be inefficient for tasks like Big Data mining and intensive text processing and analysis. Current Natural Language Processing (NLP) systems are growing in complexity, and computational power needs have been significantly increased, requiring solutions such as distributed frameworks and parallel computing programming paradigms. This paper presents a distributed framework for executing NLP related tasks in a parallel environment. This has been achieved by integrating the APIs of the widespread GATE open source NLP platform in a multi-node cluster, built upon the open source Apache Hadoop file system. The proposed framework has been evaluated against a real corpus of web pages and documents.
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
This paper studies the performance and energy consumption of several multi-core, multi-CPUs and manycore
hardware platforms and software stacks for parallel programming. It uses the Multimedia Multiscale
Parser (MMP), a computationally demanding image encoder application, which was ported to several
hardware and software parallel environments as a benchmark. Hardware-wise, the study assesses
NVIDIA's Jetson TK1 development board, the Raspberry Pi 2, and a dual Intel Xeon E5-2620/v2 server, as
well as NVIDIA's discrete GPUs GTX 680, Titan Black Edition and GTX 750 Ti. The assessed parallel
programming paradigms are OpenMP, Pthreads and CUDA, and a single-thread sequential version, all
running in a Linux environment. While the CUDA-based implementation delivered the fastest execution, the
Jetson TK1 proved to be the most energy efficient platform, regardless of the used parallel software stack.
Although it has the lowest power demand, the Raspberry Pi 2 energy efficiency is hindered by its lengthy
execution times, effectively consuming more energy than the Jetson TK1. Surprisingly, OpenMP delivered
twice the performance of the Pthreads-based implementation, proving the maturity of the tools and
libraries supporting OpenMP.
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
This paper studies the performance and energy consumption of several multi-core, multi-CPUs and manycore
hardware platforms and software stacks for parallel programming. It uses the Multimedia Multiscale
Parser (MMP), a computationally demanding image encoder application, which was ported to several
hardware and software parallel environments as a benchmark. Hardware-wise, the study assesses
NVIDIA's Jetson TK1 development board, the Raspberry Pi 2, and a dual Intel Xeon E5-2620/v2 server, as
well as NVIDIA's discrete GPUs GTX 680, Titan Black Edition and GTX 750 Ti. The assessed parallel
programming paradigms are OpenMP, Pthreads and CUDA, and a single-thread sequential version, all
running in a Linux environment. While the CUDA-based implementation delivered the fastest execution, the
Jetson TK1 proved to be the most energy efficient platform, regardless of the used parallel software stack.
Although it has the lowest power demand, the Raspberry Pi 2 energy efficiency is hindered by its lengthy
execution times, effectively consuming more energy than the Jetson TK1. Surprisingly, OpenMP delivered
twice the performance of the Pthreads-based implementation, proving the maturity of the tools and
libraries supporting OpenMP.
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
Audience Level
Intermediate
Synopsis
High performance computing and cloud computing have traditionally been seen as separate solutions to separate problems, dealing with issues of performance and flexibility respectively. In a diverse research environment however, both sets of compute requirements can occur. In addition to the administrative benefits in combining both requirements into a single unified system, opportunities are provided for incremental expansion.
The deployment of the Spartan cloud-HPC hybrid system at the University of Melbourne last year is an example of such a design. Despite its small size, it has attracted international attention due to its design features. This presentation, in addition to providing a grounding on why one would wish to build an HPC-cloud hybrid system and the results of the deployment, provides a complete technical overview of the design from the ground up, as well as problems encountered and planned future developments.
Speaker Bio
Lev Lafayette is the HPC and Training Officer at the University of Melbourne. Prior to that he worked at the Victorian Partnership for Advanced Computing for several years in a similar role.
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches.
Similar to Benchmarking open source deep learning frameworks (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Halogenation process of chemical process industries
Benchmarking open source deep learning frameworks
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 10, No. 5, October 2020, pp. 5479∼5486
ISSN: 2088-8708, DOI: 10.11591/ijece.v10i5.pp5479-5486 Ì 5479
Benchmarking open source deep learning frameworks
Ghadeer Al-Bdour, Raffi Al-Qurran, Mahmoud Al-Ayyoub, Ali Shatnawi
Jordan University of Science and Technology, Jordan
Article Info
Article history:
Received Jul 23, 2019
Revised Apr 25, 2020
Accepted May 6, 2020
Keywords:
CNTK
Performance comparison
TensorFlow
Theano
ABSTRACT
Deep Learning (DL) is one of the hottest fields. To foster the growth of DL, several
open source frameworks appeared providing implementations of the most common
DL algorithms. These frameworks vary in the algorithms they support and in the
quality of their implementations. The purpose of this work is to provide a qualitative
and quantitative comparison among three such frameworks: TensorFlow, Theano and
CNTK. To ensure that our study is as comprehensive as possible, we consider multi-
ple benchmark datasets from different fields (image processing, NLP, etc.) and mea-
sure the performance of the frameworks’ implementations of different DL algorithms.
For most of our experiments, we find out that CNTK’s implementations are superior
to the other ones under consideration.
Copyright c 2020 Insitute of Advanced Engineeering and Science.
All rights reserved.
Corresponding Author:
Mahmoud Al-Ayyoub,
Jordan University of Science and Technology, Irbid, Jordan.
Email: maalshbool@just.edu.jo
1. INTRODUCTION
Deep learning (DL) is the hottest trend in machine learning (ML). Although the theoretical concepts
behind DL are not new, it has enjoyed a surge of interest over the past decade due to many factors. One example
is that DL approaches have significantly outperformed state-of-the-art (SOTA) approaches in many tasks across
different fields such as image processing, computer vision, speech processing, natural language processing
(NLP), etc. Moreover, the scientific community (from both the academia and the industry) has quickly and
massively adopted DL. Open source implementations of successful DL algorithms quickly appeared on code
sharing websites, and were subsequently used by many researchers in different fields.
Several DL frameworks exist, such as TensorFlow, Theano, CNTK, Caffe and PyTorch, each with
different features and characteristics. Furthermore, each framework utilizes different techniques to optimize its
code. Although the same algorithm is implemented in different frameworks, the performance of the different
implementations can vary greatly. A researcher/practitioner looking to use such an algorithm in his/her work
would face a difficult choice, since the number of different implementations is high and the effort invested by
the research community in scientifically comparing these implementations is limited.
In this work, we aim at providing qualitative and quantitative comparisons between three popular open
source DL frameworks: TensorFlow, Theano and CNTK. These frameworks support multi-core CPUs as well
as multiple GPUs. All of them import cuDNN, which is a DL library from NVIDIA that supports highly tuned
implementations for standard routines such as forward and backward convolution, normalization, pooling and
activation layers. We compare these frameworks by training different neural network (NN) architectures on
five different standard benchmark datasets for various tasks in image processing, computer vision and NLP.
Despite their importance, comparative studies like ours that focus on performance issues are rare.
Limited efforts have been dedicated to conducting comparative studies between SOTA DL frameworks
running on different hardware platforms (CPU and GPU) to highlight the advantages and limitations for each
framework for different deep NN architectures. These efforts included papers [1-9] as well as online blogs
Journal homepage: http://ijece.iaescore.com/index.php/IJECE
2. 5480 Ì ISSN: 2088-8708
(https://github.com/soumith/convnet-benchmarks). Due to space constraint, we do not discuss the details of
these works here. Interested readers are referred to earlier versions of this work [10, 11] for such details.
However, we do note that, in previous studies, the comparison goal focused only on processing time. None of
those comparative studies dealt with CPU and GPU utilization or memory consumption. This work covered
these metrics to find which of the considered frameworks achieve the best performance. Finally and most
importantly, the comparisons involved more datasets from more fields compared with previous studies.
The rest of this paper is organized as follows. Section 2. discusses the frameworks, the way they were
used to train the datasets and a brief comparison between them. The methodology we follow is discussed in
Section 3. Experimental results and the discussion are detailed in Section 4. The work is concluded with final
thoughts presented in Section 5.
2. DEEP LEARNING FRAMEWORKS
The frameworks considered in this comparative study are: CNTK, TensorFlow and Theano. Moreover,
we use Keras on top of these frameworks as discussed later. All of these frameworks provide flexible APIs and
configuration options for performance optimization. Software versions of the frameworks are shown in Table 1
and their properties are shown in Table 2.
Table 1. Frameworks used for this comparative study
Framework Major Version Github Commit ID
CNTK 2.0 7436a00
TensorFlow 1.2.0 49961e5
Theano 0.10.0.dev1 8a1af5b
Keras 2.0.5 78f26df
Table 2. Properties of the considered frameworks
Property CNTK TensorFlow Theano Keras
Core C++ C++ Python Python
CPU
Multi-Threaded CPU Eigen Blas, conv2D, Limited OpenMP
GPU
Multi-GPU X (experimental version)
NVIDIA cuDNN
2.1. Microsoft cognitive toolkit (CNTK)
CNTK is an open source DL framework developed by Microsoft Research [12] for training and testing
many types of NN across multiple GPUs or servers. CNTK supports different DL architectures like Feedfor-
ward, Convolutional, Recurrent, Long Short-Term Memory (LSTM) and Sequence-to-Sequence NN. In CNTK,
a Computational Network learns any function by converting it to a directed graph, where leaf nodes consist of
an input values or learning parameters while other nodes represent matrix operation applied to its children. In
this case, CNTK has an advantage as it can automatically find the derive gradients for all the computations
which are required to learn the parameters. In CNTK, users specify their networks using a configuration file
that contains information about the network type, where to find input data and the way to optimize param-
eters [13]. CNTK interface supports different APIs of several languages such as Python, C++ and C# across
both GPU (CUDA) or CPU platforms. According to its developers (https://docs.microsoft.com/en-us/cognitive-
toolkit/cntk-evaluation-overview), CNTK was written in C++ in an efficient way, where it removes duplicated
computations in forward and backward passes, uses minimal memory and reduces memory reallocation by
reusing them.
2.2. Theano
Theano is an open source Python library developed at MILA lab at the University of Montreal as a
compiler for mathematical expressions that lets users and developers optimize and evaluate their expressions
using NumPy’s syntax (a Python library that supports a large and multi-dimensional arrays) [3, 14]. Theano
starts performing computations automatically by optimizing the selection of computations, translates them
into other machine learning languages such as C++ or CUDA (for GPU) and then compiles them into Python
Int J Elec & Comp Eng, Vol. 10, No. 5, October 2020 : 5479 – 5486
3. Int J Elec & Comp Eng ISSN: 2088-8708 Ì 5481
modules in an efficient way on CPUs or GPUs. Theano’s development started in 2008 and it is more popular
on a research and ecosystem platform than many DL libraries. Several software packages have been developed
to build on top of Theano, with a higher-level user interface which aims to make Theano easier to express and
train different architectures of deep learning models, such as Pylearn2, Lasagne and Keras.
2.3. TensorFlow
TensorFlow is an open source framework developed by Google Brain Team [15]. It uses a single data
flow graph, expressing all numerical computations, to achieve excellent performance. TensorFlow constructs
large computation graphs where each node represents a mathematical operation, while the edges represent the
communication between nodes. This data flow graph executes the communication between sub-computations
explicitly, which makes it possible to execute independent computations in parallel or to use multiple devices to
execute partition computations [15]. Programmers of TensorFlow define large computation graphs from basic
operators, then distribute the execution of these graphs across a heterogeneous distributed system (can deploy
computation to one or more CPUs or GPUs on a different hardware platforms such as desktops, servers, or
even mobile devices). The flexible architecture of TensorFlow allows developers and users to experiment and
train a wide variety of NN models; It is used for deploying ML systems into production for different fields
including speech recognition, NLP, computer vision, robotics and computational drug discovery. TensorFlow
uses different APIs of several languages such as Python, C++ and Java for constructing and executing a graph.
Python API is the most complete and the easiest to use (https://www.tensorflow.org/).
2.4. Keras
Keras is an open source DL library developed in Python. It runs on top of CNTK, Theano or Ten-
sorFlow frameworks. Keras was founded by Google engineer Chollet in 2015 as a part of the research project
ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System). Keras is designed in a way
that allows fast expression with DNN and easy and fast prototyping (modularity and extensibility) [16].
3. METHODOLOGY
The goal of this experimental study is to compare the aforementioned frameworks (Theano, Ten-
sorFlow and CNTK) by using them to train Convolutional NN (CNN) and Recurrent NN (RNN) models on
standard benchmark datasets of classical problems in image processing (MNIST, CIFAR-10 and Self-driving
Car) and NLP (Penn TreeBank and IMDB). Specifically, we aim at comparing the resources consumed by each
framework to reach a certain accuracy level for each problem. Thus, we experiment with different epoch counts
in order to make sure the accuracy for all frameworks are close to each other. Each framework’s performance
is evaluated using running time, memory consumption and CPU and GPU utilization. We use a laptop that has
an Intel Core i7-6700HQ CPU @ 2.60GHz (4 cores) with 16 GB RAM, 64-bit operating system (Windows
10), and NVIDIA GEFORCE GTX 960m graphics card with PCI Express 3.0 bus support, equipped with 4 GB
GDDR5 memory and 640 CUDA cores.
3.1. Benchmark datasets
In this subsection, we discuss the datasets used in our experiments.
3.1.1. MNIST
The MNIST (Mixed National Institute of Standards and Technology) dataset for handwritten digits is
widely used in ML [17]. It has 60,000 training images and 10,000 testing images. Each image is 28× 28 pixels
which is flattened into a 784-value vector. The label of each image is a number between 0 and 9 representing
the digit appearing in the image.
3.1.2. CIFAR-10
The CIFAR-10 dataset is one of the datasets collected by Krizhevsky et al. [18, 19]. It consists of
60,000 32×32 color images evenly distributed over ten classes: airplane, automobile, bird, cat, deer, dog, frog,
horse, ship and truck. There are 50,000 training images and 10,000 test images. The classes are completely
mutually exclusive. I.e., there is no overlap between them. For instance, the “Automobile” class includes
sedans, SUVs, etc. On the other hand, the “Truck” class includes only big trucks. To avoid overlap, neither one
of these two classes includes pickup trucks.
Benchmarking open source deep learning frameworks (Ghadeer Al-Bdour)
4. 5482 Ì ISSN: 2088-8708
3.1.3. Penn TreeBank
In 1993, Marcus et al. [20] wrote a paper on constructing a large annotated corpus of English called
the Penn TreeBank (PTB). They reviewed their experience with constructing one large annotated corpus that
consists of over 4.5 million words of American English. It was annotated for part-of-speech (POS) tag infor-
mation. Moreover, half of the corpus was annotated for skeletal syntactic structure. The dataset is large and
diverse. It includes the Brown Corpus (retagged) and the Wall Street Journal Corpus, as well as Department of
Energy abstracts, Dow Jones Newswire stories, Department of Agriculture bulletins, Library of America texts,
MUC-3 messages, IBM Manual sentences, WBUR radio transcripts and ATIS sentences.
3.1.4. IMDB
The IMDB dataset [21] is another example of applying CNN, which is an online dataset of informa-
tion regarding films, TV programs and video games. It consists of 25,000 reviews labeled by the sentiment
(positive/negative) of each review. The reviews have been preprocessed and encoded as integers in a form of a
sequence of word indexes. Words are indexed by overall frequency in the dataset, so that the index i encodes
the ith most frequent word in the data in order to allow operations of quick filtering.
3.1.5. Self-Driving Car
This dataset uses a Udacity’s Self-Driving Car simulator as a testbed for training an autonomous car.
This work started in the 1980s with Carnegie Mellon University’s Navlab and ALV projects [22].
The training phase starts with activating the simulator which is an executable application. A user initiates
the service of collecting the data for training followed by collecting the data as images and saving them locally
on the computer. So, the framework can take these images and train them. The training is done via distin-
guishing the image edges, which are taken by the three cameras laying on the front of the car in the simulator.
After the training phase is done, the testing phase begins by taking the file generated whenever the performance
in the epoch is better than the previous best. Finally, the last generated file is executed in order to make the car
drive autonomously to observe the testing phase results.
3.2. Networks architecture
CNN is used for the MNIST, CIFAR-10, IMDB and Self-Driving Car datasets, where a different
network architecture is used for each dataset. The architecture of each CNN is shown in [23]. For the MNIST
and CIFAR-10 datasets, two convolutional layers with ReLU activation function are used after the input layer.
The activation function is used to reduce the training time and to prevent vanishing gradients. After each CNN
layer, a max-pooling layer is added in order to down-sample the input and to reduce overfitting. In the max-
pooling layer, the stride value must be specified with which the filter is slid. When the stride is x, the filter
(window) is moved x pixels at a time. This will produce smaller output volumes spatially. After each max-
pooling layer, the dropout method is used in order to reduce overfitting by forcing the model to learn many
independent representations of the same data through randomly disabling neurons in the learning phase.
For the Self-Driving Car dataset, the CNN has the same components as the ones used with the MNIST
and CIFAR-10 datasets, but with deeper model that consists of five convolutional layers with Exponential
Linear Unit (ELU) activation function. The convolutional layers are used for feature engineering. The fully
connected layer is used for predicting the steering angle (final output). The dropout avoids overfitting and,
finally, the ELU activation function is used to solve the problem of the vanishing gradient.
In the IMDB dataset, the movie reviews are composed of sequences of words of different lengths.
These words are encoded by mapping movie reviews to sequences of word embeddings where words are
mapped to vectors of real numbers; the network architecture consists of an embedding layer followed by a 1D
convolutional layer which is used for temporal data followed by a global max-pooling operation.
These sequences are padded to have the same size as the largest sequence because they have different lengths.
The other NN type we consider is RNN with LSTM. One of the most popular uses of LSTM is for
text analysis tasks such as the ones associated with the Penn TreeBank (PTB) dataset. Word-level prediction
experiments on PTB was adopted, which consists of 929k training words, 73k validation words and 82k test
words. It has 10k words in its vocabulary. We trained models of two sizes (small and medium) using the
same architecture presented in [24]. To evaluate language models of the PTB implementation, a special metric
called a perplexity is used, where better prediction accuracy is achieved when perplexity value is as low as
possible. Perplexity is the inverse of probability definition. This means that minimizing perplexity value is
the same as maximizing probability. The goal of applying PTB dataset is to match a probabilistic form which
Int J Elec & Comp Eng, Vol. 10, No. 5, October 2020 : 5479 – 5486
5. Int J Elec & Comp Eng ISSN: 2088-8708 Ì 5483
assigns probabilities to sentences. This process is done by predicting the next words in a text given a history
of previously located words. LSTM cells represent the core of the model which processes one word at a time
and computes probabilities of the possible values for the next word in the sentence. A vector of zeros is used
for the memory state of the network to get initialized and updated after reading each word. In the small model,
two hidden layers (with 200 LSTM units per layer) are used with Tanh activation function. The weights are
initialized to 0.1. We trained it for four epochs with a learning rate of one (number of epochs trained with
initial learning rate) and then the learning rate is decreased by a factor of two after each epoch (the decay of the
learning rate for each epoch after four epochs), for a total of 13 training epochs. The size of each batch is 20,
then the network is unrolled for 20 steps. This model’s architecture is shown in [23].
4. RESULTS AND DISCUSSION
In this section we discuss the results of our experiments. For each model on each dataset,
Table 3 shows the CPU and GPU processing times while Tables 4–8 show the utilization levels of the CPU, GPU
and their memories. For the image classification datasets (MNIST and CIFAR-10), one can observe the superi-
ority of CNTK over TensorFlow and Theano in terms of GPU and CPU multithreading; however, in CIFAR-10
using 8, 16 and 32 threads in CPU, TensorFlow was faster than CNTK. On the other hand, Theano revealed
to be more time consuming than other frameworks. Transitioning to sentiment analysis dataset (IMDB), CPU
multithreading was not performed because CNTK is written in Python in which multithreading is not sup-
ported. Without CPU multithreading (CPU uses the default number of existing physical cores which are equal
one thread per core), the superiority of TensorFlow is revealed in both CPU and GPU environments. The results
for the text analysis dataset (Penn TreeBank) shows the superiority of TensorFlow over CNTK and Theano, for
CPU with 8 threads as well as the case in GPU. Moving forward to video analysis dataset (Self-Driving Car),
the superiority of TensorFlow is revealed in both CPU and GPU environments, while CNTK showed to be more
time consuming than the other two frameworks.
The processing times clearly show the advantage of GPU over CPU for training CNN and RNN.
The advantage of fast GPU would be more significant when training complex models with larger data as in the
Self-Driving Car dataset. From the CPU results, the best performance occurred when the number of threads is
equal to the number of physical CPU cores, where each thread possesses a single core. In our work, we use
a laptop with 8 cores. Thus, in each dataset, the best performance in terms of processing time was achieved
while using 8 threads. The metrics measurement of each framework was conducted to explain the failure of
one of the selected frameworks.
We notice poor performance of Theano at most datasets comparing to CNTK and TensorFlow.
This could be attributed to its low CPU utilization compared to the other frameworks. CNTK outperformed
both TensorFlow and Theano while training MNIST and CIFAR-10 datasets. This achievement is highly likely
due to the use of BrainScript format (https://docs.microsoft.com/en-us/cognitive-toolkit/BrainScript-Network-
Builder), which is a custom network description language that makes CNTK more flexible for NN customiza-
tion. On the other hand, TensorFlow uses Eigen (http://eigen.tuxfamily.org/index.php?title=Main Page), which
is a C++ template library (BLAS library) for linear algebra including matrices, vectors, numerical solvers and
related algorithms. It is used to make TensorFlow perform better than CNTK and Theano in RNN.
In addition to processing time, we also report the utilization levels of the CPU, GPU and their
memories for each model on each framework under consideration. These results are shown in Tables 4–8.
The utilization levels for both CPU and GPU are high for all models. The only supersizing numbers are the
CPU utilization for Theano, which were very low. The tables also show that the utilization levels are rather
small for both types of memory. This applies to all models for all frameworks. However, the tables also show
that, in most cases, CNTK had the lowest memory utilization while TensorFlow had the highest. Surprisingly,
the case is almost reversed for the video analysis dataset (the Self-Driving Car dataset), where CNTK had the
highest utilization and Theano had the lowest. Another unexpected finding of these experiments is that the
models of the IMDB generally needed the largest portions of memory.
Comparing our work to previous works [1, 2], we reveal the following findings. Bahrampour et al. [1]
based their comparative study on three main aspects including speed, hardware utilization and extensibility.
Besides, they used three NN types: CNN, AutoEncoder (AE) and LSTM to train MNIST, ImageNet [25]
and IMDB datasets on Caffe, Neon, TensorFlow, Theano and Torch frameworks. They came up with the
following results. Training on CPU, Torch performed the best followed by Theano while Neon had the worst
Benchmarking open source deep learning frameworks (Ghadeer Al-Bdour)
6. 5484 Ì ISSN: 2088-8708
performance. Moreover, Theano and Torch are the best in terms of extensibility, as well as TensorFlow and
Theano were very flexible and Caffe was the easiest to find the performance. Regarding training datasets on
GPU and for larger convolutional and fully connected networks (FCN), Torch was the best followed by Neon.
For smaller networks Theano was the best. For LSTM, Theano’s results were the best, while TensorFlow’s
performance was not competitive compared with the other studied frameworks.
On the other hand, Shi et al. [2] based their comparative study on two metrics: processing time and
convergence rate. The NN used are fully connected NN, CNN and RNN to train ImageNet, MNIST and
CIFAR-10 datasets on Caffe, CNTK, MXNet, TensorFlow and Torch frameworks. The results of TensorFlow
were the best using CPU. While using single GPU; on FCN, Caffe, CNTK and Torch performed better than
MXNet and TensorFlow. As for small CNN, Caffe and CNTK achieved a good performance and for RNN
(LSTM), however, CNTK was the fastest (5-10x faster than other frameworks). Using multi-GPU implementa-
tion, all frameworks had higher throughput and accelerated the convergence speed compared with single-GPU
implementation.
Table 3. Processing time for each dataset (measured in seconds), For the Environment columns, CPU (x)
denotes CPU with x threads
Env CNTK TensorFlow Theano
MNIST CPU (1) 847 5130 3560
CPU (2) 630 3180 2500
CPU (4) 574 2070 2260
CPU (8) 560 1740 2060
CPU (16) 567 1920 2050
CPU (32) 588 2010 2050
GPU 66.67 328.93 377.86
CIFAR-10 CPU (1) 20196 25905 26700
CPU (2) 14520 16610 18700
CPU (4) 13662 11550 17250
CPU (8) 11484 9955 15800
CPU (16) 11550 10340 15850
CPU (32) 11649 10835 15750
GPU 926 2166.4 2386.1
IMDB CPU (1) - 1244 538
CPU (2) - 642 412
CPU (4) - 390 380
CPU (8) 486 290 368
CPU (16) - 249 368
CPU (32) - 302 384
GPU 73.1 62.4 220.
Self-Driving Car CPU (1) - ∼33.3 hours ∼50 hours
CPU (2) - ∼19.8 hours ∼44.2 hours
CPU (4) - ∼15 hours ∼42.6 hours
CPU (8) ∼47.6 hours ∼14.1 hours ∼43.5 hours
CPU (16) - ∼16.4 hours ∼43.5 hours
CPU (32) - ∼16.4 hours ∼43.5 hours
GPU ∼8.7 hours ∼6 hours ∼6.8hours
PTB CPU (1) - 40560 27066
CPU (2) - 26819 23244
CPU (4) - 18733 21541
CPU (8) 4290 16407 21450
CPU (16) - 16848 21476
CPU (32) - 18369 21541
GPU 2106 1342.28 1630
Int J Elec & Comp Eng, Vol. 10, No. 5, October 2020 : 5479 – 5486
7. Int J Elec & Comp Eng ISSN: 2088-8708 Ì 5485
Table 4. Performance metrics of all models on the MNIST dataset
Metrics Environment CNTK TensorFlow Theano
Accuracy CPU 99.27 99.14 99.10
GPU 99.26 99.11 99.17
CPU% - 99.6 92.2 14.7
GPU% - 92 77 95
Memory% CPU 1.7 2.2 2.1
GPU 3.6 5.2 4.9
Epochs# CPU 7 15 10
GPU 7 15 10
Table 5. Performance metrics of all models on the the CIFAR-10 dataset
Metrics Environment CNTK TensorFlow Theano
Accuracy CPU 82.68 82.26 82.29
GPU 82.57 82.33 82.30
CPU% - 99.8 87.3 15.3
GPU% - 97 73 94.5
Memory% CPU 3.2 5.3 5.1
GPU 4.5 7.4 7.5
Epochs# CPU 33 55 50
GPU 33 55 50
Table 6. Performance metrics of all models on the IMDB dataset
Metrics Environment CNTK TensorFlow Theano
Accuracy CPU 88.87 88.68 88.72
GPU 88.93 88.83 88.48
CPU% - 94.8 92.2 14.6
GPU% - 76 76 88
Memory% CPU 5.7 6.6 5.1
GPU 6.6 9.1 7.6
Epochs# CPU 2 3 2
GPU 2 3 2
Table 7. Performance metrics of all models on the Self-Driving Car dataset
Metrics Environment CNTK TensorFlow Theano
Accuracy CPU 99.93 99.96 99.71
GPU 99.97 99.97 99.73
CPU% - 93.2 85 22
GPU% - 32.4 34 31
Memory% CPU 5.3 4.3 3.2
GPU 6.6 6.2 5.3
Epochs# CPU 10 10 10
GPU 10 10 10
Table 8. Performance metrics of all models on the Penn TreeBank dataset
Metrics Environment CNTK TensorFlow Theano
Perplexity CPU 113.7 114.79 114.57
GPU 113.2 113.21 113.3
CPU% - 94 91.8 18.3
GPU% - 76.6 77 81
Memory% CPU 1.3 2.3 4.1
GPU 2.2 4.5 5.4
Epochs# CPU 13 13 13
GPU 13 13 13
5. CONCLUSIONS AND FUTURE WORK
In this paper, we have provided a qualitative and quantitative comparison between three of the most
popular and most comprehensive DL frameworks (namely Microsoft’s CNTK, Google’s TensorFlow and Uni-
versity of Montreal’s Theano). The main goal of this work was to help end users make an informed decision
Benchmarking open source deep learning frameworks (Ghadeer Al-Bdour)
8. 5486 Ì ISSN: 2088-8708
about the best DL framework that suits their needs and resources. To ensure that our study is as comprehen-
sive as possible, we have used multiple benchmark datasets namely MNIST, CIFAR-10, Self-Driving Car and
IMDB which were trained via multilayer CNN network architecture and Penn TreeBank dataset which was
trained via RNN architecture. We have run our experiments on a laptop with windows 10 operating system.
We have measured performance and utilization of CPU multithreading , GPU and memory. For most of our
experiments, we find out that CNTK’s implementations are superior to the other ones under consideration.
REFERENCES
[1] S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah, “Comparative study of deep learning software
frameworks,” arXiv preprint arXiv:1511.06435, 2015.
[2] S. Shi, et al., “Benchmarking state-of-the-art deep learning software tools,” arXiv preprint
arXiv:1608.07249, 2016.
[3] R. Al-Rfou et al., “Theano: A python framework for fast computation of mathematical expressions,”
arXiv preprint arXiv:1605.02688, 2016.
[4] P. Goldsborough, “A tour of tensorflow,” arXiv preprint arXiv:1610.01178, 2016.
[5] V. Kovalev et al., “Deep learning with theano, torch, caffe, tensorflow, and deeplearning4j: Which one is
the best in speed and accuracy?,” Pattern Recognition and Information Processing (PRIP), 2016.
[6] F. Bastien, et al., “Theano: new features and speed improvements,” arXiv preprint arXiv:1211.5590, 2012.
[7] W. Ding, R. Wang, F. Mao, and G. Taylor, “Theano-based large-scale visual recognition with multiple
gpus,” arXiv preprint arXiv:1412.2302, 2014.
[8] W. Dai and D. Berleant, “Benchmarking contemporary deep learning hardware and frameworks: A survey
of qualitative metrics,” International Conference on Cognitive Machine Intelligence, pp. 148-155, 2019.
[9] C. Coleman et al., “Dawnbench: An end-to-end deep learning benchmark and competition,” 31st Confer-
ence on Neural Information Processing Systems, vol. 100, no. 101, 2017.
[10] A. Shatnawi et al., “A comparative study of open source deep learning frameworks,” 9th International
Conference on Information and Communication Systems, pp. 72-77, 2018.
[11] G. Al-Bdour, R. Al-Qurran, M. Al-Ayyoub, and A. Shatnawi, “A detailed comparative study of open
source deep learning frameworks,” arXiv preprint arXiv:1903.00102, 2019.
[12] D. Yu et al., “An introduction to computational networks and the computational network toolkit,”
Microsoft, Tech. Rep. MSR-TR-2014-112, 2014.
[13] D. Yu, K. Yao, and Y. Zhang, “The computational network toolkit [best of the web],” IEEE Signal Pro-
cessing Magazine, vol. 32, no. 6, pp. 123-126, 2015.
[14] J. Bergstra et al., “Theano: A cpu and gpu math compiler in python,” Proc. 9th Python in Science
Conference, vol. 1, pp. 3-10, 2010.
[15] M. Abadi et al., “Tensorflow: A system for large-scale machine learning,” Proceedings of the 12th
USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 265-283, 2016.
[16] F. Chollet et al., “Keras,” 2015.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recogni-
tion,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural
networks,” Advances in neural information processing systems, pp. 1097-1105, 2012.
[19] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” University of
Toronto, Tech. Rep., 2009.
[20] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, “Building a large annotated corpus of english: The
penn treebank,” Computational linguistics, vol. 19, no. 2, pp. 313-330, 1993.
[21] A. Maas et al., “Learning word vectors for sentiment analysis,” Proceedings of the 49th Annual Meeting
of the Association for Computational Linguistics: Human Language Technologies, 2011.
[22] R. Wallace et al., “First results in robot road-following,” IJCAI, pp. 1089-1095, 1985.
[23] G. Al-Bdour, “Comparative study between deep learning frameworks using multiple benchmark datasets,”
Master’s thesis, Jordan University of Science and Technology, 2017.
[24] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv preprint
arXiv:1409.2329, 2014.
[25] J. Deng, et al., “Imagenet: A large-scale hierarchical image database,” IEEE conference on computer
vision and pattern recognition, pp. 248-255, 2009.
Int J Elec & Comp Eng, Vol. 10, No. 5, October 2020 : 5479 – 5486