Leap Motion controller is an input device that can track hands and fingers position quickly and precisely. In some gaming environment, a need may arise to capture letters written in the air by Leap Motion, which cannot be directly done right now. In this paper, we propose an approach to capture and recognize which letter has been drawn by the user with Leap Motion. This approach is based on Deep Belief Networks (DBN) with Resilient Backpropagation (Rprop) fine-tuning. To assess the performance of our proposed approach, we conduct experiments involving 30,000 samples of handwritten capital letters, 8,000 of which are to be recognized. Our experiments indicate that DBN with Rprop achieves an accuracy of 99.71%, which is better than DBN with Backpropagation or Multi-Layer Perceptron (MLP), either with Backpropagation or with Rprop. Our experiments also show that Rprop makes the process of fine-tuning significantly faster and results in a much more accurate recognition compared to ordinary Backpropagation. The time needed to recognize a letter is in the order of 5,000 microseconds, which is excellent even for online gaming experience.
Two level data security using steganography and 2 d cellular automataeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Two level data security using steganography and 2 d cellular automataeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
We are progressing towards new discoveries and
inventions in the field of science and technology, but
unfortunately, very rare inventions could have helped
the problems faced by the physically challenged
people who face difficulties in communicating with
normal people as they use sign language as their
prime medium for communication. Mostly, the sign
languages are not understood by the common people.
Studies say that many research works have been done
to eliminate such kind of communication barrier. But
those work involves the functioning of
Microcontrollers or by some other complicated
techniques. Our study advances this process by using
the Kinect sensor. Kinect sensor is a highly sensitive
motion sensing device with many other applications.
Our workflow from capturing of an image of the
body to conversion into the skeletal image and from
image processing to feature extraction of the detected
image hence getting an output along with its meaning
and voice. The experimental results of our proposed
algorithm are also very promising with an accuracy
of 94.5%.
The presentation briefly answers the questions:
1. What is Machine Learning?
2. Ideas behind Neural Networks?
3. What is Deep Learning? How different is it from NN?
4. Practical examples of applications.

For more information:
https://www.quora.com/How-does-deep-learning-work-and-how-is-it-different-from-normal-neural-networks-and-or-SVM
http://stats.stackexchange.com/questions/114385/what-is-the-difference-between-convolutional-neural-networks-restricted-boltzma
https://www.youtube.com/watch?v=n1ViNeWhC24 - presentation by Ng
http://techtalks.tv/talks/deep-learning/58122/ - deep learning tutorial and slides - http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf
Deep learning for NLP - http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial
papers: http://www.cs.toronto.edu/~hinton/science.pdf
http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_ErhanCBV10.pdf
http://arxiv.org/pdf/1206.5538v3.pdf
http://arxiv.org/pdf/1404.7828v4.pdf
More recommendations - https://www.quora.com/What-are-the-best-resources-to-learn-about-deep-learning
Vision Based Gesture Recognition Using Neural Networks Approaches: A ReviewWaqas Tariq
The aim of gesture recognition researches is to create system that easily identifies gestures, and use them for device control, or convey in formations. In this paper we are discussing researches done in the area of hand gesture recognition based on Artificial Neural Networks approaches. Several hand gesture recognition researches that use Neural Networks are discussed in this paper, comparisons between these methods were presented, advantages and drawbacks of the discussed methods also included, and implementation tools for each method were presented as well.
Machine Learning approach for Assisting Visually ImpairedIJTET Journal
Abstract- India has the largest blind population in the world. The complex Indian environment makes it difficult for the people to navigate using the present technology. In-order to navigate effectively a wearable computing system should learn the environment by itself, thus providing enough information for making visually impaired adapt to the environment. The traditional learning algorithm requires the entire percept sequence to learn. This paper will propose algorithms for learning from various sensory inputs with selected percept sequence; analyze what feature and data should be considered for real time learning and how they can be applied for autonomous navigation for blind, what are the problem parameters to be considered for the blind navigation/protection, tools and how it can be used on other application.
In our World of today, the quest to get rich at all cost without working for our money has led some of our youth into crimes such as robbery and kidnapping. As a result of this and by the sheer fact that vehicles are now very expensive to buy these days, there is a need for people to safeguard their vehicles against these hoodlums to avoid loss of their precious Assets to these rampaging criminals. Tracking is technology that is used by many companies and individuals to track a vehicle, an individual or an asset by using many ways like GPS that operates using satellites and ground-based stations or by using our approach which depends on the cellular mobile towers. Vehicle tracking system is a system that can be used in monitoring and locating a vehicle, avoid theft or recover a stolen vehicle, for monitoring of vehicle routes to ensure strict compliance to an already defined vehicle routes, monitor driver’s behavior, predict bus arrival as well as for fleet management. Internet of things has made it very possible to devices to inter communicate amongst themselves and exchange information, helping in acquiring and analyzing information faster that we used to know in the past and this has helped more especially in vehicle monitoring to ensure that vehicle owners feel safe about their investments without fearing about their loss. In this paper, we propose a vehicle monitoring system based on IOT technology, using 4G/LTE to get the get the coordinate, speed, and overall condition of the vehicle, process and send to a remote server to be analyzed and used in locating the vehicle and monitor its other configured parameters. This is realized using Raspberry pi, 4G/LTE, GPS, Accelerometer and other sensors with communicate amongst themselves to get the environmental parameters which is processed and sent to a remote server where it is analyzed and represented on a map to locate the vehicle and monitor the other set parameters. 4G/LTE provides fast internet connectivity with overcomes the usual delay usually experienced in sending the acquired signals to be processed. The True Vehicle position is represented using google geolocation service and the actual position triangulated in real-time.
Artificial Neural Network / Hand written character RecognitionDr. Uday Saikia
1. Overview
2.Development of System
3.GCR Model
4.Proposed model
5.Back ground Information
6. Preprocessing
7.Architecture
8.ANN(Artificial Neural Network)
9.How the Human Brain Learns?
10.Synapse
11.The Neuron Model
12.A typical Feed-forward neural network model
13.The neural Network
14.Training of characters using neural networks
15.Regression of trained neural networks
16.Training state of neural networks
17.Graphical user interface….
This presentation is part of the webinar. Here is the link for the webinar recording https://www.anymeeting.com/geospatialworld/E955DA81854C39
Presentation Credits: NVIDIA & Geospatial Media
Smart surveillance systems play an important role in security today. The goal of security systems is to protect users against fires, car accidents, and
other forms of violence. The primary function of these systems is to offer security in residential areas. In today’s culture, protecting our homes is
critical. Surveillance, which ranges from private houses to large corporations, is critical in making us feel safe. There are numerous machine learning algorithms for home security systems; however, the deep learning convolutional neural network (CNN) technique outperforms the others. The
Keras, Tensorflow, Cv2, Glob, Imutils, and PIL libraries are used to train and assess the detection method. A web application is used to provide a
user-friendly environment. The flask web framework is used to construct it. The flash-mail, requests, and telegram application programming interface (API) apps are used in the alerting approach. The surveillance system tracks
abnormal activities and uses machine learning to determine if the scenario is normal or not based on the acquired image. After capturing the image, it is
compared with the existing dataset, and the model is trained using normal events. When there is an anomalous event, the model produces an output from which the mean distance for each frame is calculated.
We are progressing towards new discoveries and
inventions in the field of science and technology, but
unfortunately, very rare inventions could have helped
the problems faced by the physically challenged
people who face difficulties in communicating with
normal people as they use sign language as their
prime medium for communication. Mostly, the sign
languages are not understood by the common people.
Studies say that many research works have been done
to eliminate such kind of communication barrier. But
those work involves the functioning of
Microcontrollers or by some other complicated
techniques. Our study advances this process by using
the Kinect sensor. Kinect sensor is a highly sensitive
motion sensing device with many other applications.
Our workflow from capturing of an image of the
body to conversion into the skeletal image and from
image processing to feature extraction of the detected
image hence getting an output along with its meaning
and voice. The experimental results of our proposed
algorithm are also very promising with an accuracy
of 94.5%.
The presentation briefly answers the questions:
1. What is Machine Learning?
2. Ideas behind Neural Networks?
3. What is Deep Learning? How different is it from NN?
4. Practical examples of applications.

For more information:
https://www.quora.com/How-does-deep-learning-work-and-how-is-it-different-from-normal-neural-networks-and-or-SVM
http://stats.stackexchange.com/questions/114385/what-is-the-difference-between-convolutional-neural-networks-restricted-boltzma
https://www.youtube.com/watch?v=n1ViNeWhC24 - presentation by Ng
http://techtalks.tv/talks/deep-learning/58122/ - deep learning tutorial and slides - http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf
Deep learning for NLP - http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial
papers: http://www.cs.toronto.edu/~hinton/science.pdf
http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_ErhanCBV10.pdf
http://arxiv.org/pdf/1206.5538v3.pdf
http://arxiv.org/pdf/1404.7828v4.pdf
More recommendations - https://www.quora.com/What-are-the-best-resources-to-learn-about-deep-learning
Vision Based Gesture Recognition Using Neural Networks Approaches: A ReviewWaqas Tariq
The aim of gesture recognition researches is to create system that easily identifies gestures, and use them for device control, or convey in formations. In this paper we are discussing researches done in the area of hand gesture recognition based on Artificial Neural Networks approaches. Several hand gesture recognition researches that use Neural Networks are discussed in this paper, comparisons between these methods were presented, advantages and drawbacks of the discussed methods also included, and implementation tools for each method were presented as well.
Machine Learning approach for Assisting Visually ImpairedIJTET Journal
Abstract- India has the largest blind population in the world. The complex Indian environment makes it difficult for the people to navigate using the present technology. In-order to navigate effectively a wearable computing system should learn the environment by itself, thus providing enough information for making visually impaired adapt to the environment. The traditional learning algorithm requires the entire percept sequence to learn. This paper will propose algorithms for learning from various sensory inputs with selected percept sequence; analyze what feature and data should be considered for real time learning and how they can be applied for autonomous navigation for blind, what are the problem parameters to be considered for the blind navigation/protection, tools and how it can be used on other application.
In our World of today, the quest to get rich at all cost without working for our money has led some of our youth into crimes such as robbery and kidnapping. As a result of this and by the sheer fact that vehicles are now very expensive to buy these days, there is a need for people to safeguard their vehicles against these hoodlums to avoid loss of their precious Assets to these rampaging criminals. Tracking is technology that is used by many companies and individuals to track a vehicle, an individual or an asset by using many ways like GPS that operates using satellites and ground-based stations or by using our approach which depends on the cellular mobile towers. Vehicle tracking system is a system that can be used in monitoring and locating a vehicle, avoid theft or recover a stolen vehicle, for monitoring of vehicle routes to ensure strict compliance to an already defined vehicle routes, monitor driver’s behavior, predict bus arrival as well as for fleet management. Internet of things has made it very possible to devices to inter communicate amongst themselves and exchange information, helping in acquiring and analyzing information faster that we used to know in the past and this has helped more especially in vehicle monitoring to ensure that vehicle owners feel safe about their investments without fearing about their loss. In this paper, we propose a vehicle monitoring system based on IOT technology, using 4G/LTE to get the get the coordinate, speed, and overall condition of the vehicle, process and send to a remote server to be analyzed and used in locating the vehicle and monitor its other configured parameters. This is realized using Raspberry pi, 4G/LTE, GPS, Accelerometer and other sensors with communicate amongst themselves to get the environmental parameters which is processed and sent to a remote server where it is analyzed and represented on a map to locate the vehicle and monitor the other set parameters. 4G/LTE provides fast internet connectivity with overcomes the usual delay usually experienced in sending the acquired signals to be processed. The True Vehicle position is represented using google geolocation service and the actual position triangulated in real-time.
Artificial Neural Network / Hand written character RecognitionDr. Uday Saikia
1. Overview
2.Development of System
3.GCR Model
4.Proposed model
5.Back ground Information
6. Preprocessing
7.Architecture
8.ANN(Artificial Neural Network)
9.How the Human Brain Learns?
10.Synapse
11.The Neuron Model
12.A typical Feed-forward neural network model
13.The neural Network
14.Training of characters using neural networks
15.Regression of trained neural networks
16.Training state of neural networks
17.Graphical user interface….
This presentation is part of the webinar. Here is the link for the webinar recording https://www.anymeeting.com/geospatialworld/E955DA81854C39
Presentation Credits: NVIDIA & Geospatial Media
Smart surveillance systems play an important role in security today. The goal of security systems is to protect users against fires, car accidents, and
other forms of violence. The primary function of these systems is to offer security in residential areas. In today’s culture, protecting our homes is
critical. Surveillance, which ranges from private houses to large corporations, is critical in making us feel safe. There are numerous machine learning algorithms for home security systems; however, the deep learning convolutional neural network (CNN) technique outperforms the others. The
Keras, Tensorflow, Cv2, Glob, Imutils, and PIL libraries are used to train and assess the detection method. A web application is used to provide a
user-friendly environment. The flask web framework is used to construct it. The flash-mail, requests, and telegram application programming interface (API) apps are used in the alerting approach. The surveillance system tracks
abnormal activities and uses machine learning to determine if the scenario is normal or not based on the acquired image. After capturing the image, it is
compared with the existing dataset, and the model is trained using normal events. When there is an anomalous event, the model produces an output from which the mean distance for each frame is calculated.
Hand LightWeightNet: an optimized hand pose estimation for interactive mobile...IJECEIAES
In this paper, a hand pose estimation method is introduced that combines MobileNetV3 and CrossInfoNet into a single pipeline. The proposed approach is tailored for mobile phone processors through optimizations, modifications, and enhancements made to both architectures, resulting in a lightweight solution. MobileNetV3 provides the bottleneck for feature extraction and refinements while CrossInfoNet benefits the proposed system through a multitask information sharing mechanism. In the feature extraction stage, we utilized an inverted residual block that achieves a balance between accuracy and efficiency in limited parameters. Additionally, in the feature refinement stage, we incorporated a new best-performing activation function called “activate or not” ACON, which demonstrated stability and superior performance in learning linearly and non-linearly gates of the whole activation area of the network by setting hyperparameters to switch between active and inactive states. As a result, our network operated with 65% reduced parameters, but improved speed by 39% which is suitable for running in a mobile device processor. During experiment, we conducted test evaluation on three hand pose datasets to assess the generalization capacity of our system. On all the tested datasets, the proposed approach demonstrates consistently higher performance while using significantly fewer parameters than existing methods. This indicates that the proposed system has the potential to enable new hand pose estimation applications such as virtual reality, augmented reality and sign language recognition on mobile devices.
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)csandit
The proposed work aims to create a smart application camera, with the intention of eliminating
the need for a human presence to detect any unwanted sinister activities, such as theft in this
case. Spread among the campus, are certain valuable biometric identification systems at
arbitrary locations. The application monitosr these systems (hereafter referred to as “object”)
using our smart camera system based on an OpenCV platform.
By using OpenCV Haar Training, employing the Viola-Jones algorithm implementation in
OpenCV, we teach the machine to identify the object in environmental conditions. An added
feature of face recognition is based on Principal Component Analysis (PCA) to generate Eigen
Faces and the test images are verified by using distance based algorithm against the eigenfaces,
like Euclidean distance algorithm or Mahalanobis Algorithm.
If the object is misplaced, or an unauthorized user is in the extreme vicinity of the object, an
alarm signal is raised.
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)cscpconf
The proposed work aims to create a smart application camera, with the intention of eliminating the need for a human presence to detect any unwanted sinister activities, such as theft in this
case. Spread among the campus, are certain valuable biometric identification systems at arbitrary locations. The application monitosr these systems (hereafter referred to as “object”)
using our smart camera system based on an OpenCV platform.
By using OpenCV Haar Training, employing the Viola-Jones algorithm implementation in OpenCV, we teach the machine to identify the object in environmental conditions. An added
feature of face recognition is based on Principal Component Analysis (PCA) to generate Eigen Faces and the test images are verified by using distance based algorithm against the eigenfaces, like Euclidean distance algorithm or Mahalanobis Algorithm. If the object is misplaced, or an unauthorized user is in the extreme vicinity of the object, an alarm signal is raised.
Similar to Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 8, No. 6, December 2018, pp. 4693 – 4704
ISSN: 2088-8708, DOI: 10.11591/ijece.v8i6.pp4693-4704 4693
Institute of Advanced Engineering and Science
w w w . i a e s j o u r n a l . c o m
Deep Belief Networks for Recognizing Handwriting Captured
by Leap Motion Controller
Abas Setiawan1,2
and Reza Pulungan1
1
Dept. of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Indonesia
2
Dept. of Informatics Engineering, Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
Article Info
Article history:
Received December 29, 2017
Revised May 30, 2018
Accepted July 13, 2018
Keyword:
Handwriting
Leap Motion
Deep belief network
Accuracy
Resilient backpropagation
ABSTRACT
Leap Motion controller is an input device that can track hands and fingers position quickly
and precisely. In some gaming environment, a need may arise to capture letters written in
the air by Leap Motion, which cannot be directly done right now. In this paper, we propose
an approach to capture and recognize which letter has been drawn by the user with Leap
Motion. This approach is based on Deep Belief Networks (DBN) with Resilient Backprop-
agation (Rprop) fine-tuning. To assess the performance of our proposed approach, we con-
duct experiments involving 30,000 samples of handwritten capital letters, 8,000 of which
are to be recognized. Our experiments indicate that DBN with Rprop achieves an accu-
racy of 99.71%, which is better than DBN with Backpropagation or Multi-Layer Perceptron
(MLP), either with Backpropagation or with Rprop. Our experiments also show that Rprop
makes the process of fine-tuning significantly faster and results in a much more accurate
recognition compared to ordinary Backpropagation. The time needed to recognize a letter is
in the order of 5,000 microseconds, which is excellent even for online gaming experience.
Copyright c 2018 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Reza Pulungan
Department of Computer Science and Electronics, Universitas Gadjah Mada, Indonesia
Gedung KPTU FMIPA UGM, Sekip Utara, Bulaksumur, Sleman, Yogyakarta, 55281, Indonesia
+6282136261216
Email: pulungan@ugm.ac.id
1. INTRODUCTION
Computer aided handwriting recognition can be performed either on-line or off-line. Typically, on-line hand-
writing recognition is based on specific input devices, such as real-time capturing digital pen, finger-touch screen, or
mouse movement. In the recent decade, such input devices have already been equipped with computer vision tech-
nology that enables recognition of natural hand or body gestures as seen in Kinect-based handwriting recognition [1].
Huang et al. [2] introduce digit recognition on Kinect by using Multiple Segment and Scaled Coding with 94.6%
accuracy. Other methods are proposed by using Depth-Skin Background Mixture Model for hand segmentation [3]
and Dynamic Time Warping combined with Support Vector Machines (SVM) for digit recognition [1].
Kinect has the ability to recognize body shape or hand movements, but it faces difficulties in obtaining
precise finger position compared to Leap Motion controller. Leap Motion controller (or simply Leap Motion) is a new
computer vision device that can track the position of fingers more precisely. The 3D finger data can be obtained at over
100 frames per second [4], making it possible to efficiently track the position of each finger in detail. However, Leap
Motion still faces some limitations: it cannot directly recognize a specific sequence of fingers’ strokes (movements)
performed when drawing a letter or a digit in air.
Several algorithms have been proposed to recognize these sequences of strokes to obtain a better accuracy.
Vikram et al. develop handwriting recognition based on Leap Motion by using Dynamic Time Warping algorithm [4].
Agarwal et al. [5] present segmentation and recognition of 3D handwritten text captured by Leap Motion. Hidden
Markov Model (HMM) is used to train segmented words and achieves an accuracy of 77.6%. Another algorithm for
rapid recognition of hand graphical gestures is SVM. SVM is used to recognize 100 testing samples with an average
recognition rate of 82.4% [6].
Journal Homepage: http://iaescore.com/journals/index.php/IJECE
Institute of Advanced Engineering and Science
w w w . i a e s j o u r n a l . c o m
2. 4694 ISSN: 2088-8708
Researches show that algorithms with shallow architectures (i.e., Decision Tree [7], Gaussian Mixture Models
(GMM), HMM, Conditional Random Fields (CRF), Multi-Layer Perceptron (MLP) [8], and SVM [9]) are effective
in solving many simple or well-constrained problems. However, these algorithms may face difficulties in complicated
real-world applications involving natural signals, such as human speech, natural sound and language, natural images,
and visual scenes [10] because of their modelling limitation. Instead, a deep architecture algorithm able to recognize
handwritten digits captured by Leap Motion is available, namely Convolutional Neural Networks (CNN). However, it
is hard to train deep MLP (and hence also CNN) without GPU parallelization [11].
CNN has been used to simulate 3D digit recognition model that can be projected as 2D in a 3D environment
on Leap Motion tracking data [12]. This method achieves an accuracy of 92.4%. In this paper, we propose to use
Deep Belief Networks (DBN) to solve this recognition problem. DBN is one of deep architecture algorithms and
consists of two phases: greedy layer-wise unsupervised pre-training and discriminative fine-tuning [13, 14, 15]. Pre-
training, initialized by a stacked Restricted Boltzmann Machines, can generate a better result compared to purely
random configuration of the MLP. After the pre-training phase in the DBN, the network can be treated simply as an
MLP and then trained with Backpropagation to enhance its weights [14, 15]. Alternatively, Resilient Backpropagation
can be used to train the networks instead of ordinary Backpropagation.
2. PRELIMINARIES
2.1. Leap Motion controller
Leap Motion is an input device that allows users to recognize and track hands, fingers, and finger-like tools.
Leap Motion is developed by Michael Buckwald and David Holz at Leap Motion, Inc. Leap Motion is able to track
3D finger data precisely and obtains them at over 100 frames per second [4]. Besides, Leap Motion device is also
cheaper compared to other similar computer-vision devices. The dimension of the device is 1.2×3×7.6 cm3
and it
uses optical sensors and infrared light, as shown in Figure 1 [16]. Leap Motion’s sensors can detect hands or fingers
across the upward y-axis and possess a 150◦
-vision with effective range of 25 to 600 millimeters [17].
Figure 1. Detail of the Leap Motion’s sensors [16]
To develop applications that use Leap Motion as input device, we can make use of an available API that can
be accessed via various programming languages. Leap Motion API version 2.0 [17] introduces New Skeletal Tracking
Model that provides additional information for arms, fingers, and overall tracking enhancement. In order to use Leap
Motion, first, it must be connected with an existing USB port and positioned beside the monitor at a distance of the
cable’s range. Leap Motion device will detect gestures and the position as well as the movement of hands or fingers
as the user positions her hands on its range. To retrieve the desired hand tracking data, Leap Motion API provides an
update set or a frame of data. Every single frame object from the API represents a frame that contains a list of tracking
entities called motion tracking data. Motion tracking data, such as hands, fingers, and tools, can also be accessed
through the API.
2.2. Restricted Boltzmann machines
A Boltzmann machine is a stochastic recurrent neural network with binary stochastic units. There are two
types of layers in Boltzmann machines: visible layer and hidden layer. Visible units or neurons on the visible layer
are connected to other units and receive information from the environment. Visible units also bring the input signal
from the environment and send it to other units during training. Hidden units, on the other hand, operate freely
and extract features from the input signal. Interconnections of units in Boltzmann machines can be done even in a
single layer itself. Unfortunately, learning in Boltzmann machines is often impractical and furthermore they also face
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018: 4693 – 4704
3. Int J Elec & Comp Eng ISSN: 2088-8708 4695
scalability issues. In order to make learning in Boltzmann machines easier, advanced restrictions of connectivity must
be enforced. This restriction is called Restricted Boltzmann Machines (RBM) [18].
In RBM, there is no connection between units of the same layer. RBM is composed of a visible layer and a
hidden layer of stochastic neurons that are connected by symmetrical weights (i.e., there is no weight toward itself).
Hidden units in RBM are considered as feature detector.
2.3. Deep belief networks
A deep belief network is a generative model, consisting of multiple stochastic layers. A stochastic layer has
several latent variables. Latent variables typically have binary values and are called hidden units or feature detec-
tors [13]. Each DBN layer is an RBM and in order to create many layers, several RBMs are stacked on top of each
other to construct a DBN construction. Hinton et al. in [13] propose an idea to learn one layer at a time and restrict
the connectivity of binary stochastic units in order to make learning more efficient and simpler [18].
DBN training begins with RBM training: starting from input samples and then sequentially progressing
toward the next RBM training. Generated patterns formed by the top RBM can deliver a reversed propagation to the
input layer by using conditional probability as in belief network or sigmoid belief network. This training procedure
is called DBN’s greedy layer-wise training [13]. DBN can work with or without supervision. Although DBN can be
used in supervised or discriminative mode, basically, greedy layer-wise training is an unsupervised training for each
layer of stacked RBMs. It is called a pre-training procedure [19, 20, 21]. The purpose of pre-training procedure on
each layer’s training is to put the parameters of all layers in a region of certain parameter ranges that contain a good
generalization of local optimum that can be obtained by local gradient descent [20, 21].
In an RBM, the joint distribution p(v, h; w) of visible units v and hidden units h with parameters w is defined
by the energy function E(v, h; w). For Bernoulli-visible units that lead to Bernoulli-hidden units, the energy function
is defined by:
E(v, h; w) = −
I
i=1
J
j=1
wijvihj −
I
i=1
bivi −
J
j=1
ajhj,
where wij represents the symmetrical interaction between visible unit vi and hidden unit hj, bi and aj are the bias
terms of, and I and J are the numbers of visible and hidden units, respectively. Therefore, the conditional probabilities
can be calculated by:
p(hj = 1 | v; w) = ϕ
I
i=1
viwij + aj , and (1)
p(vi = 1 | h; w) = ϕ
J
j=1
hjwij + bi
, (2)
where ϕ(x) = 1/(1 + exp(−x)).
Greedy procedure on stacked RBMs is used to raise the maximum likelihood learning. Equation (3) is a
learning rule for updated weights at each RBM to collect the gradient log-likelihood:
∆wij = Edata(vi, hj) − Emodel(vi, hj), (3)
where Edata(vi, hj) is the expectation of observation on the training data and Emodel(vi, hj) is the expectation under
the distribution defined by the model. Calculating Emodel(vi, hj) is intractable, so Contrastive Divergence (CD)
algorithm is used to solve this problem [13].
CD algorithm performs Gibbs sampling to replace Emodel(vi, hj) in one or more steps. CD-1, namely CD
algorithm that only performs one step of Gibbs sampling on the data, is used to train the model. The following are the
steps of CD-1. First, input sample x and updated hidden units are used to compute p(hj = 1, v; w). Second, units
are activated by comparing the sample result from the previous step with random values to obtain the binary state for
each unit. Third, those binary units are reconstructed by calculating p(vi = 1, h; w). Fourth, the output probability of
the reconstruction is calculated, and the reconstruction result becomes an input. Fifth, after all collections of statistics
from the previous step are obtained, the weights on the RBM are updated.
After completing pre-training procedure in DBN, the next phase is to perform discriminative fine-tuning with
Backpropagation [13, 14]. Fine-tuning with Backpropagation is carried out by adding a final output layer, which is a
labeled class excluded from the pre-training procedure.
Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller (Setiawan and Pulungan)
4. 4696 ISSN: 2088-8708
3. PROPOSED APPROACH
Figure 2 depicts the architecture of our proposed approach for in-air handwriting recognition using Leap
Motion. The proposed approach is developed based on the idea laid out in [22]. Data acquisition is a process of
collecting every letter sample by using Leap Motion. This captured letter sample can be used for recognition, or else,
it can be stored in a dataset. The letter dataset is split into two parts, namely: training and testing datasets. The training
dataset will be incorporated into a learning system by using several proposed algorithms; and this basically constitutes
the training process. After training process has been completed, several model algorithms will be constructed. These
model algorithms are then used for testing (by using the testing dataset) in order to obtain and compare their accuracy.
The best model algorithm is then selected for further recognition of letter input data captured by Leap Motion.
Data Acquisition
using Leap
Motion
A Single
Data
Letter
Dataset
Saved or
Recognized?
Learning System
Saved
Model
Recognized
Recognition
Figure 2. Our proposed approach
3.1. Data acquisition
Data acquisition is a mechanism to obtain handwritten sample data by using Leap Motion. A sample data
is a sequence of information about the movement of hand (inputs only use one hand and one finger) in the air as
captured by Leap Motion. The motion of finger-hand draws a letter stroke, which is then displayed on the screen of
the simulation application integrated with Leap Motion API. Figure 3 depicts the flow for creating one sample data
using the existing API.
Start
Input:
Hand
Gesture
Leap Motion
Initialization
End
2D Mapping
Normalizing
Stabilized
Position
Touch
Emulation
Image Matrix
Creation
Image Matrix
Dilation
Output: a Letter
as a Binary
Vector
Figure 3. Data acquisition workflow
Leap Motion initialization Initialization is performed to connect the simulation application to the API. In the sim-
ulation application, the user is given information about what letter should be drawn with the Leap Motion. As the user
starts drawing the letter in the air, we can access the captured Frame object of the Leap Motion through the API.
2D mapping Through the Frame we can access the Finger object, and hence the tip positions of all fingers can
be obtained through the API. However, we do not need the positions of all fingers, but only the dominant finger as
when pointing with the forefinger. The dominant finger can be obtained via a property of the Finger object, namely
Frontmost. In addition, in order to obtain the position of the dominant finger’s tip with minimal jitter, a property
named StabilizedTipPosition can be used. Because of 2D mapping, the z-position of the forefinger produced by
StabilizedTipPosition is ignored, and the x and y-position are then denoted by xpos and ypos, respectively.
Normalizing stabilized position InteractionBox property of Frame is an object on the API to ensure the user’s
finger-hand always stays on the device’s visibility range. While finger-hand is inside InteractionBox, movements of
the fingers will be detected on 3D coordinates, as shown in Figure 4(left). When the application is mapped to 2D
coordinates, the InteractionBox is as shown in Figure 4(right).
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018: 4693 – 4704
5. Int J Elec & Comp Eng ISSN: 2088-8708 4697
Figure 4. Field-of-view of the (left) 3D and the (right) 2D Interactionbox [17]
InteractionBox simplifies the mapping of every point of finger’s tip position by normalizing the range of
captured hands from left to right or top to bottom to values between 0 to 1, relatively to the application screen. If wapp
and happ are the width and height, respectively, of the 2D field in the application screen, InteractionBox provides the
normalized position of the final stroke relative to the application’s coordinate (xapp, yapp), which is given by:
xapp = NP(xpos)wapp and yapp = (1 − NP(ypos))happ,
where NP is a normalization function in InteractionBox called normalizePoint.
Touch emulation From the positions of all strokes representing a single letter, a sequence of strokes S = (S1, S2, S3,
S4, . . . , Sm) is formed, where each Si = (xapp, yapp) for 1 ≤ i ≤ m, and m is the number of strokes required to draw
the letter. Strokes can be in two states: hovering and touching. Not all strokes are used; those that are in hovering state
can simply be ignored, as illustrated in Figure 5(left).
(0,0) (336,0)
(336,336)(0,336)
x x xx x x
x
x
x
x
x
x
x
x
x
x
x
x
x
O = Touch
X = Hover
Hovering Touching
+1 0 -1
Figure 5. Touch emulation: the difference between hovering and touching [17]
The state is determined by a normalized distance in the range of +1 to −1. First, the finger enters the hovering
state from normalized distance of +1 and the distance decreases toward 0 where the finger is near the touching surface.
When the finger penetrates the surface, the normalized distance decreases to the maximum −1, which indicates that it
is right now in the touching state. Stroke points are only recorded when the finger’s tip is in the touching state.
Image matrix creation The tracing in the previous process is related to the tile-based editor of the simulation
application. Tile-based editors are often developed in game technology, specifically to create the game environ-
ment [23, 24]. In this research, all tiles are of the same width and height, namely 12 pixels, and the tile-map is a
28×28 tile matrix.
If the tile-map is considered as a representation of an image, each tile will have an intensity value and an
index position on the tile-map. Each tile indicates a pixel image drawn by the user in the air. Therefore, in order to
select a tile that corresponds to the user’s creation, a fixed stroke point position with the hovering or touching state is
used. Figure 6 provides an illustration on how the selection of tile-based position for each stroke captured by Leap
Motion is done.
In Figure 6, mt is map-top distance, ml is map-left distance, Ts is the size of a tile, and (xt, yt) is the
selected index of a tile. The sequence of strokes S will be converted into an indexed (vector) sequence of strokes
V = (V1, V2, V3, V4, . . . , Vn) with Vi = (xi, yi) for 1 ≤ i ≤ n and n is the vector size of the 28×28 image matrix,
namely 784. Each Vi has an intensity value of 0 or 1 determined by:
Vi =
1, if d > 0,
0, otherwise,
where d is the touch distance, which defines whether the finger is in hovering or touching state. A coordinate (xt, yt)
of each stroke point is then defined by:
xt =
xapp − ml
Ts
and yt =
yapp − mt
Ts
.
Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller (Setiawan and Pulungan)
6. 4698 ISSN: 2088-8708
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
xapp = 218, yapp = 158
Ts = 12 px
(xt,yt) = (14,9) = 1
28 28 Matrix
mt = 50 px
ml = 50 px
Figure 6. Selecting an indexed tile on the tile-map
In the end, this phase basically converts an image (tile) matrix obtained by the Leap Motion into a vector with binary
elements representing the image.
Image matrix dilation Sometimes when the user writes a letter in the air by using Leap Motion too quickly, the
handwritten letter becomes unclear. To clarify the handwritten letter, dilation operation can be applied to the image
matrix. Dilation simply takes several specific neighboring points of a point and gives them an intensity value of 1, as
depicted in Figure 7.
x - 1 , y - 1t t x , y - 1t t x + 1 , y - 1t t
x - 1 , yt t x , yt t x + 1 , yt t
x , y + 1t t x + 1 , y + 1t tx - 1 , y + 1t t
1 1 1
1 1 1
1 1 1
Figure 7. Image matrix dilation for letter “D”
3.2. Data, dataset, and data preparation
Each instance of data acquisition creates a single sample data. A single sample data is represented by a 784-
size vector, which encodes the 28×28 matrix produced in image matrix creation. The data is used for two purposes: as
an addition to dataset collection used as training or testing dataset with labeled data, and to be used in the recognition
process, where the label will be searched out based on the learning algorithm trained before.
In this research, 22,000 samples of training and 8,000 samples of testing datasets are collected from 10
respondents. The respondents are undergraduate students in Universitas Dian Nuswantoro, Semarang, Indonesia, aged
20-21 years old. Each of them produces around 3,000 handwritings with Leap Motion, captured within 1 month. The
training and testing datasets are restricted for upper-case letters from A to Z. Because the learning task is supervised,
the dataset used for the DBN is in the form of [(V1, L1), (V2, L2), . . . , (Vk, Lk)]. For each letter sample, input data is
a vector representing the sequence of strokes Vi for 1 ≤ i ≤ k whose label is Li, and k is the number of samples in
the dataset. All elements of input vector Vi are of binary value.
Encoding the label Li, which has value from “A” to “Z”, such that Li ∈ {0, 1, 2, . . . , 25} cannot be included
directly into a DBN. In order to map Li as an element of a vector, one-hot encoding mechanism is used. In one-hot
encoding, label Li containing the value “A” is encoded by [1, 0, 0, . . . , 0], “B” is encoded by [0, 1, 0, . . . , 0], etc.
3.3. Learning system
This research makes use of Deep Belief Networks (DBN) in supervised fashion. DBN basically learns the
model of Fw : {V } → {L}, which maps an input vector V into a labeled vector L with parameter weights w.
Figure 8(a) shows the architecture of a DBN with two hidden layers, h1 and h2. Two main processes exist on learning
this DBN model. Firstly, the process of greedy layer-by-layer training for each stacked RBM by using Contrastive
Divergence algorithm, which is called pre-training. Secondly, the process of fine-tuning the entire network using
Resilient Backpropagation or Rprop.
3.3.1. RBM initialization and construction
As described earlier, a DBN consists of stacked RBMs [13], therefore, each RBM layer first needs to be
initialized. Figure 8(b) shows a DBN constructed by two layers of RBMs. The following is the pre-training process,
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018: 4693 – 4704
7. Int J Elec & Comp Eng ISSN: 2088-8708 4699
RBM1
V
h1
h2
v1
Pre-training
L
y
Fine-tuning
(Rprop)
RBM 1
h1
w1
v1
HIDDEN LAYER
VISIBLE LAYER
Input Data x
RBM 2
h2
w2
v2
HIDDEN LAYER
VISIBLE LAYER
HIDDEN LAYER
INPUT LAYER
wij
h1
Input Data x
fine-tuning
HIDDEN LAYER
wjk
h2
fine-tuning
OUTPUT LAYER yl
y
(a) (b) (c)
Figure 8. (a) A DBN model with Rprop, (b) Constructing two RBMs in a DBN (c) Fine-tuning DBN with Rprop
which uses unsupervised learning algorithm on DBN to construct the stacked RBMs:
Step 0: In the first RBM, visible layer v1 gets direct input from sample V .
Step 1: Train the RBM’s visible v1 and hidden h1 layers to adjust weights w1.
Step 2: After the training is completed, use the pattern activity of the first RBM’s hidden layer h1 as input to visible
layer v2 to train the second RBM.
Step 3: By copying h1 as v2 in the second RBM, train visible layer v2 and hidden layer h2 to adjust weights w2.
Note that in this process, class labels of data are not involved.
3.3.2. Training each RBM layer
Every RBM is trained by using Contrastive Divergence with one step of Gibbs sampling algorithm (or CD-1).
CD-1 follows the gradient difference function called Kullback-Leibler divergence. Here are the steps of CD-1:
Step 1: While maximum epoch is not reached, do step 2 to 7 repeatedly.
Step 2: Use data sample V as visible layer v.
Step 3: Update hidden layer by calculating p(hj = 1, v; w) (cf. Equation (1)), then activate stochastic binary
neurons hj on the hidden layer by taking the probability p(hj = 1, v; w).
Step 4: Reconstruct visible neurons v1
i by calculating p(vi = 1, h; w) (cf. Equation (2)) using the value of stochas-
tic binary neurons calculated before as input.
Step 5: Calculate the output probability of h1
j from previous reconstruction step, namely p(h1
j = 1, v1
; w).
Step 6: Calculate weights (cf. Equation (3)) by using the learning rate α as follows:
∆wij = α( vihj
0
− vihj
1
),
where vihj
0
= Edata(vihj) and vihj
1
= Emodel(vihj).
Step 7: Update the new weights by adding momentum µ and weight decay wd as follows:
wgrad = vihj
0
− vihj
1
, and wij = wij(t − 1) + (µ∆wij + α(wgrad) − wdwij(t − 1)).
Learning rate is a parameter that influences the learning speed of the algorithm in reaching convergence.
Momentum is a parameter used to prevent the system from being trapped in local minima. Weight decay works by
adding “additional terms” to the normal gradient. The additional term is a derivative of a function that penalizes an
excessive weight [25].
The process of updating weights of RBM refers to the update rules of the gradient descent. In the case of
large-scale datasets, using batch gradient descent requires quite an expensive computation when it only needs one
step for the entire data training. Therefore, we use a more efficient learning mode that learns the dataset for each
predetermined batch. This mode is called mini-batch gradient descent [25].
3.3.3. Fine-tuning Rprop
Because DBN is to be used as supervised (after pre-training process is completed), which involves label
classes, weights have to be fine-tuned. Fine-tuning is conducted by training the network with Backpropagation algo-
Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller (Setiawan and Pulungan)
8. 4700 ISSN: 2088-8708
rithm [13, 14]. In this research, besides using fine-tuning with Backpropagation, we also perform Resilient Backprop-
agation (Rprop) algorithm as shown in Figure 8(c).
Theoretically, Rprop algorithm is more efficient in terms of learning speed compared to ordinary Backprop-
agation. The difference between Rprop and Backpropagation is located on the weight update, where Rprop makes use
of the sign that identifies the value of the last gradient update [26]. In the Rprop algorithm, we use weights and biases
obtained from pre-training process, namely wij = w1 (from input to the first hidden layer) and wjk = w2 (from the
first hidden layer to the second hidden layer). The weights and bias connection between the second hidden layer to
output layer uses random initialization in the interval (−1, 1). We initialize the maximum epoch and set the values of
∆0 = 0.0125, ∆max = 50.0, ∆min = 10−6
, η+
= 1.2, and η−
= 0.5.
4. RESULT AND ANALYSIS
A learning algorithm naturally minimizes the cost function in order to obtain optimal result. When the
algorithm is being trained using some training dataset, we are basically creating a model algorithm that closely reflects
the available data. Afterwards, the obtained model algorithm can then be applied to testing dataset to assess the
performance of the algorithm.
4.1. Experimental setup
All methods and algorithms in our proposed approach are implemented in C#. All experiments are conducted
using this implementation on a computer with a Core(TM) i7-2600 CPU @ 3.40GHz (8 CPUs), 3.7 GHz. Learning
is performed with a parallel computation on a logical core for each CPU. For our experiments, we perform data
acquisition that produces 30,000 samples of capital letters, partitioned into two groups: 22,000 samples as training
and 8,000 samples as testing datasets. There are about 843 samples for each letter from A to Z in training dataset,
whereas in testing dataset, each letter has about 307 samples. Every single data in the training dataset will be used as
an input for the learning algorithm.
In order to assess the performance of our proposed approach, we not only conduct experiments on DBN
with Rprop fine-tuning, but also on three other approaches for the purpose of comparison. The three approaches are
MLP with Backpropagation (Bp), MLP with Rprop, and DBN with Backpropagation. Table 1 shows the experimental
configuration we use for each of those approaches.
Table 1. Experimental configurations
Dataset Architecture Pre-training Fine-tuning
Letter
784-400-100-26 (MLP) - Bp (Ep.: 300)
784-400-100-26 (MLP) - Rprop (Ep.: 300)
784-400-100-26 (DBN) CD-1 (Ep.: 300) Bp (Ep.: 300)
784-400-100-26 (DBN) CD-1 (Ep.: 300) Rprop (Ep.: 300)
Digit
(MNIST)
400-400-100-10 (MLP) - Bp (Ep.: 500)
400-400-100-10 (MLP) - Rprop (Ep.: 500)
400-400-100-10 (DBN) CD-1 (Ep.: 300) Bp (Ep.: 500)
400-400-100-10 (DBN) CD-1 (Ep.: 300) Rprop (Ep.: 500)
Table 1 shows that the neural network architecture we use when dealing with the letter dataset is 784-400-
100-26. In this architecture, the input is a 784-sized vector, which is a flattening of the 28×28 tile-map matrix for
each letter sample. Both MLP and DBN use 400 and 100 neurons in the first and second hidden layers, respectively.
The size of the output vector is adjusted to the number of letter labels, namely 26 classes with one-hot-encoding
mechanism.
Beside using the letter dataset captured by our own data acquisition mechanism, we also make use of the
digit dataset from MNIST database [27]. The MNIST digit dataset consists of 60,000 samples in training dataset and
10,000 samples in testing dataset. Each sample in the MNIST digit dataset is formed by a grayscale image. To use
them, it is necessary first to perform preprocessing by transforming the grayscale into binary images. Afterwards,
width normalization is carried out by changing the original size of 28×28 pixels to 20×20 pixels. The 20×20 tile-
map matrix is then flattened into an input vector of size 400 as shown in the last four rows of Table 1. Similar to the
letter configuration, in this digit configuration, both MLP and DBN use 400 neurons and 100 neurons in the first and
second hidden layers, respectively. Because the digit labels range from 0 to 9, the size of the output vector is 10 with
one-hot-coding mechanism.
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018: 4693 – 4704
9. Int J Elec & Comp Eng ISSN: 2088-8708 4701
Both for the letter and MNIST digit datasets, we use the same configuration on the pre-training phase of the
DBN. There are two RBM architectures in letter configuration: the first RBM is 784-400 and the second RBM is
400-100. In MNIST digit configuration, the first RBM architecture is 400-400 and the second is 400-100. There is no
pre-training phase in the MLP. Each RBM in the letter or MNIST digit configurations is trained with CD-1 algorithm.
Based on our experiments, we select the same initialization throughout, namely learning-rate = 0.5, momentum = 0.5,
and weight-decay = 0.001. The maximum epoch for each RBM is set to 300. The batch size parameter for each RBM
is set to 100, which means that if there are 22,000 training samples in the dataset, they will be processed 100 at a time.
The Backpropagation for fine-tuning on DBN or training on MLP uses learning rate = 0.05, momentum =
0.5, and batch size = 100. Unlike Backpropagation, Rprop does not need learning rate and momentum, but instead
needs constant values ∆0, ∆max, ∆min, η+
, and η−
(cf. Section 3.3.3.). The initial epoch for Rprop is set to 300 for
letter and 500 for MNIST digit configurations.
4.2. Testing and recognition
Table 2 shows accuracy and the computational times required in recognizing each letter (from testing dataset)
by the four different methods. Recall that there are about 307 samples for each letter in testing dataset.
Table 2. Accuracy and computational times for the letter (testing) dataset
Lab
MLP-Bp MLP-Rprop DBN-Bp DBN-Rprop
Acc. t (µs) stdev Acc. t (µs) stdev Acc. t (µs) stdev Acc. t (µs) stdev
A 99.35 4,572.2 1,081.2 100 5,305.1 1,718.6 100 5,117.5 1,392.5 100 3,278.2 811.7
B 92.86 5,740.0 1,534.4 99.68 4,622.1 918.5 99.03 5,369.4 1,416.5 100 5,235.6 2,074.0
C 94.81 3,727.2 912.2 100 5,337.8 1,117.6 99.35 5,126.9 1,684.3 100 3,922.0 2,140.9
D 92.88 2,972.8 628.1 100 4,991.5 1,229.1 98.71 4,124.2 996.1 100 5,668.1 1,935.6
E 92.51 3,151.1 751.4 100 4,428.3 968.1 99.35 3,718.4 941.1 100 5,212.0 1,156.5
F 97.73 3,265.4 724.0 99.68 5,219.6 1,224.0 99.03 4,776.6 1,213.6 99.35 4,602.7 1,556.9
G 77.67 4,862.7 1,100.5 100 5,068.3 1,262.4 98.38 3,558.6 958.8 99.68 5,459.3 1,452.2
H 90.29 4,893.7 1,163.0 99.68 4,205.8 935.3 99.68 3,139.1 819.7 99.68 4,549.9 1,634.3
I 90.58 5,055.4 1,231.4 97.40 3,573.4 1,095.7 95.45 3,714.9 1,205.8 98.38 5,856.4 3,309.5
J 96.74 5,439.8 1,390.0 100 4,720.8 1,233.2 100 5,064.3 1,357.8 100 4,439.5 1,304.9
K 94.48 4,619.5 1,006.0 99.03 4,766.1 1,270.2 98.70 5,416.3 1,433.2 100 5,325.4 1,096.8
L 99.68 3,705.6 913.6 100 4,446.1 1,392.5 100 5,260.5 1,268.8 100 5,545.3 1,287.1
M 91.53 3,833.7 907.5 99.67 5,013.1 1,151.6 99.02 4,462.5 1,136.8 99.67 3,833.9 896.3
N 98.38 4,287.0 1,097.0 99.35 4,819.3 1,019.2 99.35 3,699.3 914.0 99.35 4,909.1 863.1
O 95.45 3,338.1 816.7 99.03 5,237.3 1,000.0 98.38 4,846.7 1,167.2 99.35 4,849.6 1,563.8
P 95.44 5,606.4 1,390.2 99.67 5,486.6 1,502.8 99.67 3,633.0 880.7 100 5,707.3 2,559.3
Q 91.53 4,486.3 903.3 99.67 5,036.0 1,219.2 99.35 3,387.2 788.2 99.67 4,680.0 1,310.3
R 91.86 4,513.4 1,194.6 99.67 5,515.9 1,335.5 95.44 5,289.7 1,185.9 99.67 4,567.4 1,063.3
S 93.16 4,300.1 1,176.0 99.35 3,967.6 890.0 99.35 5,619.3 1,385.8 100 3,560.6 970.2
T 93.18 5,536.6 1,321.9 100 4,180.4 1,104.3 99.03 4,618.1 1,199.2 100 5,156.8 1,118.2
U 94.16 4,861.5 1,147.3 100 3,639.5 864.8 100 5,545.0 1,445.5 100 4,873.8 1,197.9
V 99.03 4,755.6 1,467.6 100 5,169.7 1,482.1 100 4,198.5 1,255.0 100 3,876.2 1,317.2
W 95.44 4,117.0 1,094.3 99.35 4,782.4 1,781.9 100 4,723.2 1,347.3 100 4,161.0 773.4
X 95.11 4,788.1 1,129.6 99.35 4,025.6 1,605.4 100 3,967.8 893.0 100 4,158.1 1,044.3
Y 98.05 3,699.5 986.6 99.35 5,022.5 1,330.0 99.67 3,239.5 777.4 99.67 5,238.1 1,067.9
Z 77.20 4,996.6 1,105.0 99.35 4,571.7 1,473.3 98.37 4,002.9 973.7 98.05 5,230.8 1,210.8
All 93.43 4,427.9 1,083.6 99.59 4,736.6 1,235.6 99.05 4,446.9 1,155.3 99.71 4,765.3 1,412.2
The first four columns of the first row of Table 2 indicate that out of 307 samples of the letter A, 99.35%
(Acc.) of them are recognized and the computational time spent until the point when the letters are successfully or
unsuccessfully recognized is on average 4,572.2 microseconds (t), with a standard deviation of 1,081.2 (stdev). The
last row of the table provides the overall statistics of accuracy and computational times for the whole letter training
dataset produced by the four different methods. Table 3 shows similar results for the recognition of the MNIST digit
testing dataset.
Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller (Setiawan and Pulungan)
10. 4702 ISSN: 2088-8708
Table 3. Accuracy and computational times for the MNIST digit (testing) dataset
Lab
MLP-Bp MLP-Rprop DBN-Bp DBN-Rprop
Acc. t (µs) stdev Acc. t (µs) stdev Acc. t (µs) stdev Acc. t (µs) stdev
1 97.62 3,012.4 575.6 98.15 3,062.5 688.6 97.71 2,747.4 595.3 98.59 2,844.7 1019.6
2 86.92 3,028.2 890.5 94.67 2,268.8 562.4 88.76 2,703.7 482.7 95.83 3,017.7 721.4
3 75.54 2,294.3 673.0 95.84 2,783.3 354.4 91.98 2,623.2 802.8 96.14 2,425.2 589.1
4 88.70 3,059.5 699.7 95.52 2,400.4 621.4 88.90 2,616.5 307.7 96.74 2,525.1 819.4
5 90.92 3,041.4 763.8 96.08 2,795.0 377.6 85.76 2,720.6 295.3 95.52 2,452.3 742.4
6 89.56 3,211.7 795.0 96.45 2,774.5 331.8 95.51 2,656.7 834.6 95.93 2,408.7 556.1
7 88.62 2,811.0 736.3 94.94 2,732.5 661.6 90.66 2,759.4 589.3 95.04 3,069.1 777.3
8 85.01 2,920.0 710.8 94.25 2,002.9 506.3 88.19 2,327.9 594.7 95.69 2,429.5 567.5
9 88.01 1,740.0 556.3 93.56 2,835.8 629.6 87.22 2,804.7 790.1 95.44 3,017.0 665.7
0 95.61 3,031.5 561.9 98.06 2,404.7 641.0 95.41 2,773.7 819.0 98.06 2,805.1 379.9
All 88.65 2,815.0 696.3 95.75 2,606.0 537.5 91.01 2,673.4 611.1 96.30 2,699.4 683.9
Table 4 summarizes the results for recognizing both letter and MNIST digit testing datasets. The best accu-
racy (column Acc.) for both testing datasets is DBN with Rprop fine-tuning with accuracy of 99.71% and 96.30%,
respectively. This is better than the accuracy of MLP with Rprop fine-tuning, which achieves 99.59% and 95.75%,
respectively. These two approaches evidently are much more superior than both DBN and MLP with Backpropagation
training. It seems that fine-tuning with Rprop contributes significantly to the accuracy of both MLP and DBN.
Table 4. Summary of accuracy and computational times
Algorithm
Acc. Epoch
Training Recognition
(Dataset) t (hours) t (µs) stdev
MLP-Bp (Letter) 93.43% 300 > 5 4,427.9 1,083.6
MLP-Rprop (Letter) 99.59% 35 > 1 4,736.6 1,235.6
DBN-Bp (Letter) 99.05% 300 > 10 4,446.9 1,155.3
DBN-Rprop (Letter) 99.71% 35 > 6 4,765.3 1,412.2
MLP-Bp (MNIST) 88.65% 500 > 12 2,815.0 696.3
MLP-Rporp (MNIST) 95.75% 70 > 2 2,606.0 537.5
DBN-Bp (MNIST) 91.01% 500 > 24 2,673.4 611.1
DBN-Rprop (MNIST) 96.30% 70 > 12 2,699.4 683.9
The last two columns of Table 4 (t and stdev) show that the recognition times of the four approaches are
comparable. It seems that using Rprop results in longer recognition times both for MLP and DBN. Out of the four
approaches, the recognition time of DBN fine-tuned with Rprop is the longest, namely on average 4,765.3 microsec-
onds with a standard deviation of 1,412.2 for the letter dataset. For the MNIST digit dataset, the recognition times of
the four approaches are much more similar to each other, and are a bit more than a half of those for the letter dataset.
This is because the resulting neural network for the digit dataset has smaller input as well as output vectors, and thus
is quicker to process. The time needed to recognize a letter or a digit is below ten milliseconds, which is excellent
even for online gaming experience [28].
4.3. Training and fine-tuning
Prior to recognition, the neural networks, both DBN and MLP, must first be trained or fine-tuned using
training dataset. The fourth column of Table 4 provides the computational times spent by the four different methods to
complete training and pre-training (if any). Because MLP does not have pre-training phase, its training time is much
shorter than DBN. The time spent by DBN to complete pre-training is around 5 hours for the letter dataset and 10
hours for the MNIST digit dataset. It is evident that this pre-training phase takes most of the training computational
time of DBN and Rprop is really helpful in reducing the training (but outside of pre-training) time.
In Figure 9 and Figure 10, we show how the accuracy of each method changes as the number of epochs
increases in the training process. In order to produce these figures, right after each epoch in the training process (with
training datasets), each method is paused, and the resulting neural network is then used to recognize letters or digits in
the testing datasets. We observe these changes until 300 and 500 epochs for the letter and digit datasets, respectively.
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018: 4693 – 4704
11. Int J Elec & Comp Eng ISSN: 2088-8708 4703
0
20
40
60
80
100
Accuracy(%)
0 50 100 150 200 250 300
Epoch
MLP-Bp
MLP-Rprop
DBN-Bp
DBN-Rprop
0
20
40
60
80
100
Accuracy(%)
0 10 20 30
Epoch
MLP-Bp
MLP-Rprop
DBN-Bp
DBN-Rprop
Figure 9. Accuracy vs. epoch for the letter (testing) dataset
0
20
40
60
80
100
Accuracy(%)
0 100 200 300 400 500
Epoch
MLP-Bp
MLP-Rprop
DBN-Bp
DBN-Rprop
0
20
40
60
80
100
Accuracy(%)
0 20 40 60
Epoch
MLP-Bp
MLP-Rprop
DBN-Bp
DBN-Rprop
Figure 10. Accuracy vs. epoch for the MNIST digit (testing) dataset
In Figure 9(right), both MLP and DBN fine-tuned with Rprop require 35 epochs (cf. Table 4 column 3) to
reach the highest accuracy (cf. Table 4 column 2) in the letter dataset. Out of 8,000 letter samples in the testing
dataset, DBN fine-tuned with Rprop misrecognizes only 23 samples at epoch 35. In Figure 10(right), MLP and DBN
fine-tuned with Rprop require 70 epochs to reach the best accuracy in the MNIST digit dataset. From 10,000-digit
samples, DBN with Rprop misrecognizes 367 samples at epoch 70.
5. CONCLUSION
We have proposed an approach to capture and recognize which letter has been drawn in the air by the user
with Leap Motion controller; an approach based on DBN fine-tuned with Rprop. We have also conducted experiments
to find out the performance of our proposed approach. Our experiments indicated that, compared to MLP or DBN
with Backpropagation, DBN fine-tuned with Rprop achieves the best accuracy of 99.71% for the letter dataset and of
96.30% for the digit dataset. This accuracy can be reached in only 35 and 70 epochs, respectively. The recognition time
for letter is in the order of 5,000 microseconds, which is responsive enough even for online gaming experience. Rprop
has been shown to be significant in making fine-tuning process faster and in producing a more accurate recognition.
We have selected neural network architectures 784-400-100-26 and 400-400-100-10 because they are still
computationally feasible in a multi-core (typical gaming) computer and still manage to achieve an excellent accuracy.
We are now working on lowering the computational requirements while still maintaining the overall accuracy.
REFERENCES
[1] C. Qu, D. Zhang, and J. Tian, “Online Kinect handwritten digit recognition based on dynamic time warping and
support vector machine,” Journal of Information and Computational Science, vol. 12, no. 1, pp. 413–422, 2015.
[2] F. A. Huang, C. Y. Su, and T. T. Chu, “Kinect-based mid-air handwritten digit recognition using multiple seg-
Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller (Setiawan and Pulungan)
12. 4704 ISSN: 2088-8708
ments and scaled coding,” in 2013 International Symposium on Intelligent Signal Processing and Communication
Systems, 2013, pp. 694–697.
[3] Z. Ye, X. Zhang, L. Jin, Z. Feng, and S. Xu, “Finger-writing-in-the-air system using Kinect sensor,” in 2013
IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2013, pp. 1–4.
[4] S. Vikram, L. Li, and S. Russell, “Handwriting and gestures in the air, recognizing on the fly,” in The ACM
SIGCHI Conference on Human Factors in Computing Systems (CHI 2013), 2013, pp. 1179–1184.
[5] C. Agarwal, D. P. Dogra, R. Saini, and P. P. Roy, “Segmentation and recognition of text written in 3D using Leap
Motion interface,” in 3rd IAPR Asian Conference on Pattern Recognition (ACPR 2015), 2016, pp. 539–543.
[6] Y. Chen, Z. Ding, Y. L. Chen, and X. Wu, “Rapid recognition of dynamic hand gestures using Leap Motion,” in
2015 IEEE International Conference on Information and Automation (ICIA 2015), 2015, pp. 1419–1424.
[7] M. Abdar, S. R. N. Kalhori, T. Sutikno, I. M. I. Subroto, and G. Arji, “Comparing performance of data mining
algorithms in prediction heart diseases,” International Journal of Electrical and Computer Engineering (IJECE),
vol. 5, no. 6, pp. 1569–1576, 2015.
[8] W. Wiharto, H. Kusnanto, and H. Herianto, “Hybrid system of tiered multivariate analysis and artificial neural
network for coronary heart disease diagnosis,” International Journal of Electrical and Computer Engineering
(IJECE), vol. 7, no. 2, p. 1023, 2017.
[9] L. Li, H. Xia, L. Li, and Q. Wang, “Traffic prediction based on svm training sample divided by time,” TELKOM-
NIKA Indonesian Journal of Electrical Engineering, vol. 11, no. 12, pp. 7446–7452, 2013.
[10] L. Deng and D. Yu, Deep Learning: Methods and Applications. NOW Publishers, 2014.
[11] D. C. Cires¸an, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep, big, simple neural nets for handwritten
digit recognition,” Neural Computation, vol. 22, no. 12, pp. 3207–3220, 2010.
[12] R. McCartney, J. Yuan, and H.-P. Bischof, “Gesture recognition with the Leap Motion controller,” in Proceedings
of the Int. Conf. on Image Processing, Computer Vision, and Pattern Recognition (IPCV), 2015, pp. 3–9.
[13] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation,
vol. 18, no. 7, pp. 1527–1554, 2006.
[14] R. Salakhutdinov, “Learning deep generative models,” Annual Review of Statistics and Its Application, vol. 2,
pp. 361–385, 2015.
[15] Y. Hua, J. Guo, and H. Zhao, “Deep belief networks and deep learning,” in Proceedings of 2015 International
Conference on Intelligent Computing and Internet of Things (ICIT), 2015, pp. 1–4.
[16] F. Weichert, D. Bachmann, B. Rudak, and D. Fisseler, “Analysis of the accuracy and robustness of the leap
motion controller,” Sensors, vol. 13, no. 5, pp. 6380–6393, 2013.
[17] Leap Motion Inc., “API overview,” https://developer.leapmotion.com/documentation/csharp/devguide/Leap Overview.html,
2017.
[18] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14,
no. 8, pp. 1771–1800, 2002.
[19] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” in Pro-
ceedings of the 19th International Conference on Neural Information Processing Systems (NIPS’06). MIT
Press, 2006, pp. 153–160.
[20] Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp.
1–127, 2009.
[21] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio, “Why does unsupervised pre-
training help deep learning?” Journal of Machine Learning Research, vol. 11, pp. 625–660, 2010.
[22] A. Setiawan, “Pengenalan karakter menggunakan DBN untuk tulisan tangan di udara yang ditangkap oleh Leap
Motion controller,” Master thesis, Universitas Gadjah Mada, Yogyakarta, Indonesia, 2016.
[23] M. F. Cohen, J. Shade, S. Hiller, and O. Deussen, “Wang tiles for image and texture generation,” ACM Transac-
tions on Graphics, vol. 22, no. 3, pp. 287–294, 2003.
[24] W. Dong, N. Zhou, and J.-C. Paul, “Tile-based interactive texture design,” in Third International Conference on
Technologies for E-Learning and Digital Entertainment. Springer, 2008, pp. 675–686.
[25] G. E. Hinton, “A practical guide to training restricted Boltzmann machines,” in Neural Networks: Tricks of the
Trade: Second Edition, G. Montavon, G. B. Orr, and K.-R. M¨uller, Eds. Springer, 2012, pp. 599–619.
[26] M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: the Rprop algo-
rithm,” in IEEE International Conference on Neural Networks, 1993, pp. 586–591.
[27] Y. Lecun and C. Cortes, “The MNIST database of handwritten digits,” http://yann.lecun.com/exdb/mnist/, 2017.
[28] M. Claypool and K. Claypool, “Latency and player actions in online games,” Communication of ACM, vol. 49,
no. 11, pp. 40–45, 2006.
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018: 4693 – 4704