Due to the rapidly increasing need of human-robot interaction (HRI), more intelligent robots are in demand. However, the vast majority of robots can only follow strict instructions, which seriously restricts their flexibility and versatility. A critical fact that strongly negates the experience of HRI is that robots cannot understand human intentions. This study aims at improving the robotic intelligence by training it to understand human intentions. Different from previous studies that recognizing human intentions from distinctive actions, this paper introduces a method to predict human intentions before a single action is completed. The experiment of throwing a ball towards designated targets are conducted to verify the effectiveness of the method. The proposed deep learning based method proves the feasibility of applying convolutional neural networks (CNN) under a novel circumstance. Experiment results show that the proposed CNN-vote method out competes three traditional machine learning techniques. In current context, the CNN-vote predictor achieves the highest testing accuracy with relatively less data needed.
Image Captioning Generator using Deep Machine Learningijtsrd
Technologys scope has evolved into one of the most powerful tools for human development in a variety of fields.AI and machine learning have become one of the most powerful tools for completing tasks quickly and accurately without the need for human intervention. This project demonstrates how deep machine learning can be used to create a caption or a sentence for a given picture. This can be used for visually impaired persons, as well as automobiles for self identification, and for various applications to verify quickly and easily. The Convolutional Neural Network CNN is used to describe the alphabet, and the Long Short Term Memory LSTM is used to organize the right meaningful sentences in this model. The flicker 8k and flicker 30k datasets were used to train this. Sreejith S P | Vijayakumar A "Image Captioning Generator using Deep Machine Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42344.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42344/image-captioning-generator-using-deep-machine-learning/sreejith-s-p
Deep-Learning-based Human Intention Prediction with Data Augmentationgerogepatton
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the single-participant data set is further evaluated on a multi-participant data set to assess its generalization ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data augmentation methods that crop or deform images can improve the prediction performance; 2) Random cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50% to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of
data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches
a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches
the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant
data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the
single-participant data set is further evaluated on a multi-participant data set to assess its generalization
ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is
evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data
augmentation methods that crop or deform images can improve the prediction performance; 2) Random
cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50%
to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve
the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
ARTIFICIAL INTELLIGENCE & ROBOTICS – SYNTHETIC BRAIN IN ACTIONijaia
Rapid technological growth has made Artificial Intelligence (AI) and application of robots common among
human lives. The advancements undertaken to make designs with human similarities or adaptations to the
society are elaborated in detail. The increasing manufacturing and use of robots for industrial purposes
have been related to their operating mechanisms. The experiments and laboratory testing of these devices
is analysed in form tables to show the statistical side of the technology. This report explains the
technological aspects and laboratory experiments that have been advanced to increase knowledge on these
digital technologies. This study aims to present an overview of two developing technologies: artificial
intelligence (AI) and robots and their potential applications. The product variety is a primary
characteristic of each of these specialties. In addition, they may be described as disruptive, facilitating,
and transdisciplinary.
MOTION PREDICTION USING DEPTH INFORMATION OF HUMAN ARM BASED ON ALEXNETgerogepatton
This document describes a study that uses a convolutional neural network model based on AlexNet to predict the drop point of a ball thrown by experimenters based on analyzing depth map images of their arm motion captured by a Kinect sensor. The researchers collected depth map data from 400 throwing trials, trained and tested their CNN model on this data, and made modifications to improve prediction accuracy, such as adjusting the network scale, loss function, and adding regularization. Their best model achieved over 70% accuracy in predicting the drop area of test examples based on the motion seen in the first 10 frames of depth maps capturing the throw.
Motion Prediction Using Depth Information of Human Arm Based on Alexnetgerogepatton
This document describes a study that uses a convolutional neural network based on AlexNet to predict the drop point of a ball thrown by experimenters based on analyzing depth information captured of their arm motion. The study involved collecting depth map data using a Kinect sensor as experimenters threw balls into different areas. The AlexNet model was modified and trained on this data, with improvements made such as adding regularization to the loss function to reduce overfitting. Experimental results showed the modified model could predict the drop point with improved accuracy compared to the original AlexNet.
The study of attention estimation for child-robot interaction scenariosjournalBEEI
One of the biggest challenges in human-agent interaction (HAI) is the development of an agent such as a robot that can understand its partner (a human) and interact naturally. To realize this, a system (agent) should be able to observe a human well and estimate his/her mental state. Towards this goal, in this paper, we present a method of estimating a child's attention, one of the more important human mental states, in a free-play scenario of child-robot interaction (CRI). To realize attention estimation in such CRI scenario, first, we developed a system that could sense a child's verbal and non-verbal multimodal signals such as gaze, facial expression, proximity, and so on. Then, the observed information was used to train a model that is based on a Support Vector Machine (SVM) to estimate a human's attention level. We investigated the accuracy of the proposed method by comparing with a human judge's estimation, and obtained some promising results which we discuss here.
The verbal fluency test involves two groups of students, one group of art students and one group of non-art students. Brain scans using voxel-based morphometry found that art students had more gray matter in the parietal lobe, a region linked to visual imagery and object manipulation. This suggests art training is associated with differences in brain structure related to visual-spatial skills.
Image Captioning Generator using Deep Machine Learningijtsrd
Technologys scope has evolved into one of the most powerful tools for human development in a variety of fields.AI and machine learning have become one of the most powerful tools for completing tasks quickly and accurately without the need for human intervention. This project demonstrates how deep machine learning can be used to create a caption or a sentence for a given picture. This can be used for visually impaired persons, as well as automobiles for self identification, and for various applications to verify quickly and easily. The Convolutional Neural Network CNN is used to describe the alphabet, and the Long Short Term Memory LSTM is used to organize the right meaningful sentences in this model. The flicker 8k and flicker 30k datasets were used to train this. Sreejith S P | Vijayakumar A "Image Captioning Generator using Deep Machine Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42344.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42344/image-captioning-generator-using-deep-machine-learning/sreejith-s-p
Deep-Learning-based Human Intention Prediction with Data Augmentationgerogepatton
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the single-participant data set is further evaluated on a multi-participant data set to assess its generalization ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data augmentation methods that crop or deform images can improve the prediction performance; 2) Random cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50% to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of
data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches
a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches
the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant
data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the
single-participant data set is further evaluated on a multi-participant data set to assess its generalization
ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is
evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data
augmentation methods that crop or deform images can improve the prediction performance; 2) Random
cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50%
to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve
the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
ARTIFICIAL INTELLIGENCE & ROBOTICS – SYNTHETIC BRAIN IN ACTIONijaia
Rapid technological growth has made Artificial Intelligence (AI) and application of robots common among
human lives. The advancements undertaken to make designs with human similarities or adaptations to the
society are elaborated in detail. The increasing manufacturing and use of robots for industrial purposes
have been related to their operating mechanisms. The experiments and laboratory testing of these devices
is analysed in form tables to show the statistical side of the technology. This report explains the
technological aspects and laboratory experiments that have been advanced to increase knowledge on these
digital technologies. This study aims to present an overview of two developing technologies: artificial
intelligence (AI) and robots and their potential applications. The product variety is a primary
characteristic of each of these specialties. In addition, they may be described as disruptive, facilitating,
and transdisciplinary.
MOTION PREDICTION USING DEPTH INFORMATION OF HUMAN ARM BASED ON ALEXNETgerogepatton
This document describes a study that uses a convolutional neural network model based on AlexNet to predict the drop point of a ball thrown by experimenters based on analyzing depth map images of their arm motion captured by a Kinect sensor. The researchers collected depth map data from 400 throwing trials, trained and tested their CNN model on this data, and made modifications to improve prediction accuracy, such as adjusting the network scale, loss function, and adding regularization. Their best model achieved over 70% accuracy in predicting the drop area of test examples based on the motion seen in the first 10 frames of depth maps capturing the throw.
Motion Prediction Using Depth Information of Human Arm Based on Alexnetgerogepatton
This document describes a study that uses a convolutional neural network based on AlexNet to predict the drop point of a ball thrown by experimenters based on analyzing depth information captured of their arm motion. The study involved collecting depth map data using a Kinect sensor as experimenters threw balls into different areas. The AlexNet model was modified and trained on this data, with improvements made such as adding regularization to the loss function to reduce overfitting. Experimental results showed the modified model could predict the drop point with improved accuracy compared to the original AlexNet.
The study of attention estimation for child-robot interaction scenariosjournalBEEI
One of the biggest challenges in human-agent interaction (HAI) is the development of an agent such as a robot that can understand its partner (a human) and interact naturally. To realize this, a system (agent) should be able to observe a human well and estimate his/her mental state. Towards this goal, in this paper, we present a method of estimating a child's attention, one of the more important human mental states, in a free-play scenario of child-robot interaction (CRI). To realize attention estimation in such CRI scenario, first, we developed a system that could sense a child's verbal and non-verbal multimodal signals such as gaze, facial expression, proximity, and so on. Then, the observed information was used to train a model that is based on a Support Vector Machine (SVM) to estimate a human's attention level. We investigated the accuracy of the proposed method by comparing with a human judge's estimation, and obtained some promising results which we discuss here.
The verbal fluency test involves two groups of students, one group of art students and one group of non-art students. Brain scans using voxel-based morphometry found that art students had more gray matter in the parietal lobe, a region linked to visual imagery and object manipulation. This suggests art training is associated with differences in brain structure related to visual-spatial skills.
Artificial intelligence (AI) is a commonly used term as a result of adopting an overly generalized representation.
The main problem is definitions of “intelligence,” which often misinterpret practical notions that the term indicates.
Human visual search does not maximize the post saccadic probability of identi...Giuseppe Fineschi
Researchers have conjectured that eye movements during visual search are selected to minimize the number of saccades.
The optimal Bayesian eye movement strategy minimizing saccades does not simply direct the eye to whichever location is
judged most likely to contain the target but makes use of the entire retina as an information gathering device during each fixation. Here we show that human observers do not minimize the expected number of saccades in planning saccades in a simple visual search task composed of three tokens. In this task, the optimal eye movement strategy varied, depending on the spacing between tokens (in the first experiment) or the size of tokens (in the second experiment), and changed abruptly once the separation or size surpassed a critical value. None of our observers changed strategy as a function of separation or size. Human performance fell far short of ideal, both qualitatively and quantitatively
A Novel Efficient Medical Image Segmentation Methodologyaciijournal
Image segmentation plays a crucial role in many medical applications. The threshold based medical image
segmentation approach is the most common and effective method for medical image segmentation, but it
has some shortcomings such as high complexity, poor real time capability and premature convergence, etc.
To address above issues, an improved evolution strategies is proposed to use for medical image
segmentation, there are 2 populations concurrently during evolution, one focuses on local search in order
to search solutions near optimal solution, and the other population that implemented based on chaotic
theory focuses on global search so as to keep the variety of individuals and jump out from the local
maximum to overcome the problem of premature convergence. The encoding scheme, fitness function, and
evolution operators are also designed. The experimental results validated the effectiveness and efficiency of
the proposed approach.
The document describes a project to develop an interface using a Kinect sensor and Unity game engine to motivate neurorehabilitation through a game. The game aims to quantify irregular hand movements in patients by having them grab and move objects. Prior work showed games can aid rehabilitation but marker-based systems have downsides. The project captures motion markerless using Kinect to track arm movements navigating a ball into targets of varying sizes for adaptive difficulty. Future work includes analyzing data to track patient progress.
HSV Brightness Factor Matching for Gesture Recognition SystemCSCJournals
The main goal of gesture recognition research is to establish a system which can identify specific human gestures and use these identified gestures to be carried out by the machine, In this paper, we introduce a new method for gesture recognition that based on computing the local brightness for each block of the gesture image, the gesture image is divided into 25x25 blocks each of 5x5 block size, and we calculated the local brightness of each block, so, each gesture produces 25x25 features value, our experimental shows that more that %60 of these features are zero value which leads to minimum storage space, this brightness value is calculated from the HSV (Hue, Saturation and Value) color model that used for segmentation operation, the recognition rate achieved is %91 using 36 training gestures and 24 different testing gestures. This Paper focuses on the hand gesture instead of the whole body movement since hands are the most flexible part of the body and can transfer the most meaning, we build a gesture recognition system that can communicate with the machine in natural way without any mechanical devices and without using the normal input devices which are the keyboard and mouse and the mathematical equations will be the translator between the gestures and the telerobotic.
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITYZac Darcy
This document summarizes a research paper on tracking human activity using a stochastic statistical approach. The paper proposes using covariance matrices to represent image regions for feature extraction, rather than traditional histogram-based methods. An improved mathematical model is developed using covariance matrices that is capable of more accurate and faster human/object tracking compared to existing histogram and other approaches. The accuracy of the new mathematical detection model is approximately 94.3% compared to 89.1% for conventional models, based on evaluation using publicly available datasets. The approach uses integral images to quickly compute covariances over regions of interest for efficient tracking.
A novel enhanced algorithm for efficient human trackingIJICTJOURNAL
This paper introduces a novel enhanced algorithm for efficient human tracking based on background subtraction. The algorithm operates in four steps: 1) identifying a fixed background and removing noise, 2) subtracting the background from movable objects, 3) filtering the image to remove shadows and noise, and 4) separating and tracking movable objects using bubble routing. Experimental results found that the proposed algorithm improved motion and trajectory estimation of objects in terms of speed and accuracy compared to previous methods. The algorithm provides a fast and accurate way to track humans which has applications in video surveillance, human action recognition, and human-computer interaction.
This document provides an overview of mind reading computer technology. It discusses how computational models of mind reading can infer mental states from facial signals using techniques like facial affect detection and emotional classification. The technology works by measuring blood volume and oxygen levels in the brain using functional near-infrared spectroscopy sensors. Current applications include predicting driver drowsiness or anger, controlling animations, and enabling silent web searches. While the technology shows promise, challenges remain in scaling the techniques for conversational speech recognition and addressing privacy and ethical implications.
I developed a Convolutional Neural Network using Python. This particular CNN is able to identify the correct individual based solely off of a photo with the knowledge of facial recognition.
This document discusses a study that proposes a framework for classifying brain tumors using an ensemble of deep features extracted from pre-trained convolutional neural networks (CNNs) and machine learning (ML) classifiers. The framework uses 13 pre-trained CNNs to extract deep features from magnetic resonance (MRI) brain images, which are then evaluated and the top 3 features selected using 9 ML classifiers. The selected features are concatenated to create an ensemble feature, which is classified using ML classifiers. The study evaluates this approach on 3 brain MRI datasets with different numbers of classes to classify tumors. Experimental results show that ensembling deep features can improve performance significantly, and support vector machines generally perform best, especially on larger datasets.
Scene Description From Images To SentencesIRJET Journal
This document presents an approach for generating sentences to describe images using distributed intelligence. It involves detecting objects in images using YOLO detection, finding relative positions of objects, labeling background scenes, generating tuples of objects/scenes/relations, extracting candidate sentences from Wikipedia containing tuple elements, searching images for each sentence and selecting the sentence whose images most closely match the input image. The approach is compared to the Babytalk model using BLEU and ROUGE scores, showing comparable performance. Future work to improve object detection and use larger knowledge sources is discussed.
1) The document discusses methods for counting people in crowded environments using computer vision techniques. It divides the main approaches into low-level analysis, foreground segmentation models, motion models, shape models, and multi-target tracking.
2) The author's approach uses a shape model based on head and shoulder detection combined with a uniform motion model from optical flow to generate probability maps for head detections. Detections are associated across frames and validated via tracking to reduce false positives and provide a person count.
3) Experimental results on standard video sequences demonstrate the method provides person counts comparable to state-of-the-art while also enabling tracking of individuals in crowded scenes.
The document provides information on various topics related to artificial intelligence including:
- Examples of intelligence such as solving puzzles, performing complex math problems quickly, and following rules.
- Definitions of AI from early researchers such as John McCarthy who coined the term, and descriptions of AI as the study of intelligent behavior in machines.
- Key areas of AI research and applications such as game playing, reasoning, learning, robotics, and machine learning.
- Approaches to problem solving in AI like state space search, knowledge representation, and using heuristics to guide searches.
BRAIN TUMOR’S DETECTION USING DEEP LEARNINGIRJET Journal
This document describes a study on detecting brain tumors using deep learning. The researchers created a brain tumor dataset from MR images and used techniques like CNN, NN, and KFOLDS validation to detect brain tumors in the images. KFOLDS validation improved the accuracy of the CNN model by testing it using various evaluation metrics. The proposed framework achieved good results, showing that KFOLDS validation is more effective than other models at independently focusing on the areas needed to identify brain tumors.
The document discusses an artificial intelligence course, outlining 5 units that will cover topics like problem solving through search algorithms, propositional logic, knowledge representation, planning, and learning from uncertainty. The goals of the course are for students to understand concepts like state space representation, heuristic search techniques, and applying AI techniques to problems involving games, machine learning and more. The course will examine algorithms, knowledge representation methods, and applications of AI in areas such as games, theorem proving and machine learning.
Vision Based Gesture Recognition Using Neural Networks Approaches: A ReviewWaqas Tariq
The aim of gesture recognition researches is to create system that easily identifies gestures, and use them for device control, or convey in formations. In this paper we are discussing researches done in the area of hand gesture recognition based on Artificial Neural Networks approaches. Several hand gesture recognition researches that use Neural Networks are discussed in this paper, comparisons between these methods were presented, advantages and drawbacks of the discussed methods also included, and implementation tools for each method were presented as well.
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
This document discusses hard computing and soft computing. Hard computing uses deterministic algorithms and mathematical models to produce accurate and predictable results, while soft computing can handle imprecision, uncertainty, and ambiguity. Soft computing techniques include fuzzy logic, neural networks, genetic algorithms, probabilistic reasoning, and evolutionary computation. These techniques aim to mimic human-like reasoning by tolerating uncertainty, learning and adapting, and integrating multiple methods. Examples of evolutionary computation algorithms provided are genetic algorithms, genetic programming, evolutionary strategies, differential evolution, and particle swarm optimization. Neural networks, ant colony optimization, and fuzzy logic are also summarized.
Identifier of human emotions based on convolutional neural network for assist...TELKOMNIKA JOURNAL
This paper proposes a solution for the problem of continuous prediction in real-time of the emotional state of a human user from the identification of characteristics in facial expressions. In robots whose main task is the care of people (children, sick or elderly people) is important to maintain a close relationship man-machine, anld a rapid response of the robot to the actions of the person under care. We propose to increase the level of intimacy of the robot, and its response to specific situations of the user, identifying in real time the emotion reflected by the person's face. This solution is integrated with algorithms of the research group related to the tracking of people for use on an assistant robot. The strategy used involves two stages of processing, the first involves the detection of faces using HOG and linear SVM, while the second identifies the emotion in the face using a CNN. The strategy was completely tested in the laboratory on our robotic platform, demonstrating high performance with low resource consumption. Through various controlled laboratory tests with different people, which forced a certain emotion on their faces, the scheme was able to identify the emotions with a success rate of 92%.
Deep learning for pose-invariant face detection in unconstrained environmentIJECEIAES
In the recent past, convolutional neural networks (CNNs) have seen resurgence and have performed extremely well on vision tasks. Visually the model resembles a series of layers each of which is processed by a function to form a next layer. It is argued that CNN first models the low level features such as edges and joints and then expresses higher level features as a composition of these low level features. The aim of this paper is to detect multi-view faces using deep convolutional neural network (DCNN). Implementation, detection and retrieval of faces will be obtained with the help of direct visual matching technology. Further, the probabilistic measure of the similarity of the face images will be done using Bayesian analysis. Experiment detects faces with ±90 degree out of plane rotations. Fine tuned AlexNet is used to detect pose invariant faces. For this work, we extracted examples of training from AFLW (Annotated Facial Landmarks in the Wild) dataset that involve 21K images with 24K annotations of the face.
Levelised Cost of Hydrogen (LCOH) Calculator ManualMassimo Talia
The aim of this manual is to explain the
methodology behind the Levelized Cost of
Hydrogen (LCOH) calculator. Moreover, this
manual also demonstrates how the calculator
can be used for estimating the expenses associated with hydrogen production in Europe
using low-temperature electrolysis considering different sources of electricity
More Related Content
Similar to AN APPLICATION OF CONVOLUTIONAL NEURAL NETWORKS ON HUMAN INTENTION PREDICTION
Artificial intelligence (AI) is a commonly used term as a result of adopting an overly generalized representation.
The main problem is definitions of “intelligence,” which often misinterpret practical notions that the term indicates.
Human visual search does not maximize the post saccadic probability of identi...Giuseppe Fineschi
Researchers have conjectured that eye movements during visual search are selected to minimize the number of saccades.
The optimal Bayesian eye movement strategy minimizing saccades does not simply direct the eye to whichever location is
judged most likely to contain the target but makes use of the entire retina as an information gathering device during each fixation. Here we show that human observers do not minimize the expected number of saccades in planning saccades in a simple visual search task composed of three tokens. In this task, the optimal eye movement strategy varied, depending on the spacing between tokens (in the first experiment) or the size of tokens (in the second experiment), and changed abruptly once the separation or size surpassed a critical value. None of our observers changed strategy as a function of separation or size. Human performance fell far short of ideal, both qualitatively and quantitatively
A Novel Efficient Medical Image Segmentation Methodologyaciijournal
Image segmentation plays a crucial role in many medical applications. The threshold based medical image
segmentation approach is the most common and effective method for medical image segmentation, but it
has some shortcomings such as high complexity, poor real time capability and premature convergence, etc.
To address above issues, an improved evolution strategies is proposed to use for medical image
segmentation, there are 2 populations concurrently during evolution, one focuses on local search in order
to search solutions near optimal solution, and the other population that implemented based on chaotic
theory focuses on global search so as to keep the variety of individuals and jump out from the local
maximum to overcome the problem of premature convergence. The encoding scheme, fitness function, and
evolution operators are also designed. The experimental results validated the effectiveness and efficiency of
the proposed approach.
The document describes a project to develop an interface using a Kinect sensor and Unity game engine to motivate neurorehabilitation through a game. The game aims to quantify irregular hand movements in patients by having them grab and move objects. Prior work showed games can aid rehabilitation but marker-based systems have downsides. The project captures motion markerless using Kinect to track arm movements navigating a ball into targets of varying sizes for adaptive difficulty. Future work includes analyzing data to track patient progress.
HSV Brightness Factor Matching for Gesture Recognition SystemCSCJournals
The main goal of gesture recognition research is to establish a system which can identify specific human gestures and use these identified gestures to be carried out by the machine, In this paper, we introduce a new method for gesture recognition that based on computing the local brightness for each block of the gesture image, the gesture image is divided into 25x25 blocks each of 5x5 block size, and we calculated the local brightness of each block, so, each gesture produces 25x25 features value, our experimental shows that more that %60 of these features are zero value which leads to minimum storage space, this brightness value is calculated from the HSV (Hue, Saturation and Value) color model that used for segmentation operation, the recognition rate achieved is %91 using 36 training gestures and 24 different testing gestures. This Paper focuses on the hand gesture instead of the whole body movement since hands are the most flexible part of the body and can transfer the most meaning, we build a gesture recognition system that can communicate with the machine in natural way without any mechanical devices and without using the normal input devices which are the keyboard and mouse and the mathematical equations will be the translator between the gestures and the telerobotic.
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITYZac Darcy
This document summarizes a research paper on tracking human activity using a stochastic statistical approach. The paper proposes using covariance matrices to represent image regions for feature extraction, rather than traditional histogram-based methods. An improved mathematical model is developed using covariance matrices that is capable of more accurate and faster human/object tracking compared to existing histogram and other approaches. The accuracy of the new mathematical detection model is approximately 94.3% compared to 89.1% for conventional models, based on evaluation using publicly available datasets. The approach uses integral images to quickly compute covariances over regions of interest for efficient tracking.
A novel enhanced algorithm for efficient human trackingIJICTJOURNAL
This paper introduces a novel enhanced algorithm for efficient human tracking based on background subtraction. The algorithm operates in four steps: 1) identifying a fixed background and removing noise, 2) subtracting the background from movable objects, 3) filtering the image to remove shadows and noise, and 4) separating and tracking movable objects using bubble routing. Experimental results found that the proposed algorithm improved motion and trajectory estimation of objects in terms of speed and accuracy compared to previous methods. The algorithm provides a fast and accurate way to track humans which has applications in video surveillance, human action recognition, and human-computer interaction.
This document provides an overview of mind reading computer technology. It discusses how computational models of mind reading can infer mental states from facial signals using techniques like facial affect detection and emotional classification. The technology works by measuring blood volume and oxygen levels in the brain using functional near-infrared spectroscopy sensors. Current applications include predicting driver drowsiness or anger, controlling animations, and enabling silent web searches. While the technology shows promise, challenges remain in scaling the techniques for conversational speech recognition and addressing privacy and ethical implications.
I developed a Convolutional Neural Network using Python. This particular CNN is able to identify the correct individual based solely off of a photo with the knowledge of facial recognition.
This document discusses a study that proposes a framework for classifying brain tumors using an ensemble of deep features extracted from pre-trained convolutional neural networks (CNNs) and machine learning (ML) classifiers. The framework uses 13 pre-trained CNNs to extract deep features from magnetic resonance (MRI) brain images, which are then evaluated and the top 3 features selected using 9 ML classifiers. The selected features are concatenated to create an ensemble feature, which is classified using ML classifiers. The study evaluates this approach on 3 brain MRI datasets with different numbers of classes to classify tumors. Experimental results show that ensembling deep features can improve performance significantly, and support vector machines generally perform best, especially on larger datasets.
Scene Description From Images To SentencesIRJET Journal
This document presents an approach for generating sentences to describe images using distributed intelligence. It involves detecting objects in images using YOLO detection, finding relative positions of objects, labeling background scenes, generating tuples of objects/scenes/relations, extracting candidate sentences from Wikipedia containing tuple elements, searching images for each sentence and selecting the sentence whose images most closely match the input image. The approach is compared to the Babytalk model using BLEU and ROUGE scores, showing comparable performance. Future work to improve object detection and use larger knowledge sources is discussed.
1) The document discusses methods for counting people in crowded environments using computer vision techniques. It divides the main approaches into low-level analysis, foreground segmentation models, motion models, shape models, and multi-target tracking.
2) The author's approach uses a shape model based on head and shoulder detection combined with a uniform motion model from optical flow to generate probability maps for head detections. Detections are associated across frames and validated via tracking to reduce false positives and provide a person count.
3) Experimental results on standard video sequences demonstrate the method provides person counts comparable to state-of-the-art while also enabling tracking of individuals in crowded scenes.
The document provides information on various topics related to artificial intelligence including:
- Examples of intelligence such as solving puzzles, performing complex math problems quickly, and following rules.
- Definitions of AI from early researchers such as John McCarthy who coined the term, and descriptions of AI as the study of intelligent behavior in machines.
- Key areas of AI research and applications such as game playing, reasoning, learning, robotics, and machine learning.
- Approaches to problem solving in AI like state space search, knowledge representation, and using heuristics to guide searches.
BRAIN TUMOR’S DETECTION USING DEEP LEARNINGIRJET Journal
This document describes a study on detecting brain tumors using deep learning. The researchers created a brain tumor dataset from MR images and used techniques like CNN, NN, and KFOLDS validation to detect brain tumors in the images. KFOLDS validation improved the accuracy of the CNN model by testing it using various evaluation metrics. The proposed framework achieved good results, showing that KFOLDS validation is more effective than other models at independently focusing on the areas needed to identify brain tumors.
The document discusses an artificial intelligence course, outlining 5 units that will cover topics like problem solving through search algorithms, propositional logic, knowledge representation, planning, and learning from uncertainty. The goals of the course are for students to understand concepts like state space representation, heuristic search techniques, and applying AI techniques to problems involving games, machine learning and more. The course will examine algorithms, knowledge representation methods, and applications of AI in areas such as games, theorem proving and machine learning.
Vision Based Gesture Recognition Using Neural Networks Approaches: A ReviewWaqas Tariq
The aim of gesture recognition researches is to create system that easily identifies gestures, and use them for device control, or convey in formations. In this paper we are discussing researches done in the area of hand gesture recognition based on Artificial Neural Networks approaches. Several hand gesture recognition researches that use Neural Networks are discussed in this paper, comparisons between these methods were presented, advantages and drawbacks of the discussed methods also included, and implementation tools for each method were presented as well.
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
This document discusses hard computing and soft computing. Hard computing uses deterministic algorithms and mathematical models to produce accurate and predictable results, while soft computing can handle imprecision, uncertainty, and ambiguity. Soft computing techniques include fuzzy logic, neural networks, genetic algorithms, probabilistic reasoning, and evolutionary computation. These techniques aim to mimic human-like reasoning by tolerating uncertainty, learning and adapting, and integrating multiple methods. Examples of evolutionary computation algorithms provided are genetic algorithms, genetic programming, evolutionary strategies, differential evolution, and particle swarm optimization. Neural networks, ant colony optimization, and fuzzy logic are also summarized.
Identifier of human emotions based on convolutional neural network for assist...TELKOMNIKA JOURNAL
This paper proposes a solution for the problem of continuous prediction in real-time of the emotional state of a human user from the identification of characteristics in facial expressions. In robots whose main task is the care of people (children, sick or elderly people) is important to maintain a close relationship man-machine, anld a rapid response of the robot to the actions of the person under care. We propose to increase the level of intimacy of the robot, and its response to specific situations of the user, identifying in real time the emotion reflected by the person's face. This solution is integrated with algorithms of the research group related to the tracking of people for use on an assistant robot. The strategy used involves two stages of processing, the first involves the detection of faces using HOG and linear SVM, while the second identifies the emotion in the face using a CNN. The strategy was completely tested in the laboratory on our robotic platform, demonstrating high performance with low resource consumption. Through various controlled laboratory tests with different people, which forced a certain emotion on their faces, the scheme was able to identify the emotions with a success rate of 92%.
Deep learning for pose-invariant face detection in unconstrained environmentIJECEIAES
In the recent past, convolutional neural networks (CNNs) have seen resurgence and have performed extremely well on vision tasks. Visually the model resembles a series of layers each of which is processed by a function to form a next layer. It is argued that CNN first models the low level features such as edges and joints and then expresses higher level features as a composition of these low level features. The aim of this paper is to detect multi-view faces using deep convolutional neural network (DCNN). Implementation, detection and retrieval of faces will be obtained with the help of direct visual matching technology. Further, the probabilistic measure of the similarity of the face images will be done using Bayesian analysis. Experiment detects faces with ±90 degree out of plane rotations. Fine tuned AlexNet is used to detect pose invariant faces. For this work, we extracted examples of training from AFLW (Annotated Facial Landmarks in the Wild) dataset that involve 21K images with 24K annotations of the face.
Levelised Cost of Hydrogen (LCOH) Calculator ManualMassimo Talia
The aim of this manual is to explain the
methodology behind the Levelized Cost of
Hydrogen (LCOH) calculator. Moreover, this
manual also demonstrates how the calculator
can be used for estimating the expenses associated with hydrogen production in Europe
using low-temperature electrolysis considering different sources of electricity
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Accident detection system project report.pdfKamal Acharya
The Rapid growth of technology and infrastructure has made our lives easier. The
advent of technology has also increased the traffic hazards and the road accidents take place
frequently which causes huge loss of life and property because of the poor emergency facilities.
Many lives could have been saved if emergency service could get accident information and
reach in time. Our project will provide an optimum solution to this draw back. A piezo electric
sensor can be used as a crash or rollover detector of the vehicle during and after a crash. With
signals from a piezo electric sensor, a severe accident can be recognized. According to this
project when a vehicle meets with an accident immediately piezo electric sensor will detect the
signal or if a car rolls over. Then with the help of GSM module and GPS module, the location
will be sent to the emergency contact. Then after conforming the location necessary action will
be taken. If the person meets with a small accident or if there is no serious threat to anyone’s
life, then the alert message can be terminated by the driver by a switch provided in order to
avoid wasting the valuable time of the medical rescue team.
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Transcat
Join us for this solutions-based webinar on the tools and techniques for commissioning and maintaining PV Systems. In this session, we'll review the process of building and maintaining a solar array, starting with installation and commissioning, then reviewing operations and maintenance of the system. This course will review insulation resistance testing, I-V curve testing, earth-bond continuity, ground resistance testing, performance tests, visual inspections, ground and arc fault testing procedures, and power quality analysis.
Fluke Solar Application Specialist Will White is presenting on this engaging topic:
Will has worked in the renewable energy industry since 2005, first as an installer for a small east coast solar integrator before adding sales, design, and project management to his skillset. In 2022, Will joined Fluke as a solar application specialist, where he supports their renewable energy testing equipment like IV-curve tracers, electrical meters, and thermal imaging cameras. Experienced in wind power, solar thermal, energy storage, and all scales of PV, Will has primarily focused on residential and small commercial systems. He is passionate about implementing high-quality, code-compliant installation techniques.
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...DharmaBanothu
The Network on Chip (NoC) has emerged as an effective
solution for intercommunication infrastructure within System on
Chip (SoC) designs, overcoming the limitations of traditional
methods that face significant bottlenecks. However, the complexity
of NoC design presents numerous challenges related to
performance metrics such as scalability, latency, power
consumption, and signal integrity. This project addresses the
issues within the router's memory unit and proposes an enhanced
memory structure. To achieve efficient data transfer, FIFO buffers
are implemented in distributed RAM and virtual channels for
FPGA-based NoC. The project introduces advanced FIFO-based
memory units within the NoC router, assessing their performance
in a Bi-directional NoC (Bi-NoC) configuration. The primary
objective is to reduce the router's workload while enhancing the
FIFO internal structure. To further improve data transfer speed,
a Bi-NoC with a self-configurable intercommunication channel is
suggested. Simulation and synthesis results demonstrate
guaranteed throughput, predictable latency, and equitable
network access, showing significant improvement over previous
designs
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Blood finder application project report (1).pdfKamal Acharya
Blood Finder is an emergency time app where a user can search for the blood banks as
well as the registered blood donors around Mumbai. This application also provide an
opportunity for the user of this application to become a registered donor for this user have
to enroll for the donor request from the application itself. If the admin wish to make user
a registered donor, with some of the formalities with the organization it can be done.
Specialization of this application is that the user will not have to register on sign-in for
searching the blood banks and blood donors it can be just done by installing the
application to the mobile.
The purpose of making this application is to save the user’s time for searching blood of
needed blood group during the time of the emergency.
This is an android application developed in Java and XML with the connectivity of
SQLite database. This application will provide most of basic functionality required for an
emergency time application. All the details of Blood banks and Blood donors are stored
in the database i.e. SQLite.
This application allowed the user to get all the information regarding blood banks and
blood donors such as Name, Number, Address, Blood Group, rather than searching it on
the different websites and wasting the precious time. This application is effective and
user friendly.
AN APPLICATION OF CONVOLUTIONAL NEURAL NETWORKS ON HUMAN INTENTION PREDICTION
1. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
DOI: 10.5121/ijaia.2019.10501 1
AN APPLICATION OF CONVOLUTIONAL NEURAL
NETWORKS ON HUMAN INTENTION PREDICTION
Lin Zhang1
, Shengchao Li2
, Hao Xiong2
, Xiumin Diao2
and Ou Ma1
1
Department of Aerospace Engineering and Engineering Mechanics, University of
Cincinnati, Cincinnati, Ohio, USA
2
School of Engineering Technology, Purdue University, West Lafayette, Indiana, USA
ABSTRACT
Due to the rapidly increasing need of human-robot interaction (HRI), more intelligent robots are in
demand. However, the vast majority of robots can only follow strict instructions, which seriously restricts
their flexibility and versatility. A critical fact that strongly negates the experience of HRI is that robots
cannot understand human intentions. This study aims at improving the robotic intelligence by training it
to understand human intentions. Different from previous studies that recognizing human intentions from
distinctive actions, this paper introduces a method to predict human intentions before a single action is
completed. The experiment of throwing a ball towards designated targets are conducted to verify the
effectiveness of the method. The proposed deep learning based method proves the feasibility of applying
convolutional neural networks (CNN) under a novel circumstance. Experiment results show that the
proposed CNN-vote method out competes three traditional machine learning techniques. In current
context, the CNN-vote predictor achieves the highest testing accuracy with relatively less data needed.
KEYWORDS
Human-robot Interaction; Intentions Prediction; Convolutional Neural Networks;
1. INTRODUCTION
Robots are getting involved in many aspects of our lives, e.g. medication, education,
housekeeping, rescuing, etc.. They are expected to work as collaborators and/or assistants of
human beings [1]–[4]. In such context, human-robot interaction (HRI) becomes inevitable.
Understanding human intentions is essential for HRI and researchers have reported how HRI
tasks can be benefit from estimating human intentions [5]–[7]. Among all the intention
indicators, human behaviours or motions are rich in intention information. Indeed, human
intentions inference through motions has been introduced to several HRI instances. Kim et al.
proposed to use the cycle of action and neural network module to classify 4 categories of human
intentions [8]. Song et al. suggested using a probabilistic-based method recognize 4 classes of
human intentions based on their hand movement. They even provided optimized strategies for a
robot
that was involved in the same task [9]. In such studies one type of human motion can only result
in one intention. Thus the human intentions can be easily distinguished from recognizing their
body movement. In other scenarios, various intentions are hidden behind a single motion type,
therefore are more difficult to be estimated. Vasquez et al. proposed a hidden Markov models
(HMMs) to predict human intentions during walking [10]. Ziebart et al. employed an optimal
control approach to predict pointing intentions of computer users [11]. Under some extreme
circumstances, human intention needs to be predict ahead of the time. Wang et al. introduced a
2. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
2
table tennis robot with probabilistic algorithms, which was able to predict the targets of the ball
according to human player’s hits [12]. In this research, predicting a table tennis player’s
intention requires multiple high-speed, high-resolution cameras and a fast-response robotic
manipulator. Unfortunately, these apparatuses are not budget friendly and hence the experiment
cannot be easily replicated. In recent years, technologies of deep learning are drastically
developing. Among which, convolutional neural networks (CNN) is showing dominating power
in image pattern recognition tasks [13]. CNN is also played important roles in recognizing
human actions [14], [15] and human-robot interaction tasks [16], [17]. Whereas, most previous
studies equalize recognizing human intentions to distinguish human actions, limited studies
were able to find out subtle changes in body motion that lead to a different intention.
In this research, we propose a CNN based solution: CNN-vote, which predicts human intentions
behind a single action. In our previous research, we have demonstrated that binary intentions
behind a human action can be predicted by machine learning (ML) algorithms [18]. However,
when human intentions expanded, traditional ML experienced difficulties on such task. With
CNN’s powerful pattern recognition capability, we hypothesize that CNN-vote can predict the
one out of nine human intentions from the motion of tossing a ball. We propose to train our
CNN-vote predictor using RGB images obtained from a RGB-D camera and labeling the images
with human pitchers’ intended targets. The performance of CNN-vote human intention predictor
and the comparison with ML algorithms will be introduced later.
The rest of this article is arranged as follows. In section 2, the research methods are introduced,
including the experiment details, principle of CNN-vote and mechanism of three involved ML
algorithms. The experiment results are revealed in section 3. Some interesting and inspiring
facts are discussed in section 4. In Appendix, more experiment results are demonstrated for the
sake of comparison and analysis.
2. METHOD
2.1. A Thought Experiment
Let us imagine a pitch-and-block game in a 2D plane as shown in Figure 1. A human pitcher
throws a ball towards an unlimited-length rod hinged by a revolute joint in the front. The goal of
the hinged rod is preventing the ball passing the horizontal line. We have following
assumptions: 1) the ball and the rod are both moving in the same 2D plane; 2) the ball is moving
at a constant speed of 𝑣 and the rod is driven by a constant angular velocity, 𝜔; 3) the initial
distance between the ball and the horizontal line is 𝑑; 4) the angle between the ball’s movement
direction and the vertical line is 𝛼 ∈ the angle between the rod’s initial position
and the vertical line is 𝜃 = 0; 5) the rod started to rotate when distance between the ball
and the horizontal line is 𝑑 𝑓 ∈ [0, 𝑑].
The condition of a successful block is revealed in (1). When 𝑣 and 𝛼 is predetermined, an
appropriate 𝑑 𝑓 should be decided to satisfy this equation so that we guarantee a successful
block. When 𝑣 is increased, an unchanged 𝑑 𝑓 may not guarantee (1) is satisfied. In this case we
will have
to increase 𝑑 𝑓 to maintain (1), which means the faster the ball is thrown the earlier the rod needs
to be moved. In other words, we need predict the ball pitcher’s intention of throwing the ball
either to his/her left or right before minimal 𝑑 𝑓 is reached.
3. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
3
(1)
In our previous research [18], we successfully predicted binary human intentions with three ML
algorithm (KNN, SVM, MLP). However, as long as the human intentions increased, traditional
ML algorithm based predictors were experiencing difficulties.
Figure 1 Thought experiment: game of pitch-and-block
2.2. CNN-vote human intention predictor
We would like to introduce a convolutional neural network (CNN) based human intention
predictor. In this research 9 intentions are assigned to the action of pitching a ball. The 9
intentions are associated with 9 targets in front of the pitcher. Before each pitch, the human
pitcher announces the target he/she intend. The images of the pitcher’s motion are recorded and
labelled with his/her intention. The CNN-vote human intention predictor can be trained with the
labelled images. The data collecting and predictor training procedure can be viewed in Figure 2.
The CNN- vote predictor is composed by two parts: AlexNet and voter. The AlexNet is
responsible for extracting features from the images according to the specific intention [19].
Since an intention can be predicted by several images, the voter votes out the most popular
intention from the involved images.
4. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
4
Figure 2 Workflow of using learning based methods to predict human pitchers intention
2.2.1 AlexNet Structure
The architecture of CNN-vote predictor is illustrated in Figure 3 which reveals the role of
AlexNet. AlexNet takes in one image at a time and output the predicted human intention. The
overall network contains eight layers. The first five are convolutional layers and the last three
are fully- connected dense layers. The input of this network is a color image with a
dimension of 224 × 224 × 3 pixels. The output of the last dense layer is fed to a 9-way softmax
function which produces a distribution over 9 intention labels. The first layer filters the 224 ×
224 × 3 input image with 96 kernels of size 11 × 11 × 3 with stride of 4. The output of the first
convolutional layer is response-normalized and overlapping pooled. The second convolutional
layer takes the output of the first convolutional layer as the input and is filtered by 256 kernels
of size 5 × 5 × 96 with stride of 1. The output of the second convolutional layer is also response-
normalized and overlapping-pooled with the same setting as the first layer. The third
convolutional layer is filtered by 384 kernels of size 3 × 3 × 256 with stride of 1. The fourth
convolutional layer is filtered by 384 kernels of size 3 × 3 × 384 with stride of 1. The fifth
convolutional layer is filetered by 256 kernels of size 3 × 3 × 384 with stride of 1. There is no
intervening normalization or pooling operation after either the third, fourth or fifth
convolutional layer. The first and second fully- connected layers have 4096 neurons each. The
last layer has 9 neurons which is equal to the number of possible human intentions. A total
number of 58318217 parameters is included in our CNN-vote predictor.
2.2.2 Voter
For each pitching trial, N frames of color images, {𝑥⃗ 1,…, 𝑥⃗ 𝑖, … , 𝑥⃗ 𝑁} are served as input data
to the AlexNet frame by frame. These frames can be classified by AlexNet as {𝑦1, … , 𝑦𝑖, … ,
𝑦 𝑁}, where 𝑦𝑖 is the classified label of ith frame and 𝑦𝑖 ∈ {1,2,3,4,5,6,7,8,9}. Every single
number indicates a corresponding target, and the targets layout was shown in Figure 3. Due to
the low threshold of body movement detection, the initiation of the pitching can be detected
early in the whole trial. Thus, a heuristic discount weight, γ is arranged to all classification
results, and the prediction of the trial can be drawn as,
(2)
5. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
5
Figure 3 Architecture of CNN-vote human intention predictor
2.3. Experiment
2.3.1. Experiment settings
The whole experiment includes a human participant as the pitcher, a Microsoft Kinect V2 RGB-
D camera as the sensor, a whiteboard as the target, a computer as the data receiver and
processor. The experiment configuration is shown in Figure 4. A reference coordinate system is
set up and the origin of it is fixed to the color camera of the Kinect V2 sensor. The y axis is
pointing upward vertically, and z axis is pointing outward horizontally, thus x axis can be
determined by the right- hand rule. The dimension of the targets area on the whiteboard is 1.2
meters in width, 1.2 meters in height, with its center located at (0, 0.7, -0.7) meters. Each target
is a square patch with a side length of 0.4 meters. Only one human participant is involved in all
the experiments, who simultaneously plays as a ball pitcher and the target recorder. This is
acceptable because the method is about to train a robot to understand one person’s intention. For
a different person, it has to be re-trained. The sensor is placed 0.8 meters above the floor, and
the HP stands 4.2 meters in front of the sensor along the positive z axis. A house-made Matlab
program is coded to retrieve data from the pitching trials and the obtained data is pre-processed.
6. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
6
Figure 4 Experiment configuration
2.3.2. Experiment protocols
To warm up, the pitcher practices throwing ball to the target he intends to several times before
the recorded trials. The pitcher has to secure at least one successful pitching for every target.
The pitcher also practices for his standing pose for the phase before and after the pitching. In
each recorded trial,
1. Computer picks target randomly and inform the target to the pitcher before the trial
started.
2. The pitcher listens to the auditory cue and initiate throwing. The pitcher has to stand
still before initiating the throwing. The initial pose is not mandatory, hence can be
determined at pitcher’s preference.
3. The pitcher throws the ball toward the pre-noticed target within 5 seconds. The pitcher
is not allowed to perform any irrelevant action during and after throwing. We
recommend the pitcher to return to the initial pose.
4. Unless the pitcher or computer operator was not satisfied with the trial, recorded data
will be saved. Repeat step 1 through 4 until the data collection is finished.
In some trials, the pitcher throws the ball into the target that was not intended. We save both the
intended target and actual hit target. However, in this research, we only investigate the trials that
intended target match the actual hit target. Due to the large amount of data is required, the
pitcher is allowed to separate experiment into subsets because of fatigue.
2.4. Data Processing
A total of 292 trials of pitching were recorded, every trial recorded data in 5 seconds. Kinect V2
camera works under the frequency of 30 Hz, hence each trial contains 150 data points. A data
point is either a 75-dimentional (25 × 3) vector when it is in the form of joint positions, or it is a
tensor with shape of 224 × 224 × 3 when it is in the form of color image. The joint position data
represents the 3-D coordinates of 25 joints on the skeleton tracking model [20]. For every data
point, one in nine possible labels is arranged according to the pitching result of the trial.
The whole dataset is divided into two groups, 256 trials were used for training and the rest 36
trials were served for evaluation test. In the training dataset, intention #1 has 28 trials; intention
#2 has 30 trials; intention #3 has 26 trials; intention #4 has 31 trials; intention #5 has 28 trials;
7. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
7
intention #6 has 27 trials; intention #7 has 27 trials; intention #8 has 31 trials, and intention #9
has 28 trials. In the test dataset, each intention has 4 trials.
Every pitch generates 5 seconds that includes150 frames of data, but large amount of which are
irrelevant and useless (before and post pitching). Hence, we need to extract effective data from
all 150 frames. We proposed a heuristic way to do so. First, we need to detect pitching initiation
in every trial according to the change of the skeleton model. Suppose a pitcher’s joint positions
vector at frame t (where 0 ≤ t ≤ 149) is 𝑝⃗ 𝑡, then 𝑝⃗ 𝑡 stands for the initial joint positions. By
calculating L-2 distance between every 𝑝⃗ 𝑡 and 𝑝⃗ 0, we can track how pitchers body move
against his initial pose and obtained a body movement curve as shown in Figure 5. Scan this
curve with 20 frames sized window, if the curve is increasing monotonically and all the values
are greater than a threshold=1 in the window, then the first frame of the window will be defined
as the cutting- started frame. If we cut out the data from the cutting-started frame to a certain
frame (up to the 40th) after the cutting-started frame to form eight datasets based on subsets of
these 40 frames of data. These datasets were using first 5 frames; first 10 frames; first 15
frames; first 20 frames; first 25 frames; first 30 frames; first 35 frames, and all 40 frames of
data. A good predictor was expected to predict the intention of the pitching trial with higher
accuracy with less frames of data recruited. The kNN and SVM predictors are trained with a
Python module of scikit-learn [21], MLP and CNN-vote predictor is trained with the assistance
of a Python library: tensorflow [22].
Figure 5 Pitch initiation detection and dataset segmentation
3. EXPERIMENTAL RESULTS
We trained four types of predictors using two types of data on eight sub-segmented datasets.
Figure 6 illustrated the performance of all kinds of predictors with respect to different data
formats and various segmented datasets. Because The CNN-vote predictor announced a 0.7222
testing accuracy trained with 35 frames of color image data. Despite this, CNN-vote won all the
competitions against other predictors no matter what kind of data they relied on. Moreover,
CNN- vote even claimed a 0.5833 test accuracy with only 5 frames of color images which was
surprising and against the trend of more accurate predictor needs more data fed. The kNN, SVM
and MLP predictors showed better performance when fed with joint data than color image data.
The SVM predictor achieved over 0.5 of accuracy when 30 or more frames of joint data were
fed.
We can take a closer look at the CNN-vote predictor by Figure 7(a). Higher training accuracies
indicated that overfitting issue existed. In fact, by looking at all other predictors in Appendix,
overfitting was a global issue in this research. Fortunately, our later work claimed that data
augmentation depressed overfitting problem while slightly boosted performance of CNN-vote
predictor [23]. In Figure 7(b), a confusion matrix of CNN-vote predictions on test dataset is
illustrated. We found that even the mis-predicted intentions were adjacent to the true intention.
However, taking a peek at the appendix, the second best predictor, SVM predictor had some
mis- predictions two blocks away from the true intentions.
8. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
8
Figure 6 Performance of all types of predictors using different input data format, trained and tested on
different datasets. Orange bars indicate predictors trained with color image data, white bars indicate
predictors trained with joint data.
(a)CNN-vote: training accuracies vs. testing accuracies (b ) CNN-vote: confusion matrix
Figure 7 Performance of CNN-vote human intention predictor.
4. CONCLUSION
This research proposed a novel application of convolutional neural networks, CNN-vote, which
is able to predict a human pitcher’s intention of throwing a ball to a specific target among
various targets. The CNN-vote predictor outcompetes other traditional ML predictors and
achieves the highest test accuracy with relatively less information fed. Albeit CNNs are
frequently applied to classify images or videos containing various human activities, this study
cuts in from a new angle that introduces the CNNs to further understand a single human action
by subdividing it to learn the subtle changes of patterns behind the action.
As Alexnet only investigating spatial features in images, temporal features of the whole action
should be considered in the future. Deep neural networks with CNNs combined with recurrent
neural networks (RNNs) has been reported with some encouraging results [24]. We have not
introduced the RNNs into this study at the current stage due to the limitation of the
computational resource. However, a deeper predictor of CNN working together with RNN has
been already included in our blueprint. As more advanced deep learning architectures, such as
VGG-19 [25], Inception-V4 [26], etc. were invented after AlexNet was first published in 2012,
9. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
9
we are definitely looking forward to upgrade the Alexnet into a more advanced level. In
addition, overfitting
Problem requires more data to be collected, data augmentation and further tuning of the
predictors. Besides human intentions, human activities are also worth to be predicted [27]–[29].
We also expect the framework of our human intention predictor to be applied to such fields.
Although our current techniques of action detection are rudimentary, we are making progress of
building a more intelligent robot and we are definitely going to keep improving the techniques
in the next stages of research.
REFERENCES
[1] J. Forlizzi and C. DiSalvo, “Service robots in the domestic environment,” in Proceeding of the 1st
ACM SIGCHI/SIGART conference on Human-robot interaction - HRI ’06, 2006, p. 258.
[2] J. Bodner, H. Wykypiel, G. Wetscher, and T. Schmid, “First experiences with the da VinciTM
operating robot in thoracic surgery☆,” Eur. J. Cardio-Thoracic Surg., vol. 25, no. 5, pp. 844–851,
May 2004.
[3] M. J. Micire, “Evolution and field performance of a rescue robot,” J. F. Robot., vol. 25, no. 1–2, pp.
17–30, Jan. 2008.
[4] F. Mondada et al., “The e-puck , a Robot Designed for Education in Engineering,” in Robotics,
2009, vol. 1, no. 1, pp. 59–65.
[5] K. Wakita, J. Huang, P. Di, K. Sekiyama, and T. Fukuda, “Human-Walking-Intention-Based Motion
Control of an Omnidirectional-Type Cane Robot,” IEEE/ASME Trans. Mechatronics, vol. 18, no. 1,
pp. 285–296, Feb. 2013.
[6] K. Sakita, K. Ogawam, S. Murakami, K. Kawamura, and K. Ikeuchi, “Flexible cooperation between
human and robot by interpreting human intention from gaze information,” in 2004 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566),
2004, vol. 1, pp. 846–851.
[7] Z. Wang, A. Peer, and M. Buss, “An HMM approach to realistic haptic human-robot interaction,” in
World Haptics 2009 - Third Joint EuroHaptics conference and Symposium on Haptic Interfaces for
Virtual Environment and Teleoperator Systems, 2009, pp. 374–379.
[8] S. Kim, Z. Yu, J. Kim, A. Ojha, and M. Lee, “Human-Robot Interaction Using Intention
Recognition,” in Proceedings of the 3rd International Conference on Human-Agent Interaction,
2015, pp. 299–302.
[9] D. Song et al., “Predicting human intention in visual observations of hand/object interactions,” in
2013 IEEE International Conference on Robotics and Automation, 2013, pp. 1608–1615.
[10] D. Vasquez, T. Fraichard, O. Aycard, and C. Laugier, “Intentional motion on-line learning and
prediction,” Mach. Vis. Appl., vol. 19, no. 5–6, pp. 411–425, Oct. 2008.
[11] B. Ziebart, A. Dey, and J. A. Bagnell, “Probabilistic pointing target prediction via inverse optimal
control,” in Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
- IUI ’12, 2012, p. 1.
[12] Z. Wang et al., “Probabilistic movement modeling for intention inference in human–robot
interaction,” Int. J. Rob. Res., vol. 32, no. 7, pp. 841–858, 2013.
[13] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.
10. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
10
[14] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-Scale
Video Classification with Convolutional Neural Networks,” in 2014 IEEE Conference on Computer
Vision and Pattern Recognition, 2014, pp. 1725–1732.
[15] X. Wang, L. Gao, P. Wang, X. Sun, and X. Liu, “Two-Stream 3-D convNet Fusion for Action
Recognition in Videos With Arbitrary Size and Length,” IEEE Trans. Multimed., vol. 20, no. 3, pp.
634–644, Mar. 2018.
[16] P. Barros, C. Weber, and S. Wermter, “Emotional expression recognition with a cross-channel
convolutional neural network for human-robot interaction,” in 2015 IEEE-RAS 15th International
Conference on Humanoid Robots (Humanoids), 2015, pp. 582–587.
[17] A. H. Qureshi, Y. Nakamura, Y. Yoshikawa, and H. Ishiguro, “Show, attend and interact:
Perceivable human-robot social interaction through neural attention Q-network,” in 2017 IEEE
International Conference on Robotics and Automation (ICRA), 2017, pp. 1639–1645.
[18] L. Zhang, X. Diao, and O. Ma, “A Preliminary Study on a Robot’s Prediction of Human Intention,”
7th Annu. IEEE Int. Conf. CYBER Technol. Autom. Control. Intell. Syst., pp. 1446– 1450, 2017.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional
Neural Networks,” Adv. Neural Inf. Process. Syst. (NIPS 2012), p. 4, 2012.
[20] J. Shotton et al., “Real-time human pose recognition in parts from single depth images,” in CVPR
2011, 2011, vol. 411, pp. 1297–1304.
[21] F. Pedregosa et al., “Scikit-learn: Machine Learning in Pythons,” J. Mach. Learn. Res., vol. 12, no.
6, pp. 2825–2830, May 2011.
[22] M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed
Systems,” 2016.
[23] S. Li, L. Zhang, and X. Diao, “Improving Human Intention Prediction Using Data Augmentation,”
in HRI 2018 WORKSHOP ON SOCIAL HUMAN-ROBOT INTERACTION OF HUMAN-CARE
SERVICE ROBOTS, 2018.
[24] J. Donahue et al., “Long-term recurrent convolutional networks for visual recognition and
description,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
2015, pp. 2625–2634.
[25] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image
Recognition,” Inf. Softw. Technol., vol. 51, no. 4, pp. 769–784, Sep. 2014.
[26] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact
of Residual Connections on Learning,” Pattern Recognit. Lett., vol. 42, pp. 11–24, Feb. 2016.
[27] P. Munya, C. A. Ntuen, E. H. Park, and J. H. Kim, “A BAYESIAN ABDUCTION MODEL FOR
EXTRACTING THE MOST PROBABLE EVIDENCE TO SUPPORT SENSEMAKING,” Int. J.
Artif. Intell. Appl., vol. 6, no. 1, p. 1, 2015.
[28] J. A. Morales and D. Akopian, “Human activity tracking by mobile phones through hebbian
learning,” Int. J. Artif. Intell. Appl., vol. 7, no. 6, pp. 1–16, 2016.
[29] C. Lee and M. Jung, “PREDICTING MOVIE SUCCESS FROM SEARCH QUERY USING
SUPPORT VECTOR REGRESSION METHOD,” Int. J. Artif. Intell. Appl. (IJAIA)., vol. 7, no. 1,
2016.
11. International Journal of Artificial Intelligence & Applications (IJAIA) Vol.10, No.5, September 2019
11
AUTHORS
Lin Zhang as Sr. Research Associate is currently working in Intelligent Robotics and
Autonomous System Lab at University of Cincinnati. His major research interest is
reinforcement learning and its application in robotics.
Shengchao Li as Ph.D student is doing research in DeePURobotics Lab at Purdue
University. His major research interest is deep neural networks and image processing.
Hao Xiong as Ph.D candidate is doing research in DeePURobotics Lab at Purdue
University. His major research interest is dynamics and control of cable-driven robot
and its application in rehabilitation robots.
Xiumin Diao as Assistant Professor is supervising and directing DeePURobotics Lab at
Purdue University. His research interests are dynamics and control of cable-driven robot
and intelligent robotics.
Ou Ma as Professor is supervising and directing Intelligent Robotics and Autonomous
System Lab at University of Cincinnati. His research interests are multibody dynamics
and control, impact-contact dynamics, intelligent control of robotics, autonomous
systems, human-robot interaction,etc..