○Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, and Tetsunari Inamura, "Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2017), 2017.
Video: https://youtu.be/hVKQCdbRQVM
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
Neural Radiance Flow (NeRFlow) is a method that extends Neural Radiance Fields (NeRF) to model dynamic scenes from video data. NeRFlow simultaneously learns two fields - a radiance field to reconstruct images like NeRF, and a flow field to model how points in space move over time using optical flow. This allows it to generate novel views from a new time point. The model is trained end-to-end by minimizing losses for color reconstruction from volume rendering and optical flow reconstruction. However, the method requires training separate models for each scene and does not generalize to unknown scenes.
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES sipij
Human action recognition is still a challenging problem and researchers are focusing to investigate this
problem using different techniques. We propose a robust approach for human action recognition. This is
achieved by extracting stable spatio-temporal features in terms of pairwise local binary pattern (P-LBP)
and scale invariant feature transform (SIFT). These features are used to train an MLP neural network
during the training stage, and the action classes are inferred from the test videos during the testing stage.
The proposed features well match the motion of individuals and their consistency, and accuracy is higher
using a challenging dataset. The experimental evaluation is conducted on a benchmark dataset commonly
used for human action recognition. In addition, we show that our approach outperforms individual features
i.e. considering only spatial and only temporal feature.
This document reviews object detection techniques for mobile robot navigation in dynamic indoor environments. It begins with an abstract that outlines the purpose of object detection for mobile robots and provides an overview of different techniques. It then reviews object detection approaches in two main categories: local feature-based techniques that use features like color, shape and templates, and deep learning-based techniques that use neural networks for object proposals or one-shot detection. Key algorithms discussed include SIFT, SURF, R-CNN, Fast R-CNN, Faster R-CNN, YOLO and SSD. The challenges of object detection and applications for mobile robot navigation are also mentioned.
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object DetectionDeep Learning JP
This paper proposes a new method called PV-RCNN for 3D object detection from point clouds. It introduces two key modules: 1) A voxel-to-keypoint scene encoding module that extracts feature vectors for keypoints by combining features from voxel CNNs and point networks. 2) A RoI grid pooling module that computes feature vectors for regions of interest (RoIs) from the keypoint features to refine detections. Experiments on KITTI and Waymo datasets demonstrate that PV-RCNN achieves state-of-the-art performance for 3D object detection from point clouds.
Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and Unsupervised Word Discovery from Spoken Sentences", IEEE Transactions on Cognitive and Developmental Systems, Vol. 8, No. 4, pp285-297, 2016.
Implementing kohonen's som with missing data in OTBmelaneum
The document discusses implementing Kohonen's self-organizing map (SOM) algorithm to handle missing and erroneous data in time series. It describes the SOM properties and training process. It also provides an example of a MODIS time series over Brittany, France with missing, erroneous, and clean data points to which the modifications would be applied. Finally, it discusses the benefits of implementing the SOM in a generic programming approach.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
160205 NeuralArt - Understanding Neural RepresentationJunho Cho
The document summarizes three papers on neural representations presented at a seminar:
1. Texture synthesis using convolutional neural networks (CNNs) to generate new texture samples matching a source texture based on gram matrices of CNN feature maps.
2. Reconstructing images from feature maps of CNNs trained on object recognition to understand neural representations.
3. A neural algorithm of artistic style that combines the content of one image and style of another using CNN representations of content and style.
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
Neural Radiance Flow (NeRFlow) is a method that extends Neural Radiance Fields (NeRF) to model dynamic scenes from video data. NeRFlow simultaneously learns two fields - a radiance field to reconstruct images like NeRF, and a flow field to model how points in space move over time using optical flow. This allows it to generate novel views from a new time point. The model is trained end-to-end by minimizing losses for color reconstruction from volume rendering and optical flow reconstruction. However, the method requires training separate models for each scene and does not generalize to unknown scenes.
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES sipij
Human action recognition is still a challenging problem and researchers are focusing to investigate this
problem using different techniques. We propose a robust approach for human action recognition. This is
achieved by extracting stable spatio-temporal features in terms of pairwise local binary pattern (P-LBP)
and scale invariant feature transform (SIFT). These features are used to train an MLP neural network
during the training stage, and the action classes are inferred from the test videos during the testing stage.
The proposed features well match the motion of individuals and their consistency, and accuracy is higher
using a challenging dataset. The experimental evaluation is conducted on a benchmark dataset commonly
used for human action recognition. In addition, we show that our approach outperforms individual features
i.e. considering only spatial and only temporal feature.
This document reviews object detection techniques for mobile robot navigation in dynamic indoor environments. It begins with an abstract that outlines the purpose of object detection for mobile robots and provides an overview of different techniques. It then reviews object detection approaches in two main categories: local feature-based techniques that use features like color, shape and templates, and deep learning-based techniques that use neural networks for object proposals or one-shot detection. Key algorithms discussed include SIFT, SURF, R-CNN, Fast R-CNN, Faster R-CNN, YOLO and SSD. The challenges of object detection and applications for mobile robot navigation are also mentioned.
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object DetectionDeep Learning JP
This paper proposes a new method called PV-RCNN for 3D object detection from point clouds. It introduces two key modules: 1) A voxel-to-keypoint scene encoding module that extracts feature vectors for keypoints by combining features from voxel CNNs and point networks. 2) A RoI grid pooling module that computes feature vectors for regions of interest (RoIs) from the keypoint features to refine detections. Experiments on KITTI and Waymo datasets demonstrate that PV-RCNN achieves state-of-the-art performance for 3D object detection from point clouds.
Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and Unsupervised Word Discovery from Spoken Sentences", IEEE Transactions on Cognitive and Developmental Systems, Vol. 8, No. 4, pp285-297, 2016.
Implementing kohonen's som with missing data in OTBmelaneum
The document discusses implementing Kohonen's self-organizing map (SOM) algorithm to handle missing and erroneous data in time series. It describes the SOM properties and training process. It also provides an example of a MODIS time series over Brittany, France with missing, erroneous, and clean data points to which the modifications would be applied. Finally, it discusses the benefits of implementing the SOM in a generic programming approach.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
160205 NeuralArt - Understanding Neural RepresentationJunho Cho
The document summarizes three papers on neural representations presented at a seminar:
1. Texture synthesis using convolutional neural networks (CNNs) to generate new texture samples matching a source texture based on gram matrices of CNN feature maps.
2. Reconstructing images from feature maps of CNNs trained on object recognition to understand neural representations.
3. A neural algorithm of artistic style that combines the content of one image and style of another using CNN representations of content and style.
A survey on moving object tracking in videoijitjournal
The ongoing research on object tracking in video sequences has attracted many researchers. Detecting
the objects in the video and tracking its motion to identify its characteristics has been emerging as a
demanding research area in the domain of image processing and computer vision. This paper proposes a
literature review on the state of the art tracking methods, categorize them into different categories, and
then identify useful tracking methods. Most of the methods include object segmentation using background
subtraction. The tracking strategies use different methodologies like Mean-shift, Kalman filter, Particle
filter etc. The performance of the tracking methods vary with respect to background information. In this
survey, we have discussed the feature descriptors that are used in tracking to describe the appearance of
objects which are being tracked as well as object detection techniques. In this survey, we have classified
the tracking methods into three groups, and a providing a detailed description of representative methods in
each group, and find out their positive and negative aspects.
This document introduces PIRF-Nav, an online incremental appearance-based localization and mapping system for dynamic environments. PIRF-Nav uses Position-invariant Robust Features (PIRFs) to represent places, which are extracted from image sequences and are robust against changes like illumination and camera position. PIRF-Nav can perform simultaneous localization and mapping incrementally and in real-time without needing an offline dictionary generation process. It achieves higher recall rates than previous methods at 100% precision even with significant dynamic changes in environments. The document outlines the basic concept and processing steps of PIRF-Nav.
This document describes a new method for analyzing infant spontaneous motor patterns using a Kinect sensor and tracking algorithm. The Kinect is used to record 3D video of infants' limbs in motion without any body markers. Custom software then tracks limb positions over time and calculates kinematic measures like velocity and movement units. Initial results show the method can accurately capture and quantify limb movements and correlations between limbs. The goal is to use this non-invasive tracking to study developmental changes in infants' movement patterns from 2-24 weeks of age.
The document discusses motion planning for robot manipulators. It introduces the canonical problem of motion planning, which is to find a collision-free path between an initial and final configuration while avoiding obstacles. It describes how the configuration space represents all possible configurations of the robot as points in a space. Examples are given of how the configuration space represents different types of robots, such as mobile robots and manipulators. Planning techniques for solving motion planning problems in configuration space are then discussed.
The document summarizes Junho Cho's presentation on image translation using generative adversarial networks (GANs). It discusses several papers on this topic, including pix2pix, which uses conditional GANs to perform supervised image-to-image translation on paired datasets; Domain Transfer Network (DTN), which uses an unsupervised method to perform cross-domain image generation; and CycleGAN and DiscoGAN, which can perform unpaired image-to-image translation using cycle-consistent adversarial networks. The presentation provides an overview of each method and shows examples of their applications to tasks such as semantic segmentation, style transfer, and domain adaptation.
This is a slide for IEEE International Conference on Computational Photography (ICCP) 2016 in Northwestern University.
See for details: http://omilab.naist.jp/project/LFseg/
A robot may need to use a tool to solve a complex problem. Currently, tool use must be pre-programmed by a human. However, this is a difficult task and can be helped if the robot is able to learn how to use a tool by itself. Most of the work in tool use learning by a robot is done using a feature-based representation. Despite many successful results, this representation is limited in the types of tools and tasks that can be handled. Furthermore, the complex relationship between a tool and other world objects cannot be captured easily. Relational learning methods have been proposed to overcome these weaknesses [1, 2]. However, they have only been evaluated in a sensor-less simulation to avoid the complexities and uncertainties of the real world. We present a real world implementation of a relational tool use learning system for a robot. In our experiment, a robot requires around ten examples to learn to use a hook-like tool to pull a cube from a narrow tube.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
In recent years, there has been an increasing use of social media among people in Myanmar and writing review on social media pages about the product, movie, and trip are also popular among people. Moreover, most of the people are going to find the review pages about the product they want to buy before deciding whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very important and time consuming for people. Sentiment analysis is one of the important processes for extracting useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar Language.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
The document presents a lifelong federated reinforcement learning (LFRL) architecture for navigation in cloud robotic systems. LFRL allows robots to fuse their experience and transfer knowledge so they can effectively use prior knowledge and quickly adapt to new environments. It proposes a knowledge fusion algorithm to upgrade a shared model on the cloud by fusing private models from robots. It also introduces effective transfer learning methods to help robots rapidly adapt to new environments. Experiments show LFRL improves the efficiency of reinforcement learning for robot navigation. A cloud robotic navigation website is also presented to demonstrate LFRL.
PROLOGUSED TO REPRESENT AND REASON QUALITATIVELYOVER A SPACE DOMAINijaia
Spatial reasoning is a relevant topic in artificial intelligence with applications in geographical InformationSystem, robotics, content-based image retrieval, traffic engineering. Additionally formal representation of knowledge allows the processing in a computer. Prolog is a programming language used in artificial intelligence that is useful to represent knowledge and perform a search, by asking questions in the knowledge base. Prolog can be used to develop a variety of applications like check the consistency or to perform any kind of reasoning. This article proposes the use of Prolog as a representation model and a reasoning engine to describe the topological relations between several objects in a geographic space, using the RCC model. The application of this simplifies the constructionprogram, allows us to focus on the spatial problem.
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
This document presents research on developing an automatic lip reading system using convolutional neural networks. The system takes in video frames of a speaker's face without audio and classifies the words or phrases being spoken. The researchers preprocessed the data by detecting faces in video frames and cropping them. They then trained a CNN model on concatenated frames. Their model achieved 80.44% accuracy on the test set in classifying 10 words and 10 phrases from 17 speakers. The researchers concluded the model could be improved by addressing overfitting to unseen speakers with a larger dataset and regularization techniques.
Advanced Robotics Projects For Undergraduate StudentsEmily Smith
This document summarizes advanced robotics projects that have been or could be successfully implemented by undergraduate students in a one or two semester course. It explores what makes a good undergraduate advanced project, examples of projects done by students, and the benefits of such projects. Key frameworks like Pyro are discussed that allow complex projects by providing tools for robot communication and control architectures. Example successful student projects discussed include a tour guide robot, replications of developmental robotics models, and original research.
This document summarizes research on sign language recognition systems. It discusses previous work on image-based sign language recognition using approaches like colored gloves, geometric feature extraction, and orientation histograms. It then describes the proposed system, an Android application that uses hand gesture recognition with real-time text and speech conversion. Key steps include gesture extraction using background subtraction and blob detection, gesture matching, and text-to-speech conversion. The system allows users to define their own sign language database to facilitate communication across different sign languages.
IRJET- ASL Language Translation using MLIRJET Journal
This document presents a survey of technologies for hand sign language recognition and translation to text using machine learning. It discusses using CNN models to identify hand gestures in real-time from video input and translate the gestures to words rather than individual letters for better communication between deaf and hearing people. The system architecture involves hand detection, gesture recognition using a CNN model, and a login system for users. Previous approaches discussed include using sequential pattern mining and hidden Markov models on extracted motion features from video frames. The goal is to build an effective communication medium between deaf and hearing individuals.
Language-Based Actions for Self-Driving RobotIRJET Journal
This document describes a framework for a self-driving robot to follow natural language commands. The framework uses sequence modeling to learn the meaning of sentences describing a path and identify relevant objects and prepositions. It then uses this information in the cognizance phase to generate a path and move the robot to accomplish the navigational goal described in the input sentence. The researchers created a virtual environment using Unity game engine to simulate the robot and collect training data on floor plans, sentences, and robot paths. They preprocessed this data and used hidden Markov models and a probabilistic graphical model to represent the temporal segments of sentences and learn the relationships between objects for sequence modeling.
The document describes an intelligent query processing system for the Malayalam language. It presents a model for developing such a system, focusing on time inquiries for different transportation modes. The system performs shallow syntactic and semantic analysis of queries. It determines the query type and required result slots. SQL queries are generated to retrieve answers from the database. The system architecture includes morphological analysis, shallow parsing, query frame identification, SQL generation, and answer retrieval. It was evaluated on 70 queries with 87.5% precision.
Arabic named entity recognition using deep learning approachIJECEIAES
Most of the Arabic Named Entity Recognition (NER) systems depend massively on external resources and handmade feature engineering to achieve state-of-the-art results. To overcome such limitations, we proposed, in this paper, to use deep learning approach to tackle the Arabic NER task. We introduced a neural network architecture based on bidirectional Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) and experimented with various commonly used hyperparameters to assess their effect on the overall performance of our system. Our model gets two sources of information about words as input: pre-trained word embeddings and character-based representations and eliminated the need for any task-specific knowledge or feature engineering. We obtained state-of-the-art result on the standard ANERcorp corpus with an F1 score of 90.6%.
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Sharmila Sathish
This document discusses using deep learning techniques for multi-modal feature extraction. It proposes a multi-modal neural network with independent sub-networks for each data mode. It also discusses using a bi-directional GRU network for English word segmentation to effectively solve long-distance dependency issues while reducing training and prediction time compared to bi-directional LSTM. Experimental results showed the proposed multi-modal fusion model can effectively extract low-dimensional fused features from original high-dimensional multi-modal data.
A survey on moving object tracking in videoijitjournal
The ongoing research on object tracking in video sequences has attracted many researchers. Detecting
the objects in the video and tracking its motion to identify its characteristics has been emerging as a
demanding research area in the domain of image processing and computer vision. This paper proposes a
literature review on the state of the art tracking methods, categorize them into different categories, and
then identify useful tracking methods. Most of the methods include object segmentation using background
subtraction. The tracking strategies use different methodologies like Mean-shift, Kalman filter, Particle
filter etc. The performance of the tracking methods vary with respect to background information. In this
survey, we have discussed the feature descriptors that are used in tracking to describe the appearance of
objects which are being tracked as well as object detection techniques. In this survey, we have classified
the tracking methods into three groups, and a providing a detailed description of representative methods in
each group, and find out their positive and negative aspects.
This document introduces PIRF-Nav, an online incremental appearance-based localization and mapping system for dynamic environments. PIRF-Nav uses Position-invariant Robust Features (PIRFs) to represent places, which are extracted from image sequences and are robust against changes like illumination and camera position. PIRF-Nav can perform simultaneous localization and mapping incrementally and in real-time without needing an offline dictionary generation process. It achieves higher recall rates than previous methods at 100% precision even with significant dynamic changes in environments. The document outlines the basic concept and processing steps of PIRF-Nav.
This document describes a new method for analyzing infant spontaneous motor patterns using a Kinect sensor and tracking algorithm. The Kinect is used to record 3D video of infants' limbs in motion without any body markers. Custom software then tracks limb positions over time and calculates kinematic measures like velocity and movement units. Initial results show the method can accurately capture and quantify limb movements and correlations between limbs. The goal is to use this non-invasive tracking to study developmental changes in infants' movement patterns from 2-24 weeks of age.
The document discusses motion planning for robot manipulators. It introduces the canonical problem of motion planning, which is to find a collision-free path between an initial and final configuration while avoiding obstacles. It describes how the configuration space represents all possible configurations of the robot as points in a space. Examples are given of how the configuration space represents different types of robots, such as mobile robots and manipulators. Planning techniques for solving motion planning problems in configuration space are then discussed.
The document summarizes Junho Cho's presentation on image translation using generative adversarial networks (GANs). It discusses several papers on this topic, including pix2pix, which uses conditional GANs to perform supervised image-to-image translation on paired datasets; Domain Transfer Network (DTN), which uses an unsupervised method to perform cross-domain image generation; and CycleGAN and DiscoGAN, which can perform unpaired image-to-image translation using cycle-consistent adversarial networks. The presentation provides an overview of each method and shows examples of their applications to tasks such as semantic segmentation, style transfer, and domain adaptation.
This is a slide for IEEE International Conference on Computational Photography (ICCP) 2016 in Northwestern University.
See for details: http://omilab.naist.jp/project/LFseg/
A robot may need to use a tool to solve a complex problem. Currently, tool use must be pre-programmed by a human. However, this is a difficult task and can be helped if the robot is able to learn how to use a tool by itself. Most of the work in tool use learning by a robot is done using a feature-based representation. Despite many successful results, this representation is limited in the types of tools and tasks that can be handled. Furthermore, the complex relationship between a tool and other world objects cannot be captured easily. Relational learning methods have been proposed to overcome these weaknesses [1, 2]. However, they have only been evaluated in a sensor-less simulation to avoid the complexities and uncertainties of the real world. We present a real world implementation of a relational tool use learning system for a robot. In our experiment, a robot requires around ten examples to learn to use a hook-like tool to pull a cube from a narrow tube.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
In recent years, there has been an increasing use of social media among people in Myanmar and writing review on social media pages about the product, movie, and trip are also popular among people. Moreover, most of the people are going to find the review pages about the product they want to buy before deciding whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very important and time consuming for people. Sentiment analysis is one of the important processes for extracting useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar Language.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
The document presents a lifelong federated reinforcement learning (LFRL) architecture for navigation in cloud robotic systems. LFRL allows robots to fuse their experience and transfer knowledge so they can effectively use prior knowledge and quickly adapt to new environments. It proposes a knowledge fusion algorithm to upgrade a shared model on the cloud by fusing private models from robots. It also introduces effective transfer learning methods to help robots rapidly adapt to new environments. Experiments show LFRL improves the efficiency of reinforcement learning for robot navigation. A cloud robotic navigation website is also presented to demonstrate LFRL.
PROLOGUSED TO REPRESENT AND REASON QUALITATIVELYOVER A SPACE DOMAINijaia
Spatial reasoning is a relevant topic in artificial intelligence with applications in geographical InformationSystem, robotics, content-based image retrieval, traffic engineering. Additionally formal representation of knowledge allows the processing in a computer. Prolog is a programming language used in artificial intelligence that is useful to represent knowledge and perform a search, by asking questions in the knowledge base. Prolog can be used to develop a variety of applications like check the consistency or to perform any kind of reasoning. This article proposes the use of Prolog as a representation model and a reasoning engine to describe the topological relations between several objects in a geographic space, using the RCC model. The application of this simplifies the constructionprogram, allows us to focus on the spatial problem.
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
This document presents research on developing an automatic lip reading system using convolutional neural networks. The system takes in video frames of a speaker's face without audio and classifies the words or phrases being spoken. The researchers preprocessed the data by detecting faces in video frames and cropping them. They then trained a CNN model on concatenated frames. Their model achieved 80.44% accuracy on the test set in classifying 10 words and 10 phrases from 17 speakers. The researchers concluded the model could be improved by addressing overfitting to unseen speakers with a larger dataset and regularization techniques.
Advanced Robotics Projects For Undergraduate StudentsEmily Smith
This document summarizes advanced robotics projects that have been or could be successfully implemented by undergraduate students in a one or two semester course. It explores what makes a good undergraduate advanced project, examples of projects done by students, and the benefits of such projects. Key frameworks like Pyro are discussed that allow complex projects by providing tools for robot communication and control architectures. Example successful student projects discussed include a tour guide robot, replications of developmental robotics models, and original research.
This document summarizes research on sign language recognition systems. It discusses previous work on image-based sign language recognition using approaches like colored gloves, geometric feature extraction, and orientation histograms. It then describes the proposed system, an Android application that uses hand gesture recognition with real-time text and speech conversion. Key steps include gesture extraction using background subtraction and blob detection, gesture matching, and text-to-speech conversion. The system allows users to define their own sign language database to facilitate communication across different sign languages.
IRJET- ASL Language Translation using MLIRJET Journal
This document presents a survey of technologies for hand sign language recognition and translation to text using machine learning. It discusses using CNN models to identify hand gestures in real-time from video input and translate the gestures to words rather than individual letters for better communication between deaf and hearing people. The system architecture involves hand detection, gesture recognition using a CNN model, and a login system for users. Previous approaches discussed include using sequential pattern mining and hidden Markov models on extracted motion features from video frames. The goal is to build an effective communication medium between deaf and hearing individuals.
Language-Based Actions for Self-Driving RobotIRJET Journal
This document describes a framework for a self-driving robot to follow natural language commands. The framework uses sequence modeling to learn the meaning of sentences describing a path and identify relevant objects and prepositions. It then uses this information in the cognizance phase to generate a path and move the robot to accomplish the navigational goal described in the input sentence. The researchers created a virtual environment using Unity game engine to simulate the robot and collect training data on floor plans, sentences, and robot paths. They preprocessed this data and used hidden Markov models and a probabilistic graphical model to represent the temporal segments of sentences and learn the relationships between objects for sequence modeling.
The document describes an intelligent query processing system for the Malayalam language. It presents a model for developing such a system, focusing on time inquiries for different transportation modes. The system performs shallow syntactic and semantic analysis of queries. It determines the query type and required result slots. SQL queries are generated to retrieve answers from the database. The system architecture includes morphological analysis, shallow parsing, query frame identification, SQL generation, and answer retrieval. It was evaluated on 70 queries with 87.5% precision.
Arabic named entity recognition using deep learning approachIJECEIAES
Most of the Arabic Named Entity Recognition (NER) systems depend massively on external resources and handmade feature engineering to achieve state-of-the-art results. To overcome such limitations, we proposed, in this paper, to use deep learning approach to tackle the Arabic NER task. We introduced a neural network architecture based on bidirectional Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) and experimented with various commonly used hyperparameters to assess their effect on the overall performance of our system. Our model gets two sources of information about words as input: pre-trained word embeddings and character-based representations and eliminated the need for any task-specific knowledge or feature engineering. We obtained state-of-the-art result on the standard ANERcorp corpus with an F1 score of 90.6%.
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Sharmila Sathish
This document discusses using deep learning techniques for multi-modal feature extraction. It proposes a multi-modal neural network with independent sub-networks for each data mode. It also discusses using a bi-directional GRU network for English word segmentation to effectively solve long-distance dependency issues while reducing training and prediction time compared to bi-directional LSTM. Experimental results showed the proposed multi-modal fusion model can effectively extract low-dimensional fused features from original high-dimensional multi-modal data.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
The document describes a lecture on deep learning for information processing and artificial intelligence given by Li Deng at Tianjin University in China from July 2-5, 2013. The lecture covered the basics of deep learning, including restricted Boltzmann machines, deep belief networks, deep neural networks, and applications to speech recognition, language modeling, and other domains. It also provided references to related tutorials, books, and research groups working on deep learning techniques.
Anatomical Survey Based Feature Vector for Text Pattern DetectionIJEACS
The vital objective of artificial intelligence is to discover and understand the human competences, one of which is the capability to distinguish several text objects within one or more images exhibited on any canvas including prints, videos or electronic displays. Multimedia data has increased rapidly in past years. Textual information present in multimedia contains important information about the image/video content. However it needs to technologically verify the commonly used human intelligence of detecting and differentiating the text within an image, for computers. Hence in this paper feature set based on anatomical study of human text detection system is proposed.
In this we have studied survey of how NASA build their first Machine Learning enabled Rover to send it on Mars. Hope you Like it! If any improvements or copyright content removal needed feel free to communicate.
Similar to [IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping (20)
Simultaneous Estimation of Self-position and Word from Noisy Utterances and S...Akira Taniguchi
Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information", 13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems (IFAC HMS2016), Aug, 2016. in Kyoto, Japan.
Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...Akira Taniguchi
The document proposes a method called SLAM-SBE that allows a mobile robot to simultaneously estimate its location, map its environment, and determine its own body shape using only sensory-motor information. The method extends existing SLAM techniques by including a node for self-body information represented by an occupancy grid. An experiment shows that a robot is able to recursively update its estimated body shape over time based on its position in the environmental map. The estimated shapes were similar but not identical to the robot's actual shapes, with some errors due to sensor limitations. Future work could improve accuracy and apply the approach to 3D or multi-joint robotic systems.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping
1. Online Spatial Concept and
Lexical Acquisition with
Simultaneous Localization and
Mapping
IROS2017@Vancouver
1
Akira Taniguchi *, Yoshinobu Hagiwara *, Tadahiro Taniguchi *,
Tetsunari Inamura **
* Ritsumeikan University, Japan. (E-mail: a.taniguchi@em.ci.ritsumei.ac.jp)
** National Institute of Informatics / The Graduate University for Advanced
Studies, Japan.
2. Research background
Robots coexisting with humans and operating in various
environments are required to adaptively learn the
spatial concepts (place categories and a lexicon) while
incrementally generating an environmental map.
◦ Spatial concepts are such that their target domain may be
unclear and may differ according to the user and environment.
◦ Therefore, it is difficult to manually design spatial concepts in
advance, and it is desirable for robots to autonomously learn
spatial concepts based on their own experiences.
2
Which area is
the same place?
What scenery
can I see?
What is the name
of this place?
3. Spatial concept
Spatial concept based on multimodal information
◦ Word information (Place names)
◦ Place information (Position distribution)
◦ Image information (Visual features)
Meething room
Laboratory
Elevator hall
3
4. [Taniguchi 16] Taniguchi, A et.al. Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and
Unsupervised Word Discovery from Spoken Sentences, IEEE TCDS, Vol. 8, No. 4, pp. 285–297 (2016)
Previous method : SpCoA [Taniguchi 16]
Nonparametric Bayesian spatial concept acquisition method
The main features
• This model can learn unknown words from continuous speech signals.
• This model can learn an appropriate number of spatial concepts,
depending on the data. (using nonparametric Beysian approach)
• This model can learn many-to-many correspondences between names and
places by relating several places to several names via spatial concepts.
/sofamae//hoNdana//qgeNkaN/
/kiqchiN/
/daidokoro/ /terebimae//gomibakoo/
/tereburunoatari/
4
Learning result in Japanese
5. Previous method : SpCoA [Taniguchi 16]
Nonparametric Bayesian spatial concept acquisition method
◦ Batch learning
◦ The robot learns the spatial concepts after getting sufficient data while
moving the environment.
◦ Environmental map
◦ This method cannot learn spatial concepts from unknown
environments without a map.
◦ Over-segmentation problem
◦ It’s is caused by word segmentation of phoneme-recognition results
including errors.
[Taniguchi 16] Taniguchi, A et.al. Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and
Unsupervised Word Discovery from Spoken Sentences, IEEE TCDS, Vol. 8, No. 4, pp. 285–297 (2016)
This place is the
laboratory.
|dis|pu|rai|su|iz
a|ra|bora|to|ri|
???
5
6. Research purpose
Mobile robots learn spatial concepts, a lexicon, and an environmental
map incrementally from interaction with an environment and human,
even in an unknown environment without prior knowledge.
6
7. The proposed method : SpCoSLAM
This model integrates multimodal place categorization,
lexical acquisition and SLAM as one Bayesian generative model.
7Gray nodes indicate observation variables.
1tx
xt
xt1
tz
tu
tC
ti
l
00 ,m
00 ,V
1tz
1tu
1tz
1tu
LMtyAM
tS
ltf
∞
∞
m
lW
k
k
8. The proposed method : SpCoSLAM
This model integrates multimodal place categorization,
lexical acquisition and SLAM as one Bayesian generative model.
8Gray nodes indicate observation variables.
1tx
xt
xt1
tz
tu
tC
ti
l
00 ,m
00 ,V
1tz
1tu
1tz
1tu
LMtyAM
tS
ltf
∞
∞
m
lW
k
k
Simultaneous localization and mapping (SLAM)
Position
distribution
(Gaussian
distribution)
Nonparametric Bayesian
multimodal place categorization
Index of place
Image feature
Words
Lexical acquisition (Speech recognition and word segmentation)
9. FastSLAM and SpCoSLAM
Simultaneous Localization And Mapping (SLAM)
◦ FastSLAM has realized an on-line algorithm for efficient
self-localization and mapping using a Rao-Blackwellized
particle filter (RBPF) [Grisetti 05] .
Online learning algorithm of SpCoSLAM
◦ The online learning algorithm can be derived by
introducing sequential update equations for estimating
the parameters of the spatial concepts into the
formulation of FastSLAM based on RBPF.
9
[Grisetti 05] G. Grisetti, C. Stachniss, and W. Burgard, “Improving grid-based SLAM with Rao-Blackwellized
particle filters by adaptive proposals and selective resampling,” in Proceedings of ICRA, 2005.
10. FastSLAM and SpCoSLAM
FastSLAM
SpCoSLAM
Self-
position map
Control
data
Sensor
data
Latent variables
Model parameters
Hyperprameters
Language model
Acoustic model
Speech signal
Image feature
LM is updated. Model parameters
are updated.
10
Rao-Blackwellized
particle filter (RBPF)
11. Online learning algorithm
of SpCoSLAM
2. Calculating the proposal
distribution of FastSLAM 2.0
3. Word segmentation, sampling
latent variables, and calculating
weights
6. Updating a language model
7. Resampling of particles
4. mapping
5. Estimation of parameters of
spatial concepts
1. Speech recognition
11
12. Experiment I : Online learning
We performed experiments for online learning of spatial
concepts in a novel environment.
[1] latticelm: http://www.phontron.com/latticelm/
[2] The robotics data set repository (radish): http://radish.sourceforge.net/
Conditions
12
Middleware Robot Operating System (ROS) indigo
Speech recognition
system
Julius dictation-kit-v4.3.1-linux (GMM-HMM decoding),
Japanese syllable dictionary
Word segmentation
system
latticelm [1]
(WFST-based word segmentation system)
Image feature extractor Caffe (CNN model of Places205-AlexNet)
Dataset Robotics Data Set Repository (Radish) [2]
albert-b-laser-vision by Cyrill Stachniss
• Rosbag file (odometry, depth, image data)
Speech data 50 sentences including 10 types of various phrases
WFST:Weighted Finite-State Transducer
13. Experiment I : Online learning
13
Video: https://youtu.be/hVKQCdbRQVM
14. Experiment I : Online learning
1
2
3 4
5
6
Step 15
1
23
4
5
6
7
8
Step 30
1
23
4
5
6
78
9
10
Step 50
Position distribution: 6 Position distribution: 1 Position distribution: 8
14
Correct: /ikidomari/
(The end of corridor)
Estimated word:
/ikidomaekidayao/
Estimated word:
/kyooyuusehi/
Correct: /kyouyuuseki/
(Sharing desk)
Estimated words:
/upuriNpabeyatarero/
/izaridokourodayo/
Correct1: /puriNtaabeya/
Correct2: /daidokoro/
(Printer room, kitchen)
Words are estimated by
15. Experiment I : Online learning
We compare the performance of four methods as
follows:
(A) SpCoSLAM (The proposed method)
(B) Online SpCoA based on RBPF
(C) Online SpCoA
(D) SpCoA (Batch learning) [Taniguchi 16]
[Taniguchi 16] Taniguchi, A et.al. Spatial Concept Acquisition for a Mobile Robot that Integrates
Self-Localization and Unsupervised Word Discovery from Spoken Sentences, IEEE Transactions on Cognitive
and Developmental Systems, Vol. 8, No. 4, pp. 285–297 (2016)
The number of
particles:30
15
Methods (B), (C), and (D) based on SpCoA did not perform
the update of a language model and did not use image
features.
16. Experiment I : Online learning
16
We compare the performance of SpCoSLAM and SpCoA-based methods.
1tx
xt
xt1
tz
tu
tC
ti
l
00 ,m
00 ,V
1tz
1tu
1tz
1tu
LMtyAM
tS
ltf
∞
∞
m
lW
k
k
1tx
xt
xt1
tz
tu
tC
ti
l
00,m
00 ,V
1tz
1tu
1tz
1tu
tS
∞
∞
m
lW
k
k
SpCoA-based methodSpCoSLAM
SpCoA did not perform the update of a language model and did not use image features.
17. Evaluation I :
The estimated number of spatial concepts
Figures show the number of spatial concepts and the number
of position distributions by online learning.
True data was determined by an user based on teaching data.
SpCoSLAM was closer to the true data than other methods.
17
18. Evaluation II :
Word segmentation in the lexical acquisition
Figure shows the number of segmented words.
SpCoSLAM improved the over-segmentation problem by
updating the language model sequentially.
SpCoSLAM was
closer to the phrase
segmentation.
Over-segmentation
Morpheme: The morphological segmentation (using MeCab)
Phrase: The phrase segmentation (segmenting words only before and after
the name of the place.)
18
19. Evaluation II :
Word segmentation in the lexical acquisition
SpCoSLAM
SpCoA 19
SpCoSLAM
SpCoA
SpCoSLAM
SpCoA
(in English)
(in Japanese)
20. Experiment II :
Place recognition using a speech signal
When the user says “Go to **.”, the estimation of a target
position was calculated as follows:
SpCoSLAM showed the
highest overall evaluation
values of the online methods.
We calculated the place recognition rate (PRR) that the rate
of positions estimated within the correct area in the test data.
SpCoA (0.5)
20
21. Conclusion
We proposed an online learning method of spatial
concepts and an environmental map by a mobile
robot.
The proposed method integrated the spatial concept
acquisition into SLAM by an RBPF-based approach.
In the experiments, we conducted online learning in
a novel environment by the robot without a pre-
existing lexicon and map.
SpCoSLAM improved the performance of place recognition
using a speech signal in online learning methods.
SpCoSLAM improved over-segmentation problem in lexical
acquisition by updating the language model sequentially.
21THANK YOU FOR YOUR KIND ATTENTION.
Editor's Notes
Each talk is 15 minutes: 12 minutes of presentation + 3 minutes of discussion.
I’m Akira Taniguchi, in Ritsumeikan University, Japan.
I’d like to present about our research, “Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping”.
Win+P
First, research background.
Robots coexisting with humans and operating in various environments are required to adaptively learn the spatial concepts (place categories and a lexicon) while incrementally generating an environmental map.
However, spatial concepts are such that their target domain may be unclear and may differ according to the user and environment.
Therefore, it is difficult to manually design spatial concepts in advance, and it is desirable for robots to autonomously learn spatial concepts based on their own experiences.
We define spatial concept as the place category based on multimodal information.
Spatial concept includes word, place, and image information, like this.
Next, I will introduce our previous method Nonparametric Bayesian spatial concept acquisition method.
As the main features, (読む)
However, this method has some problems. As a problem, this method cannot learn spatial concepts from unknown environments without a map.
The robot needs to have a map generated by SLAM.
However, this method has some problems.
First is batch learning.
This method cannot learn spatial concepts from unknown environments without a map.
The robot needs to have a map generated by SLAM.
Second is Over-segmentation problem.
It is caused by word segmentation of phoneme-recognition results including errors.
Next is Research purpose.
The goal of this study is to develop Mobile robots learn spatial concepts, a lexicon, and an environmental map incrementally from interaction with an environment and human, even in an unknown environment without prior knowledge.
We propose an unsupervised Bayesian generative model and an online learning algorithm that can perform simultaneous learning of the spatial concepts and an environmental map from multimodal information.
This model integrates multimodal place categorization, lexical acquisition and SLAM as one Bayesian generative model.
This figure shows the graphical model representation of SpCoSLAM.
Blue part is SLAM, red part is position distribution represented by Gaussian mixture, Green part is multimodal place categorization of place, image feature, and words.
Orange part is lexical acquisition. speech recognition and word segmentation.
---
SpCoSLAM
Integrating SpCoA (place categorization and lexical acquisition) and SLAM (mapping) as one model
Using scene-image features
Updating the language model based on place information
It can learn incremental spatial concepts for unknown environments and unsearched regions without maps.
It can mutually complement the uncertainty of information by using multimodal information.
SLAM、Multimodal LDA, GMM、speech recognition, word segmentationをone model で表現した。
We propose an unsupervised Bayesian generative model and an online learning algorithm that can perform simultaneous learning of the spatial concepts and an environmental map from multimodal information.
This model integrates multimodal place categorization, lexical acquisition and SLAM as one Bayesian generative model.
This figure shows the graphical model representation of SpCoSLAM.
Blue part is SLAM, red part is position distribution represented by Gaussian mixture, Green part is multimodal place categorization of place, image feature, and words.
Orange part is lexical acquisition. speech recognition and word segmentation.
---
SpCoSLAM
Integrating SpCoA (place categorization and lexical acquisition) and SLAM (mapping) as one model
Using scene-image features
Updating the language model based on place information
It can learn incremental spatial concepts for unknown environments and unsearched regions without maps.
It can mutually complement the uncertainty of information by using multimodal information.
SLAM、Multimodal LDA, GMM、speech recognition, word segmentationをone model で表現した。
(次のスライドを映しながらこれを読むのもいいかも)
The formulation of SLAM is the probability distribution of self-position x and map m given control data u and sensor data z.
FastSLAM has realized an online algorithm for efficient self-localization and mapping using a Rao-Blackwellized particle filter (RBPF).
And, this is the formulation of SpCoSLAM.
The online learning algorithm introduced sequential update formulation for estimating the parameters of the spatial concepts into the formulation of FastSLAM based on RBPF.
The joint posterior distribution can be factorized to the probability distributions of updating a language model, mapping, updating model parameters and the joint distribution of self-position and latent variables. This part is estimated by the particle filter.
I introduce the overview of the online learning algorithm.
First is speech recognition.
Second is Calculating the proposal distribution of FastSLAM.
3rd is Word segmentation, sampling latent variables, and calculating weights.
4th is mapping.
5th is Estimation of parameters of spatial concepts.
6th is Updating a language model.
7th is Resampling of particles.
Blue areas are same to FastSLAM.
Orange areas are original part in this work.
---
Please check proceedings for details of algorithm and formulation.
In the Experiment I : Online learning,
We performed experiments for online learning of spatial concepts in a novel environment.
This table shows experimental conditions.
(表を読む)
This video is visualization of online learning.
The lower right is the robot camera image.
The black dot is the robot position.
These circles represent the position distributions of the spatial concepts.
This result shows the robot can learn the spatial concepts while mapping.
---
maximum 1:15
This is a summary of online learning results,
Figure shows the position distributions in the map (at steps 15, 30, and 50).
And, Bottom is examples of the estimated words on each position distribution.
For example, in position distribution 1, the correct word is kyouyuuseki in Japanese. It’s sharing desk in English.
The estimated word is kyooyuusehi.
---
The upper part of this figure shows an example of the image corresponding to each position distribution, the correct phoneme sequence of the name of the place, and the best word of the probability value estimated by the probability distribution. at step t.
As a result, figure shows how the spatial concepts are acquired while sequentially mapping.
We compare the performance of four methods as follows:
(A) SpCoSLAM (The proposed method)
(B) Online SpCoA based on RBPF
(C) Online SpCoA
(D) SpCoA (Batch learning)
Methods (B), (C), and (D) based on SpCoA did not perform the update of a language model and did not use image features.
We compare the performance of SpCoSLAM and SpCoA-based methods.
SpCoA did not perform the update of a language model and did not use image features.
Evaluation I
Figures show the average of the number of spatial concepts and the number of position distributions in 10 trials by online learning.
True data was determined by a user based on teaching data.
SpCoSLAM was closer to the true data than other methods.
Evaluation II
Figure shows the average value of the number of segmented words.
SpCoA-based methods are over-segmentation.
However, SpCoSLAM was closer to the phrase segmentation.
Phrase segmentation is segmenting words only before and after the name of the place.
---
The morphological segmentation (purple line) was suitably segmented into Japanese morphemes using MeCab, which is an off-the-shelf Japanese morphological analyzer that is widely used for natural language processing.
The phrase segmentation (yellow line) was the number of words in the case of segmenting words only before and after the name of the place, i.e., we assume that a phrase other than the name of the place is one word.
This table shows the example of word segmentation results.
slash is a word segment point.
SpCoSLAM improved the over-segmentation problem by updating the language model sequentially.
Experiment II : Place recognition using a speech signal
When the user says “Go to **.”, the estimation of a target position was calculated by this equation.
We calculated the place recognition rate (PRR) that the rate of positions estimated within the correct area in the test data.
SpCoSLAM showed the highest overall evaluation values of the online methods.
---
The experimental results show that the robot was able to more accurately learn the relationships between words and the position in the map incrementally by using SpCoSLAM.
(時間がない場合) This is conclusion. Thank you for king attention.
(時間がある場合) This is conclusion. (読む) Thank you for king attention.