The document provides an overview of a vision-based place recognition project for autonomous robots. [1] The project uses computer vision techniques like feature extraction and classification to allow robots to recognize their position without continuous human guidance. [2] It discusses how the project relates to fields like computer vision and robotic localization. [3] The framework includes steps like sensing images, extracting features, training classifiers, and recognizing places based on images' features.
A study on the importance of image processing and its apllicationseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This document discusses computer vision and robot vision. It describes early work using artificial neural networks to allow a robot to steer a vehicle based on camera images (ALVINN system). The document outlines the two main stages of robot vision: image processing and scene analysis. Image processing transforms raw images, e.g. through averaging, edge enhancement, and region finding algorithms. Scene analysis extracts task-specific information by interpreting lines, curves, and applying model-based approaches to reconstruct scenes from primitive 3D objects. Stereo vision obtains depth information through triangulation using two camera images.
Computer Based Human Gesture Recognition With Study Of AlgorithmsIOSR Journals
This document discusses computer-based human gesture recognition algorithms. It begins with an introduction to gesture recognition and its uses in human-computer interaction. It then describes two main approaches to gesture recognition: appearance-based and 3D model-based. For appearance-based recognition, it discusses active appearance models and histogram-of-motion words. For 3D model-based recognition, it discusses using 3D image data to achieve invariance to viewpoint. It also discusses representing gestures as sequences of motion primitives to achieve viewpoint independence. Finally, it discusses skeletal algorithms that represent body pose as joint configurations and angles.
GOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape AssessmentTomohiro Fukuda
This slide is presented in CMC2012 (2012 4th International Conference on
Communications, Mobility, and Computing).
Abstract. This research presents the development of a mobile AR system which realizes geometric consistency
using GIS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is developed.
Geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
This document discusses object detection techniques for images. It begins with an introduction to image processing and object detection. Object detection aims to find instances of real-world objects in images and is used in applications like surveillance and automotive safety. The document then reviews several papers on related topics, including object removal from videos, visual surveillance systems for tracking targets, and augmented reality object compositing. It proposes using skull detection and region growing techniques to detect particular objects while removing shadows and other background objects for improved detection. Region growing would detect the object, while skull detection removes it from the image to analyze the object separately. In summary, the document reviews object detection methods and proposes a hybrid approach using skull detection and region growing.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
Matching Sketches with Digital Face Images using MCWLD and Image Moment Invar...iosrjce
Face recognition is an important problem in many application domains. Matching sketches with
digital face image is important in solving crimes and capturing criminals. It is a computer application for
automatically identifying a person from a still image. Law enforcement agencies are progressively using
composite sketches and forensic sketches for catching the criminals. This paper presents two algorithms that
efficiently retrieve the matched results. First method uses multiscale circular Weber’s local descriptor to encode
more discriminative local micro patterns from local regions. Second method uses image moments, it extracts
discriminative shape, orientation, and texture features from local regions of a face. The discriminating
information from both sketch and digital image is compared using appropriate distance measure. The
contributions of this research paper are: i) Comparison of multiscale circular Weber’s local descriptor with
image moment for matching sketch to digital image, ii) Analysis of these algorithms on viewed face sketch,
forensic face sketch and composite face sketch databases
A study on the importance of image processing and its apllicationseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This document discusses computer vision and robot vision. It describes early work using artificial neural networks to allow a robot to steer a vehicle based on camera images (ALVINN system). The document outlines the two main stages of robot vision: image processing and scene analysis. Image processing transforms raw images, e.g. through averaging, edge enhancement, and region finding algorithms. Scene analysis extracts task-specific information by interpreting lines, curves, and applying model-based approaches to reconstruct scenes from primitive 3D objects. Stereo vision obtains depth information through triangulation using two camera images.
Computer Based Human Gesture Recognition With Study Of AlgorithmsIOSR Journals
This document discusses computer-based human gesture recognition algorithms. It begins with an introduction to gesture recognition and its uses in human-computer interaction. It then describes two main approaches to gesture recognition: appearance-based and 3D model-based. For appearance-based recognition, it discusses active appearance models and histogram-of-motion words. For 3D model-based recognition, it discusses using 3D image data to achieve invariance to viewpoint. It also discusses representing gestures as sequences of motion primitives to achieve viewpoint independence. Finally, it discusses skeletal algorithms that represent body pose as joint configurations and angles.
GOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape AssessmentTomohiro Fukuda
This slide is presented in CMC2012 (2012 4th International Conference on
Communications, Mobility, and Computing).
Abstract. This research presents the development of a mobile AR system which realizes geometric consistency
using GIS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is developed.
Geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
This document discusses object detection techniques for images. It begins with an introduction to image processing and object detection. Object detection aims to find instances of real-world objects in images and is used in applications like surveillance and automotive safety. The document then reviews several papers on related topics, including object removal from videos, visual surveillance systems for tracking targets, and augmented reality object compositing. It proposes using skull detection and region growing techniques to detect particular objects while removing shadows and other background objects for improved detection. Region growing would detect the object, while skull detection removes it from the image to analyze the object separately. In summary, the document reviews object detection methods and proposes a hybrid approach using skull detection and region growing.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
Matching Sketches with Digital Face Images using MCWLD and Image Moment Invar...iosrjce
Face recognition is an important problem in many application domains. Matching sketches with
digital face image is important in solving crimes and capturing criminals. It is a computer application for
automatically identifying a person from a still image. Law enforcement agencies are progressively using
composite sketches and forensic sketches for catching the criminals. This paper presents two algorithms that
efficiently retrieve the matched results. First method uses multiscale circular Weber’s local descriptor to encode
more discriminative local micro patterns from local regions. Second method uses image moments, it extracts
discriminative shape, orientation, and texture features from local regions of a face. The discriminating
information from both sketch and digital image is compared using appropriate distance measure. The
contributions of this research paper are: i) Comparison of multiscale circular Weber’s local descriptor with
image moment for matching sketch to digital image, ii) Analysis of these algorithms on viewed face sketch,
forensic face sketch and composite face sketch databases
SOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENTTomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. This research presents the development of a sensor oriented mobile AR system which realizes geometric consistency using GPS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is realized. Consistency of the viewing angle of a video camera and a CG virtual camera, and geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
IRJET- A Vision based Hand Gesture Recognition System using Convolutional...IRJET Journal
This document describes a vision-based hand gesture recognition system using convolutional neural networks. The system captures images of hand gestures using a camera, pre-processes the images, and classifies the gestures using a CNN model. The CNN architecture includes convolutional layers, max pooling layers, dropout layers, and fully connected layers. The system was trained on a dataset of images representing 7 different hand gestures. Testing achieved over 90% accuracy in recognizing the gestures. This vision-based approach allows for natural human-computer interaction without physical devices.
Gesture Based Interface Using Motion and Image Comparisonijait
This paper gives a new approach for movement of mouse and implementation of its functions using a real time camera. Here we propose to change the hardware design. Most of the existing technologies mainly depend on changing the mouse parts features like changing the position of tracking ball and adding more buttons. We use a camera, colored substance, image comparison technology and motion detection technology to control mouse movement and implement its functions (right click, left click, scrolling and double click) .
Optimally Learnt, Neural Network Based Autonomous Mobile Robot Navigation SystemIDES Editor
Neural network based systems have been used in
past years for robot navigation applications because of their
ability to learn human expertise and to utilize this knowledge
to develop autonomous navigation strategies. In this paper,
neural based systems are developed for mobile robot reactive
navigation. The proposed systems transform sensors’ input to
yield wheel velocities. Novel algorithm is proposed for optimal
training of neural network. With a view to ascertain the efficacy
of proposed system; developed neural system’s performance
is compared to other neural and fuzzy based approaches.
Simulation results show effectiveness of proposed system in
all kind of obstacle environments.
DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...Tomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. The mobility of people's activities, and cloud computing technologies are becoming advanced in the modern age of information and globalisation. This study describes the availability of discussing spatial design while sharing a 3-dimensional virtual space with stakeholders in a distributed and synchronised environment. First of all, a townscape design support system based on a cloud computing type VR system is constructed. Next, an experiment of a distributed and synchronised discussion of townscape design is executed with subjects who are specialists in the townscape design field. After the experiment, both qualitative mental evaluation and quantitative evaluation were carried out. The conclusions are as follows: 1. Users who use VR frequently and who use videoconferencing consider that the difference with face-to-face discussion is small. 2. A Moiré pattern may occur in a gradation picture. 3. The availability of distributed and synchronised discussions with cloud computing type VR is high.
IEEE EED2021 AI use cases in Computer VisionSAMeh Zaghloul
AI Use Cases in Computer Vision
Introduction and Overview about AI Use Cases in Computer Vision, to answer a basic question: “How Machines See?”, covering Neural Networks, Object detection and recognition, Content-based image retrieval, Object tracking, Image restoration, Scene reconstruction, Computer Vision Tools, Frameworks, Pretrained Models, and Public Train/Test Datasets.
With real-project examples on using Computer Vision in Egyptian Hieroglyph Alphabet recognition, Face Recognition/Matching, in addition to hands-on interactive session on Object/Image Tagging/Annotation on Videos/Images to prepare model training dataset.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IRJET- Detection and Recognition of Text for Dusty Image using Long Short...IRJET Journal
The document describes a proposed system for detecting and recognizing text in dusty images. The system has two main modules: 1) preprocessing and enhancement of the dusty input image, and 2) detection and recognition of text regions. The preprocessing and enhancement uses median filtering, color space conversion to Lab color space, and contrast limited adaptive histogram equalization (CLAHE) to improve the appearance and visibility of text in the dusty image. The second module then detects text regions using maximally stable extremal regions (MSER) and recognizes characters using a long short-term memory (LSTM) neural network, which is well-suited for this challenging task.
This document describes a face recognition technique that uses a hybrid of principal component analysis (PCA) and an artificial neural network. PCA is used to extract global features of the entire face and local features of the eyes, nose, and mouth regions. These features are used as inputs to an artificial neural network for training and testing. The technique aims to leverage both global and local features for face recognition while reducing computation time compared to local-feature-only approaches.
Computer vision analyzes visual data like images and videos to understand and interpret them similarly to humans. It works by training models on large datasets to recognize patterns and classify objects. Applications include face recognition for login, medical imaging analysis, and computer vision in autonomous vehicles. The future of computer vision may involve combining it with natural language processing for image captioning and visual assistance applications.
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Kalle
The portability of an eye tracking system encourages us to develop a technique for estimating 3D point-of-regard. Unlike conventional methods, which estimate the position in the 2D image coordinates of the mounted camera, such a technique can represent richer gaze information of the human moving in the larger area. In this paper, we propose a method for estimating the 3D point-of-regard and a visualization technique of gaze trajectories under natural head movements for the head-mounted device. We employ visual SLAM technique to estimate head configuration and extract environmental information. Even in cases where the head moves dynamically, the proposed method could obtain 3D point-of-regard. Additionally, gaze trajectories are appropriately overlaid on the scene camera image.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document discusses a proposed spatial data mining system called SD-Miner. SD-Miner consists of three main parts: a graphical user interface, a data mining module, and a data storage module. The data mining module provides major spatial data mining functionalities like spatial clustering, classification, characterization, and spatio-temporal association rule mining. SD-Miner aims to allow intuitive and effective use of both spatial and non-spatial data mining functionalities. It also provides spatial data mining functions as libraries for convenient use in applications.
Human Segmentation Using Haar-ClassifierIJERA Editor
Segmentation is an important process in many aspects of multimedia applications. Fast and perfect segmentation of moving objects in video sequences is a basic task in many computer visions and video investigation applications. Particularly Human detection is an active research area in computer vision applications. Segmentation is very useful for tracking and recognition the object in a moving clip. The motion segmentation problem is studied and reviewed the most important techniques. We illustrate some common methods for segmenting the moving objects including background subtraction, temporal segmentation and edge detection. Contour and threshold are common methods for segmenting the objects in moving clip. These methods are widely exploited for moving object segmentation in many video surveillance applications, such as traffic monitoring, human motion capture. In this paper, Haar Classifier is used to detect humans in a moving video clip some features like face detection, eye detection, full body, upper body and lower body detection.
IRJET - A Review on Face Recognition using Deep Learning AlgorithmIRJET Journal
This document provides an overview of face recognition using deep learning algorithms. It discusses how deep learning approaches like convolutional neural networks (CNNs) have achieved high accuracy in face recognition tasks compared to earlier methods. CNNs can learn discriminative face features from large datasets during training to generalize to new images, handling variations in pose, illumination and expression. The document reviews popular CNN architectures and training approaches for face recognition. It also discusses other traditional face recognition methods like PCA and LDA, and compares their performance to deep learning methods.
Markerless motion capture for 3D human model animation using depth cameraTELKOMNIKA JOURNAL
3D animation is created using keyframe based system in 3D animation software such as Blender and Maya. Due to the long time interval and the need of high expertise in 3D animation, motion capture devices were used as an alternative and Microsoft Kinect v2 sensor is one of them. This research analyses the capabilities of the Kinect sensor in producing 3D human model animations using motion capture and keyframe based animation system in reference to a live motion performance. The quality, time interval and cost of both animation results were compared. The experimental result shows that motion capture system with Kinect sensor consumed less time (only 2.6%) and cost (30%) in the long run (10 minutes of animation) compare to keyframe-based system, but it produced lower quality animation. This was due to the lack of body detection accuracy when there is obstruction. Moreover, the sensor’s constant assumption that the performer’s body faces forward made it unreliable to be used for a wide variety of movements. Furthermore, standard test defined in this research covers most body parts’ movements to evaluate other motion capture system.
The document provides an overview of computer vision including:
- It defines computer vision as using observed image data to infer something about the world.
- It briefly discusses the history of computer vision from early projects in 1966 to David Marr establishing the foundations of modern computer vision in the 1970s.
- It lists several related fields that computer vision draws from including artificial intelligence, information engineering, neurobiology, solid-state physics, and signal processing.
- It provides examples of applications of computer vision such as self-driving vehicles, facial recognition, augmented reality, and uses in smartphones, the web, VR/AR, medical imaging, and insurance.
The document discusses human activity recognition from video data using computer vision techniques. It describes recognizing activities at different levels from object locations to full activities. Basic activities like walking and clapping are the focus. Key steps involve tracking segmented objects across frames and comparing motion patterns to templates to identify activities through model fitting. The DEV8000 development kit and Linux are used to process video and recognize activities in real-time. Applications discussed include surveillance, sports analysis, and unmanned vehicles.
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
Today, the surveillance systems and other monitoring systems are considering the capturing of image sequences in a single frame. The captured images can be combined to get the mosaiced image or combined image sequence. But the captured image may have quality issues like brightness issue, alignment issue (correlation issue), resolution issue, manual image registration issue etc. The existing technique like cross correlation can offer better image mosaicing but faces brightness issue in mosaicing. Thus, this paper introduces two different methods for mosaicing i.e., (a) Sliding Window Module (SWM) based Color Image Mosaicing (CIM) and (b) Discrete Cosine Transform (DCT) based CIM on Field Programmable Gate Array (FPGA). The SWM based CIM adopted for corner detection of two images and perform the automatic image registration while DCT based CIM aligns both the local as well as global alignment of images by using phase correlation approach. Finally, these two methods performances are analyzed by comparing with parameters like PSNR, MSE, device utilization and execution time. From the analysis it is concluded that the DCT based CIM can offers significant results than SWM based CIM.
1) The document presents a real-time static hand gesture recognition system for the Devanagari number system using two feature extraction techniques: Discrete Cosine Transform (DCT) and Edge Oriented Histogram (EOH).
2) The system captures an image using a webcam, performs pre-processing, extracts the region of interest, then extracts features using DCT or EOH before matching against a training database to recognize the gesture.
3) An experiment tested 20 images and found DCT achieved a higher recognition accuracy of 18 gestures compared to 15 for EOH.
Text and Object Recognition using Deep Learning for Visually Impaired Peopleijtsrd
This document presents a system to aid visually impaired people using object and text detection with deep learning. Object detection is performed using a convolutional neural network trained on datasets like MS-COCO and PASCAL VOC. Text detection uses a fully convolutional neural network. Detected objects and text are converted to speech using a text-to-speech synthesizer to help visually impaired users understand their surroundings. The system achieves real-time object detection and can detect multiple objects and text in an image with reasonable accuracy depending on lighting and other conditions.
The document is a seminar report on the topic of computer vision. It provides an overview of computer vision, discussing its history, applications in various fields like robotics, medicine, security, transportation, and human-computer interaction. It also describes the challenges of computer vision as an inverse problem and provides examples of computer vision systems used in domains like industrial automation, image databases, and more.
Computer vision is a field of artificial intelligence that uses computer hardware and software to analyze visual images and videos. The goal is to make useful decisions based on sensed images by understanding objects and scenes. Computer vision combines knowledge from fields like computer science, electrical engineering, mathematics, biology and cognitive science. It focuses on extracting useful information from images like detecting and identifying faces, recovering 3D geometry, and tracking motion. Computer vision has applications in manufacturing, city planning, entertainment, forensics and more.
SOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENTTomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. This research presents the development of a sensor oriented mobile AR system which realizes geometric consistency using GPS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is realized. Consistency of the viewing angle of a video camera and a CG virtual camera, and geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
IRJET- A Vision based Hand Gesture Recognition System using Convolutional...IRJET Journal
This document describes a vision-based hand gesture recognition system using convolutional neural networks. The system captures images of hand gestures using a camera, pre-processes the images, and classifies the gestures using a CNN model. The CNN architecture includes convolutional layers, max pooling layers, dropout layers, and fully connected layers. The system was trained on a dataset of images representing 7 different hand gestures. Testing achieved over 90% accuracy in recognizing the gestures. This vision-based approach allows for natural human-computer interaction without physical devices.
Gesture Based Interface Using Motion and Image Comparisonijait
This paper gives a new approach for movement of mouse and implementation of its functions using a real time camera. Here we propose to change the hardware design. Most of the existing technologies mainly depend on changing the mouse parts features like changing the position of tracking ball and adding more buttons. We use a camera, colored substance, image comparison technology and motion detection technology to control mouse movement and implement its functions (right click, left click, scrolling and double click) .
Optimally Learnt, Neural Network Based Autonomous Mobile Robot Navigation SystemIDES Editor
Neural network based systems have been used in
past years for robot navigation applications because of their
ability to learn human expertise and to utilize this knowledge
to develop autonomous navigation strategies. In this paper,
neural based systems are developed for mobile robot reactive
navigation. The proposed systems transform sensors’ input to
yield wheel velocities. Novel algorithm is proposed for optimal
training of neural network. With a view to ascertain the efficacy
of proposed system; developed neural system’s performance
is compared to other neural and fuzzy based approaches.
Simulation results show effectiveness of proposed system in
all kind of obstacle environments.
DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...Tomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. The mobility of people's activities, and cloud computing technologies are becoming advanced in the modern age of information and globalisation. This study describes the availability of discussing spatial design while sharing a 3-dimensional virtual space with stakeholders in a distributed and synchronised environment. First of all, a townscape design support system based on a cloud computing type VR system is constructed. Next, an experiment of a distributed and synchronised discussion of townscape design is executed with subjects who are specialists in the townscape design field. After the experiment, both qualitative mental evaluation and quantitative evaluation were carried out. The conclusions are as follows: 1. Users who use VR frequently and who use videoconferencing consider that the difference with face-to-face discussion is small. 2. A Moiré pattern may occur in a gradation picture. 3. The availability of distributed and synchronised discussions with cloud computing type VR is high.
IEEE EED2021 AI use cases in Computer VisionSAMeh Zaghloul
AI Use Cases in Computer Vision
Introduction and Overview about AI Use Cases in Computer Vision, to answer a basic question: “How Machines See?”, covering Neural Networks, Object detection and recognition, Content-based image retrieval, Object tracking, Image restoration, Scene reconstruction, Computer Vision Tools, Frameworks, Pretrained Models, and Public Train/Test Datasets.
With real-project examples on using Computer Vision in Egyptian Hieroglyph Alphabet recognition, Face Recognition/Matching, in addition to hands-on interactive session on Object/Image Tagging/Annotation on Videos/Images to prepare model training dataset.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IRJET- Detection and Recognition of Text for Dusty Image using Long Short...IRJET Journal
The document describes a proposed system for detecting and recognizing text in dusty images. The system has two main modules: 1) preprocessing and enhancement of the dusty input image, and 2) detection and recognition of text regions. The preprocessing and enhancement uses median filtering, color space conversion to Lab color space, and contrast limited adaptive histogram equalization (CLAHE) to improve the appearance and visibility of text in the dusty image. The second module then detects text regions using maximally stable extremal regions (MSER) and recognizes characters using a long short-term memory (LSTM) neural network, which is well-suited for this challenging task.
This document describes a face recognition technique that uses a hybrid of principal component analysis (PCA) and an artificial neural network. PCA is used to extract global features of the entire face and local features of the eyes, nose, and mouth regions. These features are used as inputs to an artificial neural network for training and testing. The technique aims to leverage both global and local features for face recognition while reducing computation time compared to local-feature-only approaches.
Computer vision analyzes visual data like images and videos to understand and interpret them similarly to humans. It works by training models on large datasets to recognize patterns and classify objects. Applications include face recognition for login, medical imaging analysis, and computer vision in autonomous vehicles. The future of computer vision may involve combining it with natural language processing for image captioning and visual assistance applications.
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Kalle
The portability of an eye tracking system encourages us to develop a technique for estimating 3D point-of-regard. Unlike conventional methods, which estimate the position in the 2D image coordinates of the mounted camera, such a technique can represent richer gaze information of the human moving in the larger area. In this paper, we propose a method for estimating the 3D point-of-regard and a visualization technique of gaze trajectories under natural head movements for the head-mounted device. We employ visual SLAM technique to estimate head configuration and extract environmental information. Even in cases where the head moves dynamically, the proposed method could obtain 3D point-of-regard. Additionally, gaze trajectories are appropriately overlaid on the scene camera image.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document discusses a proposed spatial data mining system called SD-Miner. SD-Miner consists of three main parts: a graphical user interface, a data mining module, and a data storage module. The data mining module provides major spatial data mining functionalities like spatial clustering, classification, characterization, and spatio-temporal association rule mining. SD-Miner aims to allow intuitive and effective use of both spatial and non-spatial data mining functionalities. It also provides spatial data mining functions as libraries for convenient use in applications.
Human Segmentation Using Haar-ClassifierIJERA Editor
Segmentation is an important process in many aspects of multimedia applications. Fast and perfect segmentation of moving objects in video sequences is a basic task in many computer visions and video investigation applications. Particularly Human detection is an active research area in computer vision applications. Segmentation is very useful for tracking and recognition the object in a moving clip. The motion segmentation problem is studied and reviewed the most important techniques. We illustrate some common methods for segmenting the moving objects including background subtraction, temporal segmentation and edge detection. Contour and threshold are common methods for segmenting the objects in moving clip. These methods are widely exploited for moving object segmentation in many video surveillance applications, such as traffic monitoring, human motion capture. In this paper, Haar Classifier is used to detect humans in a moving video clip some features like face detection, eye detection, full body, upper body and lower body detection.
IRJET - A Review on Face Recognition using Deep Learning AlgorithmIRJET Journal
This document provides an overview of face recognition using deep learning algorithms. It discusses how deep learning approaches like convolutional neural networks (CNNs) have achieved high accuracy in face recognition tasks compared to earlier methods. CNNs can learn discriminative face features from large datasets during training to generalize to new images, handling variations in pose, illumination and expression. The document reviews popular CNN architectures and training approaches for face recognition. It also discusses other traditional face recognition methods like PCA and LDA, and compares their performance to deep learning methods.
Markerless motion capture for 3D human model animation using depth cameraTELKOMNIKA JOURNAL
3D animation is created using keyframe based system in 3D animation software such as Blender and Maya. Due to the long time interval and the need of high expertise in 3D animation, motion capture devices were used as an alternative and Microsoft Kinect v2 sensor is one of them. This research analyses the capabilities of the Kinect sensor in producing 3D human model animations using motion capture and keyframe based animation system in reference to a live motion performance. The quality, time interval and cost of both animation results were compared. The experimental result shows that motion capture system with Kinect sensor consumed less time (only 2.6%) and cost (30%) in the long run (10 minutes of animation) compare to keyframe-based system, but it produced lower quality animation. This was due to the lack of body detection accuracy when there is obstruction. Moreover, the sensor’s constant assumption that the performer’s body faces forward made it unreliable to be used for a wide variety of movements. Furthermore, standard test defined in this research covers most body parts’ movements to evaluate other motion capture system.
The document provides an overview of computer vision including:
- It defines computer vision as using observed image data to infer something about the world.
- It briefly discusses the history of computer vision from early projects in 1966 to David Marr establishing the foundations of modern computer vision in the 1970s.
- It lists several related fields that computer vision draws from including artificial intelligence, information engineering, neurobiology, solid-state physics, and signal processing.
- It provides examples of applications of computer vision such as self-driving vehicles, facial recognition, augmented reality, and uses in smartphones, the web, VR/AR, medical imaging, and insurance.
The document discusses human activity recognition from video data using computer vision techniques. It describes recognizing activities at different levels from object locations to full activities. Basic activities like walking and clapping are the focus. Key steps involve tracking segmented objects across frames and comparing motion patterns to templates to identify activities through model fitting. The DEV8000 development kit and Linux are used to process video and recognize activities in real-time. Applications discussed include surveillance, sports analysis, and unmanned vehicles.
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
Today, the surveillance systems and other monitoring systems are considering the capturing of image sequences in a single frame. The captured images can be combined to get the mosaiced image or combined image sequence. But the captured image may have quality issues like brightness issue, alignment issue (correlation issue), resolution issue, manual image registration issue etc. The existing technique like cross correlation can offer better image mosaicing but faces brightness issue in mosaicing. Thus, this paper introduces two different methods for mosaicing i.e., (a) Sliding Window Module (SWM) based Color Image Mosaicing (CIM) and (b) Discrete Cosine Transform (DCT) based CIM on Field Programmable Gate Array (FPGA). The SWM based CIM adopted for corner detection of two images and perform the automatic image registration while DCT based CIM aligns both the local as well as global alignment of images by using phase correlation approach. Finally, these two methods performances are analyzed by comparing with parameters like PSNR, MSE, device utilization and execution time. From the analysis it is concluded that the DCT based CIM can offers significant results than SWM based CIM.
1) The document presents a real-time static hand gesture recognition system for the Devanagari number system using two feature extraction techniques: Discrete Cosine Transform (DCT) and Edge Oriented Histogram (EOH).
2) The system captures an image using a webcam, performs pre-processing, extracts the region of interest, then extracts features using DCT or EOH before matching against a training database to recognize the gesture.
3) An experiment tested 20 images and found DCT achieved a higher recognition accuracy of 18 gestures compared to 15 for EOH.
Text and Object Recognition using Deep Learning for Visually Impaired Peopleijtsrd
This document presents a system to aid visually impaired people using object and text detection with deep learning. Object detection is performed using a convolutional neural network trained on datasets like MS-COCO and PASCAL VOC. Text detection uses a fully convolutional neural network. Detected objects and text are converted to speech using a text-to-speech synthesizer to help visually impaired users understand their surroundings. The system achieves real-time object detection and can detect multiple objects and text in an image with reasonable accuracy depending on lighting and other conditions.
The document is a seminar report on the topic of computer vision. It provides an overview of computer vision, discussing its history, applications in various fields like robotics, medicine, security, transportation, and human-computer interaction. It also describes the challenges of computer vision as an inverse problem and provides examples of computer vision systems used in domains like industrial automation, image databases, and more.
Computer vision is a field of artificial intelligence that uses computer hardware and software to analyze visual images and videos. The goal is to make useful decisions based on sensed images by understanding objects and scenes. Computer vision combines knowledge from fields like computer science, electrical engineering, mathematics, biology and cognitive science. It focuses on extracting useful information from images like detecting and identifying faces, recovering 3D geometry, and tracking motion. Computer vision has applications in manufacturing, city planning, entertainment, forensics and more.
The document provides an overview of a vision-based place recognition system for autonomous robots. It discusses the framework of such a system, including sensing, pre-processing, feature extraction, training, classification, and post-processing. Local feature extraction is a key component, involving local feature detection to identify interest points and local feature descriptors to build representations around those points. The system aims to recognize places using visual cues in order to enable robot localization.
Computer vision is a field that uses methods to process, analyze and understand images and visual data from the real world in order to produce decisions or symbolic information. The goal of computer vision is to automatically extract, analyze and understand useful information from single images or sequences of images to represent real-world objects, similar to how humans use their eyes and brain for vision. Computer vision involves image acquisition, processing, analysis, and comprehension stages to sense images, improve image quality, examine scenes to identify features, and understand objects and their relationships.
Color based image processing , tracking and automation using matlabKamal Pradhan
Image processing is a form of signal processing in which the input is an image, such as a photograph or video frame. The output of image processing may be either an image or, a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it. This project aims at processing the real time images captured by a Webcam for motion detection and Color Recognition and system automation using MATLAB programming.
In color based image processing we work with colors instead of object. Color provides powerful information for object recognition. A simple and effective recognition scheme is to represent and match images on the basis of color histograms.
Tracking refers to detection of the path of the color once the color based processing is done the color becomes the object to be tracked this can be very helpful in security purposes.
Automation refers to an automated system is any system that does not require human intervention. In this project I’ve automated the mouse that work with our gesture and do the desired tasks.
Computer vision is the automation of human visual perception to allow computers to analyze and understand digital images. The goal is to emulate the human visual system through techniques like deep learning. Computer vision involves image acquisition, processing, and analysis to interpret images beyond just recording them. It has applications in areas like object detection, facial recognition, medical imaging, and self-driving cars. While it provides advantages like unique customer experiences, it also raises privacy concerns regarding how the data used is collected and stored.
This document proposes a multi-view object tracking system using deep learning to track objects from multiple camera views. It uses the YOLO v3 algorithm to map segmented object groups between camera views to share knowledge. A two-pass regression framework is also presented for multi-view object tracking. Key steps include preprocessing images, extracting features, detecting and tracking objects between views using blob matching, and counting objects over time by maintaining tracks. The approach aims to improve object counting accuracy by exploiting information from multiple camera views.
Face Recognition Based on Image Processing in an Advanced Robotic SystemIRJET Journal
This document describes a face recognition system used to control a robotic system. The system works in two stages: first, face recognition is used to unlock the system by validating a user's face. Then, different navigation images are used to control the robot's motion. Face recognition is implemented using support vector machine (SVM), histogram of oriented gradients (HOG), and k-nearest neighbors (KNN) algorithms in MATLAB. The process is based on machine learning concepts where the system is trained in a supervised manner to recognize faces and control the robot.
A computer vision engineer at Connect Infosoft is a highly skilled professional who specializes in developing cutting-edge algorithms and systems that enable machines to understand and interpret visual information. By harnessing the power of artificial intelligence and deep learning, our computer vision engineers empower computers to "see" and process images and videos, just like humans do. They work on diverse applications, including object detection, image recognition, facial analysis, and augmented reality, among others.
Click here for more Details: https://www.connectinfosoft.com/artificial-intelligence-and-machine-learning-development-service/
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robotSudhakar Spartan
This document describes a vision-based obstacle detection module for a wheeled mobile robot. It uses a stereoscopic vision system to detect obstacles and build and update a map of the environment. The system extracts the ground surface from images using hue and detects obstacle edges using luminance. It then performs stereo matching on corresponding obstacle edge points to calculate depth and detect obstacles. The module is implemented using an FPGA to achieve high-performance real-time obstacle detection for safe robot navigation and dynamic map updating.
A Review On AI Vision Robotic Arm Using Raspberry PiAngela Shin
This document summarizes a research project that designed an artificial intelligence (AI) vision robotic arm using a Raspberry Pi microcontroller. The robotic arm has 6 degrees of freedom and is intended to perform multifunctional tasks like detecting, identifying, grasping, and repositioning objects. A computer vision system with a camera is used to recognize objects and their spatial positions to control the robotic arm's movement. The vision system is processed using the Raspberry Pi's computing power to recognize objects in real-time based on software commands. The study aims to interest and automate various axes of the manipulator to lift, carry and place objects as desired using integrated electric motors and a vision-based control system.
This document provides an overview of computer vision, including its history, challenges, and promising applications. It discusses how computer vision aims to model the human visual system from a biological perspective and build autonomous vision systems from an engineering perspective. The author outlines computer vision's difficult goal of matching the human visual system. Promising future applications mentioned include image databases, vision-based interfaces, virtual agents, and facial expression analysis.
Ramsundar Kalpagam Ganesan is a graduate student in Computer Engineering at ASU, with research interests in Computer Vision. He is currently working as a graduate researcher at the Interactive Robotics Lab, advised by Prof. Heni Ben Amor. His master's thesis involves developing an augmented projection system that can track objects and project information onto them using Computer Vision techniques. He has experience with various Computer Vision and machine learning projects involving facial expression recognition, obstacle detection, spike sorting, and more.
1. Computer vision involves processes like image acquisition, image processing, classification, recognition, and decision making to perform visual perception tasks automatically.
2. It uses techniques to estimate object features in images, measure geometry-related features of objects, and interpret geometric information.
3. Computer vision aims to emulate human vision, which involves the eye receiving an image, the brain interpreting it to understand the observed object and use that understanding for decisions.
This document provides an overview of computer imaging, which can be separated into digital image processing and computer vision. Digital image processing involves examining image data to solve problems and typically outputs images for human consumption, covering topics like image restoration, enhancement, and compression. Computer vision is intended to analyze images for computer use, outputting attributes rather than images, and covers topics like segmentation, recognition, and 3D reconstruction. The document outlines several applications of computer imaging in fields like medicine, security, and robotics, and discusses the current state of the art in areas like object recognition, medical imaging, and vision-based human-computer interaction.
Computer graphics involves the creation and manipulation of images through programming. There are four major operations in computer graphics: imaging, modeling, rendering, and animation. Computer graphics is used in many applications including computer-aided design, presentation graphics, computer art, entertainment, education and training, visualization, image processing, and graphical user interfaces.
Computer vision is the study and application of methods that allow computers to understand image content. The goal is to extract specific information from images for human or automated use, such as detecting cancerous cells or controlling an industrial robot. Computer vision relies on digital images as input and can involve single images, multiple images, videos, or 3D volumes. While early systems were programmed for specific tasks, machine learning is increasingly used. Computer vision draws from fields like artificial intelligence, signal processing, neurobiology, and mathematics. It involves tasks like recognition, motion analysis, and scene reconstruction. Typical computer vision systems include image acquisition, preprocessing, feature extraction, and registration. Applications include facial recognition, mobile robots, and more.
1. Ain Shams University,
Faculty of Computer & Information Sciences,
Egypt, Cairo
Vision-Based Place Recognition for Autonomous Robot
Survey (1): Project Overview
Ahmed Abd El-Fattah, Ahmed Saher, Mourad Aly and Yasser Hassan
Dr. Mohammed Abd El-Megged and Dr. Safaa Amin
Ain Shams University, Faculty of Computer and Information Science, Egypt, Cairo
Abstract
In the current survey, an overview of the project will be provided, what is the meaning of project’s title?
Where is the position of the project in SLAM? What is the location of the project in computer science field?
The survey includes also the project’s architecture, taxonomies of feature detectors, feature descriptors, and
classification algorithms.
2. Table of Contents
Abstract ......................................................................................................................................................... 1
1. Introduction ............................................................................................................................................... 3
2. Project’s position in computer science...................................................................................................... 3
3. Simultaneous Localization and Mapping “SLAM” .................................................................................. 4
3.1 Localization......................................................................................................................................... 4
4. Vision-Based Place Recognition for autonomous robots.......................................................................... 5
4.1. Autonomous Robots........................................................................................................................... 5
4.2. Vision Based ...................................................................................................................................... 5
4.3. Place Recognition .............................................................................................................................. 5
5.Framework of a vision-based place recognition system ............................................................................ 6
5.1. Sensing ............................................................................................................................................... 8
5.2. Pre-Processing.................................................................................................................................... 8
5.3. Feature Extraction .............................................................................................................................. 8
5.4. Training .............................................................................................................................................. 9
5.5. Optimization ...................................................................................................................................... 9
5.6. Classification...................................................................................................................................... 9
5.7. Post-Processing .................................................................................................................................. 9
6.Feature Extraction .................................................................................................................................... 10
6.1. Local Feature Detection Algorithms ................................................................................................ 11
6.1.1 Good features properties: ............................................................................................................. 11
6.2. Local Feature Descriptor Algorithms............................................................................................... 12
7. Classification........................................................................................................................................... 13
7.1 Supervised Learning: ........................................................................................................................ 13
7.1.1 Supervised Learning Steps: ........................................................................................................... 13
7.1.2 Problems in supervised learning ................................................................................................... 15
7.2 Various classifiers ............................................................................................................................. 15
7.2.1 Similarity (Template Matching).................................................................................................... 16
7.2.2 Probabilistic Classifiers................................................................................................................. 16
7.2.3 Decision Boundary Based Classifiers ........................................................................................... 16
7.3 Classifier Evaluation ......................................................................................................................... 16
a. Conclusion............................................................................................................................................... 16
b. Future Work ............................................................................................................................................ 16
c. References ............................................................................................................................................... 17
2
3. 1. Introduction
The project surveyed in this document is part of a SLAM project. SLAM is an acronym for Simultaneous
localization and mapping, which is a technique used by robots and autonomous vehicles to build up a map
within an unknown environment (without a prior knowledge), or to update a map within a known
environment (with a prior knowledge from a given map) while at the same time keeping track of their
current location. Both this document and project focus on vision-based place recognition for autonomous
robots. These robots will be able to recognize their position autonomously which means that it will be able
to perform desired tasks in unstructured environments without continuous human guidance.
2. Project’s position in computer science
Computer science may be divided into several fields. Our problem revolves mainly around the field of
computer vision. Computer vision is the science and technology of machines that see, where see in this case
means that the machine is able to extract information from an image that is necessary to solve some task.
This means that computer vision is the construction of explicit, meaningful descriptions of physical objects
from their images. The output of computer vision is a description or an interpretation or some quantitative
measurements of the structures in the 3D scene [1]. Image processing and pattern recognition are among
many techniques computer vision employs to achieve its goals as shown in Fig. 2.0.1.
Pattern Recognition
Signal Processing
Image Processing
Computer
Physics Vision
Artificial Inteligince
Mathematics
Fig. 2.0.1 Image Processing and Pattern Recognition techniques computer vision employs to achieve its goals, Project lays in the green area.
Our project is a robot vision application, which applies computer vision techniques to robotics applications.
Specifically, it studies the machine vision in the context of robot control and navigation [1].
3
4. 3. Simultaneous Localization and Mapping “SLAM”
Simultaneous localization and mapping (SLAM) is a technique used by robots and autonomous vehicles to
build up a map within an unknown environment (without a prior knowledge), or to update a map within a
known environment (with a prior knowledge from a given map) while at the same time keeping track of
their current location. Mapping is the problem of integrating the information gathered by a set of sensors
into a consistent model and depicting that information as a given representation. It can be described by
the first characteristic question what does the world look like? Central aspects in mapping are the
representation of the environment and the interpretation of sensor data. In contrast to this, localization is the
problem of estimating the place (and pose) of the robot relative to a map. In other words, the robot has to
answer the second characteristic question, Where am I? Typically, solutions comprise tracking, where the
initial place of the robot is known and global localization, in which no or just some a prior knowledge about
the ambiance of the starting position is given [2]. Our project focuses on the localization problem, as it
needs a previously generated map and the current input images to be able to localize the robot’s position.
3.1 Localization
Localization is a fundamental problem in mobile robotics. Most mobile robots must be able to locate
themselves in their environment in order to accomplish their tasks. Localization has three methods
geometric, topological and hybrid as shown in Fig.3.1.1.
Lacalization
Methods
geometric topological hybrid
Fig. 3.1.1. Localization methods geometric, topological and hybrid
Geometric approaches typically use a two-dimensional grid as a map representation. They attempt to keep
track of the robot’s exact position with respect to the map’s coordinate system. Topological approaches use
an adjacency graph as a map representation. They attempt to determine the node of the graph that
corresponds to the robot’s location. Hybrid methods combine geometric and topological approaches [3].
Most of the recent work in the field of mobile robot localization focuses on geometric localization. In
general, these geometric approaches are based on either map matching or landmark detection. Most map
matching systems rely on an extended Kalman filter (EKF) that combines information from intrinsic
sensors with information from extrinsic sensors to determine the current position. Good statistical models of
the sensors and their uncertainties must be provided to the Kalman filter.
Landmark localization systems rely on either artificial or natural features of the environment. Artificial
landmarks are easier to detect reliably than natural landmarks. However, artificial landmarks require
modifications of the environment, such that systems based on natural landmarks are often preferred.
Various features have been used as natural landmarks: corners, doors, overhead lights, air diffusers in
4
5. ceilings, and distinctive buildings. Because most of the landmark-based localization systems are tailored for
specific environments, they can rarely be easily applied to different environments [3].
4. Vision-Based Place Recognition for autonomous robots
4.1. Autonomous Robots
Autonomous robots are intelligent machines capable of performing tasks in the world by themselves,
without explicit human control. Mobile Autonomous Robot (MAR) is a microprocessor based,
programmable mobile robot, which can sense and react to its environment.
A fully autonomous robot has the ability to:
Gain information about the environment.
Work for an extended period of time without human intervention.
Move either all or part of itself throughout its operating environment without human
assistance.
Avoid situations that are harmful to people, property, or itself unless those are part of its
design specifications.
Human controls RC robots or Remote Control robots, and they can’t react to the environment by themselves
[4].
4.2. Vision Based
A robust localization system requires an extrinsic sensor, which provides rich information in order to allow
the system to reliably distinguish between adjacent locations. For this reason, we use a passive color vision
camera as our extrinsic sensor. Because many places can easily be distinguished by their color appearance,
we expect that color images provide sufficient information without the need for range data from additional
sensors such as stereo cameras, sonars, or a laser rangefinder. Other systems uses another extrinsic sensor
like range measurement device (Sonars, Laser Scanner). Nowadays, the range measurement devices usually
used are laser scanners. Laser scanners are very precise, efficient, and the output does not require much
computation to process. On the downside they are also very expensive. A SICK scanner costs about
5000USD. Problems with laser scanners are looking at certain surfaces including glass, where they can give
very bad readings (data output). Also laser scanners cannot be used underwater since the water disrupts the
light and the range is drastically reduced. Also there is the option of sonar. Sonar was used intensively some
years ago. They are very cheap compared to laser scanners. However, Their measurements are not very
good compared to laser scanners and they often give bad readings [5].
4.3. Place Recognition
The robotics community has mostly conducted the research in scene and/or place recognition to solve the
problem of mobile robot navigation. Leonard and Durant summarized the general problem of mobile robot
navigation by three questions: Where am I? Where am I going? And, How should I get there? However,
most of the research that been conducted in this area tries to answer the first question, that is: robot
positioning in its environment.
Many challenges face the problem of face recognition can be shown in Fig.4.3.1.
5
6. 1. The fact that the objects, scenes and/or places appear to be largely variable in their visual
appearances. Their visual appearances can change dramatically due to occlusion, cluttered
background, noise and differently illumination and imaging conditions.
2. Recognition algorithms would perform differently in indoor and outdoor environments.
3. Recognition algorithms would perform differently with different environments.
4. Due to the very limited resources of a mobile robot, it’s difficult to find a solution that is both,
resource efficient and accurate [5].
Fig.4.3.1. Showing the different dynamics that are common to the real world indoor environments [5].
The appearance of the room changes dramatically due to variation in illumination caused by different
weather conditions (1st row). Variation caused by different viewpoints (2nd row). Additional variability
caused by human activities is also apparent: a person appears to work in 4.3.1(a) and 4.3.1(c), the dustbin is
full in 4.3.1(a) whereas it is empty in 4.3.1(b) [5].
5.Framework of a vision-based place recognition system
Any supervised recognition system contains all or some of the modules shown in figure.5.0.1.
6
7. Fig.5.0.1. Framework of a vision-based place recognition system [5].
In figure.5.0.1 main modules of the system are shown with yellow rectangles that constitute the overall
operations in the training and the recognition processes. Data flow among the different modules is shown
with arrow heads. Light gray rectangles describe the type of data generated at every stage of the two
processes. Finally, the most fundamental modules, present in almost every pattern recognition system, are
framed with a solid line [5].
The first three operations (sensing, feature extraction, training) are common for both the training and the
recognition processes, therefore will be discussed first.
7
8. 5.1. Sensing
The basic purpose of a sensor is to sense the environment and to store that information into a digital format.
Two types of optical sensors that are commonly used for vision-based place recognition and localization of
mobile robots include: a regular digital camera and an Omni-directional camera. Regular cameras are
the most commonly found due to their nominal cost and good performance. Omni-directional cameras, on
the other hand, provide a horizontal field of view of 3600, which simplifies recognition task [5].
Fig.5.1.1. Framework of a vision-based place recognition system [5].
5.2. Pre-Processing
Employing digital image processing techniques before any further processing enhances such images.
Discussed certain problems intrinsic to digital imaging: tone reproduction, resolution, color balance,
channel registration, bit depth, noise, clipping, compression and sharpening. In such a situation, each of the
individual patterns is segmented for an effective recognition process known as segmentation. Another
problem arises when the image pattern consists of several disconnected parts. In such a situation, each of
the disconnected parts must be properly combined in order to form a coherent entity an operation known as
grouping. After the pre-processing stage in the training process, the acquired image instances by the optical
sensor are stored in a temporary storage before any further processing is performed. Whereas, in case of
recognition process, acquired image instance is directly used for feature extraction when the purpose is to
provide real-time target recognition [5].
5.3. Feature Extraction
Feature extraction is thus a process that extracts such features from the input image in order to give it a new
representation. A desirable property of the new representation is that it should be insensitive to the
variability that can occur within a class (within-class variability), and should emphasize pattern properties
that are different for different classes (between-class variability). In other words, good features describe
distinguishing/discriminative properties between different patterns. The desirable properties of the extracted
features would be their invariance to translation, viewpoint change, scale change, illumination variations,
and effects of small changes in the environment [5].
8
9. 5.4. Training
Training is a process by which an appropriate learning method is trained on a representative set of samples
of the underlying problem, in order to come up with a classifier. The Choosing of appropriate learning
method depends on the choice of learning paradigm and the problem at the hand. In this thesis [5], Support
Vector Machines (SVM) classifier has been employed as a learning method. SVM once trained, results in a
model that is composed of support vectors (a selected set of training instances that summarize the whole
feature space) and corresponding weight coefficients.
5.5. Optimization
Optimization is a major design concern in any recognition system, which aims for more robustness,
computational efficiency, and low memory consumption. This is particularly crucial for systems that aim to
work in real-time, and/or to perform continuous learning. In order to achieve these goals, the learning
method can be improved by optimizing the learned model. After the optimization stage, the model is ready
to be served as a knowledge model to the SVM classifier for the recognition of indoor places [5].
5.6. Classification
Classification is that stage in the system architecture, where the learned classifier incorporates the
knowledge model for the actual recognition of indoor places. At this stage an input image instance, in the
form of extracted features, is provided to the classifier. The classifier then assigns this input image instance
to one of the predefined classes depending upon the knowledge that is stored in the knowledge model. The
performance of any learning based recognition system heavily depends on [5]:
The quality of the classifier.
The performance of pre-processing operations such as noise removal and segmentation.
Feature extraction also plays an important role in providing quality features to the classifier.
The complexity of the decision function of the classifier.
5.7. Post-Processing
Up till this stage, a single level of classification has made the recognition decision about a particular input
image instance, but it does not have to be the final decision layer of the recognition system. This is mainly
because in many cases performance and robustness can be greatly improved by incorporating additional
mechanisms. Such mechanisms can exploit multiple sources of information or just process the data
produced by a single classifier. An example of the latest case would be incorporating the information of a
place for object recognition. As already mentioned, the recognition performance can be improved by
incorporating multiple cues. In this way, the recognition system comprises of multiple classifiers. In this
thesis [5], post processing hasn’t been used, because the use of a single cue made by single classifier.
9
10. 6. Feature Extraction
In case of visual data, such representation of features can be derived from the whole image (global features)
or can be computed locally based on its salient parts (local features).
Local Features Global Features
A local feature is an image pattern that In the field of image retrieval, many global
differs from its immediate neighborhood. It features have been proposed to describe the
is usually associated with a change of an image content, with color histograms and
image property or several properties variations thereof as a typical example.
simultaneously; the image properties Global features cannot distinguish
commonly considered are intensity, color, foreground from background, and mix
and texture. information from both parts together.
Some measurements are taken from a
region centered on a local feature and
converted into descriptors. The descriptors
can then be used for various applications.
A set of local features can be used as a
robust image representation, that allows
recognizing objects or scenes without the
need for segmentation.
In this thesis [5], evaluation of the performance of only local image features for indoor place recognition
under varying imaging and illumination conditions has been made. The basic idea behind the local features
is to represent the appearance of the input image only around a set of characteristic points known as the
interest points/key points. The process of local features extraction mainly consists of two stages: interest
points detection and descriptor building.
Interest Point Detection:
The purpose of an interest point detector is to identify a set of characteristic points in the input image that
have the maximum possibility to be repeated again even in the presence of various transformations (as for
instance scaling and rotation). Also more stable interest points means better performance.
Local Feature Descriptor:
For each of these interest points, a local feature descriptor is built to distinctively describe the local region
around the interest point. In order to determine the resemblance between two images using such
representation, the local descriptors from both of the images are matched. Therefore, the degree of
resemblance is usually a function of the number of properly matched descriptors between the two images.
In addition, we typically require a sufficient number of feature regions to cover the target object, so that it
can still be recognized under partial occlusion. This is achieved by the following feature extraction pipeline:
1. Find a set of distinctive key points.
2. Define a region around each key point in a scale- or affine-invariant manner.
3. Extract and normalize the region content.
4. Compute a descriptor from the normalized region.
5. Match the local descriptors.
10
11. 6.1. Local Feature Detection Algorithms
Harris
Edge Based
SUSAN
Corner
Harris
Harris Laplace
Harris Affine
Hessian
Hessian Affine
Hessian Laplace
Feature Detectors
Blob SURF
Salient Regions
Hessian Affine
DoG
MSER
Region Intensity Based
Super Pixels
Fig.6.1.1. Classification of feature detectors corresponding to corner, region, and blob Methods, gray rectangle with green border refer to
corner detector but it may be a blob detector.
6.1.1 Good features properties:
Repeatability: Given two images of the same object or scene, taken under different viewing
conditions, a high percentage of the features detected on the scene part are visible in both images
should be found in both images.
Distinctiveness: The intensity patterns underlying the detected features should show a lot of
variation, such that features can be distinguished and matched.
Locality: The features should be local, so as to reduce the probability of occlusion and to allow
simple model approximations of the geometric and photometric deformations between two images
taken under different viewing conditions (e.g., based on a local planarity assumption).
11
12. Quantity: The number of detected features should be sufficiently large, such that a reasonable
number of features are detected even on small objects. However, the optimal number of features
depends on the application. Ideally, the number of detected features should be adaptable over a
large range by a simple and intuitive threshold. The density of features should reflect the
information content of the image to provide a compact image representation.
Accuracy: The detected features should be accurately localized, both in image location, as with
respect to scale and possibly shape.
Efficiency: Preferably, the detection of features in a new image should allow for time-critical
applications. Repeatability, arguably the most important property of all, can be achieved in two
different ways: either by invariance or by robustness.
Invariance: When large deformations are to be expected, the preferred approach is to model these
mathematically if possible, and then develop methods for feature detection that are unaffected by
these mathematical transformations.
Image noise
Changes in illumination
Uniform scaling
Rotation
Minor changes in viewing direction
Robustness: In case of relatively small deformations, it often suffices to make feature detection
methods less sensitive.
6.2. Local Feature Descriptor Algorithms
Scale Invariant Feature
Transformation(SIFT)
Shape Contexts
Image Moments
Feature Descriptors
Jet Decriptors
Gradient Location and
Orientaiton Histogram
Geometric Blur
Fig.6.2.1. Classification of feature descriptors .
12
13. 7. Classification
In machine learning and pattern recognition, classification refers to an algorithmic procedure for assigning a
given piece of input data into one of a given number of categories. An algorithm that implements
classification, especially in a concrete implementation, is known as a classifier.
In more simple words, Classification means to resolve the class of an object, e.g., a ground vehicle vs. an
aircraft [8].
Machine learning may be divided into two main learning types:
Machine Learning
Supervised Unsupervised
Learning Learning
(Classification) (Clustering)
Figure 7.0.1: Types of machine learning
A supervised learning procedure (classification) is a procedure that learns to classify new instances based
on learning from a training set of instances that have been properly labeled by hand with the correct classes.
However, an unsupervised procedure (clustering) involves grouping data into classes based on some
measure of inherent similarity (e.g. the distance between instances, considered as vectors in a multi-
dimensional vector space).
As our problem mainly revolves around supervised learning, only a detailed explanation of supervised
learning methodology will be provided.
7.1 Supervised Learning:
Supervised learning is the machine-learning task of inferring a function from supervised training data. A
supervised learning algorithm analyzes the training data and produces an inferred function, which is called
a classifier (if the output is discrete) or a regression function (if the output is continuous).
7.1.1 Supervised Learning Steps:
In order to solve a certain problem using supervised learning, it is necessary to follow the steps in
Figure 7.1.1.
13
14. Determine Determine Complete the Evaluate the
Determine the
Type of Gather structure of design. accuracy of the
input feature
learning Training Set the learned learned
representation (run the set)
examples function function
Figure 7.1.1: steps of solving a supervised learning problem.
These steps may be described as follow:
1. Determine the type of training examples. Before doing anything else, the engineer should decide
what kind of data is to be used as an example. For instance, this might be a single handwritten
character, an entire handwritten word, or an entire line of handwriting.
2. Gather a training set. The training set needs to be representative of the real-world use of the
function. Thus, a set of input objects is gathered and corresponding outputs are also gathered, either
from human experts or from measurements.
3. Determine the input feature representation of the learned function. The accuracy of the learned
function depends strongly on how the input object is represented. Typically, the input object is
transformed into a feature vector, which contains a number of features that are descriptive of the
object. The number of features should not be too large, because of the curse of dimensionality; but
should contain enough information to accurately predict the output.
4. Determine the structure of the learned function and corresponding learning algorithm. For example,
the engineer may choose to use support vector machines or decision trees.
5. Complete the design. Run the learning algorithm on the gathered training set. Some supervised
learning algorithms require the user to determine certain control parameters. These parameters may
be adjusted by optimizing performance on a subset (called a validation set) of the training set, or
via cross-validation.
6. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the
performance of the resulting function should be measured on a test set that is separate from the
training set [9].
14
15. However, the previous steps may solve a supervised learning problem effectively, but some problems may
arise.
7.1.2 Problems in supervised learning
1. Bias-Variance tradeoff.
2. Function complexity and amount of training data.
3. Dimensionality of the input space.
4. Noise in the output values.
5. Factors to consider.
a. Heterogeneity of the data.
b. Redundancy in the data.
7.2 Various classifiers
In this section, several types of commonly used classifiers in learning will be viewed. Classifiers can be
roughly separated to 3 main groups (approaches).
Similarity Template Matching
KNN
(K-Nearest Neighbor)
BnB
Probabilistic Approach
(Branch and Bound)
Types Of Classifiers
Naive Bayes
Decision Trees
ANN-MLP
Decision Boundry (Artificial Neural Network -
Multilayer perceptron)
SVM
(Support Vector Machines)
Figure 7.2.1: Some of the commonly used classifiers in the supervised learning process.
15
16. 7.2.1 Similarity (Template Matching)
Is the most intuitive method, as Template matching is a simple task of performing a normalized cross-
correlation between a template image (object in training set) and a new image to be classified. It is
noticeable that template matching is the easiest method to be understood and implemented. However,
template matching is well known to be an expensive operation when used in classifying against a large set
of images.
7.2.2 Probabilistic Classifiers
Algorithms of this nature use statistical inference to find the best class for a given instance. Unlike other
algorithms, which simply output a "best" class, probabilistic algorithms output a probability of the instance
being a member of each of the possible classes. The best class is normally then selected as the one with the
highest probability. However, such an algorithm has numerous advantages over non-probabilistic
classifiers, because it can output a confidence value associated with its choice. Correspondingly, it
can abstain when its confidence of choosing any particular output is too low. Because of the probabilities
output, probabilistic classifiers can be more effectively incorporated into larger machine-learning tasks, in a
way that partially or completely avoids the problem of error propagation [11].
7.2.3 Decision Boundary Based Classifiers
Decision boundary based classifiers are based on the concept of decision planes that define decision
boundaries. A Decision Plane is a plane that separates between a set of objects having different class
memberships [10].
7.3 Classifier Evaluation
Classifier performance depends greatly on the characteristics of the data to be classified. There is no single
classifier that works best on all given. Various empirical tests have been performed to compare classifier
performance and to find the characteristics of data that determine classifier performance. Determining a
suitable classifier for a given problem is however still more an art than a science.
There are many measures that can be used to evaluate the quality of a classification system, such as
precision and recall, and receiver operating characteristic (ROC) [9].
a. Conclusion
In brief words, this document included the main parts, algorithms, and methodologies that are used by most
researches in the field of place recognition. It contained the basic steps of place recognition, and brief
details on each step. Which give us an insight on the process of place recognition and how usually it works.
Now, the problem is how to select the proper feature detectors, and the proper classifiers to solve the place
recognition problem.
b. Future Work
Working on generation of survey number 2, which will contain an additional study of each algorithm
mentioned, and the proposed approaches to solve our problem. After finishing survey #2 we will be able
16
17. reduce the effort done on the research phase and start the design and implementation phase in parallel with
the research phase.
c. References
[1] Computer vision and robot vision. [Online]. Available:
http://en.wikipedia.org/wiki/Computer_vision
[2] SLAM. [Online]. Available:
http://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
[3] Appearance-Based Place Recognition for Topological Localization, Iwan Ulrich and Illah
Nourbakhsh. [Online]. Available:
http://www.cis.udel.edu/~cer/arv/readings/paper_ulrich.pdf
[4] Autonomous Robot [Online]. Available:
http://en.wikipedia.org/wiki/Autonomous_robot
[5] Vision-Based Indoor Place Recognition using Local Features, Muhammed muneeb ullah
[Online]. Available: http://www.csc.kth.se/~pronobis/projects/msc/ullah2007msc.pdf
[6] Local Invariant Feature detectors, Tinne Tuytelaars and Krystian Mikolajczyk [Online].
Available:
http://campar.in.tum.de/twiki/pub/Chair/TeachingWs09MATDCV/FT_survey_interestpoin
ts08.pdf
[7] Comparison of Local Feature descriptors, Subhransu Maji [Online]. Available:
http://www.eecs.berkeley.edu/~yang/courses/cs294-6/maji-presentation.pdf
[8] “Evaluation of Bayes, ICA, PCA and SVM Methods for Classification”, V.C.Chen. Radar
Division, US Naval Research Laboratory
[9] http://en.wikipedia.org/wiki/Supervised_learning
[10] http://www.statsoft.com/textbook/support-vector-machines/
[11] http://en.wikipedia.org/wiki/Classification_(machine_learning)
17