Slides used for the thesis defense of the PhD candidate Sergio Orts-Escolano.
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time.This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problems and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
PCA is used to reduce the dimensionality of image data by finding the principal components that account for the most variance in the data. It works by constructing a feature space from the eigenvectors of the image covariance matrix. Images are represented as vectors in this lower-dimensional feature space rather than the original high-dimensional pixel space. The eigenvectors corresponding to the largest eigenvalues are the principal components or "eigenfaces" that best represent the variation between images. New images can be classified by projecting them into this eigenface feature space.
Facial emotion detection on babies' emotional face using Deep Learning.Takrim Ul Islam Laskar
phase- 1
Face Detection.
Facial Landmark detection.
phase- 2
Neural Network Training and Testing.
validation and implementation.
phase - 1 has been completed successfully.
This document summarizes face recognition techniques. It discusses three levels of facial details, from gross to micro features, and how they require different image resolutions. It also outlines the major components of a face recognition system: image acquisition, face detection, and face matching. Finally, it describes common image formats like 2D photos and 3D scans, and detection methods like Viola-Jones that use Haar-like features and AdaBoost training.
This document discusses using biometrics and neural networks for face recognition. It describes using facial feature coordinates like nose width and eye positions as inputs to train a neural network to identify people from images. The author explains normalizing the data, training the network through supervised learning, and testing it to model the function relating facial inputs to identity outputs. Common face recognition algorithms mentioned include PCA with Mahalanobis distance and half-face or eigen-eyes approaches. The goal is to create a basic trainable system for face verification using Neuroph Studio.
This document discusses interpretability and explainable AI (XAI) in neural networks. It begins by providing motivation for why explanations of neural network predictions are often required. It then provides an overview of different interpretability techniques, including visualizing learned weights and feature maps, attribution methods like class activation maps and guided backpropagation, and feature visualization. Specific examples and applications of each technique are described. The document serves as a guide to interpretability and explainability in deep learning models.
This document provides an overview of Kalyan Acharjya's proposed work on face recognition for his M.Tech dissertation. It discusses conducting literature research on existing face recognition techniques, identifying challenges in real-time applications, and exploring standard face image databases. The presentation covers topics such as how face recognition works, applications, and concludes with plans to modify existing algorithms and compare results to related work to enhance recognition rates.
PCA is used to reduce the dimensionality of image data by finding the principal components that account for the most variance in the data. It works by constructing a feature space from the eigenvectors of the image covariance matrix. Images are represented as vectors in this lower-dimensional feature space rather than the original high-dimensional pixel space. The eigenvectors corresponding to the largest eigenvalues are the principal components or "eigenfaces" that best represent the variation between images. New images can be classified by projecting them into this eigenface feature space.
Facial emotion detection on babies' emotional face using Deep Learning.Takrim Ul Islam Laskar
phase- 1
Face Detection.
Facial Landmark detection.
phase- 2
Neural Network Training and Testing.
validation and implementation.
phase - 1 has been completed successfully.
This document summarizes face recognition techniques. It discusses three levels of facial details, from gross to micro features, and how they require different image resolutions. It also outlines the major components of a face recognition system: image acquisition, face detection, and face matching. Finally, it describes common image formats like 2D photos and 3D scans, and detection methods like Viola-Jones that use Haar-like features and AdaBoost training.
This document discusses using biometrics and neural networks for face recognition. It describes using facial feature coordinates like nose width and eye positions as inputs to train a neural network to identify people from images. The author explains normalizing the data, training the network through supervised learning, and testing it to model the function relating facial inputs to identity outputs. Common face recognition algorithms mentioned include PCA with Mahalanobis distance and half-face or eigen-eyes approaches. The goal is to create a basic trainable system for face verification using Neuroph Studio.
This document discusses interpretability and explainable AI (XAI) in neural networks. It begins by providing motivation for why explanations of neural network predictions are often required. It then provides an overview of different interpretability techniques, including visualizing learned weights and feature maps, attribution methods like class activation maps and guided backpropagation, and feature visualization. Specific examples and applications of each technique are described. The document serves as a guide to interpretability and explainability in deep learning models.
This document provides an overview of Kalyan Acharjya's proposed work on face recognition for his M.Tech dissertation. It discusses conducting literature research on existing face recognition techniques, identifying challenges in real-time applications, and exploring standard face image databases. The presentation covers topics such as how face recognition works, applications, and concludes with plans to modify existing algorithms and compare results to related work to enhance recognition rates.
The document discusses face recognition using principal components analysis (PCA). It provides three key points:
1. PCA is used to reduce the dimensionality of face image data to 2D or 3D by finding patterns in high-dimensional data and visualizing it. This allows for face recognition by representing each face as a set of weights of significant eigenvectors.
2. A training set is used to form the PCA coordinate system and represent each training face as weights of eigenvectors. A test face is then recognized as the closest training face based on Euclidean distance between their representations in the PCA space.
3. PCA allows for data compression, noise reduction, and classification of faces by projecting high-dimensional image data onto
The document discusses the eigenface approach for face recognition. It provides an overview of eigenfaces, how they are calculated from a training set of faces, and how they can be used to identify faces by projecting faces onto the eigenface space. Major steps include calculating the eigenfaces from a training set, projecting new images into eigenface space to get weight coefficients, and comparing the weights to known individuals' weights or thresholds to classify faces as known or unknown. Advantages are ease of implementation and little preprocessing required, while limitations include sensitivity to head scale and only applicable to frontal views under controlled conditions.
The document discusses face recognition technology as a biometric authentication method. It describes how face recognition works by detecting nodal points on faces and creating unique faceprints. The advantages are that face recognition is convenient, socially acceptable and inexpensive compared to other biometrics. However, face recognition has difficulties with identical twins and environmental/appearance changes reducing accuracy over time. The document also outlines applications in security, law enforcement, banking, and commercial access control.
Predicting Emotions through Facial Expressions twinkle singh
This document describes a facial expression recognition system with two parts: face recognition and facial expression recognition. It discusses using principal component analysis (PCA) and linear discriminative analysis (LDA) for face recognition, and PCA to extract eigenfaces for facial expression recognition. The system first performs face detection, then extracts facial expression data and classifies the expression. MATLAB is used as the tool for its faster programming capabilities.
Face recognition technology uses machine learning algorithms to identify or verify a person's identity from digital images or video frames. The process involves detecting faces, applying preprocessing techniques like filtering and scaling, training classifiers using labeled face images, and then classifying new faces. Common machine learning algorithms used include K-nearest neighbors, naive Bayes, decision trees, and locally weighted learning. The proposed system detects faces, builds a tabular dataset from pixel values, trains classifiers, and evaluates performance on a test set. Software applies techniques like detection, alignment, normalization, and matching to encode faces for comparison. Face recognition has advantages like convenience and low cost, and applications in security, banking, and more.
The document discusses human action recognition using spatio-temporal features. It proposes using optical flow and shape-based features to form motion descriptors, which are then classified using Adaboost. Targets are localized using background subtraction. Optical flows within localized regions are organized into a histogram to describe motion. Differential shape information is also captured. The descriptors are used to train a strong classifier with Adaboost that can recognize actions in testing videos.
Deep learning on face recognition (use case, development and risk)Herman Kurnadi
1) Face recognition using deep learning methods has achieved high accuracy, nearing and sometimes surpassing human-level performance on some datasets.
2) The document outlines the key steps in face recognition systems using deep learning: face detection, alignment, feature extraction, and recognition. It discusses several influential deep learning models that have improved accuracy.
3) Applications discussed include security, health, and marketing/retail uses. Concerns about bias and privacy are also mentioned.
This document discusses facial recognition techniques using principal component analysis (PCA). It explains that PCA is used to reduce a large set of face image variables to a smaller set of principal components, or "eigenfaces", that contain most of the information. The document outlines how PCA is applied to a training set of face images to calculate the eigenfaces, which form an orthonormal basis set that can be used to reconstruct face images. It notes some challenges like variations in lighting and expressions but overall finds eigenface-based facial recognition to be a robust technique for security applications.
This document provides an overview of a course on computer vision called CSCI 455: Intro to Computer Vision. It acknowledges that many of the course slides were modified from other similar computer vision courses. The course will cover topics like image filtering, projective geometry, stereo vision, structure from motion, face detection, object recognition, and convolutional neural networks. It highlights current applications of computer vision like biometrics, mobile apps, self-driving cars, medical imaging, and more. The document discusses challenges in computer vision like viewpoint and illumination variations, occlusion, and local ambiguity. It emphasizes that perception is an inherently ambiguous problem that requires using prior knowledge about the world.
The document discusses computer vision and its history. Computer vision involves using algorithms to understand and analyze visual images and video data. It aims to help computers understand scenes, locate objects, and determine their properties similarly to human vision. Computer vision has many applications such as face detection and recognition, optical character recognition, analyzing sports footage, and enabling technologies like autonomous vehicles and robots. The field involves understanding problems like image formation, filtering, matching, alignment, and categorization. OpenCV is also introduced as a popular open-source library for computer vision applications.
This document discusses facial recognition technology. It begins with an introduction to biometrics and the need for facial recognition. It then describes the process of facial recognition, including data capture, extraction of features, comparison, and matching. The key components of a facial recognition system and how it works are also outlined. Advantages include convenience and ease of use, while disadvantages relate to issues with lighting, pose, and privacy concerns. The document concludes by describing applications of facial recognition technology in government, security, banking, and other commercial sectors.
A completed modeling of local binary pattern operatorWin Yu
This document presents the completed local binary pattern (CLBP) operator for texture classification. CLBP generalizes and completes the local binary pattern (LBP) by using a local difference sign-magnitude transform to encode the missing texture information not captured by LBP. The CLBP operator fuses three codes - CLBP_C for the center pixel, CLBP_S for the signs of differences, and CLBP_M for the magnitudes. Experiments on the Outex texture database show CLBP achieves much better classification accuracy than LBP and other state-of-the-art methods.
The document summarizes an OpenCV based image processing attendance system. It discusses using OpenCV to detect faces in images and recognize faces by comparing features to a database. The key steps are face detection using Viola-Jones detection, face recognition using eigenfaces generated by principal component analysis to project faces into "face space", and measuring similarity by distance between projections.
[Paper] anti spoofing for face recognitionSusang Kim
This document summarizes a research paper on face anti-spoofing using deep learning models. It discusses using auxiliary supervision from additional data sources like depth maps and remote photoplethysmography (rPPG) signals to improve spoof detection performance. The proposed method uses a CNN to extract image features and an RNN to model rPPG signals. It evaluates the approach on the Spoof in the Wild database containing live and spoof videos, and compares error rates to other databases. The document provides background on anti-spoofing, defines relevant terms like rPPG and error metrics, and references related works and datasets.
Deep learning for image super resolutionPrudhvi Raj
Using Deep Convolutional Networks, the machine can learn end-to-end mapping between the low/high-resolution images. Unlike traditional methods, this method jointly optimizes all the layers of the image. A light-weight CNN structure is used, which is simple to implement and provides formidable trade-off from the existential methods.
This document presents information on face detection techniques. It discusses image segmentation as a preprocessing step for face detection. Some common segmentation methods are thresholding, edge-based segmentation, and region-based segmentation. Face detection can be classified as implicit/pattern-based or explicit/knowledge-based. Implicit methods use techniques like templates, PCA, LDA, and neural networks, while explicit methods exploit cues like color, motion, and facial features. One method discussed is human skin color-based face detection, which filters for skin-colored regions and finds facial parts within those regions. Advantages include speed and independence from training data, while disadvantages include sensitivity to lighting and accessories.
This document discusses face recognition and the PCA algorithm for face recognition. It begins with an introduction to face recognition and its uses. It then explains the PCA algorithm for face recognition in 6 steps: 1) converting images to vectors, 2) normalizing the vectors, 3) calculating eigenvectors from the normalized vectors, 4) selecting important eigenvectors, 5) representing faces as combinations of eigenvectors, and 6) recognizing faces. It discusses the strengths and weaknesses of face recognition and lists several applications such as access control, law enforcement, and banking.
1. The document discusses face recognition using an eigenface approach, which uses principal component analysis to extract features from a database of faces to generate eigenfaces that can be used to identify unknown faces.
2. The eigenface approach takes into account the entire face for recognition and is relatively insensitive to small changes in faces. It is faster, simpler, and has better learning capabilities compared to other approaches.
3. Some limitations are that accuracy is affected if lighting and face position vary greatly, it only works with grayscale images, and noisy or partially occluded faces decrease recognition performance.
This document provides an overview of facial recognition technology. It discusses the history of facial recognition, how the technology works, its implementation which involves image acquisition, processing, distinctive characteristic location and template matching. It also outlines the strengths and weaknesses of facial recognition as well as its applications in areas like border control, computer security, and banking. While facial recognition provides advantages like convenience and easy use, it also has disadvantages such as being impacted by changes in user appearance.
El documento describe diferentes técnicas de modelado basado en imágenes (IBR), incluyendo IBR puro que usa solo imágenes para renderizar escenas sin modelado 3D, e IBR híbrido que usa imágenes para guiar la reconstrucción de modelos 3D. Se detallan métodos como panoramas cilíndricos, mosaicos concéntricos, puntadas plenópticas, campos de luz, texturas de relieve, reconstrucción con profundidad y optimización no lineal. El IBR ha sido usado en películ
The document discusses face recognition using principal components analysis (PCA). It provides three key points:
1. PCA is used to reduce the dimensionality of face image data to 2D or 3D by finding patterns in high-dimensional data and visualizing it. This allows for face recognition by representing each face as a set of weights of significant eigenvectors.
2. A training set is used to form the PCA coordinate system and represent each training face as weights of eigenvectors. A test face is then recognized as the closest training face based on Euclidean distance between their representations in the PCA space.
3. PCA allows for data compression, noise reduction, and classification of faces by projecting high-dimensional image data onto
The document discusses the eigenface approach for face recognition. It provides an overview of eigenfaces, how they are calculated from a training set of faces, and how they can be used to identify faces by projecting faces onto the eigenface space. Major steps include calculating the eigenfaces from a training set, projecting new images into eigenface space to get weight coefficients, and comparing the weights to known individuals' weights or thresholds to classify faces as known or unknown. Advantages are ease of implementation and little preprocessing required, while limitations include sensitivity to head scale and only applicable to frontal views under controlled conditions.
The document discusses face recognition technology as a biometric authentication method. It describes how face recognition works by detecting nodal points on faces and creating unique faceprints. The advantages are that face recognition is convenient, socially acceptable and inexpensive compared to other biometrics. However, face recognition has difficulties with identical twins and environmental/appearance changes reducing accuracy over time. The document also outlines applications in security, law enforcement, banking, and commercial access control.
Predicting Emotions through Facial Expressions twinkle singh
This document describes a facial expression recognition system with two parts: face recognition and facial expression recognition. It discusses using principal component analysis (PCA) and linear discriminative analysis (LDA) for face recognition, and PCA to extract eigenfaces for facial expression recognition. The system first performs face detection, then extracts facial expression data and classifies the expression. MATLAB is used as the tool for its faster programming capabilities.
Face recognition technology uses machine learning algorithms to identify or verify a person's identity from digital images or video frames. The process involves detecting faces, applying preprocessing techniques like filtering and scaling, training classifiers using labeled face images, and then classifying new faces. Common machine learning algorithms used include K-nearest neighbors, naive Bayes, decision trees, and locally weighted learning. The proposed system detects faces, builds a tabular dataset from pixel values, trains classifiers, and evaluates performance on a test set. Software applies techniques like detection, alignment, normalization, and matching to encode faces for comparison. Face recognition has advantages like convenience and low cost, and applications in security, banking, and more.
The document discusses human action recognition using spatio-temporal features. It proposes using optical flow and shape-based features to form motion descriptors, which are then classified using Adaboost. Targets are localized using background subtraction. Optical flows within localized regions are organized into a histogram to describe motion. Differential shape information is also captured. The descriptors are used to train a strong classifier with Adaboost that can recognize actions in testing videos.
Deep learning on face recognition (use case, development and risk)Herman Kurnadi
1) Face recognition using deep learning methods has achieved high accuracy, nearing and sometimes surpassing human-level performance on some datasets.
2) The document outlines the key steps in face recognition systems using deep learning: face detection, alignment, feature extraction, and recognition. It discusses several influential deep learning models that have improved accuracy.
3) Applications discussed include security, health, and marketing/retail uses. Concerns about bias and privacy are also mentioned.
This document discusses facial recognition techniques using principal component analysis (PCA). It explains that PCA is used to reduce a large set of face image variables to a smaller set of principal components, or "eigenfaces", that contain most of the information. The document outlines how PCA is applied to a training set of face images to calculate the eigenfaces, which form an orthonormal basis set that can be used to reconstruct face images. It notes some challenges like variations in lighting and expressions but overall finds eigenface-based facial recognition to be a robust technique for security applications.
This document provides an overview of a course on computer vision called CSCI 455: Intro to Computer Vision. It acknowledges that many of the course slides were modified from other similar computer vision courses. The course will cover topics like image filtering, projective geometry, stereo vision, structure from motion, face detection, object recognition, and convolutional neural networks. It highlights current applications of computer vision like biometrics, mobile apps, self-driving cars, medical imaging, and more. The document discusses challenges in computer vision like viewpoint and illumination variations, occlusion, and local ambiguity. It emphasizes that perception is an inherently ambiguous problem that requires using prior knowledge about the world.
The document discusses computer vision and its history. Computer vision involves using algorithms to understand and analyze visual images and video data. It aims to help computers understand scenes, locate objects, and determine their properties similarly to human vision. Computer vision has many applications such as face detection and recognition, optical character recognition, analyzing sports footage, and enabling technologies like autonomous vehicles and robots. The field involves understanding problems like image formation, filtering, matching, alignment, and categorization. OpenCV is also introduced as a popular open-source library for computer vision applications.
This document discusses facial recognition technology. It begins with an introduction to biometrics and the need for facial recognition. It then describes the process of facial recognition, including data capture, extraction of features, comparison, and matching. The key components of a facial recognition system and how it works are also outlined. Advantages include convenience and ease of use, while disadvantages relate to issues with lighting, pose, and privacy concerns. The document concludes by describing applications of facial recognition technology in government, security, banking, and other commercial sectors.
A completed modeling of local binary pattern operatorWin Yu
This document presents the completed local binary pattern (CLBP) operator for texture classification. CLBP generalizes and completes the local binary pattern (LBP) by using a local difference sign-magnitude transform to encode the missing texture information not captured by LBP. The CLBP operator fuses three codes - CLBP_C for the center pixel, CLBP_S for the signs of differences, and CLBP_M for the magnitudes. Experiments on the Outex texture database show CLBP achieves much better classification accuracy than LBP and other state-of-the-art methods.
The document summarizes an OpenCV based image processing attendance system. It discusses using OpenCV to detect faces in images and recognize faces by comparing features to a database. The key steps are face detection using Viola-Jones detection, face recognition using eigenfaces generated by principal component analysis to project faces into "face space", and measuring similarity by distance between projections.
[Paper] anti spoofing for face recognitionSusang Kim
This document summarizes a research paper on face anti-spoofing using deep learning models. It discusses using auxiliary supervision from additional data sources like depth maps and remote photoplethysmography (rPPG) signals to improve spoof detection performance. The proposed method uses a CNN to extract image features and an RNN to model rPPG signals. It evaluates the approach on the Spoof in the Wild database containing live and spoof videos, and compares error rates to other databases. The document provides background on anti-spoofing, defines relevant terms like rPPG and error metrics, and references related works and datasets.
Deep learning for image super resolutionPrudhvi Raj
Using Deep Convolutional Networks, the machine can learn end-to-end mapping between the low/high-resolution images. Unlike traditional methods, this method jointly optimizes all the layers of the image. A light-weight CNN structure is used, which is simple to implement and provides formidable trade-off from the existential methods.
This document presents information on face detection techniques. It discusses image segmentation as a preprocessing step for face detection. Some common segmentation methods are thresholding, edge-based segmentation, and region-based segmentation. Face detection can be classified as implicit/pattern-based or explicit/knowledge-based. Implicit methods use techniques like templates, PCA, LDA, and neural networks, while explicit methods exploit cues like color, motion, and facial features. One method discussed is human skin color-based face detection, which filters for skin-colored regions and finds facial parts within those regions. Advantages include speed and independence from training data, while disadvantages include sensitivity to lighting and accessories.
This document discusses face recognition and the PCA algorithm for face recognition. It begins with an introduction to face recognition and its uses. It then explains the PCA algorithm for face recognition in 6 steps: 1) converting images to vectors, 2) normalizing the vectors, 3) calculating eigenvectors from the normalized vectors, 4) selecting important eigenvectors, 5) representing faces as combinations of eigenvectors, and 6) recognizing faces. It discusses the strengths and weaknesses of face recognition and lists several applications such as access control, law enforcement, and banking.
1. The document discusses face recognition using an eigenface approach, which uses principal component analysis to extract features from a database of faces to generate eigenfaces that can be used to identify unknown faces.
2. The eigenface approach takes into account the entire face for recognition and is relatively insensitive to small changes in faces. It is faster, simpler, and has better learning capabilities compared to other approaches.
3. Some limitations are that accuracy is affected if lighting and face position vary greatly, it only works with grayscale images, and noisy or partially occluded faces decrease recognition performance.
This document provides an overview of facial recognition technology. It discusses the history of facial recognition, how the technology works, its implementation which involves image acquisition, processing, distinctive characteristic location and template matching. It also outlines the strengths and weaknesses of facial recognition as well as its applications in areas like border control, computer security, and banking. While facial recognition provides advantages like convenience and easy use, it also has disadvantages such as being impacted by changes in user appearance.
El documento describe diferentes técnicas de modelado basado en imágenes (IBR), incluyendo IBR puro que usa solo imágenes para renderizar escenas sin modelado 3D, e IBR híbrido que usa imágenes para guiar la reconstrucción de modelos 3D. Se detallan métodos como panoramas cilíndricos, mosaicos concéntricos, puntadas plenópticas, campos de luz, texturas de relieve, reconstrucción con profundidad y optimización no lineal. El IBR ha sido usado en películ
An Open Source solution for Three-Dimensional documentation: archaeological a...Giulio Bigliardi
The modern techniques of Structure from Motion (SfM) and Image-Based Modelling
(IBM) open new perspectives in the field of archaeological documentation, providing
a simple and accurate way to record three dimensional data.
The software Python Photogrammetry Toolbox (PPT) is an Open Source solution that
implements a pipeline to perform 3D reconstruction from a set of pictures. It takes
pictures as input and performs automatically the 3D reconstruction for the images for
which 3D registration is possible.
It is composed of python scripts that automate the different steps of the workflow.
The entire process is reduced in two commands, calibration and dense reconstruction.
The user can run it from a graphical interface or from terminal command. Calibration
is performed with Bundler while dense reconstruction is done through CMVS/PMVS.
Despite the automation, the user can control the final result choosing two initial
parameters: the image size and the feature detector. Acting on the first parameter
determines a reduction of the computation time and a decreasing density of the point
cloud. Acting on the feature detector influences the final result: PPT can work both
with SIFT (patent of the University of British Columbia - freely usable only for
research purpose) and with VLFEAT (released under GPL v.2 license). The use of
VLFEAT ensures a more accurate result, though it increases the time of calculation.
Python Photogrammetry Toolbox, released under GPL v.3 license, is a classical
example of FLOSS project in which instruments and knowledge are shared. The community works for the development of the software, sharing code modification,
feed-backs and bug-checking.
Crime Scene Diagramming and Reconstruction by Det. Mike AndersonPPI_Group
From the 3D Laser Scanning for Forensic Scene Mapping Seminar 2014 in Portland and Seattle hosted by The PPI Group and co-sponsored by FARO Technologies. Presentation by Detective Mike Anderson of the Unified Police Department of Greater Salt Lake Utah.
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...mustafa sarac
Frank Dellaert presented an overview of visual SLAM, bundle adjustment, and factor graphs. Visual SLAM uses visual odometry to estimate camera poses incrementally from frame to frame. Bundle adjustment refines the camera pose estimates using non-linear optimization over all camera poses and 3D landmarks jointly. Factor graphs provide a graphical representation of the optimization problem in bundle adjustment.
Build Your Own 3D Scanner: The Mathematics of 3D TriangulationDouglas Lanman
The document introduces the topics that will be covered in the course, including:
1) The mathematics of 3D triangulation using line-plane and line-line intersections to reconstruct points in 3D space from 2D images.
2) Camera and light source calibration which is needed to map between image points and 3D rays.
3) Reconstruction and visualization of 3D point clouds scanned with swept-plane light sources.
Programación 3D y Modelado de Realidad Virtual para Internet con VRML 2.0Stephenson Prieto
Este documento resume un curso sobre programación 3D y modelado de realidad virtual para Internet con VRML. Explica conceptos básicos como los nodos, figuras primitivas, cambios de apariencia, transformaciones, y manejo de texto en VRML. El objetivo del curso es introducir el lenguaje VRML para crear mundos virtuales interactivos tridimensionales para la web.
Acoustic Trail es una aplicación que ofrece guiado sensorial sin manos ni ojos para la navegación al aire libre utilizando estímulos acústicos y hápticos. La aplicación permite crear rutas personalizadas y compartirlas, y ofrece información del recorrido y puntos de interés de forma eficiente sin necesidad de cobertura móvil. También incluye la función SafeTrails para el seguimiento en tiempo real de grandes grupos que se mueven en espacios naturales.
Build Your Own 3D Scanner:
Course Notes
http://mesh.brown.edu/byo3d/
SIGGRAPH 2009 Courses
Douglas Lanman and Gabriel Taubin
This course provides a beginner with the necessary mathematics, software, and practical details to leverage projector-camera systems in their own 3D scanning projects. An example-driven approach is used throughout; each new concept is illustrated using a practical scanner implemented with off-the-shelf parts. The course concludes by detailing how these new approaches are used in rapid prototyping, entertainment, cultural heritage, and web-based applications.
The document discusses augmented reality techniques including virtual reality, augmented reality, tangible user interfaces, and diminished reality. It covers topics such as tracking technologies using computer vision algorithms, depth cameras, optical flow, and markers. Examples of augmented reality hardware like Google Glass, Oculus Rift, and Microsoft Kinect are also mentioned. The document emphasizes the importance of realistic augmentation through geometric coherence and light coherence between virtual and real objects. Interaction and collaboration techniques using augmented reality are briefly discussed.
This poster presents an overview of 3D GIS Capabilities. It can be used for discussions about 3D GIS workflows (from 3D data acquisition to 3D object creation to 3D analysis, modeling and visualization), and the possible applications of 3D GIS in urban and landscape environments
Octopus Imaging Software is one of the most versatile and performing packages for the processing of tomography data acquired in almost any geometry. Octopus Imaging Software provides an intuitive interface, an extensive Software Development Kit and high performance routines on various hardware configurations. Combined with unique features such as single slice evaluation, allowing to tune the reconstruction parameters without processing the complete volume, Octopus Imaging Software is an ideal solution for both novice and advanced users. We offer you 3 packages: Octopus Reconstruction, Octopus Visualization and Octopus Analysis.
3D Scanning Technology Overview: Kinect Reconstruction Algorithms ExplainedVoxelMetric
Primesense depth cameras are the new standard in 3D scanning technology. The sensors have been mass-produced, and thus sold for a much lower price since the debut of Microsoft Kinect, which uses Primesense infrared LightCoding structured light technology. In this slide deck, we will describe the basics of Primesense-based 3D scanning technology from a physical and computational viewpoint.
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesDouglas Lanman
Build Your Own 3D Scanner:
3D Scanning with Swept-Planes
http://mesh.brown.edu/byo3d/
SIGGRAPH 2009 Courses
Douglas Lanman and Gabriel Taubin
This course provides a beginner with the necessary mathematics, software, and practical details to leverage projector-camera systems in their own 3D scanning projects. An example-driven approach is used throughout; each new concept is illustrated using a practical scanner implemented with off-the-shelf parts. The course concludes by detailing how these new approaches are used in rapid prototyping, entertainment, cultural heritage, and web-based applications.
This document discusses the use of 3D CT scans to study the anatomy of the middle and inner ear. It explains that 3D CT images are useful for examining congenital malformations and disorders of the inner and middle ear. The document compares 3D CT to 2D CT scans, noting that 3D images allow insight into temporal bone anatomy by allowing sectioning and rotation in any plane. It also states that 3D CT reconstruction based on spiral CT image data provides a better understanding of ear anatomy and pathology than 2D scans.
Inside Matters - 3D X-Ray Microscopy - ServicesLeiv Hendrickx
Inside Matters provides R&D departments worldwide with insights in the microstructure of their products in order to improve and accelerate their innovations, using 3D X-Ray Microscopy. Thanks to our expert knowledge and experience we offer our services to wide variety of sectors and industries. We are able to help you reconstruct, visualise and analyse 3D X-Ray Microscopy images. We assist you in determining and preparing the perfect sample, do the scans on 4 different CT scanners and perform a very wide range of measurements and analyses. To support the high level of service we offer, we develop our own imaging software called Octopus Imaging Software.
This document proposes a new technique called "Pixie Dust" that uses an acoustic potential field generated by phased arrays to levitate and animate small objects for graphical display and interaction. It summarizes the theory behind acoustic levitation using phased arrays, demonstrates the implementation of an acoustic potential field generator, and evaluates the workspace and speed capabilities. Potential applications explored include projection screens, spatial displays, and vector graphics displays. Future work areas discussed are wave synthesis, multi-layer displays, and production processes.
Build Your Own 3D Scanner: 3D Scanning with Structured LightingDouglas Lanman
Build Your Own 3D Scanner:
3D Scanning with Structured Lighting
http://mesh.brown.edu/byo3d/
SIGGRAPH 2009 Courses
Douglas Lanman and Gabriel Taubin
This course provides a beginner with the necessary mathematics, software, and practical details to leverage projector-camera systems in their own 3D scanning projects. An example-driven approach is used throughout; each new concept is illustrated using a practical scanner implemented with off-the-shelf parts. The course concludes by detailing how these new approaches are used in rapid prototyping, entertainment, cultural heritage, and web-based applications.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-mangen
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Michael Mangen, Product Manager for Camera and Computer Vision at Qualcomm, presents the "High-resolution 3D Reconstruction on a Mobile Processor" tutorial at the May 2016 Embedded Vision Summit.
Computer vision has come a long way. Use cases that were previously not possible in mass-market devices are now more accessible thanks to advances in depth sensors and mobile processors. In this presentation, Mangen provides an overview of how we are able to implement high-resolution 3D reconstruction – a capability typically requiring cloud/server processing – on a mobile processor. This is an exciting example of how new sensor technology and advanced mobile processors are bringing computer vision capabilities to broader markets.
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devicestoukaigi
The document describes a system called "From Sense to Print" that automatically generates 3D printed models from 3D sensor data without manual intervention. The system uses a Kinect sensor to reconstruct objects, KinectFusion for reconstruction, and sends the resulting 3D models to a 3D printer. It proposes a semantic segmentation algorithm to process the reconstructed data into a printable form by scaling it to the printer size. Initial results from a prototype using these components are presented along with limitations of the current approach.
This document describes an FPGA-based human detection system with an embedded platform. Key points:
- The system uses HOG features, SVM classification, and AdaBoost algorithms for human detection in images and video.
- FPGA circuits are designed to accelerate the computationally intensive HOG feature extraction, including modules for gradient calculation, histogram accumulation, and more.
- The full system is implemented on an embedded platform to achieve a real-time human detection system running at 15 frames per second.
- Experimental results show the FPGA-based system has similar detection accuracy to a PC-based software implementation but significantly faster speed, suitable for real-time embedded applications.
Desktop Softwares for Unmanned Aerial Systems(UAS))Kamal Shahi
The document compares various desktop software used for processing data from unmanned aerial vehicles (UAVs). It provides a table comparing the major features and functionality of Pix4Dmapper, Agisoft Metashape, WebODM, and QGIS. These include outputs generated, ease of use, cost, support and limitations. It also provides guidance on choosing the best software by defining needs, researching options, trying demonstrations, considering costs, and getting recommendations from experts. Selecting the right software depends on the required processing capabilities, accuracy, compatibility and other factors listed.
Real Time Object Dectection using machine learningpratik pratyay
This document discusses the development of a real-time object detection system using computer vision techniques. It aims to recognize and label moving objects in video streams from monitoring cameras with high accuracy and in a short amount of time. The system will use a hybrid model of convolutional neural networks and support vector machines for feature extraction and classification of objects from camera feeds into predefined classes. It is intended to help analyze surveillance video by only flagging clips that contain objects of interest like people or vehicles, reducing wasted storage and review time.
Wireless network implementation is a viable option for building network infrastructure in rural communities. Rural people lack network infrastructures for information services and socio-economic development. The aim of this study was to develop a wireless network infrastructure architecture for network services to rural dwellers. A user-centered approach was applied in the study and a wireless network infrastructure was designed and deployed to cover five rural locations. Data was collected and analyzed to assess the performance of the network facilities. The results shows that the system had been performing adequately without any downtime with an average of 200 users per month and the quality of service has remained high. The transmit/receive rate of 300Mbps was thrice as fast as the normal Ethernet transmit/receive specification with an average throughput of 1 Mbps. The multiple output/multiple input (MIMO) point-to-multipoint network design increased the network throughput and the quality of service experienced by the users.
3D reconstruction is a technique used in computer vision which has a wide range of applications in areas like object recognition, city modelling, virtual reality, physical simulations, video games and special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required. Such systems were often very expensive and was only available for industrial or research purpose. With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition, the goal of this work also included making the 3D scanning process fully automated by building and integrating a turntable alongside the software. This means the user can perform a full 3D scan only by a press of a few buttons from our dedicated graphical user interface. Three main steps were followed to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and convert the acquired point cloud data into a watertight mesh of good quality. Third, export the reconstructed model to a 3D printer to obtain a proper 3D print of the model.
The document describes the development of a low-cost 3D scanning system using an integrated turntable. Key points:
1) The system uses an inexpensive Kinect sensor and open-source Point Cloud Library to acquire 3D point cloud data of an object placed on an automated turntable.
2) The turntable is designed to be low-cost, using a modified twist board powered by a DC motor controlled via an Arduino microcontroller.
3) The software synchronizes point cloud acquisition with turntable rotation to automatically capture data from multiple angles and register them into a single aligned point cloud for surface reconstruction.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
This document summarizes a project on real-time object detection using computer vision techniques. It discusses using a system that can recognize objects in a video stream from a camera and label them with bounding boxes and labels. It notes that most video surveillance footage is uninteresting unless there are moving objects. The project aims to address this by building an accurate, fast object detection system that can run on resource-constrained devices. It proposes using a hybrid CNN-SVM model trained on a large dataset to recognize objects and discusses the training and detection phases of the system.
The document describes a study that develops a program to obtain 3D point clouds from digital images without requiring specialized camera setup. The program analyzes images pixel-by-pixel within a user-defined area of interest. Reference points in the image area allow processing a limited region, reducing computation time. The program uses color values and edge detection to identify object pixels and represent them as point clouds. Sample images and their converted 3D models demonstrate the technique.
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
This document describes the development of a 3D convolutional neural network (CNN) model to recognize human activities using moderate computation capabilities. The model is trained on the KTH dataset, which contains activities like walking, running, jogging, handwaving, handclapping, and boxing. The proposed model uses 3D CNN layers and max pooling layers to extract both spatial and temporal features from video frames. Testing achieved an accuracy of 93.33% for activity recognition. The number of model parameters and operations are also calculated to show the model can perform human activity recognition with reasonable computational requirements suitable for devices with moderate capabilities.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...Si Chen
CrowdMap is a system that uses crowdsourced sensor-rich videos to accurately reconstruct indoor floor plans. It improves upon previous inertial sensor-only methods by leveraging the visual information in videos. The system architecture includes modules for data collection, indoor path modeling, room layout modeling, and floor plan modeling. It is implemented on mobile and cloud platforms and evaluations show it can generate more accurate floor plans than structure from motion techniques. Future work will focus on extracting more context from room panoramas and addressing user incentive and privacy issues.
IRJET- 3D Object Recognition of Car Image DetectionIRJET Journal
This document summarizes research on 3D object recognition of car images using depth data from a Kinect sensor. The researchers used point cloud analysis techniques including VFH, CRH descriptors and ICP algorithms to match objects in 3D space. The approach involved preprocessing the point cloud to isolate individual objects, extracting descriptors, matching objects to models in a database, and verifying matches. Preliminary results showed the approach could successfully recognize objects like soda cans but performance was best at distances under 1 meter from the sensor. The goal is to enable applications like gesture controls and height estimation using 3D object detection.
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
Augmented reality has been a topic of intense research for several years for many applications. It consists of inserting a virtual object into a real scene. The virtual object must be accurately positioned in a desired place. Some measurements (calibration) are thus required and a set of correspondences between points on the calibration target and the camera images must be found. In this paper, we present a tracking technique based on both detection of Chessboard corners and a least squares method; the objective is to estimate the perspective transformation matrix for the current view of the camera. This technique does not require any information or computation of the camera parameters; it can used in real time without any initialization and the user can change the camera focal without any fear of losing alignment between real and virtual object.
Satellite image processing is an intricate task that requires vast computation and data processing, which cannot
be handled by a single computer. Furthermore, the processing of the massive amount of data accumulated by
the satellite is a huge challenge for the end user. Hence, grid computing is the essential platform to provide high
computing performance at the user end. This article reviews the grid services used for satellite image processing
and significant data processing.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to A Three-Dimensional Representation method for Noisy Point Clouds based on Growing Self-Organizing Maps accelerated on GPUs (20)
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Full-RAG: A modern architecture for hyper-personalization
A Three-Dimensional Representation method for Noisy Point Clouds based on Growing Self-Organizing Maps accelerated on GPUs
1. A Three-Dimensional
Representation method for
Noisy Point Clouds based on
Growing Self-Organizing Maps
accelerated on GPUs
Author:
Sergio Orts Escolano
Supervisors: Dr. José García Rodríguez
Dr. Miguel Ángel Cazorla Quevedo
Doctoral programme in technologies for the information society
4. Introduction
Motivation
Motivation
Most computer vision problems require the use of an
effective way of representation
•
•
Graphs, regions of interest (ROI), B-Splines, Octrees,
histograms, …
Key step for later processing stages: feature extraction,
feature matching, classification, keypoint detection, …
4/79
5. Introduction
Motivation
Motivation (II)
3D data captured from the real world
•
Implicitly is comprised of complex structures and nonlinear models
The advent of low cost 3D sensors
•
•
•
E.g. Microsoft Kinect, Asus Xtion, PrimeSense Carmine, …
RGB and Depth (RGB-D) streams (25 frames per second (fps))
High levels of noise and outliers
Only few works in 3D computer vision deal with real-time
constraints
•
3D data processing algorithms are computationally expensive
Finding a 3D model with different features:
•
•
•
•
Rapid adaptation
Good quality of representation (topology preservation)
Flexibility (non-stationary data)
Robust to noisy data and outliers
5/79
6. Introduction
Motivation
Motivation (III)
3D models of objects and scenes have been extensively used in
computer graphics
•
•
•
Suitable structure for rendering and display
Common graphics representations include quadric surfaces [Gotardo et
al., 2004], B-spline surfaces [Gregorski et al., 2000], and subdivision
surfaces
Not general enough to handle such a variety of features: flexibility,
adaption, noise-aware, … that are present in computer vision problems
(Left) Point cloud captured from a manufactured object (builder helmet).
(Right) 3D mesh generated from the captured point cloud (post-processed)
6/79
7. Introduction
Framework
Framework
Regional research project (GV/2011/034)
•
Title: “Visual surveillance systems for the identification and
characterization of anomalous behaviour” Project financed by the
Valencian Government in Spain
Regional research project (GRE09-16)
•
Title: “Visual surveillance systems for the identification and
characterization of anomalous behaviour in restricted environments
under temporal constraints” Project financed by the University of
Alicante in Spain
National research project (DPI2009-07144)
•
Title: “Cooperative Simultaneous Localization and Mapping (SLAM) in
large scale environments”
Research stay at IPAB – University of Edinburgh (BEFPI/2012/056)
•
Title: “Real-time 3D feature estimation and keypoint detection of
scenes using GPGPUs”
7/79
8. Introduction
Goals
Goals
Proposal and validation of a 3D representation
model and a data fitting algorithm for noisy
point clouds
•
•
•
•
Deals with noisy data and outliers
Flexible
Dynamic (non stationary data)
Topology preserving
An accelerated hardware implementation of the
proposed technique
•
•
Considerable speed-up regard CPU implementations
Real-time frame rates
8/79
9. Introduction
Goals
Goals (II)
Validation of the proposed method on different
real computer vision problems handling 3D data:
•
•
•
Robot vision: 6DoF Egomotion
3D Object recognition
Computed-aided design/manufacturing (CAD/CAM)
Integration of 3D data processing algorithms in
complex computer vision systems
•
•
Filtering, downsampling, normal estimation, feature
extraction, keypoint detection, matching, …
The use of a GPU as a general purpose processor
9/79
10. Introduction
Proposal
Proposal
Growing Self-Organizing Maps (GSOM) for 3D
data representation
•
•
•
Low cost 3D sensors: noisy data
Time-constrained conditions
Applications: 3D computer vision problems
Hardware-based implementation of the
proposed GSOM method
•
General Purpose computing on Graphics Processing
Units (GPGPU) paradigm
Integrate the entire pipeline of 3D computer
vision systems using GPGPU paradigm
10/79
11. Index
Introduction
3D Representation using Growing Self-Organizing Maps
•
•
•
•
Review
3D Growing Neural Gas network
Experiments: Input space adaptation & normal estimation
Extensions of the GNG algorithm
Improving keypoint detection from noisy 3D observations
GPGPU Parallel Implementations
Applications
Conclusions
11/79
12. 3D Representation GSOM
Review
Review
SOMs were originally proposed for data clustering and pattern
recognition purposes [Kohonen, 1982, Vesanto and Alhoniemi, 2000,
Dittenbach et al., 2001]
As the original model had some drawbacks due to the preestablished topology of the network, growing approaches were
proposed in order to deal with this problem
Growing Neural Gas network has been successfully applied to the
representation of 2D shapes in many computer vision problems
[Stergiopoulou and Papamarkos, 2006, García-Rodríguez et al., 2010,
Baena et al., 2013]
Already exist approaches that use traditional SOMs for 3D data
representation: [Yu, 1999, Junior et al., 2004]
•
•
•
Difficulties to correctly approximate concave structures
High computational cost
Synthetic data
12/79
13. 3D Representation GSOM
Review
Review (II)
Moreover, there exist some limitations and unexplored
topics in the application of SOM-based methods to 3D
representation:
•
•
•
•
•
Majority of these works do not consider the high
computational cost of the learning step
Do not guarantee response within strict time
constraints
Assumed perfect point clouds that were noise-free
Data fusion (Geometric information + colour
information) has not been considered
Not dealing with point cloud sequences, only singleshot data
13/79
14. 3D Representation GSOM
GNG network
3D Growing Neural Gas Network
Obtaining a reduced and compact representation of 3D data
•
Self Organizing Maps – Growing Neural Gas
Growing Neural Gas Algorithm (GNG) [Fritzke, 1995]
•
•
•
•
Incremental training algorithm
Links between the units in the network are established through Competitive
Hebbian Learning (CHL)
Topology Preserving Graph
Flexibility, growth, rapid adaption and good quality of representation.
GNG representation is comprised of nodes (neurons) and connections
(edges)
•
Wire-frame model
Initial, intermediate and final states of the GNG learning algorithm
14/79
15. 3D Representation GSOM
GNG algorithm
GNG algorithm
Input data is defined in ℝ 𝒅
•
Adaption: Reconfiguration module
•
For 3D representation d=3
Random patterns are presented to
the network
Growth: It starts with two neurons
and new neurons are inserted
Flexibility: neurons and
connections may be removed during
the learning stage
This process is repeated until an
ending condition is fulfilled:
•
•
Number of neurons/patterns
Adaptation error threshold
Highly Parallelizable
15/79
16. 3D Representation GSOM
Experiments: Data
Experiments: Data acquisition
Algorithm independent of the data source
We managed 3D data that can come from different sensors
•
Laser unit, a LMS-200 Sick mounted on a sweeping unit.
o Outdoor environments. Its range is 80 metres with an error of 1
millimetre per metre
•
Time-of-Flight camera
o SR4000 camera. It has a range of 5-10 metres
o The accuracy varies depending on the characteristics of the observed
scene, such as objects reflectivity and ambient lighting conditions
o Generation of point clouds during real time acquisition
•
Range camera: structured light, Microsot Kinect device
o RGB-D information. Indoor environments. Its range is from 0.8 to 6
metres
o Generation of point clouds during real time acquisition
16/79
17. 3D Representation GSOM
Experiments: Data
Experiments: Data acquisition
3D sensors used for experiments. From left to right: Sick laser unit
LMS-200, Time-Of-Flight SR4000 camera and Microsoft Kinect
Mobile robots used for experiments.
Left: Magellan Pro unit used
for indoors.
Right: PowerBot used for outdoors.
17/79
18. 3D Representation GSOM
Experiments: Data
Experiments: Data Sets
Some public data sets have been used to validate
the proposed method:
•
•
•
Well known Stanford 3D scanning repository. It
contains complete models that have been previously
processed (noise free)
Dataset captured using the Kinect sensor. Released by
the Computer Vision Laboratory of the University of
Bologna [Tombari et al., 2010a]
Own dataset obtained using three previously
mentioned 3D sensors
18/79
19. 3D Representation GSOM
Experiments: Data
Experiments: Data Sets
Blensor software: It allowed us to generate synthetic scenes and to
obtain partial views of the generated scene as if a Kinect device was
used
•
It provided us with ground truth information for experiments
Simulated scene
Simulated scene + Gaussian noise
19/79
20. 3D Representation GSOM
Experiments
Experiments
GNG method has been applied to 3D data
representation
•
•
Input space adaptation
Noise removal properties
Extensions of the GNG based algorithm
•
•
•
Colour-GNG
Sequences management
3D Surface reconstruction
20/79
21. 3D Representation GSOM
Exp: GNG 3D representation
Experiments: GNG 3D representation
Applying GNG to laser data
Applying GNG to Kinect data
Applying GNG to SR4000 data
21/79
22. 3D Representation GSOM
Exp: GNG 3D representation
Experiments: GNG 3D representation
Applying GNG to Kinect data
22/79
23. 3D Representation GSOM
Exp: Input space Adaptation
Experiments: Input space adaptation
GNG method obtains better adaptation to the input space than
other filtering methods like Voxel Grid technique
• Obtains lower adaptation (Mean Squared Error (MSE) )
• Tested on CAD models and simulated scenes
Lower error
Input space adaptation MSE for different models (metres). Voxel Grid versus
GNG. Numbers in bold provide the best results.
23/79
24. 3D Representation GSOM
Exp: Input space Adaptation
Experiments: Input space adaptation (II)
Noisy model σ = 0.4
GNG representation
Original CAD model
Voxel grid representation
Filtering quality using 10,000 nodes. GNG vs Voxel Grid comparison
24/79
25. 3D Representation GSOM
Exp: Normal estimation
Experiments: Normal estimation
Surface normals are important properties of a geometric surface,
and are heavily used in many areas such as computer vision and
computer graphics
Normal or curvature estimation can be affected by the presence of
noise
Normal estimation over noisy input data
Representation obtained using the GNG method allows to compute
more accurate normal information
25/79
26. 3D Representation GSOM
Exp: Input space Adaptation
Experiments: Normal estimation (II)
Top:. Normal estimation on a filtered point cloud produced by the
GNG method. Bottom: Normal estimation on a raw point cloud.
Normals are considered more stable as their distribution is
smooth and also they have less abrupt changes in their directions
26/79
27. 3D Representation GSOM
Extensions: Colour-GNG
Extension: Colour-GNG
Modern 3D sensors provide colour information (e.g.
Kinect, Carmine, Asus Xtion, … )
GNG is extended considering colour information during
the learning step
•
•
•
•
Input data is defined in ℝ 𝒅 where d = 6
Colour information is considered during the weight adaptation
step but it was not included in the CHL (winning neurons)
process
o We are still focus on topology preservation
Winning neuron step only compute Euclidean distance using
x,y,z components
No post-processing steps are required as neurons’ colour is
obtained during the learning process
27/79
28. 3D Representation GSOM
Extensions: Colour-GNG
Extension: Colour-GNG (II)
(a),(b),(c) show original point clouds. (d),(e),(f) show downsampled point
clouds using the proposed method
28/79
29. 3D Representation GSOM
Extensions: Colour-GNG
Extension: Colour-GNG (III)
Mario figure is down-sampled using the Colour-GNG method
Results are similar to those obtained with the colour
interpolation post-processing step
29/79
30. 3D Representation GSOM
Extensions: Sequences
Extension: Sequences management
Extension of the GNG for
processing sequences of
point clouds
It is not required to restart
the learning
It provides a speed-up in
the runtime as neurons are
kept between point clouds
This extension was
applied in a mobile
robotics application
An improved workflow to manage point
cloud sequences using the GNG algorithm
30/79
31. 3D Representation GSOM
Extension: 3D Reconstruction
Extension: 3D Surface Reconstruction
Three-dimensional surface reconstruction is not
considered in the original GNG algorithm as it only
generates wire-frame models
[Holdstein and Fischer, 2008, Do Rego et al., 2010,
Barhak, 2002] have already considered the creation of 3D
triangular faces modifying the original GNG algorithm
•
Post-processing steps are required for avoid gaps and holes in
the final mesh
We extended the CHL developing a method able to
produce full 3D meshes during the learning stage
•
•
No post-processing steps are required
A new learning scheme was developed
31/79
32. 3D Representation GSOM
Extension: 3D Reconstruction
Extension: 3D Surface Reconstruction (II)
Avoid non-manifold and
overlapping edges
•
•
More than 2 neighbours
it is checked if the face to
be created already exist
A face is created whenever
the already existing edges
or the new ones form a
triangle
The neuron insertion
process was also modified
Considered situations for edge and face creation
during the extended CHL
32/79
33. 3D Representation GSOM
Extension: 3D Reconstruction
Extension: 3D Surface Reconstruction (III)
Left: The triangle formed by these 3
neurons is close to a right triangle
Right: The edge connecting s1 and ni
is removed as the angle formed
Edge removal constraint based on the Tales sphere
Left: neuron insertion between the
neuron q with highest error and its
neighbour f with highest error.
Right: four new triangles and two
edges are created considering r, q
and f.
Face creation during the insertion of new neurons
33/79
34. 3D Representation GSOM
Extension: 3D Reconstruction
Extension: 3D Surface Reconstruction (IV)
Different views of reconstructed models using an existing GNGbased method [Do Rego et al., 2010] for surface reconstruction
Post-processing steps were avoided causing gaps and holes in
the final 3D reconstructed models
34/79
35. 3D Representation GSOM
Extension: 3D Reconstruction
Extension: 3D Surface Reconstruction (V)
Reconstructed models using our extended GNG method for face
reconstruction and without applying post-processing steps
35/79
36. 3D Representation GSOM
Extension: 3D Reconstruction
Extension: 3D Surface Reconstruction (VI)
Top: 3D model of a person
(Kinect sensor).
Bottom: digitized foot. (foot
digitizer)
Left: Noisy point clouds captured
using the Kinect sensor.
Right: 3D reconstruction using the
proposed method.
36/79
37. Index
Introduction
3D Representation using Growing Self-Organizing Maps
Improving keypoint detection from noisy 3D observations
•
•
•
•
Review
Improving keypoint detection
Correspondences matching
Results
GPGPU Parallel Implementations
Applications
Conclusions
37/79
38. Improving Keypoint detection
Review
Review
Filtering and down-sampling have become essential steps in
3D data processing
General System Overview
Motivation: dealing with noisy
data obtained from 3D sensors
as the Microsoft Kinect or lasers
Result: Improving 3D
keypoint detection and
therefore registration
problem
We propose the use of the GNG algorithm for downsampling
and filtering 3D data
Beneficial
attributes will be demonstrated through the 3D
registration problem
38/79
40. Improving Keypoint detection
Keypoint detection
3D Keypoint detection
Applying keypoint detection algorithms to filtered point
clouds
State-of-the-art 3D keypoint detectors
• Different techniques are used to test and measure the
improvement achieved using GNG method to filter
and downsample input data
40/79
41. Improving Keypoint detection
Keypoint detection
3D Keypoint detection (II)
3D Keypoint detectors
• SIFT3D:
using depth as the intensity value in the
original SIFT algorithm
• Harris3D: use surface normals of 3D points
• Tomasi3D: performs eigenvalue decomposition over
covariance matrix
• Noble3D: evalutes the ratio between the determinant
and the trace of the covariance matrix
41/79
42. Improving Keypoint detection
Feature descriptors
3D Feature descriptors
Feature descriptors are calculated over detected keypoints
to perform feature matching
•
•
FPH and FPFH: based on an histogram of the differences of angles
between the normals of the neighbour points
SHOT and CSHOT: a spherical grid centered on the point divides
the neighbourhood so that in each grid bin a weighted histogram
of normals is obtained
FPFH
CSHOT
42/79
43. Improving Keypoint detection
Feature matching
Feature matching (II)
Correspondences between keypoints are validated
through RANSAC algorithm, rejecting those
inconsistent correspondences
43/79
44. Improving Keypoint detection
Results
Results: Feature matching
Correspondences
matching computed on
different input data
Top: raw point clouds
Middle: reduced
representation using the
GNG (20, 000 neurons)
Bottom: reduced
representation using the
GNG (10, 000 neurons)
RANSAC is used to
reject wrong matches
Raw 3D data
GNG 20,000 nodes
GNG 10,000 nodes
44/79
45. Improving Keypoint detection
Results
Results: Transformation errors
Lowest max errors
Lowest transformation error
*1
Mean, median, minimum and maximum RMS*2 errors of the estimated
transformations using different keypoint detectors. (metres).
*1 Uniform Sampling
*2 Root Mean Square - Transformation error
45/79
46. Index
Introduction
3D Representation using Growing Self-Organizing Maps
Improving keypoint detection from noisy 3D observations
GPGPU Parallel Implementations
•
•
•
•
Graphics Processing Unit
GPU-based implementation of the GNG algorithm
GPU-based tensor extraction algorithm
Conclusions
Applications
Conclusions
46/79
47. GPGPU Implementations
GPUs
Graphics Processing Unit
GPUs have democratized High Performance
Computing (HPC)
• Massively parallel processors on a commodity PC
• Great ratio FLOP/€ compared with other solutions
However, this is not for free
• New programming model
• Algorithms need to be re-thought and re-implemented
Growing Neural Gas algorithm is
computationally expensive
• Most computer vision applications are time-constrained
• A GPGPU implementation is proposed
47/79
48. GPGPU Implementations
GPUs
Graphics Processing Unit (II)
More transistors for data processing
GPU are comprised of streaming multi-processors
High GPU Memory bandwidth
GPGPU: General Purpose computing on Graphics Processors Units
Key hardware feature is that the cores are SIMT
•
Single instruction multiple threads
G80 CUDA NVIDIAs Architecture
48/79
49. GPGPU Implementations
GPU Implementation GNG
GPU Implementation GNG
Stages of the GNG algorithm
that are highly parallelizable
•
•
•
•
Calculate distance to neurons for
every pattern
Search winning neurons
Delete neurons and edges
Search neuron with max error
Other improvements
•
•
Avoid memory transfers between
CPU and GPU
Hierarchy of memories
Highly parallelizable
stages
49/79
50. GPGPU Implementations
GPU Implementation GNG
Parallel Min/Max Reduction
A parallel Min/Max reduction
that computes the Min/Max of
large arrays of values (Neurons)
Strategy used to find Min/Max
winning neurons
Reduce linear computational
cost n of the sequential version
to the logarithmic cost log(n)
Provides better performance for
a large number of neurons
Example of Parallel Reduction Algorithm
Proposed version:
2MinParallelReduction
•
Extended version to obtain 2
minimum values in the same
number of steps
50/79
51. GPGPU Implementations
Experimental Setup
Experimental setup
Main GNG Parameters:
•
•
~0-20,000 neurons and a maximum λ (entries per iteration) of
1,000-2,000
Others parameters have been fixed based on previous works
[García-Rodriguez et al., 2012]
o єw = 0.1 , єn = 0.001
o amax = 250, α = 0.5 , β = 0.0005
Hardware
• GPUs: CUDA capable devices used in experiments
•
CPU: single thread and multiple thread implementations were
tested
o
Intel Core i3 540 3.07Ghz
51/79
53. GPGPU Implementations
Exp: GNG Runtime
Experiments: GNG learning runtime
GPU and CPU GNG runtime, and speed-up for different devices
53/79
54. GPGPU Implementations
Exp: Hybrid version
Experiments: Hybrid version
CPU implementation was faster for small network sizes
•
•
We developed an hybrid implementation
GPU version automatically starts computing when it is detected that
computing time is lower than the one obtained by the CPU
Example of CPU and Hybrid GNG runtime for different devices
54/79
55. GPGPU Implementations
GPU Feature extraction
GPU-based Tensor extraction algorithm
Time-constrained 3D feature extraction
• Most feature descriptors cannot be computed online due to their
high computational complexity
o 3D Tensor - [Mian et al., 2006b]
o Geometric Histogram - [Hetzel et al., 2001]
•
•
•
Highly parallelizable
Geometrical properties
Invariant to linear
transformations
o Spin Images - [Andrew Johnson, 1997]
An accelerated GPU-based implementation of an
existing 3D feature extraction algorithm is proposed
•
Accelerate entire pipeline of RGB-D based computer vision
systems
55/79
56. GPGPU Implementations
GPU Feature extraction
GPU-based Tensor extraction algorithm (II)
The surface area of the mesh intersecting each bin of the grid is the value
of the tensor element
As many threads as voxels are launched in parallel where each GPU
thread represent a voxel (bin) of the grid
Each thread computes the area of intersection between the mesh and its
corresponding voxel using Sutherland Hodgman’s polygon clipping
algorithm. [Foley et al., 1990]
56/79
58. Index
Introduction
3D Representation using Growing Self-Organizing Maps
Improving keypoint detection from noisy 3D observations
GPGPU Parallel Implementations
Applications
•
•
•
Robotics
Computer Vision
CAD/CAM
Conclusions
58/79
59. Applications
Exp: performance
Applications
Different cases of study where the GNG-based method
proposed in this PhD thesis was applied to different
areas
•
Robot Vision
o
•
Computer Vision
o
•
6DoF Pose Registration
3D object recognition under cluttered conditions
CAD/CAM
o
Rapid Prototyping in Shoe Last Manufacturing
59/79
60. Applications
Robotics
Robotics: 6DoF pose registration
The main goal of this application is to perform six degrees
of freedom (6DoF) pose registration in semi-structured
environments
•
We combined our accelerated GNG-based algorithm with
the method proposed in [Viejo and Cazorla, 2013]
•
Man-made indoor and outdoor environments
Planar patches extraction
It provides a good starting point for Simultaneous Location
and Mapping (SLAM)
GNG was applied directly to raw 3D data
60/79
61. Applications
Robotics
Robotics: 6DoF pose registration (II)
Without GNG
GNG
Left: planar patches extracted from SR4000 camera
Right: filtered data using the GNG network: more planar patches are
extracted
61/79
62. Applications
Robotics
Robotics: 6DoF pose registration (III)
Robot trajectory
Without GNG
GNG
Planar based 6DoF pose registration results
Left image shows map building results without using GNG while the
results shown on the Right are obtained after computing a GNG mesh
62/79
63. Applications
3D Object recognition
3D Object Recognition
The main goal of this
application is the recognition
of objects under time
constraints and cluttered
conditions
The GPU-based of the semilocal surface feature (tensor)
is successfully used to
recognize objects in cluttered
scenes
A library of models is
constructed offline, storing all
extracted 3D tensors in an
efficient way using a hash table
•
Multiple views
63/79
64. Applications
3D Object recognition
3D Object Recognition (II)
Object recognition is
performed on scenes
with different level of
occlusion
Objects are occluded by
other objects stored and
non-stored in the library
The averaged
recognition rate was
84%, wrong matches
16% and false negatives
0%
64/79
65. Applications
3D Object recognition
3D Object Recognition (III)
GPU-based 3D feature implementation is successfully used in a 3D object
recognition application
Parallel matching is performed on the GPU: correlation function
Implemented prototype took around 800 ms with a GPU implementation to
perform 3D object recognition of the entire scene
Scene 2
65/79
66. Applications
CAD: Rapid Prototyping
Rapid Prototyping in Shoe Last Manufacturing
With
the advent of CAD/CAM and rapid acquisition
devices it is possible to digitize old raised shoe lasts for
reusing them in the shoe last design software
Process to reconstruct existing shoe lasts and computing
topology preservation error regard the original CAD design
66/79
67. Applications
CAD: Rapid Prototyping
Rapid Prototyping in Shoe Last Manufacturing (II)
The main goal of this research is to obtain a grid of points that is
adapted to the topology of the footwear shoe last from a sequence
of sections with disorganized points acquired by sweeping an
optical laser digitizer
Typical sequence of sections of a shoe last.
Noisy data obtained from the digitizer
67/79
68. Applications
CAD: Rapid Prototyping
Rapid Prototyping in Shoe Last Manufacturing (III)
Voxel Grid versus GNG: Mean error
along different sections of the shoe last
68/79
69. Applications
CAD: Rapid Prototyping
Rapid Prototyping in Shoe Last Manufacturing (IV)
Input space
GNG nodes
VG nodes
GNG vs VG topological preservation comparison
3D Reconstruction GNG
3D Reconstruction VG
69/79
70. Index
Introduction
3D Representation using Growing Self-Organizing Maps
Improving keypoint detection from noisy 3D observations
GPGPU Parallel Implementations
Applications
Conclusions
•
•
•
Contributions
Future work
Publications
70/79
71. Conclusions
Contributions
Contributions
Contributions made in the topic of research:
•
Proposal of a new method to create compact, reduced and
efficient 3D representations from noisy data
‐
Development of a GNG-based method capable to deal with
different sensors
‐
‐
‐
‐
Extension of the GNG algorithm to consider colour information
‐
GPU-based implementation to accelerate the learning process of
the GNG and NG algorithms.
‐
An hybrid implementation of the GNG algorithm that takes
advantage of the CPU and GPU processors
Extension of the GNG algorithm for 3D surface reconstruction
Sequences management
Integration of the proposed method in 3D keypoint detection
algorithms improving their performance
71/79
72. Conclusions
•
Contributions
Contributions (II)
Integration of 3D data processing algorithms in complex
computer vision systems:
‐
‐
Point cloud triangulation has been ported to the GPU
accelerating its runtime
‐
•
Normal estimation has been ported to the GPU considerably
decreasing its runtime
A GPU time-constrained implementation of a 3D feature
extraction algorithm
Application of the proposed method in various real
computer vision applications:
‐
‐
Robotics: Localization and mapping: 6DoF pose registration
Computer vision: 3D object recognition under cluttered
conditionsCAD/CAM: rapid prototyping in shoe last
manufacturing
72/79
73. Conclusions
Future work
Future work
Other improvements on the GPU implementation of the
GNG algorithm:
•
•
•
•
Using multi-GPU to manage several neural networks
simultaneously
Distributed computing
Testing new architectures: Intel Xeon Phi [Fang et al., 2013a]
Generating random patterns using GPU
More applications of the accelerated GNG algorithm will be
studied in the future
•
•
Clustering multi-dimensional data: Big Data
Medical Image Reconstruction
Extension of the real-time implementation of the 3D tensor
•
•
Visual features extracted from RGB information
Improve implicit keypoint detector used by the 3D tensor
73/79
74. Conclusions
Publications
Publications
•
4 JCR Journal papers
o
“Real-time 3D semi-local surface patch extraction using GPGPU”
S. Orts-Escolano, V. Morell, J. Garcia-Rodriguez, M. Cazorla, R.B. Fisher; Journal of
Real-Time Image Processing. December 2013; ISSN: 1861-8219; Impact Factor: 1.156
(JCR 2012)
o
“GPGPU implementation of growing neural gas: Application to 3D
scene reconstruction”
S. Orts, J. García Rodríguez, D. Viejo, M. Cazorla, V. Morell; J.Parallel Distrib. Comput.
72(10); pp: 1361-1372 (2012); ISSN: 0743-7315; ImpactFactor: 1.135 (JCR 2011)
o
“3D-based reconstruction using growing neural gas landmark:
application to rapid prototyping in shoe last manufacturing”
A. Jimeno-Morenilla, J. García-Rodriguez, S. Orts-Escolano, M. Davia-Aracil; The
International Journal of Advanced Manufacturing Technology: May 2013. Vol 69. pp:
657-668; ISSN: 0268-3768; Impact Factor: 1.205 (JCR 2012)
o
“Autonomous Growing Neural Gas for applications with time
constraint: Optimal parameter estimation”
J. García Rodríguez, A. Angelopoulou, J. M. García Chamizo, A. Psarrou, S. OrtsEscolano, V. Morell-Giménez; Neural Networks 32: pp: 196-208 (2012), ISSN: 08936080; Impact Factor: 1.927 (JCR 2012)
74/79
75. Conclusions
Publications
Publications (II)
•
International conferences
o
“Point Light Source Estimation based on Scenes Recorded by a RGB-D
camera”
B. Boom, S. Orts-Escolano, X. Ning, S. McDonagh, P. Sandilands, R.B. Fisher; British
Machine Vision Conference, BMVC 2013, Bristol, UK. Rank B
o
“Point Cloud Data Filtering and Downsampling using Growing Neural
Gas”
S. Orts-Escolano, V. Morell, J. Garcia-Rodriguez and M. Cazorla; International Joint
Conference on Neural Networks, IJCNN 2013, Dallas, Texas. Rank A
o
“Natural User Interfaces in Volume Visualisation Using Microsoft
Kinect”
A. Angelopoulou, J. García Rodríguez, A.Psarrou, M. Mentzelopoulos, B. Reddy, S. OrtsEscolano, J.A. Serra. International Conference on Image Analysis and Processing,
ICIAP2013, Naples, Italy: 11-19. Rank B
o
“Improving Drug Discovery using a neural networks based parallel
scoring functions”
H. Perez-Sanchez, G. D. Guerrero, J. M. Garcia, J. Pena, J. M. Cecilia, G. Cano, S. OrtsEscolano and J. Garcia-Rodriguez. International Joint Conference on Neural Networks,
IJCNN 2013, Dallas, Texas. Rank A
75/79
76. Conclusions
Publications
Publications (III)
•
International conferences
o
“Improving 3D Keypoint Detection from Noisy Data Using Growing
Neural Gas”
J. García Rodríguez, M. Cazorla, S. Orts-Escolano, V. Morell. International Work-Conference
on Artificial Neural Networks, IWANN 2013, Puerto de la Cruz, Tenerife, Spain: 480-487.
Rank B
o
“3D Hand Pose Estimation with Neural Networks”
J. A. Serra, J. García Rodríguez, S. Orts-Escolano, J. M. García Chamizo, A. Angelopoulou, A.
Psarrou, M. Mentzelopoulos, J. Montoyo-Bojo, E. Domínguez. International WorkConference on Artificial Neural Networks, IWANN 2013, Puerto de la Cruz, Tenerife, Spain:
504-512. Rank B
o
“3D Gesture Recognition with Growing Neural Gas”
J. A. Serra-Perez, J. Garcia-Rodriguez, S. Orts-Escolano, J. M. Garcia-Chamizo, A.
Angelopoulou, A. Psarrou, M. Mentzeopoulos, J. Montoyo Bojo. International Joint
Conference on Neural Networks. IJCNN 2013, Dallas, Texas. Rank A
o
“Multi-GPU based camera network system keeps privacy using
Growing Neural Gas”
S. Orts-Escolano, J. García Rodríguez, V. Morell, J.Azorín López, J. M. García Chamizo.
International Joint Conference on Neural Networks (IJCNN) 2012, Brisbane, Australia, June:
1-8. Rank A
76/79
77. Conclusions
Publications
Publications (IV)
•
International conferences
o
“A study of registration techniques for 6DoF SLAM”
V. Morell, M.Cazorla, D. Viejo, S. Orts-Escolano, J. García Rodríguez. International
Conference of the Catalan Association for Artificial Intelligence, CCIA 2012, University of
Alacant, Spain: 111-120. Rank B
o
“Fast Autonomous Growing Neural Gas”
J. García Rodríguez, A. Angelopoulou, J. M. García Chamizo, A. Psarrou, S. Orts, V. Morell.
International Joint Conference on Neural Networks, IJCNN 2011, San Jose, California:
725-732. Rank A
o
“Fast Image Representation with GPU-Based Growing Neural Gas”
J.García Rodríguez, A. Angelopoulou, V. Morell, S. Orts, A. Psarrou, J. M. García Chamizo.
International Work-Conference on Artificial Neural Networks, IWANN 2011,
Torremolinos-Málaga, Spain: 58-65. Rank B
o
“Video and Image Processing with Self-Organizing Neural Networks”
J. García Rodríguez, E. Domínguez, A. Angelopoulou, A. Psarrou, F. J. Mora-Gimeno, S.
Orts, J. M. García Chamizo. International Work-Conference on Artificial Neural Networks,
IWANN 2011, Torremolinos-Málaga, Spain: 98-104. Rank B
77/79
78. Conclusions
Publications
Publications (V)
•
National conferences
o
“Procesamiento de múltiples flujos de datos con Growing Neural Gas
sobre Multi-GPU”
S. Orts-Escolano, J. García-Rodríguez, V. Morell-Giménez. Jornadas de Paralelismo JP,
Elche, España, 2012
•
Book chapters
o
“A Review of Registration Methods on Mobile Robots”
V. Morell-Gimenez, S. Orts-Escolano, J. García Rodríguez, M. Cazorla, D. Viejo. Robotic
Vision: Technologies for Machine Learning and Vision Applications. IGI GLOBAL
o
“Computer Vision Applications of Self-Organizing Neural Networks”
J. García-Rodríguez, J. M. García-Chamizo, S. Orts-Escolano, V. Morell-Gimenez,
J.Serra-Perez, A. Angelolopoulou, M. Cazorla, D. Viejo. Robotic Vision: Technologies
for Machine Learning and Vision Applications. IGI GLOBAL
•
Poster presentations
o
“6DoF pose estimation using Growing Neural Gas Network”
S.Orts, J. Garcia-Rodriguez, D. Viejo, M. Cazorla, V. Morell, J. Serra. 5th International
Conference on Cognitive Systems, Cogsys 2012, TU Vienna, Austria
o
“GPU Accelerated Growing Neural Gas Network”
S. Orts, J. Garcia, V.Morell. Programming and Tuning Massively Parallel Systems, PUMPS
2011,Barcelona, Spain. (Honorable Mention by NVIDIA)
78/79
79. This presentation is licensed under a Creative Commons AttributionNonCommercial-ShareAlike 4.0 International License.
79/79
80. A Three-Dimensional
Representation method for
Noisy Point Clouds based on
Growing Self-Organizing Maps
accelerated on GPUs
Author:
Sergio Orts Escolano
Supervisors: Dr. José García Rodríguez
Dr. Miguel Ángel Cazorla Quevedo
Doctoral programme in technologies for the information society