Deep learning has made great strides for problems that can be learned with direct supervision, in which a large dataset with high-quality annotation is provided (e.g., ImageNet). However, collecting such a large dataset is expensive and time-consuming. In this talk, I will discuss our recent works on learning from computer simulation to tackle real-world problems. I will first present our novel Simulated+Unsupervised (S+U) domain adaptation algorithm, which fully leverages the flexibility of data simulators and bidirectional mappings between synthetic and real data. We show that our approach achieves the improved performance on the gaze estimation task, outperforming the prior approach by Shrivastava et al. Next, I will introduce our work on building data-driven vehicle collision prediction algorithms. Today’s vehicle collision prediction algorithms are rule-based and have not benefited from the recent developments in deep learning. This is because it is almost impossible to collect a large amount of collision data from the real world. To address this challenge, we collect a large accident dataset using a popular video game named GTA and train end-to-end classification algorithms which identify dangerous objects from a given image. Furthermore, we devise a novel domain adaptation method with which we can further improve the performance of our algorithm in the real-world.
본 논문은 single depth map으로부터의 정확한 3D hand pose estimation을 목표로 한다. 3D hand pose estimation은 HCI, AR등의 기술을 구현함에 있어서 매우 중요한 기술이다. 이를 위해 많은 연구자들이 정확도를 높이기 위해 여러 방법을 제시하였지만, 여전히 손가락들의 비슷한 생김새, 가려짐, 다양한 손가락의 움직임으로 인한 복잡성 때문에 정확도를 올리는데 한계가 있었다. 본 논문은 기존 방법들의 한계를 극복하기 위해 기존 방법들이 사용하는 입력 형태와 출력 형태를 바꾸었다. 2d depth image를 입력으로 받아 hand joint의 3D coordinate를 직접 regress하는 대부분의 기존 방법들과는 달리, 제안하는 모델은 3D voxelized depth map을 입력으로 받아 3D heatmap을 출력한다. 이를 위해 encoder-decoder 형식의 3D CNN을 사용하였고, 달라진 입력과 출력 형태로 인해 제안하는 모델은 널리 사용되는 3개의 3d hand pose estimation dataset, 1개의 3d human pose estimation dataset에서 가장 높은 성능을 내었다. 또한 ICCV 2017에서 주최된 HANDS 2017 challenge에서 우승 하였다.
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of MapsNAVER Engineering
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.
Lec-07: Feature Aggregation and Image Retrieval System [notes]
Image retrieval system performance metrics, precision, recall, true positive rate, false positive rate; Bag of Words (BoW) and VLAD aggregation.
Digital Image Processing: Image Enhancement in the Spatial DomainMostafa G. M. Mostafa
This document discusses various image enhancement techniques in the spatial domain, including point operations, histogram equalization, and spatial filtering. Point operations include transformations like thresholding, negatives, power-law and gamma corrections that manipulate individual pixel intensities. Histogram equalization improves contrast by spreading out the most frequent intensity values. Spatial filtering techniques like smoothing, sharpening and edge detection use small filters to modify pixel values based on neighboring areas.
This document outlines an assignment for a computer vision course. Students are asked to implement 4 vision algorithms: 2 using OpenCV and 2 using MATLAB. The algorithms are the log-polar transform, background subtraction, histogram equalization, and contrast stretching. Students must also answer 3 short questions about orthographic vs perspective projection, efficient filtering, and sensors beyond cameras for computer vision.
1. The document presents an approach to enhance the realism of synthetic images rendered by game engines. A convolutional network is trained to modify rendered images using intermediate representations from the rendering process.
2. The network is trained with an adversarial objective to provide strong supervision at multiple perceptual levels. A new strategy is proposed for sampling image patches during training to address differences in scene layout distributions between datasets.
3. The approach significantly enhances photorealism over recent image-to-image translation methods and baselines, as shown in controlled experiments. It can add realistic details like gloss, vegetation, and road textures while keeping enhancements consistent with the input image content.
본 논문은 single depth map으로부터의 정확한 3D hand pose estimation을 목표로 한다. 3D hand pose estimation은 HCI, AR등의 기술을 구현함에 있어서 매우 중요한 기술이다. 이를 위해 많은 연구자들이 정확도를 높이기 위해 여러 방법을 제시하였지만, 여전히 손가락들의 비슷한 생김새, 가려짐, 다양한 손가락의 움직임으로 인한 복잡성 때문에 정확도를 올리는데 한계가 있었다. 본 논문은 기존 방법들의 한계를 극복하기 위해 기존 방법들이 사용하는 입력 형태와 출력 형태를 바꾸었다. 2d depth image를 입력으로 받아 hand joint의 3D coordinate를 직접 regress하는 대부분의 기존 방법들과는 달리, 제안하는 모델은 3D voxelized depth map을 입력으로 받아 3D heatmap을 출력한다. 이를 위해 encoder-decoder 형식의 3D CNN을 사용하였고, 달라진 입력과 출력 형태로 인해 제안하는 모델은 널리 사용되는 3개의 3d hand pose estimation dataset, 1개의 3d human pose estimation dataset에서 가장 높은 성능을 내었다. 또한 ICCV 2017에서 주최된 HANDS 2017 challenge에서 우승 하였다.
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of MapsNAVER Engineering
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.
Lec-07: Feature Aggregation and Image Retrieval System [notes]
Image retrieval system performance metrics, precision, recall, true positive rate, false positive rate; Bag of Words (BoW) and VLAD aggregation.
Digital Image Processing: Image Enhancement in the Spatial DomainMostafa G. M. Mostafa
This document discusses various image enhancement techniques in the spatial domain, including point operations, histogram equalization, and spatial filtering. Point operations include transformations like thresholding, negatives, power-law and gamma corrections that manipulate individual pixel intensities. Histogram equalization improves contrast by spreading out the most frequent intensity values. Spatial filtering techniques like smoothing, sharpening and edge detection use small filters to modify pixel values based on neighboring areas.
This document outlines an assignment for a computer vision course. Students are asked to implement 4 vision algorithms: 2 using OpenCV and 2 using MATLAB. The algorithms are the log-polar transform, background subtraction, histogram equalization, and contrast stretching. Students must also answer 3 short questions about orthographic vs perspective projection, efficient filtering, and sensors beyond cameras for computer vision.
1. The document presents an approach to enhance the realism of synthetic images rendered by game engines. A convolutional network is trained to modify rendered images using intermediate representations from the rendering process.
2. The network is trained with an adversarial objective to provide strong supervision at multiple perceptual levels. A new strategy is proposed for sampling image patches during training to address differences in scene layout distributions between datasets.
3. The approach significantly enhances photorealism over recent image-to-image translation methods and baselines, as shown in controlled experiments. It can add realistic details like gloss, vegetation, and road textures while keeping enhancements consistent with the input image content.
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Alejandro Franceschi
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight Portraits for Background Replacement
Abstract:
Given a portrait and an arbitrary high dynamic range lighting environment, our framework uses machine learning to composite the subject into a new scene, while accurately modeling their appearance in the target illumination condition. We estimate a high quality alpha matte, foreground element, albedo map, and surface normals, and we propose a novel, per-pixel lighting representation within a deep learning framework.
This document provides an overview of digital image fundamentals and operations. It defines what a digital image is, how it is represented as a matrix, and common image types like RGB, grayscale, and binary. Pixels, resolution, neighborhoods, and basic relationships between pixels are discussed. The document also covers different types of image operations including point, local, and global operations as well as examples like arithmetic, logical, and geometric transformations. Finally, it introduces concepts of linear and nonlinear operations and announces the topic of the next lecture on image enhancement in the spatial domain.
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Yusuke Uchida
Recently, the Fisher vector representation of local features has attracted much attention because of its effectiveness in both image classification and image retrieval. Another trend in the area of image retrieval is the use of binary feature such as ORB, FREAK, and BRISK. Considering the significant performance improvement in terms of accuracy in both image classification and retrieval by the Fisher vector of continuous feature descriptors, if the Fisher vector were also to be applied to binary features, we would receive the same benefits in binary feature based image retrieval and classification. In this paper, we derive the closed-form approximation of the Fisher vector of binary features which are modeled by the Bernoulli mixture model. In experiments, it is shown that the Fisher vector representation improves the accuracy of image retrieval by 25% compared with a bag of binary words approach.
Lec-17: Sparse Signal Processing & Applications [notes]
Sparse signal processing, recovery of sparse signal via L1 minimization. Applications including face recognition, coupled dictionary learning for image super-resolution.
This document provides a survey of single scalar point multiplication algorithms for elliptic curves over prime fields. It discusses the background of elliptic curve cryptography and point multiplication. Point multiplication is the dominant operation in ECC and can be computed using on-the-fly techniques or precomputation if the point is fixed. The efficiency of point multiplication depends on the recoding method used to represent the scalar and the composite elliptic curve operations employed. Various recoding methods and point multiplication algorithms are analyzed, including binary, signed binary using NAF representation, and window methods.
FORGERY (COPY-MOVE) DETECTION IN DIGITAL IMAGES USING BLOCK METHODeditorijcres
AKHILESH KUMAR YADAV, DEENBANDHU SINGH, VIVEK KUMAR
Department of Computer Science and Engineering
Babu Banarasi Das University, Lucknow
akhi2232232@gmail.com, deenbandhusingh85@gmail.com, vivek.kumar0091@gmail.com
ABSTRACT- Digital images can be easily modified using powerful image editing software. Determining whether a manipulation is innocent of sharpening from those which are malicious, such as removing or adding parts to an image is the topic of this paper. In this paper we focus on detection of a special type of forgery-the Copy-Move forgery, in this part of the original image is copied moved to desired location in the same image and pasted. The proposed method compress images using DWT (discrete wavelet transform) and divided into blocks and choose blocks than perform feature vector calculation and lexicographical sorting and duplicated blocks are identified after sorting. This method is good at some manipulation/attack likes scaling, rotation, Gaussian noise, smoothing, JPEG compression etc.
INDEX TERMS- Copy-Move forgery, Wavelet Transform, Lexicographical Sorting, Region Duplication Detection.
An adaptive-model-for-blind-image-restoration-using-bayesian-approachCemal Ardil
This document describes an adaptive model for blind image restoration using a Bayesian approach. It discusses image degradation and restoration techniques such as inverse filtering, Kalman filtering, and regularization. It then introduces Bayesian inference for image processing, describing how the posterior distribution of an image can be estimated given observed noisy data. The paper formulates the image restoration problem using a Bayesian model, where the observed image is considered the sum of the true image and noise. It describes estimating the true image distribution given the observed data by calculating the relative probabilities of possible true images. The goal is to find the most suitable estimate of the true image to enhance a degraded observed image and reduce blur and noise.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
This document reviews different techniques for thinning images, including the Zhang and Suen algorithm and neural networks. It provides an overview of existing thinning approaches, such as iterative algorithms, and proposes a new approach using neural networks. The proposed approach aims to perform thinning invariant to rotations while being less sensitive to noise than existing methods. It evaluates techniques based on execution time, thinning rate, and other performance measures. The document concludes that neural networks may provide better results than existing techniques in terms of metrics like PSNR and MSE, while also reducing execution time for skeletonization.
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...paperpublications3
This document discusses a proposed method for detecting number plates on images of fast moving vehicles that have been blurred due to motion. It begins with an introduction to image processing and digital images. It then discusses estimating the blur kernel caused by vehicle motion in order to model it as a linear uniform blur with parameters for angle and length. Existing related works on image deblurring are reviewed. The proposed system estimates the blur kernel parameters using sparse representation and Radon transform methods, allows deblurring the image, and then uses artificial neural networks to identify numbers and characters in the deblurred image. The system is evaluated on real blurred images and shown to improve license plate recognition compared to previous methods.
In this deck, Pieter Abbeel from UC Berkeley describes his group research into making robots learn.
Watch the video: https://wp.me/p3RLHQ-hf7
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLINGsipij
Object tracking is one of the most important problems in computer vision. The aim of video tracking is to extract the trajectories of a target or object of interest, i.e. accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence. So, feature descriptors can have significant effects on such discrimination. In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e., object modeling, object detection and localization, and model updating. However, there are major improvements in our system. Our forth component, occlusion handling, utilizes the r-spatiogram to detect the best target candidate. While spatiogram contains some moments upon the coordinates of the pixels, r-spatiogram computes region-based compactness on the distribution of the given feature in the image that captures richer features to represent the objects. The proposed research develops an efficient and robust way to keep tracking the object throughout video sequences in the presence of significant appearance variations and severe occlusions. The proposed method is evaluated on the Princeton RGBD tracking dataset considering sequences with different challenges and the obtained results demonstrate the effectiveness of the proposed method.
This document describes a novel image learning system that can recognize objects using Google Image Search. The system works as follows:
1. The user provides the name of an object and number of images to download from Google Image Search. The system downloads these images.
2. SIFT descriptors are extracted from the downloaded images and used to train a linear SVM classifier.
3. The trained classifier can then recognize that object in new images, adding them to the system's capabilities. Cache storage of features improves recognition speed on subsequent uses.
4. Experiments showed the system achieved high accuracy for distinctive objects but lower accuracy for similar objects, similar to human ability, demonstrating it can dynamically learn new objects from online images.
Performance analysis of transformation and bogdonov chaotic substitution base...IJECEIAES
In this article, a combined Pseudo Hadamard transformation and modified Bogdonav chaotic generator based image encryption technique is proposed. Pixel position transformation is performed using Pseudo Hadamard transformation and pixel value variation is made using Bogdonav chaotic substitution. Bogdonav chaotic generator produces random sequences and it is observed that very less correlation between the adjacent elements in the sequence. The cipher image obtained from the transformation stage is subjected for substitution using Bogdonav chaotic sequence to break correlation between adjacent pixels. The cipher image is subjected for various security tests under noisy conditions and very high degree of similarity is observed after deciphering process between original and decrypted images.
Abstract: Primarily due to the progresses in super resolution imagery, the methods of segment-based image analysis for generating and updating geographical information are becoming more and more important. This work presents a image segmentation based on colour features with K-means clustering. The entire work is divided into two stages. First enhancement of color separation of satellite image using de correlation stretching is carried out and then the regions are grouped into a set of five classes using K-means clustering algorithm. At first, the spatial data is concentrated focused around every pixel, and at that point two separating procedures are added to smother the impact of pseudoedges. What's more, the spatial data weight is built and grouped with k-means bunching, and the regularization quality in every district is controlled by the bunching focus esteem. The exploratory results, on both reenacted and genuine datasets, demonstrate that the proposed methodology can adequately lessen the pseudoedges of the aggregate variety regularization in the level.
Tracking and counting human in visual surveillance systemiaemedu
This document summarizes a proposed system for tracking and counting humans in a visual surveillance system. The system first performs background subtraction on input video frames to detect foreground objects. It applies this process to both grayscale and binary image formats, and selects the better-performing format. Features are then extracted from detected blobs, including size, centroid, and color. Tracking rectangles are drawn around whole detected objects to group separated body parts. Counting is done by detecting when pixel values change from background to foreground. Experimental results on videos with various challenges like shadows, illumination changes, and occlusions showed the system could accurately track and count humans, with near-perfect accuracy even for occluded or broken objects, though it introduced some minor errors.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
Artem Baklanov - Votes Aggregation Techniques in Geo-Wiki Crowdsourcing Game:...AIST
The document describes techniques used to improve the quality of crowdsourced data from the GEO-Wiki project. It discusses preprocessing steps like blur detection, duplicate detection, and vote aggregation algorithms. Blur detection removed 2% of low-quality images, while duplicate detection based on perceptual hashing removed 6% of redundant votes. Benchmarking algorithms on expert-annotated data showed that majority voting performed comparably to more complex algorithms when there were many accurate volunteers and few spammers. Preprocessing improved results by reducing workload and increasing statistical significance.
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Alejandro Franceschi
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight Portraits for Background Replacement
Abstract:
Given a portrait and an arbitrary high dynamic range lighting environment, our framework uses machine learning to composite the subject into a new scene, while accurately modeling their appearance in the target illumination condition. We estimate a high quality alpha matte, foreground element, albedo map, and surface normals, and we propose a novel, per-pixel lighting representation within a deep learning framework.
This document provides an overview of digital image fundamentals and operations. It defines what a digital image is, how it is represented as a matrix, and common image types like RGB, grayscale, and binary. Pixels, resolution, neighborhoods, and basic relationships between pixels are discussed. The document also covers different types of image operations including point, local, and global operations as well as examples like arithmetic, logical, and geometric transformations. Finally, it introduces concepts of linear and nonlinear operations and announces the topic of the next lecture on image enhancement in the spatial domain.
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Yusuke Uchida
Recently, the Fisher vector representation of local features has attracted much attention because of its effectiveness in both image classification and image retrieval. Another trend in the area of image retrieval is the use of binary feature such as ORB, FREAK, and BRISK. Considering the significant performance improvement in terms of accuracy in both image classification and retrieval by the Fisher vector of continuous feature descriptors, if the Fisher vector were also to be applied to binary features, we would receive the same benefits in binary feature based image retrieval and classification. In this paper, we derive the closed-form approximation of the Fisher vector of binary features which are modeled by the Bernoulli mixture model. In experiments, it is shown that the Fisher vector representation improves the accuracy of image retrieval by 25% compared with a bag of binary words approach.
Lec-17: Sparse Signal Processing & Applications [notes]
Sparse signal processing, recovery of sparse signal via L1 minimization. Applications including face recognition, coupled dictionary learning for image super-resolution.
This document provides a survey of single scalar point multiplication algorithms for elliptic curves over prime fields. It discusses the background of elliptic curve cryptography and point multiplication. Point multiplication is the dominant operation in ECC and can be computed using on-the-fly techniques or precomputation if the point is fixed. The efficiency of point multiplication depends on the recoding method used to represent the scalar and the composite elliptic curve operations employed. Various recoding methods and point multiplication algorithms are analyzed, including binary, signed binary using NAF representation, and window methods.
FORGERY (COPY-MOVE) DETECTION IN DIGITAL IMAGES USING BLOCK METHODeditorijcres
AKHILESH KUMAR YADAV, DEENBANDHU SINGH, VIVEK KUMAR
Department of Computer Science and Engineering
Babu Banarasi Das University, Lucknow
akhi2232232@gmail.com, deenbandhusingh85@gmail.com, vivek.kumar0091@gmail.com
ABSTRACT- Digital images can be easily modified using powerful image editing software. Determining whether a manipulation is innocent of sharpening from those which are malicious, such as removing or adding parts to an image is the topic of this paper. In this paper we focus on detection of a special type of forgery-the Copy-Move forgery, in this part of the original image is copied moved to desired location in the same image and pasted. The proposed method compress images using DWT (discrete wavelet transform) and divided into blocks and choose blocks than perform feature vector calculation and lexicographical sorting and duplicated blocks are identified after sorting. This method is good at some manipulation/attack likes scaling, rotation, Gaussian noise, smoothing, JPEG compression etc.
INDEX TERMS- Copy-Move forgery, Wavelet Transform, Lexicographical Sorting, Region Duplication Detection.
An adaptive-model-for-blind-image-restoration-using-bayesian-approachCemal Ardil
This document describes an adaptive model for blind image restoration using a Bayesian approach. It discusses image degradation and restoration techniques such as inverse filtering, Kalman filtering, and regularization. It then introduces Bayesian inference for image processing, describing how the posterior distribution of an image can be estimated given observed noisy data. The paper formulates the image restoration problem using a Bayesian model, where the observed image is considered the sum of the true image and noise. It describes estimating the true image distribution given the observed data by calculating the relative probabilities of possible true images. The goal is to find the most suitable estimate of the true image to enhance a degraded observed image and reduce blur and noise.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
This document reviews different techniques for thinning images, including the Zhang and Suen algorithm and neural networks. It provides an overview of existing thinning approaches, such as iterative algorithms, and proposes a new approach using neural networks. The proposed approach aims to perform thinning invariant to rotations while being less sensitive to noise than existing methods. It evaluates techniques based on execution time, thinning rate, and other performance measures. The document concludes that neural networks may provide better results than existing techniques in terms of metrics like PSNR and MSE, while also reducing execution time for skeletonization.
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...paperpublications3
This document discusses a proposed method for detecting number plates on images of fast moving vehicles that have been blurred due to motion. It begins with an introduction to image processing and digital images. It then discusses estimating the blur kernel caused by vehicle motion in order to model it as a linear uniform blur with parameters for angle and length. Existing related works on image deblurring are reviewed. The proposed system estimates the blur kernel parameters using sparse representation and Radon transform methods, allows deblurring the image, and then uses artificial neural networks to identify numbers and characters in the deblurred image. The system is evaluated on real blurred images and shown to improve license plate recognition compared to previous methods.
In this deck, Pieter Abbeel from UC Berkeley describes his group research into making robots learn.
Watch the video: https://wp.me/p3RLHQ-hf7
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLINGsipij
Object tracking is one of the most important problems in computer vision. The aim of video tracking is to extract the trajectories of a target or object of interest, i.e. accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence. So, feature descriptors can have significant effects on such discrimination. In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e., object modeling, object detection and localization, and model updating. However, there are major improvements in our system. Our forth component, occlusion handling, utilizes the r-spatiogram to detect the best target candidate. While spatiogram contains some moments upon the coordinates of the pixels, r-spatiogram computes region-based compactness on the distribution of the given feature in the image that captures richer features to represent the objects. The proposed research develops an efficient and robust way to keep tracking the object throughout video sequences in the presence of significant appearance variations and severe occlusions. The proposed method is evaluated on the Princeton RGBD tracking dataset considering sequences with different challenges and the obtained results demonstrate the effectiveness of the proposed method.
This document describes a novel image learning system that can recognize objects using Google Image Search. The system works as follows:
1. The user provides the name of an object and number of images to download from Google Image Search. The system downloads these images.
2. SIFT descriptors are extracted from the downloaded images and used to train a linear SVM classifier.
3. The trained classifier can then recognize that object in new images, adding them to the system's capabilities. Cache storage of features improves recognition speed on subsequent uses.
4. Experiments showed the system achieved high accuracy for distinctive objects but lower accuracy for similar objects, similar to human ability, demonstrating it can dynamically learn new objects from online images.
Performance analysis of transformation and bogdonov chaotic substitution base...IJECEIAES
In this article, a combined Pseudo Hadamard transformation and modified Bogdonav chaotic generator based image encryption technique is proposed. Pixel position transformation is performed using Pseudo Hadamard transformation and pixel value variation is made using Bogdonav chaotic substitution. Bogdonav chaotic generator produces random sequences and it is observed that very less correlation between the adjacent elements in the sequence. The cipher image obtained from the transformation stage is subjected for substitution using Bogdonav chaotic sequence to break correlation between adjacent pixels. The cipher image is subjected for various security tests under noisy conditions and very high degree of similarity is observed after deciphering process between original and decrypted images.
Abstract: Primarily due to the progresses in super resolution imagery, the methods of segment-based image analysis for generating and updating geographical information are becoming more and more important. This work presents a image segmentation based on colour features with K-means clustering. The entire work is divided into two stages. First enhancement of color separation of satellite image using de correlation stretching is carried out and then the regions are grouped into a set of five classes using K-means clustering algorithm. At first, the spatial data is concentrated focused around every pixel, and at that point two separating procedures are added to smother the impact of pseudoedges. What's more, the spatial data weight is built and grouped with k-means bunching, and the regularization quality in every district is controlled by the bunching focus esteem. The exploratory results, on both reenacted and genuine datasets, demonstrate that the proposed methodology can adequately lessen the pseudoedges of the aggregate variety regularization in the level.
Tracking and counting human in visual surveillance systemiaemedu
This document summarizes a proposed system for tracking and counting humans in a visual surveillance system. The system first performs background subtraction on input video frames to detect foreground objects. It applies this process to both grayscale and binary image formats, and selects the better-performing format. Features are then extracted from detected blobs, including size, centroid, and color. Tracking rectangles are drawn around whole detected objects to group separated body parts. Counting is done by detecting when pixel values change from background to foreground. Experimental results on videos with various challenges like shadows, illumination changes, and occlusions showed the system could accurately track and count humans, with near-perfect accuracy even for occluded or broken objects, though it introduced some minor errors.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
Artem Baklanov - Votes Aggregation Techniques in Geo-Wiki Crowdsourcing Game:...AIST
The document describes techniques used to improve the quality of crowdsourced data from the GEO-Wiki project. It discusses preprocessing steps like blur detection, duplicate detection, and vote aggregation algorithms. Blur detection removed 2% of low-quality images, while duplicate detection based on perceptual hashing removed 6% of redundant votes. Benchmarking algorithms on expert-annotated data showed that majority voting performed comparably to more complex algorithms when there were many accurate volunteers and few spammers. Preprocessing improved results by reducing workload and increasing statistical significance.
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
This document provides an unabridged review of supervised machine learning regression and classification techniques. It begins with an introduction to machine learning and artificial intelligence. It then describes regression and classification techniques for supervised learning problems, including linear regression, logistic regression, k-nearest neighbors, naive bayes, decision trees, support vector machines, and random forests. Practical examples are provided using Python code for applying these techniques to housing price prediction and iris species classification problems. The document concludes that the primary goal was to provide an extensive review of supervised machine learning methods.
Atari Game State Representation using Convolutional Neural Networksjohnstamford
I recently gave a talk to some MSc Machine Learning students at De Montfort University about the project I did for my MSc. The work included looking at feature extraction from game screens using the Arcade Learning Environment and Convolutional Neural Networks (CNN).
The work was planned to investigate if the costly nature Q-Learning could be offset by the use of a trained system using 'expert' data. The system uses the same technology as used by Deepmind in their 2013 paper.
An Automatic Attendance System Using Image processingtheijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Drones generate vast amounts of data, which is usually in the form of images or video streams. Identification of objects of interest, counting them, or detecting change over time, are some of the tasks that are monotonous and labor intensive.
FlytBase AI platform offers a complete solution to automate such tasks. It has been designed and optimised specifically for drone applications.
The document describes a student capstone project that analyzes traffic data through collecting tweets, traffic camera images, and weather data to provide users with a traffic analysis for Cincinnati. Various techniques are used like sentiment analysis of tweets, image processing of traffic cameras to count cars, and analyzing weather patterns. A web application was created to display the current traffic analysis on a scale and provide additional context to help users understand the reported traffic status.
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET Journal
This document proposes a multi-label neural network for road scene recognition in autonomous vehicles. It introduces a large-scale dataset called Driving Scene, which contains over 110,000 images across 52 road scene classes. The challenges of this dataset include multi-class prediction, data imbalance with varying image resolutions.
The proposed neural network architecture incorporates both single-label and multi-label classification to address data imbalance. It utilizes a deep data integration strategy based on AdaBoost to focus training on minority classes and misclassified samples. Additionally, lane detection and lane departure warning systems are included to provide more context for autonomous driving. The network is trained and evaluated on the Driving Scene dataset to recognize road scenes.
Iaetsd multi-view and multi band face recognitionIaetsd Iaetsd
The document discusses multi-view and multi-band face recognition using wavelet transforms. It begins with an abstract describing the challenges of face recognition due to variations in lighting, expression, and aging. It then introduces a multi-band face recognition algorithm using wavelet transforms to extract features from multiple video bands. The experimental results show wavelet transforms take less response time and are more suitable for feature extraction and face matching with high accuracy. It discusses preprocessing images, feature extraction using PCA and wavelet transforms, feature matching, and concludes wavelet transforms help with feature extraction and face matching with high accuracy and less response time.
Crude-Oil Scheduling Technology: moving from simulation to optimizationBrenno Menezes
Scheduling technology either commercial or homegrown in today’s crude-oil refining industries relies on a complex simulation of scenarios where the user is solely responsible for making many different decisions manually in the search for feasible solutions over some limited time-horizon i.e., trial-and-error heuristics. As a normal outcome, schedulers abandon these solutions and then return to their simpler spreadsheet simulators due to: (i) time-consuming efforts to configure and manage numerous scheduling scenarios, and (ii) requirements of updating premises and situations that are constantly changing. Moving to solutions based in optimization rather than simulation, the lecture describes the future steps in the refactoring of the scheduling technology in PETROBRAS considering in separate the graphic user interface (GUI) and data communication developments (non-modeling related), and the modeling and process engineering related in an automated decision-making with built-in problem representation facilities and integrated data handling features among other techniques in a smart scheduling frontline.
Eugene Khvedchenya. State of the art Image Augmentations with Albumentations.Lviv Startup Club
This document discusses image augmentation techniques for computer vision tasks. It defines image augmentation as modifying existing training images to increase the size and diversity of the dataset. This helps neural networks become more invariant to perturbations and acts as a regularizer. The document recommends Albumentations, an open-source Python library for image augmentation that is part of the PyTorch ecosystem. It provides examples of how to apply different pixel-level, spatial, and domain-specific augmentations using Albumentations. The document also highlights several Kaggle and other competitions where augmentations from Albumentations helped achieve first place by addressing distribution shifts between the training and test sets.
Real time video analytics with InfoSphere Streams, OpenCV and RStephan Reimann
Unstructured data are a fast growing area and a source for many innovative Big Data & Analytics solutions. Often the first idea of unstructured data seems to be that it's probably text data, even though that is just a small part. A lot of that "new data" is sensor data and especially multimedia (audio, video). Even though this part is growing extremly fast, it is very rarely used in analytics today. And even less in a real time context.
In order to experience what does it mean and how does it feel (and if it is possible to make sense of it) to work with this new data in real time, Wilfried Hoge and I have created a demo that shows our own experience and explains important concepts & implementation. approaches. The demo we created shows a drill equipment as it is used to build tunnels and how to analyze the output on the conveyor belt visually with machine learning approaches.
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
This document provides a comprehensive analysis of imbalance problems in object detection. It presents a taxonomy to classify different types of imbalances and discusses solutions proposed in literature. The analysis highlights significant gaps including existing imbalances that require further attention, as well as entirely new imbalances that have never been addressed before. A survey of imbalance problems caused by weather conditions and common object imbalances is conducted. Methods for addressing imbalances include data augmentation using GANs and balancing training based on class performance.
1. The document discusses various techniques that have been proposed for face detection and attendance systems, including Haar classifiers, improved support vector machines, and local binary patterns algorithms.
2. It reviews several papers that have implemented different methods for face recognition for attendance systems, such as using HOG features and PCA for dimensionality reduction along with SVM classification.
3. The document also summarizes a paper that proposed a context-aware local binary feature learning method for face recognition that exploits contextual information between adjacent image bits.
Integrated Model Discovery and Self-Adaptation of RobotsPooyan Jamshidi
Machine learn models efficiently under budget constraints to adapt to perturbations such as environmental changes or changes in the internal resources.
Modern software-intensive systems are composed of components that are likely to change their behaviour over time (e.g., adding/removing components).
For software to continue to operate under such changes, the assumptions about parts of the system made at design time may not hold at runtime due to uncertainty.
Mechanisms must be put in place that can dynamically learn new models of these assumptions and use them to make decisions about missions, configurations, etc.
IRJET- Facial Expression Recognition using GPA AnalysisIRJET Journal
This document discusses a method for facial expression recognition using geometric feature analysis (GPA). The method involves preprocessing an input face image, extracting the skin pixels and facial features, and then using a support vector machine (SVM) classifier trained on geometric features to recognize the expression. Specifically, it performs skin mapping using a gray level co-occurrence matrix to isolate the face, extracts features like the eyes, nose and lips, and then inputs geometric relationships between these features into the SVM to classify the expression based on previous training data. The goal is to develop an automated system for facial expression recognition using digital image processing techniques.
Similar to Learning from Computer Simulation to Tackle Real-World Problems (20)
ZUIX is a design system created by Zigbang's CTO team to standardize design across all of Zigbang's services. It uses React Native for responsive, multi-platform components and includes tools like Storybook for development and a design review infrastructure for validation. The deployment process involves code reviews, CI/CD pipelines, and publishing to a npm registry. Training and documentation is provided through tools like Google Classroom and Notion. The team aims to further develop ZUIX by improving the design review tools, adding end-to-end testing, and analyzing component usage. The goal is to solve Zigbang's unique challenges through an agile, collaborative approach between designers and developers.
This document discusses Kakao's search platform front-end project. It describes the architecture of an integrated search service using microservices and the need for a design system due to fragmented UIs. It introduces the KST (Kakao Search Template) project for creating a design system including 200+ UI blocks and templates. The KST Builder, Logger, and Dashboard are discussed for managing templates, logging usage, and monitoring coverage. Maintaining a consistent design system is important for operating diverse search services and platforms.
This document discusses Banksalad Product Language (BPL), which is a method used at Banksalad to standardize UI text, elements, and components. It allows designers and developers to use consistent terms, while abstracting UI elements to different levels suitable for their roles. Examples of standardized elements are provided, as well as external resources that discuss concepts like tree shaking that are relevant to BPL. While BPL has benefits, the document considers whether there may be better approaches than BPL.
This document summarizes a presentation about using Stitches, a React styling library, and Storybook for component design.
The presentation introduces Stitches as the styling library used for its support of React, easy usage, and themes. Key features of Stitches discussed include creating styled components, variants, and comparisons to other libraries.
Storybook is presented as a way to improve communication between designers and developers by allowing visualization of components alongside their stories. Clean communication through a shared Storybook is emphasized.
Reflections on initially creating a design system note the benefits of consistency and speed but also identify areas for improvement like documentation, process alignment, and understanding each other's roles. Establishing trust and understanding between
비행기 설계를 왜 통일 해야 할까?
디자인 시스템을 하는 이유
비행기들이 다 용도가 다르다...어떻게 설계하지?
맥락이 다른 페이지와 패턴
경유지까지 아직 멀었다... 언제 수리하지?
디자인 시스템을 적용하는 시점
엔지니어랑 얘기해서 정비해야하는데...어떻게 수리하지?
디자인 시스템을 적용하는 프로세스
비행기 설계가 바뀐걸 어떻게 알리지?
디자인 시스템의 전파
The document discusses Kotlin coroutines and how they can be used to write asynchronous code in a synchronous, sequential way. It explains what coroutines are, how they work internally using continuation-passing style (CPS) transformation and state machines, and compares them to callbacks. It also outlines some of the benefits of using coroutines, such as structured concurrency, light weight execution, built-in cancellation, and simplifying asynchronous code. Finally, it provides examples of how to use common coroutine builders like launch, async, and coroutineScope in a basic Android application with ViewModels.
This document contains the transcript from a presentation given by Wonsuk Lim from Naver on tips for debugging and analyzing Android applications. Some key tips discussed include fully utilizing the Android emulator's capabilities like 2-finger touch control, clipboard sharing between the emulator and host PC, and mocking locations. Advanced settings for the emulator like foldable and camera emulation are also covered. The presenter recommends ways to configure developer options and use tools like LeakCanary, the Android profiler, and Stetho for testing app stability. Methods for understanding the Android framework by reviewing system services and managers via AIDL files and logcat dumps are presented. Finally, reverse engineering tools like APK Extractor and decompilers are introduced.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMydbops
This presentation, titled "MySQL - InnoDB" and delivered by Mayank Prasad at the Mydbops Open Source Database Meetup 16 on June 8th, 2024, covers dynamic configuration of REDO logs and instant ADD/DROP columns in InnoDB.
This presentation dives deep into the world of InnoDB, exploring two ground-breaking features introduced in MySQL 8.0:
• Dynamic Configuration of REDO Logs: Enhance your database's performance and flexibility with on-the-fly adjustments to REDO log capacity. Unleash the power of the snake metaphor to visualize how InnoDB manages REDO log files.
• Instant ADD/DROP Columns: Say goodbye to costly table rebuilds! This presentation unveils how InnoDB now enables seamless addition and removal of columns without compromising data integrity or incurring downtime.
Key Learnings:
• Grasp the concept of REDO logs and their significance in InnoDB's transaction management.
• Discover the advantages of dynamic REDO log configuration and how to leverage it for optimal performance.
• Understand the inner workings of instant ADD/DROP columns and their impact on database operations.
• Gain valuable insights into the row versioning mechanism that empowers instant column modifications.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Learning from Computer Simulation to Tackle Real-World Problems
1. Learning from Computer Simulations to
Tackle Real-World Problems
Hoon Kim
Joint work with Kangwook Lee, Gyeongjo Hwang, and Changho Suh
KAIST
2. Hoon Kim
Korea Advanced Institute of Science and Technology (KAIST)
• M.S. in School of Electrical Engineering (2017 Mar. – Current)
• Advisor: Prof. Changho Suh
• B.S. in School of Electrical Engineering (2012 Feb. – 2017 Feb.)
• Double major Computer Science
• Interested in utilizing simulators to tackle real world problems!
About myself
3. 1.2 million images / 1000 Object Classes
à Such datasets are not always viable in the real world
• Deep learning has achieved exceptional performances in many tasks
• Training sets need to be large, diverse, and accurately annotated
Motivation
4. Application 1: ”Predicting Eye Gaze Vector”
à Difficult to annotate the eye gaze vector
E.g., eyegaze dataset
(ˆ✓, ˆ)
Motivation
5. Application 2: ”Predicting Accidents”
à Difficult to acquire the data itself in the real world
E.g., accident dataset
Accident / Non-accident
Motivation
6. Recently, many researchers proposed utilizing simulator to tackle data scarcity
à Generate arbitrary type & amount of data
à No labeling cost
However, synthetic data may not be realistic enough,
resulting in poor performance in the real world
Motivation
UnityEyes – Eye simulator AirSim – Drone simulator Robot Arm Simulator
CARLA - Driving simulator
7. Overview of this talk
1. Simulated+Unsupervised Learning with Adaptive Data Generation and Bidirectional Mapping
- International Conference on Learning Representation (ICLR) 2018
2. Crash to not Crash: Learn to Identify Dangerous Vehicles Using a Simulator
- Association for the Advancement in Artificial Intelligence (AAAI) 2019
8. Overview of this talk
1. Simulated+Unsupervised Learning with Adaptive Data Generation and Bidirectional Mapping
- International Conference on Learning Representation (ICLR) 2018
2. Crash to not Crash: Learn to Identify Dangerous Vehicles Using a Simulator
- Association for the Advancement in Artificial Intelligence (AAAI) 2019
9. Source domain (simulator)
Xs
Ys
Target domain (real world)
Xt
(-5, 5) ,
Given labeled source domain data (Xs, Ys)
unlabeled target domain data (Xt)
(25, 15)
, ,
Problem Formulation
à Train a predictor that works well in the target domain
10. [Shrivastava et al., CVPR 2017] proposed “Simulated+Unsupervised learning”
(3) Predict using the original images(1) Forward/translate images
F (ˆ✓, ˆ)GX!Y F(✓, ) = ( 3.0/ 8.2) (✓, ) = ( 14.2/17.3)
(✓, ) = ( 2.1/3.2) (✓, ) = ( 1/ 10)
(✓, ) = ( 3.0/ 8.2) (✓, ) = ( 14.2/17.3)
(✓, ) = ( 2.1/3.2) (✓, ) = ( 1/ 10)
They showed,
1. Refining synthetic images with GAN makes them look real while preserving the labels
2. Training models on these refined images leads to significant improvements in accuracy
(2) Train predictor on translated images
Domain Adaptation from Simulation to Real World
11. Comparison of a gaze estimator trained on Synthetic vs Refined Synthetic
Training Data % of images within
7 degree error
Synthetic Data 62.3
Synthetic Data 4x 64.9
Domain Adaptation from Simulation to Real World
12. Comparison of a gaze estimator trained on Synthetic vs Refined Synthetic
Training Data % of images within
7 degree error
Synthetic Data 62.3
Synthetic Data 4x 64.9
Refined Synthetic Data 69.4
Refined Synthetic Data 4x 87.2
Can we do better?
Domain Adaptation from Simulation to Real World
13. 2 limitations of existing approach
1. [S] How can we specify synthetic label distribution?
à Can we utilize the flexibility of the simulator to generate better synthetic data?
2. [U] Not taking advantage of the asymmetry between the real and synthetic dataset
• The amount of available synthetic data is much larger than that of real data
• Synthetic data is usually much cleaner (less noise) compared to real data
à Can we devise a better unsupervised domain adaptation algorithm?
Domain Adaptation from Simulation to Real World
14. 1. Adaptive Data Generation
• Match distributions between synthetic /real labels
2. Learn Bi-Directional Mapping
• Learn mapping between synthetic /real data with modified CycleGAN
3. Backward Translation
• Make prediction with real synthetic translated image
Our proposed algorithm - Overview
15. Simulator
Synthetic data w/
Labels
Generate Train
Predictor
Real data w/o
Labels
Real data w/
pseudo-labels
Real-to-synthetic
mapping Training
Synthetic-like
test data
Test data Prediction Results
Testing
Update simulator parameters
Adaptive Data Generation
Learn Bi-Directional
Mapping
Backward Translation
Predictor
P
(30, -10) (20, 5) (-11, 13) (20, 5)
Our proposed algorithm - Overview
16. Simulator
Synthetic data w/
Labels
Generate Train
Predictor
Real data w/o
Labels
Real data w/
pseudo-labels
Real-to-synthetic
mapping Training
Synthetic-like
test data
Test data Prediction Results
Testing
Update simulator parameters
Adaptive Data Generation
Learn Bi-Directional
Mapping
Backward Translation
Predictor
P
(30, -10) (20, 5) (-11, 13) (20, 5)
Our proposed algorithm - Overview
17. à Label distributions can be different in synthetic & real datasets
à Fortunately, the synthetic distribution can be easily modified
How Simulator works
Specify Label Distribution
Draw random
label instance Generate image
Q. How can we systematically adapt the label distribution?
Step 1 – Adaptive Data Generation
18. Overview of our adaptive data generation process
(✓, ) = (12.5, 7.8)
(✓, ) = ( 37.9, 7.1)
(✓, ) = (7.4, 27.2)
(✓, ) = (6.93, 12.8)
F
(3) Train a predictor F on (X, Z)
(ˆ✓, ˆ) = ( 0.5, 1.2)
(ˆ✓, ˆ) = ( 10.6, 1.0)(ˆ✓, ˆ) = ( 11.0, 12.8)
(ˆ✓, ˆ) = ( 16.2, 6.4)
(4) Pseudo-label real data: (Y, F(Y ))
(✓, ) = (12.5, 7.8) (✓, ) = ( 37.9, 7.1)
(✓, ) = (7.4, 27.2) (✓, ) = (6.93, 12.8)
(2) Generate synthetic data (X, Z)
(5) Estimate distribution of F(Y )
ˆ✓
ˆ PF (Y ) = P(ˆ✓, ˆ)
(1) Set distribution of Z
PZ = P(✓, )
✓
(1) Set distribution of Z
PZ = P(✓, )
✓
Step 1 – Adaptive Data Generation
19. à Experimental results of the adaptive data generation algorithm
Iteration 0 Iteration 1 Iteration 2
8.54 3.00 1.04
0.31 0.02 0.00
Step 1 – Adaptive Data Generation
20. Simulator
Synthetic data w/
Labels
Generate Train
Predictor
Real data w/o
Labels
Real data w/
pseudo-labels
Real-to-synthetic
mapping Training
Synthetic-like
test data
Test data Prediction Results
Testing
Update simulator parameters
Adaptive Data Generation
Learn Bi-Directional
Mapping
Backward Translation
Predictor
P
(30, -10) (20, 5) (-11, 13) (20, 5)
Step 2 – Learn Bi-Directional Mapping
24. Step 3 – Backward Translation
Simulator
Synthetic data w/
Labels
Generate Train
Predictor
Real data w/o
Labels
Real data w/
pseudo-labels
Real-to-synthetic
mapping Training
Synthetic-like
test data
Test data Prediction Results
Testing
Update simulator parameters
Adaptive Data Generation
Learn Bi-Directional
Mapping
Backward Translation
Predictor
P
(30, -10) (20, 5) (-11, 13) (20, 5)
25. Step 3 – Backward Translation
Backward/Translation: Our proposed training/test methodology
Forward/Translation: training/test methodology proposed by [Shrivastava et al., 2017]
Error with Forward/Translation: 8.4 7.71
Error with Backward/Translation: 8.4 7.60
26. Results
Comparison of the state of the arts on the MPIIGaze dataset
(Error: mean angle error in degrees)
Method
Trained
on
Tested
with Error
KNN w/ UT Multiview
(Zhang et al., 2015) R R 16.2
CNN w/ UT Multiview
(Zhang et al., 2015) R R 13.9
CNN w/ UnityEyes
(Shrivastava et al., 2017) S R 11.2
KNN w/ UnityEyes
(Wood et al., 2016) S R 9.9
CNN w/ UnityEyes
(Shrivastava et al., 2017) RS R 7.8
Ours w/ feature loss S RR 7.6
27. Conclusion
• In this work, we proposed a new S+U learning algorithm
• Fully exploits the flexibility of simulators
• Improved bidirectional mapping
• Utilizes the data asymmetry between real/synthetic dastaset
• Achieved the state-of-the-art result on MPIIGaze dataset
28. Overview of this talk
1. Simulated+Unsupervised Learning with Adaptive Data Generation and Bidirectional Mapping
- International Conference on Learning Representation (ICLR) 2018
2. Crash to not Crash: Learn to Identify Dangerous Vehicles Using a Simulator
- Association for the Advancement in Artificial Intelligence (AAAI) 2019
29. Collision Prediction System (CPS)
• Goal of CPS: To predict vehicle collisions ahead of time
• Existing CPSs are
- based on rule-based algorithms
- require expensive devices (LIDAR, communication chips, …)
• Our goal is to build a CPS that is
- based on data-driven algorithms
- require commodity devices (camera sensor)
Deep Learning ModelSingle Camera Collision Probability
30. Problem Formulation
Deep Learning ModelSingle Camera Collision Probability
1. Object Tracking 2. Object Classification
detects/tracks vehicles
in the input image(s)
classifies whether or not the
given vehicle is dangerous
31. Problem Formulation
Deep Learning ModelSingle Camera Collision Probability
1. Object Tracking 2. Object Classification
detects/tracks vehicles
in the input image(s)
classifies whether or not the
given vehicle is dangerous
à Our main interest!
33. Challenges in building a data-driven CPS: Data Scarcity
• Existing driving dataset (KITTI, Cityscapes) has no accident data
• Collecting accident data in the real world is very expensive
• We can take advantage of driving simulator
CARLA - Driving simulatorGrand Theft Auto V
34. How do we collect accident dataset from a simulator?
Q1. How do we reproduce accident scenes in a simulator?
35. How do we collect accident dataset from a simulator?
Q1. How do we reproduce accident scenes in a simulator?
à we modify driver behavior via simulator hacking
36. • ScriptHookV - A third party script that allows for an access to game engine
• On top of ScriptHookV, we implemented the following helper functions
• Start_Cruising(s) – makes the player’s car cruise along the lane at a constant speed s
• Get_Nearby_Vehicles() – returns the list of vehicles in the current scene of the game
• Set_Vehicle_Out_Of_Control(v) – makes the vehicle v drive out of control
• Is_My_Car_Colliding() – checks whether my car is colliding or not
• Our accident simulator generates accident data in a random manner by setting nearby
vehicles out of control while cruising and saving past 20 frames when collision occurs
Data Collection
37. Original GTA V
Data Collection
Driver behavior modification
via simulator hacking
Hacked GTA V
Normal Driving Scene
Accident Driving Scene
38. Data Collection - GTACrash
• 3661 from original GTA V (Non-accident scenes)
• 7720 from hacked GTA V (Accident scenes)
• Each scene is composed of following frames:
···0.1s 0.1s
20 frames
Each vehicle’s states
(position, velocity, etc.)
are also collected
Dataset is available @ https://sites.google.com/view/crash-to-not-crash
39. How can we collect accident dataset from a simulator?
Q2. How can we make them more realistic?
40. How can we collect accident dataset from a simulator?
Q2. How can we make them more realistic?
à Feature (Image): Style transfer via GAN
à Label: label adaptation via motion model
41. Synthetic train data Real test data
Domain Adaptation – Feature Adaptation
Domain difference between
train and test images
results in poor test performance
42. à We apply backward translation approach (Lee, Kim, and Suh 2018)
Synthetic train data Real test data Backward-translated test data
Map images to
synthetic domain
at test time
Domain Adaptation – Feature Adaptation
43. How can we collect accident dataset from a simulator?
Q2. How can we make them more realistic?
à Feature (Image): Style transfer via GAN
à Label: label adaptation via motion model
45. • Vehicle path prediction is a well studied field
• Constant Turn Rate and Acceleration (CTRA) model – Schubert et al. 2008
• state-of-the-art path prediction performance in the real world
• Requires vehicle states such as position, velocity, acceleration, and turn rate
• By accessing internal states we can access all the required information
à Our proposed label adaptation
1. Compute the predicted path of each vehicle as per the CTRA model
2. Assign a positive label if and only if the predicted path intersects with that of the ego-vehicle
Vehicles’ internal states
• Position
• Velocity
• Acceleration
• Turnrate
CTRA based
Label Adaptation
Refined Label
Label Adaptation
47. Experiments
Q. Which approach will perform better on real data?
Trained with
Small real dataset
Trained with
Large synthetic data
Pros Dataset is unbiased Large amount
Cons Small amount Dataset is biased
48. Experiments - YouTubeCrash
• 100 Non-accident Scenes / 122 Accident Scenes
• Manual annotation of bounding box and vehicle IDs
• A half of YouTubeCrash is used as training dataset for baseline
• Another half is used for test dataset for both baseline and our approach
Accidents Non-Accidents
50. ···
Safe
or
Dangerous
CNN
Experiments – Classification Algorithms
Alg1. 5-layer NN with bounding box sizes
Alg2. CNN with bounding boxes
Alg3. CNN with RGBs & bounding boxes
···
Safe
or
Dangerous
CNN
···
Safe
or
Dangerous
0.32
0.02
···
Fully Connected
Network
51. Experiments – Test AUC on YoutubeCrash
Alg/Training data
Trained with
real data
Trained with
synthetic data
Size of
training data
61 accidents
50 non-accidents
7000 accidents
3500 non-accidents
Alg1 0.8766
Alg2 0.8708
Alg3 0.8730
52. Experiments – Test AUC on YoutubeCrash
Alg/Training data
Trained with
real data
Trained with
synthetic data
Size of
training data
61 accidents
50 non-accidents
7000 accidents
3500 non-accidents
Alg1 0.8766 0.8812
Alg2 0.8708 0.8992
Alg3 0.8730 0.9086
Performance improvement with all tested algorithms
53. Experiments – Test AUC on YoutubeCrash
Alg/Training data
Trained with
real data
Trained with
synthetic data
Size of
training data
61 accidents
50 non-accidents
7000 accidents
3500 non-accidents
Alg1 0.8766 0.8812
Alg2 0.8708 0.8992
Alg3 0.8730 0.9086
A complex model can be trained only with large data
Increasing
model
complexity
54. Experiments – Test AUC on YoutubeCrash
Alg/Training data
Trained with
real data
Trained with
synthetic data
Size of
training data
61 accidents
50 non-accidents
7000 accidents
3500 non-accidents
Alg1 0.8766 0.8812
Alg2 0.8708 0.8992
Alg3 0.8730
0.9086
à 0.9164
Domain adaptation further improves performance
55. Experiments – Missed Detection vs Time-To-Collision
Easy Difficult
As TTC increases, it is
more difficult to identify
dangerous vehicles
56. Experiments – Missed Detection vs Time-To-Collision
Alg 1 – Trained with real data
57. Experiments – Missed Detection vs Time-To-Collision
18.5%
improvement
Alg 1 – Trained with real data
Alg 3 – Trained with synthetic data
58. Experiment Results – Missed Detection vs Time-To-Collision
Alg 1 – Trained with real data
Alg 3 – Trained with synthetic data
When TTC > 0.8, a complex model
is necessary for identifying distant
dangerous vehicles
61. Conclusion
• In this work, we propose an efficient deep learning-based CPS
- Customized synthetic data generator
- Novel label adaptation method
- Our method outperforms those trained with real data
• Our datasets and demo videos are available at
https://sites.google.com/view/crash-to-not-crash