오늘은 객체 탐지 데이터셋 한계를 극복하기 위한 새로운 접근법, 'ObjectLab'에 대해 소개하려고 합니다. 대부분의 실제 세계 훈련 데이터셋이 가지고 있는 annotation error 때문에, 객체 탐지는 여전히 취약한 부분이 많습니다.
1. 객체 탐지 라벨의 오류 문제
객체 탐지 시스템은 라벨링된 데이터에 크게 의존합니다. 하지만 라벨 오류가 흔한 현실에서, 이는 탐지 시스템의 정확도에 심각한 영향을 미칠 수 있습니다.
2. ObjectLab의 소개
ObjectLab은 다양한 오류를 탐지할 수 있는 직관적인 알고리즘을 제안합니다. 여기에는 간과된 바운딩 박스, 잘못 배치된 박스, 부정확한 클래스 라벨 할당 등이 포함됩니다.
3. 작동 원리
ObjectLab은 훈련된 객체 탐지 모델을 사용하여 각 이미지의 라벨 품질을 평가합니다. 이를 통해 잘못 라벨링된 이미지를 자동으로 검토 및 수정 대상으로 우선 순위화할 수 있습니다.
4. 결과와 효과
이러한 방식으로, 기존 모델링 코드를 변경하지 않고도 더 나은 버전의 객체 탐지 모델을 훈련할 수 있습니다. COCO를 포함한 다양한 객체 탐지 데이터셋과 Detectron-X101, Faster-RCNN을 포함한 여러 모델에서 ObjectLab은 다른 라벨 품질 점수들에 비해 훨씬 더 나은 정밀도/재현율로 주석 오류를 일관되게 탐지했습니다.
Empathic Computing and Collaborative Immersive AnalyticsMark Billinghurst
This document discusses empathic computing and collaborative immersive analytics. It notes that while fields like scientific and information visualization are well established, little research has looked at collaborative visualization specifically. Collaborative immersive analytics combines mixed reality, visual analytics and computer-supported cooperative work. Empathic computing aims to develop systems that allow sharing experiences, emotions and perspectives using technologies like virtual and augmented reality with physiological sensors. Applying these concepts could enhance communication and understanding for collaborative immersive analytics tasks.
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 4: retinex model based low light enhancement
auto-assistance system for visually impaired personshahsamkit73
The World Health Organization (WHO) reported that there are 285 million visually-impaired people worldwide. Among these individuals, there are 39 million who are totally blind. There have been several systems designed to support visually-impaired people and to improve the quality of their lives. One of the most difficult activities that must be conducted by visually impaired is indoor navigation. In indoor environment, visually impaired should be aware of obstacles in front of them and be able to avoid it. The use of powered wheelchairs with high transportability and obstacle avoidance intelligence is one of the great steps towards the integration of physically disabled and mentally handicapped people. The disable person will not be able to visualize the object so this Auto-assistance system may suffice the requirement. Auto-Assistance System operating in dynamic environments need to sense its surrounding environment and adapt the control signal in real time to avoid collisions and protect the users. Auto-Assistance System that assist or replace user control could be developed to serve for these users, utilizing systems and algorithms from Auto-Assistance robots. This system could be used to assist disable in their mobility by warning of obstacles. The system could be used in indoor environment like hospital, public garden area. So, we are designing an Auto-assistance system which will help the visually impaired person to work independently. In this system we would be detecting the obstruction in the path of visually impaired person using USB Camera & help them to avoid the collisions.
GitHub Link: https://github.com/shahsamkit73/Auto-Assistance-System-for-visually-impaired
IRJET- Object Detection and Recognition for Blind AssistanceIRJET Journal
1. The document proposes a system using object and color recognition and convolutional neural networks to enhance the capabilities of visually impaired people.
2. The system uses a camera mounted on glasses to capture images which are then preprocessed, compressed, and used to train a classifier model to recognize common objects.
3. The proposed hardware implementation uses a Raspberry Pi for its small size and open source software support, including TensorFlow for training convolutional neural network models.
Generative Adversarial Networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator learns to generate fake images that look real, while the discriminator learns to tell real images apart from fakes. This document discusses various GAN architectures and applications, including conditional GANs, image-to-image translation, style transfer, semantic image editing, and data augmentation using GAN-generated images. It also covers evaluation metrics for GANs and societal impacts such as bias and deepfakes.
This document defines cinematography and discusses various cinematography techniques. It begins by defining cinematography as how the camera is used to create meaning through elements like camera movement, lenses, and lighting. It then discusses different shot types like close-ups, long shots, and over-the-shoulder shots and how they impact viewers. Finally, it covers concepts like depth of field, camera movement, framing, and the rule of thirds to guide visual storytelling.
Empathic Computing and Collaborative Immersive AnalyticsMark Billinghurst
This document discusses empathic computing and collaborative immersive analytics. It notes that while fields like scientific and information visualization are well established, little research has looked at collaborative visualization specifically. Collaborative immersive analytics combines mixed reality, visual analytics and computer-supported cooperative work. Empathic computing aims to develop systems that allow sharing experiences, emotions and perspectives using technologies like virtual and augmented reality with physiological sensors. Applying these concepts could enhance communication and understanding for collaborative immersive analytics tasks.
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 4: retinex model based low light enhancement
auto-assistance system for visually impaired personshahsamkit73
The World Health Organization (WHO) reported that there are 285 million visually-impaired people worldwide. Among these individuals, there are 39 million who are totally blind. There have been several systems designed to support visually-impaired people and to improve the quality of their lives. One of the most difficult activities that must be conducted by visually impaired is indoor navigation. In indoor environment, visually impaired should be aware of obstacles in front of them and be able to avoid it. The use of powered wheelchairs with high transportability and obstacle avoidance intelligence is one of the great steps towards the integration of physically disabled and mentally handicapped people. The disable person will not be able to visualize the object so this Auto-assistance system may suffice the requirement. Auto-Assistance System operating in dynamic environments need to sense its surrounding environment and adapt the control signal in real time to avoid collisions and protect the users. Auto-Assistance System that assist or replace user control could be developed to serve for these users, utilizing systems and algorithms from Auto-Assistance robots. This system could be used to assist disable in their mobility by warning of obstacles. The system could be used in indoor environment like hospital, public garden area. So, we are designing an Auto-assistance system which will help the visually impaired person to work independently. In this system we would be detecting the obstruction in the path of visually impaired person using USB Camera & help them to avoid the collisions.
GitHub Link: https://github.com/shahsamkit73/Auto-Assistance-System-for-visually-impaired
IRJET- Object Detection and Recognition for Blind AssistanceIRJET Journal
1. The document proposes a system using object and color recognition and convolutional neural networks to enhance the capabilities of visually impaired people.
2. The system uses a camera mounted on glasses to capture images which are then preprocessed, compressed, and used to train a classifier model to recognize common objects.
3. The proposed hardware implementation uses a Raspberry Pi for its small size and open source software support, including TensorFlow for training convolutional neural network models.
Generative Adversarial Networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator learns to generate fake images that look real, while the discriminator learns to tell real images apart from fakes. This document discusses various GAN architectures and applications, including conditional GANs, image-to-image translation, style transfer, semantic image editing, and data augmentation using GAN-generated images. It also covers evaluation metrics for GANs and societal impacts such as bias and deepfakes.
This document defines cinematography and discusses various cinematography techniques. It begins by defining cinematography as how the camera is used to create meaning through elements like camera movement, lenses, and lighting. It then discusses different shot types like close-ups, long shots, and over-the-shoulder shots and how they impact viewers. Finally, it covers concepts like depth of field, camera movement, framing, and the rule of thirds to guide visual storytelling.
COMP 4010 - Lecture 7: Introduction to Augmented RealityMark Billinghurst
Lecture 7 in the COMP 4010 class on Augmented Reality and Virtual Reality. This lecture provides an introduction to Augmented Reality. This class was taught on September 7th 2017 by Mark Billinghurst at the University of South Australia.
The document outlines the Twelve Principles of Animation developed by Disney animators Frank Thomas and Ollie Johnston. The principles are meant to produce animated characters that adhere to physics while also conveying emotion and character. Some of the key principles include squash and stretch to give a sense of weight and flexibility, anticipation to make actions appear more realistic, follow through and overlapping action so body parts continue moving after the main action stops, exaggeration to make movements look natural rather than mechanical, and appeal to give characters charisma.
The document describes a project that aims to develop a mobile application for real-time object and pose detection. The application will take in a real-time image as input and output bounding boxes identifying the objects in the image along with their class. The methodology involves preprocessing the image, then using the YOLO framework for object classification and localization. The goals are to achieve high accuracy detection that can be used for applications like vehicle counting and human activity recognition.
1. Image filtering involves applying convolution operations to images using filters like box filters and Gaussian filters. This smooths images by averaging pixel neighborhoods.
2. Gaussian filters are commonly used to smooth images as they produce realistic blurring effects. The amount of smoothing depends on the Gaussian's variance parameter.
3. Template matching uses correlation or convolution to find regions in an image that match a template pattern. It is commonly used for tasks like object detection.
Digital image processing involves performing operations on digital images using computer algorithms. It has several functional categories including image restoration to remove noise and distortions, enhancement to modify the visual impact, and information extraction to analyze images. The main steps are acquisition, enhancement, restoration, color processing, compression, segmentation, and filtering using techniques like pixelization, principal components analysis, and neural networks. It has applications in medical imaging, film, transmission, sensing, and robotics. The advantages are noise removal, flexibility in format and manipulation, and easy storage and retrieval. The disadvantages can include high initial costs and potential data loss if storage devices fail.
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
This document discusses disentangled representation learning in deep generative models. It explains that generative models can generate realistic images but it is difficult to control specific attributes of the generated images. Recent research aims to learn disentangled representations where each latent variable corresponds to an independent perceptual factor, such as object pose or color. Methods described include InfoGAN, β-VAE, spatial conditional batch normalization, hierarchical latent variables, and StyleGAN's hierarchical modulation approach. Measuring entanglement through perceptual path length and linear separability is also discussed. The document suggests disentangled representation learning could help applications in biology and medicine by providing better explanatory variables for complex phenomena.
The document discusses various image enhancement techniques in digital image processing. It describes point operations like image negative, contrast stretching, thresholding, brightness enhancement, log transformation, and power law transformation. Contrast stretching expands the range of intensity levels and can be done by multiplying pixels with a constant, using a transfer function, or histogram equalization. Thresholding converts an image to binary by assigning pixel values above a threshold to one level and below to another. Log and power law transformations compress high intensity values and expand low values to enhance an image. Matlab code examples are provided for each technique.
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
The document summarizes the You Only Look Once (YOLO) object detection method. YOLO frames object detection as a single regression problem to directly predict bounding boxes and class probabilities from full images in one pass. This allows for extremely fast detection speeds of 45 frames per second. YOLO uses a feedforward convolutional neural network to apply a single neural network to the full image. This allows it to leverage contextual information and makes predictions about bounding boxes and class probabilities for all classes with one network.
The document discusses digital image processing and two-dimensional transforms. It provides an agenda that covers two-dimensional mathematical preliminaries and two transforms: the discrete Fourier transform (DFT) and discrete cosine transform (DCT). It then discusses the DFT and DCT in more detail over several pages, covering properties, examples, and applications such as image compression.
Three View Self Calibration and 3D ReconstructionPeter Abeles
The document provides an overview of a 3-view camera self-calibration and 3D reconstruction algorithm. It begins with feature detection in each image, associates features across the 3 views to estimate the trifocal tensor using RANSAC. This is used to compute compatible projective camera matrices and rectify them to metric cameras. An initial 3D point cloud is constructed and refined through bundle adjustment. The algorithm aims to estimate the camera's intrinsic parameters and reconstruct the 3D scene from three uncalibrated images without calibration targets.
This document discusses visual simultaneous localization and mapping (SLAM) and visual odometry (VO). It provides an overview of different approaches including geometric formulations, error formulations, geometry parameterizations, sparse vs dense models, optimization approaches, and sensor combinations. It analyzes two example systems - ORB-SLAM which uses an indirect, sparse model optimized using graph optimization, and Direct Sparse Odometry (DSO) which uses a direct, sparse model optimized using information filtering. It discusses important details in SLAM/VO systems like point selection, keyframe selection, residual selection, parameter initialization, and optimization strategies. It concludes with discussing evaluating SLAM/VO on a wide range of datasets to avoid overfitting.
This document summarizes a paper on Style GAN, which proposes a style-based GAN that can control image generation at multiple levels of style. It introduces new evaluation methods and collects a larger, more varied dataset (FFHQ). The paper aims to disentangle style embeddings to allow unsupervised separation of high-level attributes and introduce stochastic variation in generated images through control of the network architecture.
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Nguyen Cao
This document discusses recommendation services used by Amazon and how they could be implemented on 123Mua.vn. It describes Amazon's business model for recommendations, including recommending products based on browsing history, viewing history, and purchases. It also discusses the research model, including content-based filtering, collaborative filtering, and how they calculate similarity. Finally, it proposes how 123Mua.vn could implement a recommendation engine using items of known interest, popular items, similar item lists, and a weighting scheme to generate recommendations.
This document discusses domain transfer and domain adaptation in deep learning. It begins with introductions to domain transfer, which learns a mapping between domains, and domain adaptation, which learns a mapping between domains with labels. It then covers several approaches for domain transfer, including neural style transfer, instance normalization, and GAN-based methods. It also discusses general approaches for domain adaptation such as source/target feature matching and target data augmentation.
COMP 4010 Lecture12 - Research Directions in AR and VRMark Billinghurst
COMP 4010 lecture on research directions in AR and VR, taught by Mark Billinghurst on November 2nd 2017 at the University of South Australia. This is the final lecture in the 2017 COMP 4010 course on AR and VR
Adoor Gopalakrishnan is a renowned Indian film director born in 1941 in Kerala. He is considered the greatest filmmaker in Malayalam cinema, helping to bring Malayalam films worldwide popularity through his unique directorial style and focus on visuals over dialogue. Some of his most notable films include Swayamvaram (1972), Elippathayam (1981), Mathilukal (1990), and Nizhalkuthu (2002). Adoor is known for dictating every detail in his films and pushing his vision, winning several national and international awards, including the Dadasaheb Phalke Award and Padma Vibhushan. He has played a pivotal role in revolution
These slides provide an overview of the most popular approaches up to date to solve the task of object detection with deep neural networks. It reviews both the two stages approaches such as R-CNN, Fast R-CNN and Faster R-CNN, and one-stage approaches such as YOLO and SSD. It also contains pointers to relevant datasets (Pascal, COCO, ILSRVC, OpenImages) and the definition of the Average Precision (AP) metric.
Full program:
https://www.talent.upc.edu/ing/estudis/formacio/curs/310400/postgraduate-course-artificial-intelligence-deep-learning/
COMP 4010 - Lecture 7: Introduction to Augmented RealityMark Billinghurst
Lecture 7 in the COMP 4010 class on Augmented Reality and Virtual Reality. This lecture provides an introduction to Augmented Reality. This class was taught on September 7th 2017 by Mark Billinghurst at the University of South Australia.
The document outlines the Twelve Principles of Animation developed by Disney animators Frank Thomas and Ollie Johnston. The principles are meant to produce animated characters that adhere to physics while also conveying emotion and character. Some of the key principles include squash and stretch to give a sense of weight and flexibility, anticipation to make actions appear more realistic, follow through and overlapping action so body parts continue moving after the main action stops, exaggeration to make movements look natural rather than mechanical, and appeal to give characters charisma.
The document describes a project that aims to develop a mobile application for real-time object and pose detection. The application will take in a real-time image as input and output bounding boxes identifying the objects in the image along with their class. The methodology involves preprocessing the image, then using the YOLO framework for object classification and localization. The goals are to achieve high accuracy detection that can be used for applications like vehicle counting and human activity recognition.
1. Image filtering involves applying convolution operations to images using filters like box filters and Gaussian filters. This smooths images by averaging pixel neighborhoods.
2. Gaussian filters are commonly used to smooth images as they produce realistic blurring effects. The amount of smoothing depends on the Gaussian's variance parameter.
3. Template matching uses correlation or convolution to find regions in an image that match a template pattern. It is commonly used for tasks like object detection.
Digital image processing involves performing operations on digital images using computer algorithms. It has several functional categories including image restoration to remove noise and distortions, enhancement to modify the visual impact, and information extraction to analyze images. The main steps are acquisition, enhancement, restoration, color processing, compression, segmentation, and filtering using techniques like pixelization, principal components analysis, and neural networks. It has applications in medical imaging, film, transmission, sensing, and robotics. The advantages are noise removal, flexibility in format and manipulation, and easy storage and retrieval. The disadvantages can include high initial costs and potential data loss if storage devices fail.
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
This document discusses disentangled representation learning in deep generative models. It explains that generative models can generate realistic images but it is difficult to control specific attributes of the generated images. Recent research aims to learn disentangled representations where each latent variable corresponds to an independent perceptual factor, such as object pose or color. Methods described include InfoGAN, β-VAE, spatial conditional batch normalization, hierarchical latent variables, and StyleGAN's hierarchical modulation approach. Measuring entanglement through perceptual path length and linear separability is also discussed. The document suggests disentangled representation learning could help applications in biology and medicine by providing better explanatory variables for complex phenomena.
The document discusses various image enhancement techniques in digital image processing. It describes point operations like image negative, contrast stretching, thresholding, brightness enhancement, log transformation, and power law transformation. Contrast stretching expands the range of intensity levels and can be done by multiplying pixels with a constant, using a transfer function, or histogram equalization. Thresholding converts an image to binary by assigning pixel values above a threshold to one level and below to another. Log and power law transformations compress high intensity values and expand low values to enhance an image. Matlab code examples are provided for each technique.
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
The document summarizes the You Only Look Once (YOLO) object detection method. YOLO frames object detection as a single regression problem to directly predict bounding boxes and class probabilities from full images in one pass. This allows for extremely fast detection speeds of 45 frames per second. YOLO uses a feedforward convolutional neural network to apply a single neural network to the full image. This allows it to leverage contextual information and makes predictions about bounding boxes and class probabilities for all classes with one network.
The document discusses digital image processing and two-dimensional transforms. It provides an agenda that covers two-dimensional mathematical preliminaries and two transforms: the discrete Fourier transform (DFT) and discrete cosine transform (DCT). It then discusses the DFT and DCT in more detail over several pages, covering properties, examples, and applications such as image compression.
Three View Self Calibration and 3D ReconstructionPeter Abeles
The document provides an overview of a 3-view camera self-calibration and 3D reconstruction algorithm. It begins with feature detection in each image, associates features across the 3 views to estimate the trifocal tensor using RANSAC. This is used to compute compatible projective camera matrices and rectify them to metric cameras. An initial 3D point cloud is constructed and refined through bundle adjustment. The algorithm aims to estimate the camera's intrinsic parameters and reconstruct the 3D scene from three uncalibrated images without calibration targets.
This document discusses visual simultaneous localization and mapping (SLAM) and visual odometry (VO). It provides an overview of different approaches including geometric formulations, error formulations, geometry parameterizations, sparse vs dense models, optimization approaches, and sensor combinations. It analyzes two example systems - ORB-SLAM which uses an indirect, sparse model optimized using graph optimization, and Direct Sparse Odometry (DSO) which uses a direct, sparse model optimized using information filtering. It discusses important details in SLAM/VO systems like point selection, keyframe selection, residual selection, parameter initialization, and optimization strategies. It concludes with discussing evaluating SLAM/VO on a wide range of datasets to avoid overfitting.
This document summarizes a paper on Style GAN, which proposes a style-based GAN that can control image generation at multiple levels of style. It introduces new evaluation methods and collects a larger, more varied dataset (FFHQ). The paper aims to disentangle style embeddings to allow unsupervised separation of high-level attributes and introduce stochastic variation in generated images through control of the network architecture.
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Nguyen Cao
This document discusses recommendation services used by Amazon and how they could be implemented on 123Mua.vn. It describes Amazon's business model for recommendations, including recommending products based on browsing history, viewing history, and purchases. It also discusses the research model, including content-based filtering, collaborative filtering, and how they calculate similarity. Finally, it proposes how 123Mua.vn could implement a recommendation engine using items of known interest, popular items, similar item lists, and a weighting scheme to generate recommendations.
This document discusses domain transfer and domain adaptation in deep learning. It begins with introductions to domain transfer, which learns a mapping between domains, and domain adaptation, which learns a mapping between domains with labels. It then covers several approaches for domain transfer, including neural style transfer, instance normalization, and GAN-based methods. It also discusses general approaches for domain adaptation such as source/target feature matching and target data augmentation.
COMP 4010 Lecture12 - Research Directions in AR and VRMark Billinghurst
COMP 4010 lecture on research directions in AR and VR, taught by Mark Billinghurst on November 2nd 2017 at the University of South Australia. This is the final lecture in the 2017 COMP 4010 course on AR and VR
Adoor Gopalakrishnan is a renowned Indian film director born in 1941 in Kerala. He is considered the greatest filmmaker in Malayalam cinema, helping to bring Malayalam films worldwide popularity through his unique directorial style and focus on visuals over dialogue. Some of his most notable films include Swayamvaram (1972), Elippathayam (1981), Mathilukal (1990), and Nizhalkuthu (2002). Adoor is known for dictating every detail in his films and pushing his vision, winning several national and international awards, including the Dadasaheb Phalke Award and Padma Vibhushan. He has played a pivotal role in revolution
These slides provide an overview of the most popular approaches up to date to solve the task of object detection with deep neural networks. It reviews both the two stages approaches such as R-CNN, Fast R-CNN and Faster R-CNN, and one-stage approaches such as YOLO and SSD. It also contains pointers to relevant datasets (Pascal, COCO, ILSRVC, OpenImages) and the definition of the Average Precision (AP) metric.
Full program:
https://www.talent.upc.edu/ing/estudis/formacio/curs/310400/postgraduate-course-artificial-intelligence-deep-learning/
최보경 : 실무자를 위한 인과추론 활용 - Best Practices
발표영상 https://youtu.be/wTPEZDc6fw4
---
PAP가 준비한 팝콘 시즌1에서 프로덕트와 함께 성장하는 데이터 실무자들의 이야기를 담았습니다.
---
PAP(Product Analytics Playground)는 프로덕트 데이터 분석에 대해 편안하게 이야기할 수 있는 커뮤니티입니다.
우리는 데이터 드리븐 프로덕트 문화를 더 많은 분들이 각자의 자리에서 이끌어갈 수 있도록 하는 것을 목표로 합니다.
다양한 직군의 사람들이 모여 프로덕트를 만들듯 PAP 역시 다양한 멤버로 구성되어 있으며, 여러분들의 참여로 만들어집니다.
---
공식 페이지 : https://playinpap.oopy.io
페이스북 그룹 : https://www.facebook.com/groups/talkinpap
팀블로그 : https://playinpap.github.io
○ 개요
* Frequency별 금융 상품 소개 (크래프트 프로젝트 소개)
- Ultra low frequency : 자산배분문제 (3달 ~ 6달)
- low frequency : 로보어드바이저 (2달~3달)
- median frequency : 펀드, ETF (1달~2달)
- high frequency : 주문집행, 마켓메이킹 (일단위 밑)
○ 문제점 정의
- 금융데이터로 딥러닝을 할 경우 왜 학습이 안 되는가?
> 문제점 1 : Feature 종류 대비 짧은 Sequence 길이
> 문제점 2 : Feature 자체의 노이즈
> 문제점 3 : 문제점 1, 2로 인한 오버피팅 문제
- 레몬마켓
> 위 문제점들로 인해, 1) 퀀트 only 2) 퀀트 + 딥러닝 3) 잘못된 딥러닝이 대부분임.
> 이런 문제로 기존 로보어드바이저는 AI라는 이름을 달고 나오지만 실제로는 AI가 아닌 경우도 있고, 딥러닝을 쓰지만 성과가 나쁜 경우가 대다수임. 이런 문제로 금융 + 딥러닝 업체들에 대한 레몬마켓 현상이 발생.
○ 크래프트 해결책 (직관에 대한 최적화)
- (문제점1) Feature 종류 대비 짧은 Sequence를 어떻게 해결할 것인가?
> GAN등의 방법으로 Sequence를 연장할 수도 있지만 GAN 데이터가 시계열 데이터의 패턴을 완벽하게 반영하지 않으면 데이터 생성의 의미가 없으면, 금융데이터는 시계열 간의 관계도 매우 중요함. 따라서 부적절
> 직관적으로 퀀트들은 이런 문제를 해결하기 위해 경제적 함의점을 가지는 퀀트모델들을 만듦. (간단한 팩터모델들 소개)
> 우리는 퀀트모델들에 대한 직관적 사고 방식을 모사하는 딥러닝 모형을 설계. (팩터 모델, 자산배분모델 등에서 매우 잘 작동함을 확인)
- (문제점2) Feature 자체의 노이즈를 어떻게 해결할 것인가?
> stacked CNN AutoEncoder 기반의 노이즈 제거기술. 모듈로 확장가능성 존재
> (노이즈 제거가 잘 되는 자료 첨부, 이로 인한 학습 효과 증대)
- (문제점3) 그럼에도 발생하는 오버피팅 문제를 어떻게 해결할 것인가?
> Asynchronous Multi Network Learning Framework 소개.
> Beam search와 유사하게 각 프로세서 개별적으로 초기화된 네트워크를 가지고 학습을 진행. validation data로 검증 후 적자생존 방식으로 오버피팅 발생가능성 최소화
** 해당 자료는 외부 공유 인가 되었습니다.
1. 딥러닝의 동작 방식에 대한 기초 가이드
2. 네트워크를 통한 공간 변환 개괄
"다음을 꼭 기억하세요. 지금까지 딥러닝의 실제 성공은 연속된 기하학적 변환을 사용하여 공간 X 에서 공간 Y 로 매핑하는 능력에 기인합니다."
프랑소와 숄레, 케라스 창시자에게 배우는 딥러닝 가운데.
[Causal Inference Workshop 2022] Applications of Causal Inference in Product ...Bokyung Choi
Youtube Link : https://youtu.be/ubuFDpYIqTM?si=0E7WdBNeyj87jZAb
Applications of Causal Inference in Product Analytics
프로덕트 애널리틱스에서의 인과추론의 활용 사례와 향후 과제
Bokyung Choi (최보경)
Korea Summer Workshop on Causal Inference 2022
2. INTRO: Data-centric AI
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, & Xia Hu. (2023). Data-centric AI: Perspectives and Challenges.
● 과거 연구: specific한
task로 '모델'을 학습하고
성능을 평가
● Data-centric AI: 어떤
'데이터'로 모델을
학습시켰을 때 성능이 향상
될 수 있었으며 무엇이
'좋은 데이터'인지를 평가
3. INTRO: Data-centric AI
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, & Xia Hu. (2023). Data-centric AI: Perspectives and Challenges.
4. 데이터셋을 정제하고, train이 가능한 형태로 변환하는 과정.
ex. Data Cleaning: 결측값 입력, 중복값 제거, inconsistency 있는
샘플을 수정하는 방법 등 데이터의 노이즈나 에러를 제거하는 방법
INTRO: Data-centric AI
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, & Xia Hu. (2023). Data-centric AI: Perspectives and Challenges.
6. INTRO: Problem Statements
Badly Located Error
● GT의 bbox가 object 전체를 포함하고 있지 않거나 위치가 정확하지 않은 경우
● 60번 class(=table)를 보면 prediction 결과에서는 table 전체가 bbox에 포함되지만 GT에서는
테이블의 일부만 포함 됨.
annotators poorly outlined only half of the
dinning table(class #60) which the model
localized much better (with confidence
0.964), leading to a low Badly-Located
score in ObjectLab.
7. INTRO: Problem Statements
Swapped Error
Badly Located Error
the glass object on the right is incorrectly
annotated as a bowl(class #45), while the
model predicted cup(class #41) with
confidence 0.962, leading to a low
Swapped-score in ObjectLab.
● GT의 bbox의 위치는 맞지만, 그 클래스가 틀린 경우를 말한다.
● GT의 빨간색 bbox는 상단의 물잔을 bowl에 해당하는 45번 class로 표기한 반면,
ObjectLab으로 교정한 결과 cup에 해당하는 41번 class로 옳게 바뀜.
: GT의 bbox가 object 전체를 포함하고 있지 않거나 위치가 정확하지 않은 경우
8. INTRO: Problem Statements
Overlooked Error
annotators missed the fire hydrant (class
#10 in COCO) which the model detected
with confidence 0.998, leading to a low
Overlooked-score in ObjectLab.
● GT에 존재해야 할 bbox가 missing된 경우
● 왼쪽의 GT bbox에는 소화전에 bbox가 존재하지 않지만,
ObjectLab의 결과에서는 소화전에 올바른 bbox가 존재함.
Badly Located Error
: GT의 bbox가 object 전체를 포함하고 있지 않거나 위치가 정확하지 않은 경우
Swapped Error
: GT의 bbox 위치는 맞지만, Class가 틀린 경우
9. INTRO: Problem Statements
Badly Located Error
Swapped Error
Overlooked Error
ObjectLab
Dataset without
Labeling Errors
You Do Not Need
to Change Your Models!
→ Just use any type of Detection Model
10. INTRO: Problem Statements
Badly Located Error
Swapped Error
Overlooked Error
ObjectLab
Dataset without
Labeling Errors
: 5-Fold Cross-validation
11. Related Works: TIDE
→ A General Toolbox for Identifying Object Detection Errors
Daniel Bolya, Sean Foley, James Hays, & Judy Hoffman. (2020). TIDE: A General Toolbox for Identifying Object Detection Errors.
mAP
mAP
● 오류 유형이 서로 얽혀있어 각 오류 유형이 mAP에 얼마나 영향을 미치는지 측정하기 어려워,
detector의 오류 분석에 활용하기 어려움
● mAP만을 최적화함으로써 application마다 다를 수 있는 오류 유형의 상대적 중요성을 생략할 수 있음
(ex. 종양 탐지에서 상자 위치파악보다 분류 정확도가 더 중요함)
12. Related Works: TIDE
→ A General Toolbox for Identifying Object Detection Errors
Daniel Bolya, Sean Foley, James Hays, & Judy Hoffman. (2020). TIDE: A General Toolbox for Identifying Object Detection Errors.
TIDE
● Error를 6가지 유형으로 분류
○ 각 오류의 기여도를 측정하여 오류 원인
분석 가능
● Contribution
○ 오류 유형을 간결하게 요약하여 한 눈에
비교할 수 있음
○ 결론에 영향을 미칠 수 있는 교란
변수가 없도록 각 오류 유형의 기여도를
완전히 분리함
○ 오류의 원인을 구별하여 원하는 더
정밀한 분석이 가능함
13. Related Works: TIDE
→ A General Toolbox for Identifying Object Detection Errors
Daniel Bolya, Sean Foley, James Hays, & Judy Hoffman. (2020). TIDE: A General Toolbox for Identifying Object Detection Errors.
TIDE
14. Related Works: Confident Learning Object Detection
Northcutt, C. G., Athalye, A., and Mueller, J. Pervasive label errors in test sets destabilize machine learning benchmarks. In Proceedings of the 35th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks, December 2021a.
Detecting Swapped Dataset
● Assumption: 결국 특정 클래스로 잘못 예측되는건
Prior latent vector가 얼마나 유사한지로 결정된다!
~
~
~
~
confusing
obvious
15. Related Works: Label Quality Score
Model-agnostic label quality scoring to detect real-world label errors ICML DataPerf Workshop, 2022.
● LED(Label Error Detection): 어떤 이미지가 잘못 라벨링 되는지를 식별하는 것
● Swin Transformer 모델을 confidence weighted entropy 나 self-confidence scores를 썼을 때 가장 결과가 좋았음.
● least-confidence와 entropy scores 는 성능이 제일 안좋았음.
Importance of Label Quality Scores
** Score가 높다 == Label Error를 잘 찾아냈다
18. Methods: ObjectLab Algorithm
ObjectLab의 Label Score
: GT의 bbox 위치가 정확하지 않은
error에 대한 score
: GT의 bbox 위치는 맞지만,
Class가 틀린 경우에 대한 score
: GT의 bbox가 존재하지 않는
경우에 대한 score
⅓
20. Methods: Similarity Function
: 한 이미지에서 나온 bbox pair들에 대해서 Similarity를 계산할 수 있는 식
Bany
Bany
if in case of
badly located error,
21. Methods: Badly Located Box Scores
: GT의 bbox 위치가 정확하지
않은 error에 대한 score
: GT의 bbox 위치는 맞지만,
Class가 틀린 경우에 대한 score
: GT의 bbox가 존재하지 않는
경우에 대한 score
⅓
(Pred)Btable
(GT)Btable
22. Methods: Softmin Pooling
: GT의 bbox 위치가 정확하지 않은
error에 대한 score
: GT의 bbox 위치는 맞지만,
Class가 틀린 경우에 대한 score
: GT의 bbox가 존재하지 않는
경우에 대한 score
⅓
(Pred)Bdog
(GT)Bbear
23. Methods: Softmin Pooling
: GT의 bbox 위치가 정확하지 않은
error에 대한 score
: GT의 bbox 위치는 맞지만,
Class가 틀린 경우에 대한 score
: GT의 bbox가 존재하지 않는
경우에 대한 score
⅓
(Pred)B1, person
(Pred)B2, person
(GT) Bperson
p2=0.99
p1=0.98
24. Methods: Softmin Pooling
: GT의 bbox 위치가 정확하지 않은
error에 대한 score
: GT의 bbox 위치는 맞지만,
Class가 틀린 경우에 대한 score
: GT의 bbox가 존재하지 않는
경우에 대한 score
⅓
Softmin
** 정확히 스코어 뭘로 짤랐는지?
26. Experiments: Dataset and Models
COCO-bench Dataset 5 Classes: {person, chair, cup, car, traffic light}
Compares
COCO annotation
(original)
Ma et al. annotation
(Independent)
Sama annotation
(Independent)
vs. vs.
Wrong Annotation! 2,171
251
images
27. Experiments: Dataset and Models
SYNTHIA-AL Dataset
Car(#0)인데 Bicycle(#3)라고
잘못 라벨링되어 있음
가운데 Car(#0)의 BBox 위치가
정확하지 않음
마지막 Car의 BBox가
missing되어 있음
28. Experiments: Dataset and Models
COCO-full Dataset : Badly Located Error
Badly Located
of Train BBox
Badly Located
of Person BBox
29. Experiments: Dataset and Models
COCO-full Dataset : Swapped Error
Swapped between
Cake <-> Donut
Swapped between
Bowl <-> Cup
30. Experiments: Dataset and Models
COCO-full Dataset : Overlooked Error
BBoxes of Sports Balls are
Overlooked
BBox of a Person is
Overlooked
31. Experiments: Metrics
ObjectLab results, we estimate that in
COCO 2017 around:
3% have a Badly Located error,
0.7% have a Swapped error,
and 5% of images have an Overlooked error.
“
”
32. Implications of label errors in test data
1. 작은 모델일수록 보이지 않는 regularization 이점을 확인함 (작은 모델일 경우 고친
데이터에 대해서 성능 올라감)
2. 큰 모델은 system 자체의 label error의 패턴을 학습하여 좋은 성능을 가져온다.
Northcutt, C. G., Athalye, A., and Mueller, J. Pervasive label errors in test sets destabilize machine learning benchmarks. In Proceedings of the 35th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks, December 2021a.
Detecting Swapped Dataset
References: Label Errors in Test Dataset
: 큰 모델일수록 원래 테스트셋에서는 높은 성능이지만 고친 데이터에 대해서 떨어짐
33. References: The Effect of Improving Annotation Quality
Ma, J., Ushiku, Y., & Sagara, M. (2022). The Effect of Improving Annotation Quality on Object Detection Datasets: A Preliminary Study. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 4849-4858).
Old Dataset에서 주어진 Annotation New
{TRAIN} / {TEST}
● (old/old)일 때 좋은
경우가 많음
Annotation Error를 올바르게 고친 버전
34. References: The Effect of Improving Annotation Quality
Ma, J., Ushiku, Y., & Sagara, M. (2022). The Effect of Improving Annotation Quality on Object Detection Datasets: A Preliminary Study. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 4849-4858).
{TRAIN} / {TEST}
● (new/new)일 때 좋은
경우가 많음
Old Dataset에서 주어진 Annotation New Annotation Error를 올바르게 고친 버전
35. Conclusions
1.
ObjectLab은 모델 구조 변화 없이 Annotation Error를 탐지하고,
이를 올바르게 고쳐줄 수 있는 General한 Toolkit임
2.
Noisy Dataset으로 학습을 잘 시키는 방법에 대한 연구도 있지만,
데이터셋의 오류를 교정하여 좋은 데이터셋으로 학습 혹은 테스트를
해보자는 접근 방법임
3.
데이터셋에 존재하는 약간의 에러는 너무 쉬운 Task가 되지 않도록 도와
모델의 Robustness를 올려줄 수 있으나, 에러가 많은 경우 학습에 방해가 됨
4.
Third-party Data Annotation Vendor에 의해 7%~80%의 레이블 에러 발생
→ 직접 데이터를 만들어야 하는 경우 유용하게 쓸 수 있을 것으로 보임