The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks
Covering important topics of Classical Machine Learning in 16 hours, in preparation for the following 10 weeks of Deep Learning courses at Taiwan AI academy from 2018/02-2018/05. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI ) Yunho Maeng
#Python #딥러닝 #API #ibmdeveloperday2019
여러분들의 성원에 보답하기 위해 IBM Developer Day에서 발표한 세션 자료를 공개합니다! 그 어느때 보다 발표자료를 요청한 분들이 많아 놀랐습니다~ 그럼 다음에 또 뵙겠습니다 :)
Github https://github.com/yunho0130/devday_python_api
세션 영상 https://youtu.be/Z7bTfnuLXck
Representational Continuity for Unsupervised Continual LearningMLAI2
Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where we learn the feature representations on an unlabelled sequence of tasks and show that reliance on annotated data is not necessary for continual learning. We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-of-distribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (Lump), a simple yet effective technique that interpolates between the current task and previous tasks' instances to alleviate catastrophic forgetting for unsupervised representations.
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper reviewtaeseon ryu
105번째 논문리뷰,
오늘 소개 드릴 논문은 2020 CVPR에서 발표된 Meta-Transfer Learning for Zero-Shot Super-Resolution 라는 논문입니다!
제목에서 유추가 가능하신것 처럼 학습 데이터없이 저해상도 사진을 고해상도 사진으로 바꿔주는 Zero Shot Super Resolution을 위한 Meta Transfer Learning을 소개합니다. Internal Learning에 적합한 General한 초기 parameter를 찾는것에 기반하여 한번의 Gradient Update만으로 최적의 성능을 보여주는것 방법에 대해서 소개합니다.
논문에 대한 자세한 리뷰를 이미지 처리팀 김선옥 님이 자세한 리뷰 도와주셨습니다!
https://youtu.be/lEqbXLrUlW4
The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks
Covering important topics of Classical Machine Learning in 16 hours, in preparation for the following 10 weeks of Deep Learning courses at Taiwan AI academy from 2018/02-2018/05. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI ) Yunho Maeng
#Python #딥러닝 #API #ibmdeveloperday2019
여러분들의 성원에 보답하기 위해 IBM Developer Day에서 발표한 세션 자료를 공개합니다! 그 어느때 보다 발표자료를 요청한 분들이 많아 놀랐습니다~ 그럼 다음에 또 뵙겠습니다 :)
Github https://github.com/yunho0130/devday_python_api
세션 영상 https://youtu.be/Z7bTfnuLXck
Representational Continuity for Unsupervised Continual LearningMLAI2
Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where we learn the feature representations on an unlabelled sequence of tasks and show that reliance on annotated data is not necessary for continual learning. We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-of-distribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (Lump), a simple yet effective technique that interpolates between the current task and previous tasks' instances to alleviate catastrophic forgetting for unsupervised representations.
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper reviewtaeseon ryu
105번째 논문리뷰,
오늘 소개 드릴 논문은 2020 CVPR에서 발표된 Meta-Transfer Learning for Zero-Shot Super-Resolution 라는 논문입니다!
제목에서 유추가 가능하신것 처럼 학습 데이터없이 저해상도 사진을 고해상도 사진으로 바꿔주는 Zero Shot Super Resolution을 위한 Meta Transfer Learning을 소개합니다. Internal Learning에 적합한 General한 초기 parameter를 찾는것에 기반하여 한번의 Gradient Update만으로 최적의 성능을 보여주는것 방법에 대해서 소개합니다.
논문에 대한 자세한 리뷰를 이미지 처리팀 김선옥 님이 자세한 리뷰 도와주셨습니다!
https://youtu.be/lEqbXLrUlW4
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Three key techniques for ensemble methods are summarized:
1. Bagging grows decision trees from multiple bootstrapped samples and averages the results to form a single classifier. It creates base trees independently with no weighting.
2. Boosting creates base trees successively by fitting to the residuals of previous trees. It up-weights misclassified points and combines trees with learning rates, requiring tuning to prevent overfitting.
3. Gradient boosting is a generalized boosting technique that can use different loss functions like deviance, making it more robust than AdaBoost to outliers. It requires setting the number of trees and hyperparameters like learning rate.
This is the slide from my talk at FULokoja Ingressive meetup.
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured and structured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree-based algorithms are considered best-in-class right now. XGBoost model has the best combination of prediction performance and processing time compared to other algorithms.
This slide is to introduce BFC which plays a big role in GPU memory management ( allocation/deallocation ). It also mentions stream executor and how it works in output tensor's memory allocation in GPU.
The document discusses various techniques for tuning the learning rate when training neural networks, including adaptive learning rate methods, learning rate annealing, cyclical learning rates, and using a learning rate finder. It provides examples of implementing learning rate schedules like step decay, linear decay, and polynomial decay in Keras. Cyclical learning rates and the one cycle policy are also covered, with the one cycle policy combining learning rate and momentum scheduling.
Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
Binomial heaps are a data structure that combines multiple binomial trees. Each binomial tree obeys the min-heap property and there is at most one tree per degree. Common operations on binomial heaps include inserting and deleting nodes, finding the minimum, and merging heaps. These operations make use of properties like each tree's degree and structure to perform the operation in O(log n) time.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!
This Edureka "What is Deep Learning" video will help you to understand about the relationship between Deep Learning, Machine Learning and Artificial Intelligence and how Deep Learning came into the picture. This tutorial will be discussing about Artificial Intelligence, Machine Learning and its limitations, how Deep Learning overcame Machine Learning limitations and different real-life applications of Deep Learning.
Below are the topics covered in this tutorial:
1. What Is Artificial Intelligence?
2. What Is Machine Learning?
3. Limitations Of Machine Learning
4. Deep Learning To The Rescue
5. What Is Deep Learning?
6. Deep Learning Applications
To take a structured training on Deep Learning, you can check complete details of our Deep Learning with TensorFlow course here: https://goo.gl/VeYiQZ
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
The document discusses calibration in modern neural networks. It finds that some recent model families, like MLP-Mixers and Vision Transformers (ViTs), are both highly accurate and well-calibrated on in-distribution and out-of-distribution image datasets. Temperature scaling can improve calibration in older models but the newest architectures remain better calibrated. Model size and pretraining amount do not fully explain differences in calibration between families. Architecture appears to be a major factor, with non-convolutional models tending to have better calibration properties.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
모두의연구소에서 발표했던 “Neural Architecture Search with Reinforcement Learning”이라는 논문발표 자료를 공유합니다. 머신러닝 개발 업무중 일부를 자동화하는 구글의 AutoML이 뭘하려는지 이 논문을 통해 잘 보여줍니다.
이 논문에서는 딥러닝 구조를 만드는 딥러닝 구조에 대해서 설명합니다. 800개의 GPU를 혹은 400개의 CPU를 썼고 State of Art 혹은 State of Art 바로 아래이지만 더 빠르고 더 작은 네트워크를 이것을 통해 만들었습니다. 이제 Feature Engineering에서 Neural Network Engineering으로 페러다임이 변했는데 이것의 첫 시도 한 논문입니다.
Controlled dropout: a different dropout for improving training speed on deep ...Byung Soo Ko
Controlled dropout is a technique that improves the training speed of feed-forward neural networks (FFNNs) and convolutional neural networks (CNNs). It works by identifying redundant computations during dropout and omitting them to reduce operations. Experiments show training speed improvements of over 50% for FFNNs on CPU and around 20% for CNNs on GPU. The improvement is greater when there are more hidden layers, as controlled dropout prevents more redundant operations. This technique can significantly reduce training time for deep neural networks with minimal impact on accuracy.
Bitmap indexing is a technique used for columns with low cardinality (few unique values) that are frequently queried in large databases. It stores the values of these columns in bitmaps, where each row is represented by a bit that is 1 if the column value matches and 0 if it doesn't. When a query with conditions on these columns is executed, the relevant bitmaps are ANDed to find the matching rows faster than a regular index scan. The syntax to create a bitmap index is CREATE BITMAP INDEX index_name ON table_name (column_name). Bitmap indexing improves performance for inserts, deletes, updates and retrievals but is only suitable for large tables and the index creation process can be time-consuming.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Three key techniques for ensemble methods are summarized:
1. Bagging grows decision trees from multiple bootstrapped samples and averages the results to form a single classifier. It creates base trees independently with no weighting.
2. Boosting creates base trees successively by fitting to the residuals of previous trees. It up-weights misclassified points and combines trees with learning rates, requiring tuning to prevent overfitting.
3. Gradient boosting is a generalized boosting technique that can use different loss functions like deviance, making it more robust than AdaBoost to outliers. It requires setting the number of trees and hyperparameters like learning rate.
This is the slide from my talk at FULokoja Ingressive meetup.
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured and structured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree-based algorithms are considered best-in-class right now. XGBoost model has the best combination of prediction performance and processing time compared to other algorithms.
This slide is to introduce BFC which plays a big role in GPU memory management ( allocation/deallocation ). It also mentions stream executor and how it works in output tensor's memory allocation in GPU.
The document discusses various techniques for tuning the learning rate when training neural networks, including adaptive learning rate methods, learning rate annealing, cyclical learning rates, and using a learning rate finder. It provides examples of implementing learning rate schedules like step decay, linear decay, and polynomial decay in Keras. Cyclical learning rates and the one cycle policy are also covered, with the one cycle policy combining learning rate and momentum scheduling.
Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
Binomial heaps are a data structure that combines multiple binomial trees. Each binomial tree obeys the min-heap property and there is at most one tree per degree. Common operations on binomial heaps include inserting and deleting nodes, finding the minimum, and merging heaps. These operations make use of properties like each tree's degree and structure to perform the operation in O(log n) time.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!
This Edureka "What is Deep Learning" video will help you to understand about the relationship between Deep Learning, Machine Learning and Artificial Intelligence and how Deep Learning came into the picture. This tutorial will be discussing about Artificial Intelligence, Machine Learning and its limitations, how Deep Learning overcame Machine Learning limitations and different real-life applications of Deep Learning.
Below are the topics covered in this tutorial:
1. What Is Artificial Intelligence?
2. What Is Machine Learning?
3. Limitations Of Machine Learning
4. Deep Learning To The Rescue
5. What Is Deep Learning?
6. Deep Learning Applications
To take a structured training on Deep Learning, you can check complete details of our Deep Learning with TensorFlow course here: https://goo.gl/VeYiQZ
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
The document discusses calibration in modern neural networks. It finds that some recent model families, like MLP-Mixers and Vision Transformers (ViTs), are both highly accurate and well-calibrated on in-distribution and out-of-distribution image datasets. Temperature scaling can improve calibration in older models but the newest architectures remain better calibrated. Model size and pretraining amount do not fully explain differences in calibration between families. Architecture appears to be a major factor, with non-convolutional models tending to have better calibration properties.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
모두의연구소에서 발표했던 “Neural Architecture Search with Reinforcement Learning”이라는 논문발표 자료를 공유합니다. 머신러닝 개발 업무중 일부를 자동화하는 구글의 AutoML이 뭘하려는지 이 논문을 통해 잘 보여줍니다.
이 논문에서는 딥러닝 구조를 만드는 딥러닝 구조에 대해서 설명합니다. 800개의 GPU를 혹은 400개의 CPU를 썼고 State of Art 혹은 State of Art 바로 아래이지만 더 빠르고 더 작은 네트워크를 이것을 통해 만들었습니다. 이제 Feature Engineering에서 Neural Network Engineering으로 페러다임이 변했는데 이것의 첫 시도 한 논문입니다.
Controlled dropout: a different dropout for improving training speed on deep ...Byung Soo Ko
Controlled dropout is a technique that improves the training speed of feed-forward neural networks (FFNNs) and convolutional neural networks (CNNs). It works by identifying redundant computations during dropout and omitting them to reduce operations. Experiments show training speed improvements of over 50% for FFNNs on CPU and around 20% for CNNs on GPU. The improvement is greater when there are more hidden layers, as controlled dropout prevents more redundant operations. This technique can significantly reduce training time for deep neural networks with minimal impact on accuracy.
Bitmap indexing is a technique used for columns with low cardinality (few unique values) that are frequently queried in large databases. It stores the values of these columns in bitmaps, where each row is represented by a bit that is 1 if the column value matches and 0 if it doesn't. When a query with conditions on these columns is executed, the relevant bitmaps are ANDed to find the matching rows faster than a regular index scan. The syntax to create a bitmap index is CREATE BITMAP INDEX index_name ON table_name (column_name). Bitmap indexing improves performance for inserts, deletes, updates and retrievals but is only suitable for large tables and the index creation process can be time-consuming.
Slides based on "Introduction to Machine Learning with Python" by Andreas Muller and Sarah Guido for Hongdae Machine Learning Study(https://www.meetup.com/Hongdae-Machine-Learning-Study/) (epoch #2)
홍대 머신 러닝 스터디(https://www.meetup.com/Hongdae-Machine-Learning-Study/) (epoch #2)의 "파이썬 라이브러리를 활용한 머신러닝"(옮긴이 박해선) 슬라이드 자료.
[PyCon KR 2018] 땀내를 줄이는 Data와 Feature 다루기Joeun Park
서울 코엑스에서 진행된 파이콘 한국 2018에서 8월 19일에 발표한 내용입니다.
데이터 전처리와 Feature Engineering에 대해 다룹니다.
[파이콘 한국 2018 프로그램 | 땀내를 줄이는 Data와 Feature 다루기](https://www.pycon.kr/2018/program/47)
이 발표내용은 8월 17일 금요일에 진행되었던 다음 2개의 튜토리얼을 바탕으로 작성되었습니다.
* [공공데이터로 파이썬 데이터 분석 입문하기(3시간) — 파이콘 한국 2018](https://www.pycon.kr/2018/program/tutorial/6)
* [청와대 국민청원 데이터로 파이썬 자연어처리 입문하기(3시간) — 파이콘 한국 2018](https://www.pycon.kr/2018/program/tutorial/7)
2. Contents
1. What is Active Learning?
2. Scenarios -- Learner와 Oracle의 관계 프로세스를 정의하는 방법론
3. Query Strategy Frameworks -- 데이터를 쿼리하는 방법론
4. Related Research Areas
5. Additional Discussions
3. What is Active Learning?
모델(learner)이 학습 과정에서 특정 데이터에 대해서, 모른다!! 라고 이야기하고, 학습자(oracle) 이 이를
알려주는 일련의 과정을 통해, 상대적으로 적은 training datasets 양에 대해, 좋은 성능을 내는 방법론
4. Example of Active Learning
a : 두 개의 Gaussian dist 를 따르는 데이터 셋이 있다고 가정해보자.
b : Random sampling 으로, unlabeled data 에 대해 labeling을 하고, prediction (70%)
c : Query strategy(LC)에 따라, sampling 을 해서, 이에 대한 labeling 후의 prediction (90%)
아무 데이터에 대해 라벨링을 해서 decision boundary 를 그리고, prediction을 하는 것보다, decision boundary를 그릴 때,
도움이 될 즉, informative 한 unlabeled datasets을 query 하여, labeling하는 것이 더욱 효율적이다!!
5. Comparison between Random vs Query strategy
Active learning algorithm은 주로, learning curve로 성능 평가를 하게 된다.
decision boundary를 그리는데, 유용한 데이터(informative data)를 선별적으로
(selective) 뽑아서 라벨링하기 때문에, 학습 속도가 더욱 빨라진다.
6. Scenarios
용어 정리 :
- Learner : 데이터를 학습 및 쿼리하는 객체 ( 모델 )
- Oracle : 데이터에 레이블을 매기는 객체 ( 전문가 )
Scenarios는 Learner 와 Oracle 사이에 datasets 의 흐름에 대한 방법론을 이야기한다.
7. Membership Query Synthesis
queries that the learner generates de novo, rather than those sampled from some underlying natural distribution.
Learner : “이런 데이터는 라벨을 매기기가 너무 애매해.. 화살표에 있는 데이터 [-0.1,-0.54, 0.13, 0.35 … ] 는 어떤 것 같아?”
즉, 쿼리를 보낼 때, 실제로 가지고 있는 데이터에서 샘플링해서
보내는 것이 아니라, Leaner 가 input space에서 생성한
(generates de novo) 데이터를 쿼리로 보내서 Oracle에게
라벨링을 요청하는 구조.
Oracle이 human annotator일 경우, (대다수의 경우) 요청한 쿼리를 해석할 수 없다.
ex) handwritten characters
8. Stream-based Selective Sampling
each unlabeled instance is typically drawn one at a time from the data source, and the learner must decide whether to query or discard it.
해당 방법론의 키워드는 "draw” 와 "query strategy” 이다.
Learner : "데이터를 하나씩(sequentially) 뽑아서 보자!"
informative 에 대해 measure하는 방법론에 대해서는 다음 챕터에서 다루도록 하자!
9. Pool-based Sampling
evaluates and ranks the entire collection before selecting the best query.
Learner : “데이터를 보자..(unlabeled data pool)
상위 2개를 뽑기로 했으니까 Oracle한테
Data1, Data2 를 보내면 되겠다!!
Data Informative score
Data 1 0.9
Data 2 0.4
Data 3 0.7
Data 4 0.1
Steam-base 와 Pool-base의 차이점은 전자는 sequentially 하게 개별 데이터의 정보량을 파악해
쿼리하는 것이고, Pool-base는 데이터를 모아서 pool로 보고, 이 중 정보력있는 k개를 쿼리하는 것!
10. Query Strategy Frameworks
Scenarios 에서 우리는, Learner 가 적은 양의 L을 통해 학습, U를 prediction하는 과정에서 informative 한 instance를
쿼리하는 일련의 구조를 살펴보았다.
query strategy는 특정 instance에 대해 얼마나 informative한지를 measure하는 다양한 방법론에 대한 이야기를 하게 된다.
- Uncertainty Sampling
- least certain
- margin sampling
- entropy sampling
- Query-By-Committee
- Expected Model Change
- Expected Error Reduction
- Variance Reduction
- Density-Weighted Methods
11. Uncertainty Sampling
an active learner queries the instances about which it is least certain how to label. -- nearest 0.5
Least Certain
softmax 값이 [0.2, 0.3 ,0.5] 가 하나 있고, [0.1, 0.2 ,0.7] 이 나온 두 개의 인스턴스가 있다고 하자. 각각의 y_hat 이 0.5, 0.7 이기 때문에, 더 uncertain 한 것은 첫 번째 instance !!!
Margin Sampling
위의 예시의 경우, 각각의 margin 값은 0.5-0.3 = 0.2, 0.7 - 0.2 = 0.5 이에 따라 더 uncertain 한 것은 첫 번째 instance !!!
Entropy
위의 예시의 경우, 각각의 entropy 값은, 각각 1.49 , 1.16 으로 더 uncertain 한 것은 첫 번째 instance !!!
12. entropy 방법은 하나의 라벨에 대한 probability space가 유난히 작을 경우, informative score가
낮아진다. 반면에 LC 와 margin sampling은 그 값이 높아진다.
entropy 방법의 경우 log-loss를 최소화하는 문제에 적합하고, LC와 margin sampling의 경우,
classification error를 최소화하는 문제에 적합하다. Q. 그렇다면 NLLLoss 를 사용하는 Classification의
경우에는??
13. Query-By-Committee
같은 데이터를 학습했지만, 다르게 학습한(다른 모델) Multiple Learner가 존재
Multiple Learner를 Committee라 하고, informative instance는 Committee 간의 prediction 차이가 가장 많이 나는 것으로 정의
Vote Entropy
- V(y_{i}) 는 i번째 라벨에 대해 손을 들어준 Committee의 숫자를 의미, C로 나눠줌으로써Ratio 단위가 된다.
Kullback-Leibler (KL) divergence
두 분포의 차이를 measure
- P_theta(c)(y_{i}|x) 는 특정 Committee가 i번째 라벨에 대해 내린 결정 (i.e softmax)
- P_C(y_{i|}|x) 는 전체 Committee가 i 번째 라벨에 대해 내린 결정
14. Expected Model Change
모델을 가장 많이 변화시키는 인스턴스를 informative instance로 정의
어렸을 때, 이걸 알았다면 내 인생은 많이 변했을텐데… -> 이 인스턴스의 라벨을 안다면 내 모델의 성능은 많이 좋아졌을텐데...
“expected gradient length” (EGL)
인스턴스의 실제 라벨을 모르기 때문에, information measure를 expectation 값으로 changed learner’s gradient 의 Euclidean 값으로 한다.
- instance의 라벨이 1이였다면, (i=1) learner의 gradient값의 Euclidean 값을, 라벨 분포로 곱해 모든 라벨을 다 더해준다.
instance에 labeling을 하기 전 즉, 기존의 Labeled instance에 대한 learner 의 gradient 는 0이라고 한다. (모델이 충분히 converge했기 때문)
이에 따라, L U <x,y_{i}> 가 아닌, <x,y_{i}> 의 gradient 만을 계산해, 연산의 효율성을 꾀한다.
15. Expected Error Reduction
이전의 방법론은 특정 인스턴스가 모델을 얼마나 변화시킬 지에 집중했다면, 해당 방법론은 해당 인스턴스를 사용함으로써, 모델의 에러가 얼마나 감소할 지에
집중한다.
보다 구체적으로 이야기하자면, unlabeled instance에서 특정 instance를 쿼리한 후, retraining 한 모델이 remaining unlabeled instance를 prediction했을 때, 최대한
certain 해져야 한다.
0-1 loss
log loss
(entropy)
0-1 loss 만 봐도, sigma 가 라벨(i)에 대해서, unlabeled(U)에 대해서 존재한다.
이 때, U가 매우 크기 때문에, computational burden 이 매우 큰 방법론이다.
16. Variance Reduction
모델의 loss function 를 최소화하는 문제는 비용도 많이 들고, 닫힌 해가 아닐 수 있다.
그렇다면, 모델의 Variance 를 줄여보는 인스턴스를 찾는 것은 어떨까?
Error(X) = noise(X) + bias(X) + variance(X)
noise 는 데이터가 가지는 본질적인 한계치 -- irreducible
Bias 는 모델이 잘 못맞추는 것이다. 모델이 별로거나, 데이터가 적거나의
문제이다. (underfitting) -- irreducible
Variance 너무 잘 맞추려다 보니, 그것만 잘 맞추게 되는 민감도의 문제이다.
(overfitting) -- reducible
17. Variance Reduction
neural networks 의 output variance’s approximation (MacKay, 1992)
우변의 양 끝은 prediction 이 parameter에 대해 얼마나 민감한 지에 대한 것
우변의 가운데 term은 현재 모델의 squared error이다.(L로 학습한) 해당 부분을 Fisher information matrix라 하고, F로 정의 (cov of LL)
- 이상적인 경우, loss function 은 convex 하기 때문에, gradient of x 는 locally linear 하고, variance 형태인 F는 Gaussian 을
따르게 되기 때문에, Closed form 이 된다. 이에 따라, 모델의 re-train 이 불필요하게 된다.
- closed form : 무한대(infinity)로 발산하거나 조건부적인 결과를 제공하는 경우가아닌 경우
Fisher information 은 random variable X를 모델링하는 매개 변수 θ에 따라 발생되는 정보량을 의미한다.
output variance에서 F의 inverse 가 들어가게 되고, 사실상 Fisher information 은 maximizing 해야 한다. -> 모델의 정보량을 최대화
결과적으로, 모델의 variance 를 크게 만드는 instance 를 라벨링함으로써 모델의 variance를 줄여주는 instance x를 찾는 방법론
18. Density-Weighted Methods
오른쪽 plot을 보면, decision boundary 형성에 중요한 데이터는 색이 있는
instance지만, A가 decision boundary 위에 존재, 가장 informative하게 된다.
즉, informative 할 수는 있지만, 전체 input space에 대한 대표성(representative)이
없다고 할 수 있다.
이에 따라, query하려는 instance와 U에 있는 remaining instance간의 similarity를 계산하고, 이를 다른 query strategy와 가중곱을
해주어 query instance가 다른 U 데이터들에 대해 대표성을 띄는 것을 보장해준다.
19. Related Research Areas
- semi-supervised learning :
unlabeled data로 self-training 을 한 후에, supervised learning을 진행하는 과정이 active learning 과 유사한 관계에 있다.
다만 semi-supervised learning은 적은 양의 L로 학습한 모델로 most confident 한 label을 L set 으로 추가시켜 re training 을 시키는
구조이지만 , least confident 한 instance 를 query 하는 active learning과는 complement 한 관계성을 가지고 있다고 할 수 있다.
20. Related Research Areas
- reinforcement learning :
강화학습에서 learner는 "행동"을 통해 주어진 환경과 상호 작용하고 환경으로부터받는
"보상"과 관련하여 행동의 최적 정책을 찾으려고 한다.
"행동" 에 따른 결과를 query strategy 에 따른 informative score라고 하고, 이에 따른
oracle의 labeling을 "보상" 과 같은 응답 체계로 치환하게 되면, active learning과
유사한 구조를 띄게 된다.
21. Additional Discussion
- Deep Active Learning for Text Classification
- Bang An , Wenjun Wu , Huimin Han
- Active Discriminative Text Representation Learning
- Ye Zhang , Matthew Lease , Byron C.
- AILA: Attentive Interactive Labeling Assistant for Document Classification through
Attention-based Deep Neural Networks
- Minsuk Choi 1 , Cheonbok Park 1 , Soyoung Yang 1 , Yonggyu Kim 2 , Jaegul Choo 1 , and Sungsoo (Ray) Hong
22. Deep Active Learning for Text Classification
Text classification 에는 RNN based neural network 를 stack 한 형식을 사용하였다.
Query strategy 는 least confident 방법을 사용하되, softmax 값의 평균값에서 얼마나 벗어났는지를 측정하였다. (아래의 공식)
traditional AL 은 instance 를 sequential 하게 보거나, unlabeled pool 을 보고, k 개를 query 하는 방식을 사용하였다.high computational
expense !!
즉, stream base 는 속도가 너무 느리고, pool base 는 memory cost 가 너무 큰 단점이 존재한다.
이에 따라, batch mode 로 instance 를 query 하는 방법론인 batch-model for active learning 을 사용,
학습의 속도와 computational expense 를 줄이는 효과를 보인다.
red : with batch
green : traditional AL
grey : random sampling
23. Active Discriminant Text Representation Learning
text classification 에 CNN architecture를 사용
모델의 lowest layer인 embedding layer 의 gradient 를 가장 많이 변화시키는 instance를 query하는 EGL
approach 사용 (왼쪽의 첫 번째 formula)
긴 문장에 대해서 성능을 높이기 위해, (1) gradient (2) uncertainty 를 모두 고려할 수 있는 EGL-entropy-beta
model 제안 (왼쪽의 두 번째 formula) 가중치 lambda 는 iteration 이 높아질 수록, 두 번째 term이 smooth 하게
커지게 하는 beta dist 를 따름
24. AILA
기존의 AL은 Oracle 이 라벨만을 부여했지만, 해당
모델에서는 Attention 을 할 단어나 부분을 annotate 해주게
된다. 따라서, loss function이 label 뿐만 아니라, attention 에
대해서도 적용이 되는 구조를 가지게 되고, 이에 따라, query
도 label 과 attention 에 대해 두 가지를 하게 된다.
25. Further Study
- Batch model Active Learning
- Batch Mode Active Learning and Its Application to Medical Image Classification (2006), Steven C. H. Hoi et.al
- Optimal size of Labeled data
- etc ..