Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

•

0 likes•93 views

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning Paper link : https://arxiv.org/abs/2103.02193 CVPR 2021 accepted

Technology

Adaptive Consistency Regularization for
Semi-Supervised Transfer Learning
Abuduweili et al. (CVPR 2021)
Dongmin Choi
Yonsei University Translational Artificial Intelligence Lab

Introduction
Semi-Supervised Learning (SSL)
• Effectively leveraging both labeled and unlabeled data
• Three main approaches:
1) consistency based regularization
2) entropy minimization
3) pseudo label

Introduction
Transfer Learning
• The powerful pre-trained model
1) excellent transferability
2) generalization capacity
• Zhou et al.
1) the benefit of SSL are smaller when trained from a pre-trained model
2) combining SSL and transfer learning can solve the domain gap
[Zhou et al, When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets, arXiv 2018]

Introduction
A Semi-Supervised Transfer Learning Framework
• Extend consistency regularization in SSL to adapt the
inductive transfer learning
• Two essential components:
1) Adaptive Knowledge Consistency (AKC)
- transfer knowledge from the pre-trained model
2) Adaptive Representation Consistency (ARC)
- utilize unlabeled examples to adjust the representation

Related Work
Domain Adaptation
• Tackle the sample selection bias btw the training and test data
• Generate domain invariant representation over the training set
• 내용 추가 필요

Related Work
Semi-Supervised Learning
• Consistency based regularization
- hypothesis : the decision boundary should not pass through high-
density areas
→ two close inputs are expected to have the same label
[Engelen et al, A survey on semi-supervised learning, Machine Learning 2020

Related Work
Semi-Supervised Learning
• П-model
[Laine, Temporal Ensembling for Semi-Supervised Learning, ICLR 2017
Targets can be noisy
prior network evaluations

Related Work
Semi-Supervised Learning
• Mean Teacher
[Tarvainen, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, NIPS 2017
Averages model weights instead of label predictions

Related Work
Semi-Supervised Learning
• FixMatch
[Sohn et al., FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence, NeurIPS 2020
Consistency regularization + Pseudo labeling

The Proposed Framework
𝑫𝒕
𝒍
𝑫𝒕
𝒖
𝑭𝜽𝟎
𝑭𝜽
𝑮𝜽𝟎
𝑮𝜽
𝜃∗
, 𝜙∗
= arg min
𝜃,𝜙
∑𝑖=1
𝑛
𝐿CE 𝜃, 𝜙; 𝑥𝑙
𝑖
+ 𝑅 𝜃

The Proposed Framework
𝑅𝐾 =
1
𝐵𝑙 + 𝐵𝑢
෍
𝑥𝑖∈𝐿∪𝑈
𝑤K
𝑖
KL 𝐹𝜃0 𝑥𝑖
, 𝐹𝜃 𝑥𝑖
1. Adaptive Knowledge Consistency (AKC)

The Proposed Framework
𝑅𝐾 =
1
𝐵𝑙 + 𝐵𝑢
෍
𝑥𝑖∈𝐿∪𝑈
𝒘𝐊
𝒊
KL 𝐹𝜃0 𝑥𝑖
, 𝐹𝜃 𝑥𝑖
1. Adaptive Knowledge Consistency (AKC)
Sample importance 𝒘𝐊
𝒊
= 𝐈 𝐇 𝐩𝒔
𝒊
≤ 𝝐𝐊
- An entropy function H p𝑠
𝑖 = − ∑𝑗=1
𝐶𝑠
p𝑠,𝑗
𝑖
log p𝑠,𝑗
𝑖
- I : a hard entropy-gate function (calculated entropy → binary sample importance)

The Proposed Framework
2. Adaptive Representation Consistency (ARC)
Maximum Mean Discrepancies (MMD)
to measure the distance
(Let’s skip the details!)

The Proposed Framework
Summarization of the Framework
𝐿 𝜃, 𝜙 =
1
𝑛
෍
𝑖=1
𝑛
𝐿CE 𝜃, 𝜙; 𝑥𝑙
𝑖
+ 𝜆S𝐿S 𝑥𝑢
𝑖
+ 𝜆K𝑅K 𝑥𝑙
𝑖
, 𝑥𝑢
𝑖
+ 𝜆R𝐿R 𝑥𝑙
𝑖
, 𝑥𝑢
𝑖
1
2
3
4
1 2 3 4

Experiments
The actual sample selected ratio in ARC and AKC
Near 0.9
- exclude hard samples

Experiments
In Fully Supervised Transfer Learning

Conclusion
Two regularization methods : AKC and ARC
• Competitive among S.O.T.A SSL methods
• Best performance among several baseline methods on various
transfer learning benchmarks
• Can be used for more general transfer learning and (semi-)
supervised learning frameworks

The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks

Naver learning to rank question answer pairs using hrde-ltc

NAVER Engineering

The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence. Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model. In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.

ViT (Vision Transformer) Review [CDM]

Dongmin Choi

Icml2018 naver review

NAVER Engineering

Relational knowledge distillation

NAVER Engineering

Knowledge distillation aims at transferring “knowledge” acquired in one model (teacher) to another model (student) that is typically smaller. Previous approaches can be expressed as a form of training the student with output activations of data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (Relational KD), that transfers relations among data examples represented by the teacher. As concrete realizations of Relational KD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different benchmark tasks show that the Relational KD improves the performance of the educated student networks with a significant margin, and even outperforms the teacher’s performance.

Review : Prototype Mixture Models for Few-shot Semantic Segmentation

Dongmin Choi

Review : Rethinking Pre-training and Self-training

Dongmin Choi

Review: Incremental Few-shot Instance Segmentation [CDM]

Dongmin Choi

Deep neural networks (DNN) have recently shown promising performances in various areas. Although DNNs are very powerful, a large number of network parameters requires substantial storage and memory bandwidth which hinders them from being applied to actual embedded systems. Many researchers have sought ways of model compression to reduce the size of a network with minimal performance degradation. Among them, a method called knowledge transfer is to train the student network with a stronger teacher network. In this paper, we propose a method to overcome the limitations of conventional knowledge transfer methods and improve the performance of a student network. An auto-encoder is used in an unsupervised manner to extract compact factors which are defined as compressed feature maps of the teacher network. When using the factors to train the student network, we observed that the performance of the student network becomes better than the ones with other conventional knowledge transfer methods because factors contain paraphrased compact information of the teacher network that is easy for the student network to understand.

Test for AI model

Arithmer Inc.

This is slides used at Arithmer seminar given by Dr. Masaaki Uesaka at Arithmer inc. It is a summary of recent methods for quality assurance of machine learning model. Arithmer Seminar is weekly held, where professionals from within our company give lectures on their respective expertise. Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。 Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.

[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection

Seoul National University

CIKM 2019 In this paper, we propose a novel method for a sentence-level answer-selection task that is one of the fundamental problems in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TRECQA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance on both datasets.

Explainable AI

Arithmer Inc.

Slide for Arithmer Seminar given by Dr. Daisuke Sato (Arithmer) at Arithmer inc. The topic is on "explainable AI". "Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise. The slides are made by the lecturer from outside our company, and shared here with his/her permission. Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。 Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.

Tutorial on Deep Generative Models

MLReview

Network recasting

NAVER Engineering

○ 개요 현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다. Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.

Backbone can not be trained at once rolling back to pre trained network for p...

NAVER Engineering

Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Kenta Oono

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...

Sangwoo Mo

How much position information do convolutional neural networks encode? review...

Dongmin Choi

Comparing Incremental Learning Strategies for Convolutional Neural Networks

Vincenzo Lomonaco

In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications. If you are interested in this work please cite: Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing. For further information visit my website: http://www.vincenzolomonaco.com/

Dual Learning for Machine Translation (NIPS 2016)

Toru Fujino

MaxEnt (Loglinear) Models - Overview

ananth

Knowledge distillation deeplab

Frozen Paradise

Model-Based Reinforcement Learning @NIPS2017

mooopan

Deep Learning for Computer Vision: A comparision between Convolutional Neural...

Vincenzo Lomonaco

In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general. However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of “intelligence”. The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them. CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems. HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain. In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.

One shot learning

Vuong Ho Ngoc

[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...

Kodaira Tomonori

Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021

Chris Ohk

Orchestration Graphs: Enabling Rich Learning Scenarios at Scale

Stian Håklev

What's hot

Generative Models for General Audiences

Sangwoo Mo

Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]

Dongmin Choi

Paraphrasing complex network

NAVER Engineering

Test for AI model

Arithmer Inc.

[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection

Seoul National University

Explainable AI

Arithmer Inc.

Tutorial on Deep Generative Models

MLReview

Network recasting

NAVER Engineering

Backbone can not be trained at once rolling back to pre trained network for p...

NAVER Engineering

Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Kenta Oono

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...

Sangwoo Mo

How much position information do convolutional neural networks encode? review...

Dongmin Choi

Comparing Incremental Learning Strategies for Convolutional Neural Networks

Vincenzo Lomonaco

Dual Learning for Machine Translation (NIPS 2016)

Toru Fujino

MaxEnt (Loglinear) Models - Overview

ananth

Knowledge distillation deeplab

Frozen Paradise

Model-Based Reinforcement Learning @NIPS2017

mooopan

Deep Learning for Computer Vision: A comparision between Convolutional Neural...

Vincenzo Lomonaco

One shot learning

Vuong Ho Ngoc

[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...

Kodaira Tomonori

What's hot (20)

Generative Models for General Audiences

Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]

Paraphrasing complex network

Test for AI model

[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection

Explainable AI

Tutorial on Deep Generative Models

Network recasting

Backbone can not be trained at once rolling back to pre trained network for p...

Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...

How much position information do convolutional neural networks encode? review...

Comparing Incremental Learning Strategies for Convolutional Neural Networks

Dual Learning for Machine Translation (NIPS 2016)

MaxEnt (Loglinear) Models - Overview

Knowledge distillation deeplab

Model-Based Reinforcement Learning @NIPS2017

Deep Learning for Computer Vision: A comparision between Convolutional Neural...

One shot learning

[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...

Similar to Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021

Chris Ohk

Orchestration Graphs: Enabling Rich Learning Scenarios at Scale

Stian Håklev

Distilling Linguistic Context for Language Model Compression

GeonDoPark1

A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, transfers the knowledge on individual word representations learned without restrictions. In this paper, inspired by the recent observations that language representations are relatively positioned and have more semantic knowledge as a whole, we present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships across representations: Word Relation and Layer Transforming Relation. Unlike other recent distillation techniques for the language models, our contextual distillation does not have any restrictions on architectural changes between teacher and student. We validate the effectiveness of our method on challenging benchmarks of language understanding tasks, not only in architectures of various sizes, but also in combination with DynaBERT, the recently proposed adaptive size pruning method.

Distilling Linguistic Context for Language Model Compression

Gyeongman Kim

Abstract: A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, transfers the knowledge on individual word representations learned without restrictions. In this paper, inspired by the recent observations that language representations are relatively positioned and have more semantic knowledge as a whole, we present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships across representations: Word Relation and Layer Transforming Relation. Unlike other recent distillation techniques for the language models, our contextual distillation does not have any restrictions on architectural changes between teacher and student. We validate the effectiveness of our method on challenging benchmarks of language understanding tasks, not only in architectures of various sizes, but also in combination with DynaBERT, the recently proposed adaptive size pruning method.

Hybrid Construction Heuristics for Vehicle Routing ProblemHok Lie

2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...

広樹本間

Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...

BOHR International Journal of Research on Natural Language Computing

Boosting based Transfer Learning

Ashok Venkatesan

AttSum: Joint Learning of Focusing and Summarization with Neural Attention

Kodaira Tomonori

ssc_icml13Guy Lebanon

Reference Scope Identification of Citances Using Convolutional Neural Network

Saurav Jha

In the task of summarization of a scientific paper, a lot of information stands to be gained about a reference paper, from the papers that cite it. Automatically generating the reference scope (the span of cited text) in a reference paper, corresponding to citances (sentences in the citing papers that cite it) has great significance in preparing a structured summary of the reference paper. We treat this task as a binary classification problem, by extracting feature vectors from pairs of citances and reference sentences. These features are lexical, corpus-based, surface and knowledge-based. We extend the current feature set employed for reference-citance pair identification in the current state-of-the-art system. Using these features, we present a novel classification approach for this task, that employs a deep Convolutional Neural Network along with two boosting ensemble algorithms. We outperform the existing state-of-the- art for distinguishing between cited spans and non-cited spans of text in the reference paper.

Automated Essay Scoring Using Efficient Transformer-Based Language Models

Nat Rice

LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)

Yun Huang

This is the slides for our paper in LAK '21 conference: Yun Huang, Nikki G. Lobczowski, J. Elizabeth Richey, Elizabeth A. McLaugh- lin, Michael W. Asher, Judith M. Harackiewicz, Vincent Aleven, and Kenneth R. Koedinger. 2021. A General Multi-method Approach to Data-Driven Re- design of Tutoring Systems. In LAK21: 11th International Learning Analytics and Knowledge Conference (LAK21), April 12–16, 2021, Irvine, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3448139.3448155 Abstract: Analytics of student learning data are increasingly important for continuous redesign and improvement of tutoring systems and courses. There is still a lack of general guidance on converting analytics into better system design, and on combining multiple methods to maximally improve a tutor. We present a multi-method approach to data-driven redesign of tutoring systems and its empirical evaluation. Our approach systematically combines existing and new learning analytics and instructional design methods. In particular, our methods involve identifying difficult skills and creating focused tasks for learning these difficult skills effectively following content redesign strategies derived from analytics. In our past work, we applied this approach to redesigning an algebraic modeling unit and found initial evidence of its effectiveness. In the current work, we extended this approach and applied it to redesigning two other tutor units in addition to a second iteration of redesigning the previously redesigned unit. We conducted a one-month classroom experiment with 129 high school students. Compared to the original tutor, the redesigned tutor led to significantly higher learning outcomes, with time mainly allocated to focused tasks rather than original full tasks. Moreover, it reduced over- and under-practice, yielded a more effective practice experience, and selected skills progressing from easier to harder to a greater degree. Our work provides empirical evidence of the effectiveness and generality of a multi-method approach to data-driven instructional redesign.

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...

Aalto University

Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...

Deren Lei

Deep reinforcement learning (RL) has been a commonly-used strategy for the abstractive summarization task to address both the exposure bias and non-differentiable task issues. However, the conventional reward ROUGE-L simply looks for exact n-grams matches between candidates and annotated references, which inevitably makes the generated sentences repetitive and incoherent. In this paper, we explore the practicability of utilizing the distributional semantics to measure the matching degrees. Our proposed distributional semantics reward has distinct superiority in capturing the lexical and compositional diversity of natural language.

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

Jayavardhan Reddy Peddamail

Deep Learning for Information Retrieval: Models, Progress, & Opportunities

Matthew Lease

2211 APSIPA

WarNik Chow

3ways to improve semantic segmentation

Frozen Paradise

Your Classifier is Secretly an Energy based model and you should treat it lik...

Seunghyun Hwang

Similar to Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning (20)

Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021

Orchestration Graphs: Enabling Rich Learning Scenarios at Scale

Distilling Linguistic Context for Language Model Compression

Hybrid Construction Heuristics for Vehicle Routing Problem

2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...

Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...

Boosting based Transfer Learning

AttSum: Joint Learning of Focusing and Summarization with Neural Attention

ssc_icml13

Reference Scope Identification of Citances Using Convolutional Neural Network

Automated Essay Scoring Using Efficient Transformer-Based Language Models

LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...

Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

Deep Learning for Information Retrieval: Models, Progress, & Opportunities

2211 APSIPA

3ways to improve semantic segmentation

Your Classifier is Secretly an Energy based model and you should treat it lik...

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

The Future of Platform Engineering

Jemma Hussein Allen

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Abida Shariff

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Bits & Pixels using AI for Good.........

Alison B. Lowndes

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

PHP Frameworks: I want to break free (IPC Berlin 2024)

Ralf Eggert

In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development. This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.

ODC, Data Fabric and Architecture User Group

CatarinaPereira64715

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

The Future of Platform Engineering

Search and Society: Reimagining Information Access for Radical Futures

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Epistemic Interaction - tuning interfaces to provide information for AI support

The Art of the Pitch: WordPress Relationships and Sales

UiPath Test Automation using UiPath Test Suite series, part 4

Bits & Pixels using AI for Good.........

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Key Trends Shaping the Future of Infrastructure.pdf

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

GraphRAG is All You need? LLM & Knowledge Graph

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

PHP Frameworks: I want to break free (IPC Berlin 2024)

ODC, Data Fabric and Architecture User Group

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

1. Adaptive Consistency Regularization for Semi-Supervised Transfer Learning Abuduweili et al. (CVPR 2021) Dongmin Choi Yonsei University Translational Artificial Intelligence Lab

2. Introduction Semi-Supervised Learning (SSL) • Effectively leveraging both labeled and unlabeled data • Three main approaches: 1) consistency based regularization 2) entropy minimization 3) pseudo label

3. Introduction Transfer Learning • The powerful pre-trained model 1) excellent transferability 2) generalization capacity • Zhou et al. 1) the benefit of SSL are smaller when trained from a pre-trained model 2) combining SSL and transfer learning can solve the domain gap [Zhou et al, When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets, arXiv 2018]

4. Introduction A Semi-Supervised Transfer Learning Framework • Extend consistency regularization in SSL to adapt the inductive transfer learning • Two essential components: 1) Adaptive Knowledge Consistency (AKC) - transfer knowledge from the pre-trained model 2) Adaptive Representation Consistency (ARC) - utilize unlabeled examples to adjust the representation

5. Related Work Domain Adaptation • Tackle the sample selection bias btw the training and test data • Generate domain invariant representation over the training set • 내용 추가 필요

6. Related Work Semi-Supervised Learning • Consistency based regularization - hypothesis : the decision boundary should not pass through high- density areas → two close inputs are expected to have the same label [Engelen et al, A survey on semi-supervised learning, Machine Learning 2020

7. Related Work Semi-Supervised Learning • П-model [Laine, Temporal Ensembling for Semi-Supervised Learning, ICLR 2017 Targets can be noisy prior network evaluations

8. Related Work Semi-Supervised Learning • Mean Teacher [Tarvainen, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, NIPS 2017 Averages model weights instead of label predictions

9. Related Work Semi-Supervised Learning • FixMatch [Sohn et al., FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence, NeurIPS 2020 Consistency regularization + Pseudo labeling

10. The Proposed Framework

11. The Proposed Framework 𝑫𝒕 𝒍 𝑫𝒕 𝒖 𝑭𝜽𝟎 𝑭𝜽 𝑮𝜽𝟎 𝑮𝜽 𝜃∗ , 𝜙∗ = arg min 𝜃,𝜙 ∑𝑖=1 𝑛 𝐿CE 𝜃, 𝜙; 𝑥𝑙 𝑖 + 𝑅 𝜃

12. The Proposed Framework 𝑅𝐾 = 1 𝐵𝑙 + 𝐵𝑢 ෍ 𝑥𝑖∈𝐿∪𝑈 𝑤K 𝑖 KL 𝐹𝜃0 𝑥𝑖 , 𝐹𝜃 𝑥𝑖 1. Adaptive Knowledge Consistency (AKC)

13. The Proposed Framework 𝑅𝐾 = 1 𝐵𝑙 + 𝐵𝑢 ෍ 𝑥𝑖∈𝐿∪𝑈 𝒘𝐊 𝒊 KL 𝐹𝜃0 𝑥𝑖 , 𝐹𝜃 𝑥𝑖 1. Adaptive Knowledge Consistency (AKC) Sample importance 𝒘𝐊 𝒊 = 𝐈 𝐇 𝐩𝒔 𝒊 ≤ 𝝐𝐊 - An entropy function H p𝑠 𝑖 = − ∑𝑗=1 𝐶𝑠 p𝑠,𝑗 𝑖 log p𝑠,𝑗 𝑖 - I : a hard entropy-gate function (calculated entropy → binary sample importance)

14. The Proposed Framework 2. Adaptive Representation Consistency (ARC) Maximum Mean Discrepancies (MMD) to measure the distance (Let’s skip the details!)

15. The Proposed Framework Summarization of the Framework 𝐿 𝜃, 𝜙 = 1 𝑛 ෍ 𝑖=1 𝑛 𝐿CE 𝜃, 𝜙; 𝑥𝑙 𝑖 + 𝜆S𝐿S 𝑥𝑢 𝑖 + 𝜆K𝑅K 𝑥𝑙 𝑖 , 𝑥𝑢 𝑖 + 𝜆R𝐿R 𝑥𝑙 𝑖 , 𝑥𝑢 𝑖 1 2 3 4 1 2 3 4

16. Experiments Results on CUB-200-2011

17. Experiments Results on MURA

18. Experiments Results on CIFAR-10

19. Experiments Results on CIFAR-10

20. Experiments The actual sample selected ratio in ARC and AKC Near 0.9 - exclude hard samples

21. Experiments In Fully Supervised Transfer Learning

22. Conclusion Two regularization methods : AKC and ARC • Competitive among S.O.T.A SSL methods • Best performance among several baseline methods on various transfer learning benchmarks • Can be used for more general transfer learning and (semi-) supervised learning frameworks

23. Thank you

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Similar to Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning (20)

More from Dongmin Choi

More from Dongmin Choi (17)

Recently uploaded

Recently uploaded (20)

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning