SlideShare a Scribd company logo
1 of 60
Download to read offline
Jay Heo1, Junhyeon Park1, Hyewon Jeong1, Kwang joon Kim2,
Juho Lee3, Eunho Yang1 3, Sung Ju Hwang1 3
Cost-Effective Interactive Attention Learning
with Neural Attention Processes
KAIST1, Yonsei University College of Medicine2, AITRICS3
Model Interpretability
Main Network InferenceInput Data
Training
The complex nature of deep neural networks has led to the recent surge of interests
in interpretable models which provide model interpretations.
Model Interpretability
Main Network InferenceInput Data Model Interpretation
Interpretation tool
Inference
The complex nature of deep neural networks has led to the recent surge of interests
in interpretable models which provide model interpretations.
Model Interpretability
The complex nature of deep neural networks has led to the recent surge of interests
in interpretable models which provide model interpretations.
Main Network InferenceInput Data Model Interpretation
Interpretation tool
Inference
Provide explanations for model’s decision.
Challenge: Incorrect & Unreliable Interpretation
Not all machine-generated interpretations are correct or human-understandable.
• Correctness and reliability of a learning model heavily depends on quality and
quantity of training data.
• Neural networks tend to learn non-robust features that help with predictions, but
are not human-perceptible.
Model Interpretation
Quality of training data
Quantity of training data
Challenge: Incorrect & Unreliable Interpretation
Not all machine-generated interpretations are correct or human-understandable.
• Correctness and reliability of a learning model heavily depends on quality and
quantity of training data.
• Neural networks tend to learn non-robust features that help with predictions, but
are not human-perceptible.
Model Interpretation
Quality of training data
Quantity of training data
Whether a model learn
too many non-robust
features during training?
Challenge: Incorrect & Unreliable Interpretation
Not all machine-generated interpretations are correct or human-understandable.
• Correctness and reliability of a learning model heavily depends on quality and
quantity of training data.
• Neural networks tend to learn non-robust features that help with predictions, but
are not human-perceptible.
Model Interpretation
1. Is it correct?
2. Is it understandable
enough to trust?
Quality of training data
Quantity of training data
Whether a model learn
too many non-robust
features during training?
Interactive Learning Framework
Propose an interactive learning framework which iteratively update the model by
interacting with the human supervisors who adjust the provided interpretations.
• Actively use human supervisors as a channel for human-model communications.
Attentional Networks Human Annotator
Learning
Model
Physician
Interpretation
(Attentions)
Model’s decision
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interactive Learning Framework
Propose an interactive learning framework which iteratively update the model by
interacting with the human supervisors who adjust the provided interpretations.
• Actively use human supervisors as a channel for human-model communications.
Attentional Networks Human Annotator
Learning
Model
Physician
Interpretation
(Attentions)
Model’s decision
Annotation
AnnotateRetrain
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
• Retraining the model with scarce human feedback may result in the model overfitting
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
• Retraining the model with scarce human feedback may result in the model overfitting
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
Scarce feedback
Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
• Retraining the model with scarce human feedback may result in the model overfitting
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
Scarce feedback
Retraining
Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
Scarce feedback
Retraining
!
Overfitting
• Retraining the model with scarce human feedback may result in the model overfitting
Challenge: Expensive Human Supervision Cost
Obtaining human feedback on datasets with large numbers of training instances and
features is extremely costly.
• Obtaining feedback on already correct or previously corrected interpretations is
wasteful.
Annotator
Annotate
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈{0, 1}
Big data
Challenge: Expensive Human Supervision Cost
Obtaining human feedback on datasets with large numbers of training instances and
features is extremely costly.
• Obtaining feedback on already correct or previously corrected interpretations is
wasteful.
Annotator
Annotate
Big data
Annotate
Challenge: Expensive Human Supervision Cost
Obtaining human feedback on datasets with large numbers of training instances and
features is extremely costly.
• Obtaining feedback on already correct or previously corrected interpretations is
wasteful.
Annotator
Annotate
Big data
Costly
Annotate
!
Interactive Attention Learning Framework
Domain experts interactively evaluate learned attentions and provide feedbacks to
obtain models that generate human-intuitive interpretations.
Learning
Model
Physician
Attention
Mechanism
LDL
Respiration
Cholesterol
Creatine
BMI
Diabetes
Heart
Failure
Hypertension
Deliver
attention
Annotate attention
Attention0.80.60.3
: Attention
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈ {0, 1}
Influence
Function
MC Dropout
Annotation Mask
Deep
Interpretation
Tool
Correlation & Causal relationship Analysis
Manipulate
Explain
Neural
Attention
Processes
Counterfactual
Estimation
Counterfactual
Estimation
a
c
b
Granger Causality
Interactive Attention Learning Framework
Domain experts interactively evaluate learned attentions and provide feedbacks to
obtain models that generate human-intuitive interpretations.
Learning
Model
Physician
Attention
Mechanism
1. Neural Attention
Processes
LDL
Respiration
Cholesterol
Creatine
BMI
Diabetes
Heart
Failure
Hypertension
Deliver
attention
Annotate attention
Attention0.80.60.3
: Attention
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈ {0, 1}
Influence
Function
MC Dropout
Annotation Mask
Deep
Interpretation
Tool
Correlation & Causal relationship Analysis
Manipulate
Explain
Neural
Attention
Processes
Counterfactual
Estimation
Counterfactual
Estimation
a
c
b
Granger Causality
Interactive Attention Learning Framework
Domain experts interactively evaluate learned attentions and provide feedbacks to
obtain models that generate human-intuitive interpretations.
Learning
Model
Physician
Attention
Mechanism
LDL
Respiration
Cholesterol
Creatine
BMI
Diabetes
Heart
Failure
Hypertension
Deliver
attention
Annotate attention
Attention0.80.60.3
: Attention
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈ {0, 1}
Influence
Function
MC Dropout
Annotation Mask
Deep
Interpretation
Tool
Correlation & Causal relationship Analysis
Manipulate
Explain
Neural
Attention
Processes
Counterfactual
Estimation
1. Neural Attention
Processes
2. Cost-effective Reranking
Counterfactual
Estimation
a
c
b
Granger Causality
Neural Attention Processes (NAP)
NAP naturally reflects the information from the annotation summarization z via amor
tized inference.
Domain
Expert
Annotation
Neural Attention Processes
• NAP learns to summarize delivered annotations to a latent vector, and gives th
e summarization as an additional input to the attention generating network.
Neural Attention Processes (NAP)
NAP naturally reflects the information from the annotation summarization z via amor
tized inference.
Domain
Expert
Annotation
Neural Attention Processes
• NAP learns to summarize delivered annotations to a latent vector, and gives th
e summarization as an additional input to the attention generating network.
Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
Neural Attention Processes (NAP)
NAP minimizes retraining cost by incorporating new labeled instances without retrain
ing and overfitting.
• NAP is trained in a meta-learning fashion for few-shot function estimation, where it
is trained to predict the attention mask of other labeled samples, given a randomly
selected labeled set as context.
!
Attention
"! !!
New Observations
!!"# "!"# !!"#
Context Points
!!
"! #!!!"$ "!"$ !!"$
Further Rounds (s=2,3,4)
Neural Attention Processes (NAP)
NAP minimizes retraining cost by incorporating new labeled instances without retrain
ing and overfitting.
• NAP is trained in a meta-learning fashion for few-shot function estimation, where it
is trained to predict the attention mask of other labeled samples, given a randomly
selected labeled set as context.
!
Attention
"! !!
New Observations
!!"# "!"# !!"#
Context Points
!!
"! #!!!"$ "!"$ !!"$
Further Rounds (s=2,3,4)
Neural Attention Processes (NAP)
NAP minimizes retraining cost by incorporating new labeled instances without retrain
ing and overfitting.
• NAP is trained in a meta-learning fashion for few-shot function estimation, where it
is trained to predict the attention mask of other labeled samples, given a randomly
selected labeled set as context.
!
Attention
"! !!
New Observations
!!"# "!"# !!"#
Context Points
!!
"! #!!!"$ "!"$ !!"$
Further Rounds (s=2,3,4)
Cost-Effective Instance & Features Reranking (CER)
Address the expensive human labeling cost by reranking the instances, features, and
timesteps (for time-series data) by their negative impacts.
Attentional
Network
Domain
Expert
Instance-wise and Feature-wise Reranking
Cost-Effective Instance & Features Reranking (CER)
Address the expensive human labeling cost by reranking the instances, features, and
timesteps (for time-series data) by their negative impacts.
Attentional
Network
Domain
Expert
Instance-wise and Feature-wise Reranking
𝑷
𝑲
…
…
…TrainTrainValid Valid
Estimate
𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!)
Instance-level Reranking
Select Re-rank &
select
Cost-Effective Instance & Features Reranking (CER)
Address the expensive human labeling cost by reranking the instances, features, and
timesteps (for time-series data) by their negative impacts.
Attentional
Network
Domain
Expert
Instance-wise and Feature-wise Reranking
𝑷
𝑲
…
…
…TrainTrainValid Valid
Estimate
𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!)
Instance-level Reranking
Train
Feature-level Reranking
Select Re-rank &
select
Estimate
𝑰(𝒖",$
(&)
) / 𝐕𝐚𝒓 (𝒖",$
(&)
) / 𝜓(𝒖",$
(&)
)
Feature
…
…
Feature
𝑭
Re-rank &
select
CER: 1. Influence Score
Use the influence function (Koh & Liang, 2017) to approximate the impact of
individual training points on the model prediction.
Fish
Dog
Dog
“Dog”
Training
Training data Test Input
[Koh and Liang 17] Understanding Black-box Predictions via Influence Functions, ICML 2017
CER: 2. Uncertainty Score
Measure the negative impacts using the uncertainty which can be measured by Mon
te-Carlo sampling (Gal & Ghahramani, 2016).
Uncertainty-aware Attention Mechanism
• Less expensive approach to measure the negative impacts.
• Assume that instances having high-predictive uncertainties are potential
candidate to be corrected.
Uncertainty
[Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
CER: 2. Uncertainty Score
Measure the negative impacts using the uncertainty which can be measured by Mon
te-Carlo sampling (Gal & Ghahramani, 2016).
Measure instance-wise &
feature-wise uncertainty
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
µ σ, )(N
SpO2
Pulse
Respiration
• Less expensive approach to measure the negative impacts.
• Assume that instances having high-predictive uncertainties are potential
candidate to be corrected.
Uncertainty-aware Attention Mechanism
[Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
Experimental setting – Datasets
Use electronic health records, real estate sales transaction records, and exercise squat
posture correction records for classification and regression tasks.
1. EHR Datasets
1) Cerebral Infarction
2) Cardio Vascular Disease
3) Heart Failure
Binary Classification task
Experimental setting – Datasets
Use electronic health records, real estate sales transaction records, and exercise squat
posture correction records for classification and regression tasks.
1. EHR Datasets
1) Cerebral Infarction
2) Cardio Vascular Disease
3) Heart Failure
2. Real-estate dataset
1) Housing price forecasting
Binary Classification task Regression task
Experimental setting – Datasets
Use electronic health records, real estate sales transaction records, and exercise squat
posture correction records for classification and regression tasks.
1. EHR Datasets
1) Cerebral Infarction
2) Cardio Vascular Disease
3) Heart Failure
2. Real-estate dataset 3. Squat Posture set
1) Housing price forecasting
1) Squat posture correction
Binary Classification task Regression task Multi-label
Classification task
Attention Evaluation Interface – Risk Prediction
Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation
mask 𝑴 𝒕
𝜶
∈ 0, 1 $×& and 𝑴 𝒕
𝜷
∈ 0, 1 $×(×& .
Attention Annotation Platform
(Risk Prediction Task with Counterfactual Estimation)
Real estate in NYC
Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation
mask 𝑴 𝒕
𝜶
∈ 0, 1 $×& and 𝑴 𝒕
𝜷
∈ 0, 1 $×(×& .
Attention Annotation Platform
(Real estate Price Forecasting in New York City)
Action Posture Correction Task
Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation
mask 𝑴 𝒕
𝜶
∈ 0, 1 $×& and 𝑴 𝒕
𝜷
∈ 0, 1 $×(×& .
Attention Annotation Platform
(Action Posture Correction Task)
Experiment Results
Conducted experiments on three risk prediction tasks, one fitness squat, and one real
estate forecasting task.
EHR Fitness
Squat
Real Estate
ForecastingHeart Failure Cerebral Infarction CVD
One-time
Training
RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01
Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01
IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01
Random
Re-ranking
Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02
Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01
IAL
(Cost-effective)
AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01
IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01
Experiment Results
Conducted experiments on three risk prediction tasks, one fitness squat, and one real
estate forecasting task.
EHR Fitness
Squat
Real Estate
ForecastingHeart Failure Cerebral Infarction CVD
One-time
Training
RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01
Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01
IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01
Random
Re-ranking
Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02
Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01
IAL
(Cost-effective)
AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01
IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01
• Random-UA, which is retrained with human attention-level supervision on randomly selected.
samples, performs worse than Random-NAP.
Experiment Results
Conducted experiments on three risk prediction tasks, one fitness squat, and one real
estate forecasting task.
EHR Fitness
Squat
Real Estate
ForecastingHeart Failure Cerebral Infarction CVD
One-time
Training
RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01
Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01
IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01
Random
Re-ranking
Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02
Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01
IAL
(Cost-effective)
AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01
IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01
• Random-UA, which is retrained with human attention-level supervision on randomly selected.
samples, performs worse than Random-NAP.
• IAL-NAP significantly outperforms Random-NAP, showing that the effect of attention annotation
cannot have much effect on the model when the instances are randomly selected.
Experiment Results
Results of Ablation study with proposed IAL-NAP combinations for instance- and feat
ure-level reranking on all tasks.
IAL-NAP Variants EHR Fitness
Squat
Real Estate
ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD
Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01
Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02
Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02
Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02
• For instance-level scoring, influence and uncertainty scores work similarly
Ablation study with IAL-NAP combinations
Experiment Results
Results of Ablation study with proposed IAL-NAP combinations for instance- and feat
ure-level reranking on all tasks.
IAL-NAP Variants EHR Fitness
Squat
Real Estate
ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD
Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01
Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02
Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02
Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02
• For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual
score was the most effective for feature-wise reranking.
Ablation study with IAL-NAP combinations
Experiment Results
Results of Ablation study with proposed IAL-NAP combinations for instance- and feat
ure-level reranking on all tasks.
IAL-NAP Variants EHR Fitness
Squat
Real Estate
ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD
Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01
Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02
Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02
Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02
• For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual
score was the most effective for feature-wise reranking.
• The combination of uncertainty-counterfactual is the most cost-effective solution since it avoids ex
pensive computation of the Hessians.
Ablation study with IAL-NAP combinations
Effect of Neural Attention Processes
Retraining time to retrain examples of human annotation and Mean Response Time
of human annotations on the risk prediction tasks.
(a) Heart Failure (b) Cerebral Infarction (c) CVD
Effect of Neural Attention Processes
Retraining time to retrain examples of human annotation and Mean Response Time
of human annotations on the risk prediction tasks.
(a) Heart Failure (b) Cerebral Infarction (c) CVD
Effect of Neural Attention Processes
Retraining time to retrain examples of human annotation and Mean Response Time
of human annotations on the risk prediction tasks.
(a) Heart Failure (b) Cerebral Infarction (c) CVD
Effect of Cost-Effective Reranking
Change of accuracy with 100 annotations across four rounds (S) between IAL-NAP (bl
ue) vs Random-NAP (red).
(b) Cerebral Infarction(a) Heart Failure (c) CVD (d) Squat
• IAL-NAP uses a smaller number of annotated examples (100 examples) than Rando
m-NAP (400 examples) to improve the model with the comparable accuracy (auc: 0.
6414).
Qualitative Analysis – Risk Prediction
Further analyze the contribution of each feature for a CVD patient (label=1).
A patient records
for Cardio Vascular
Disease
• At s=3, IAL allocated more attention weights on the important feature (Smoking), which the
initial training model missed to attend.
• Age
• Smoking : Whether smoke or
not
• SysBP : Systolic Blood Pressure
• HDL : High-density Lipoprotein
• LDL : Low-density Lipoprotein
(a) Pretrained (b) s=1 (c) s=2
Qualitative Analysis – Risk Prediction
Further analyze the contribution of each feature for a CVD patient (label=1).
A patient records
for Cardio Vascular
Disease
• At s=3, IAL allocated more attention weights on the important feature (Smoking), which the
initial training model missed to attend.
à Clinicians guided the model to learn it since smoking is a key factor to access CVD.
• Age
• Smoking : Whether smoke or
not
• SysBP : Systolic Blood Pressure
• HDL : High-density Lipoprotein
• LDL : Low-density Lipoprotein
(a) Pretrained (b) s=1 (c) s=2
Summary
Propose a novel interactive learning framework which iteratively updates the model
by interacting with the human supervisor via the generated attentions.
• Unlike conventional active learning, IAL allows the human annotators to “actively” interpret,
manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai
ning the main network, by training a “separate” attention generator.
Summary
Propose a novel interactive learning framework which iteratively updates the model
by interacting with the human supervisor via the generated attentions.
• Unlike conventional active learning, IAL allows the human annotators to “actively” interpret,
manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai
ning the main network, by training a “separate” attention generator.
• Neural Attention Processes is a novel attention mechanism which can generate attention on unla
beled instances given few labeled samples, and can incorporate new labeled instances without
retraining and overfitting.
Summary
Propose a novel interactive learning framework which iteratively updates the model
by interacting with the human supervisor via the generated attentions.
• Unlike conventional active learning, IAL allows the human annotators to “actively” interpret,
manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai
ning the main network, by training a “separate” attention generator.
• Neural Attention Processes is a novel attention mechanism which can generate attention on unla
beled instances given few labeled samples, and can incorporate new labeled instances without
retraining and overfitting.
• Our reranking strategy re-ranks the instance and features, which substantially reduces the annot
ation cost and time for high-dimensional inputs.
Thanks

More Related Content

What's hot

Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkFaria Priya
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Issues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsIssues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsDEBJYOTI PAUL
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewLEE HOSEONG
 
Slides ppt
Slides pptSlides ppt
Slides pptbutest
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentationguestac67362
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningDr. Jyoti Obia
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkGauravPandey319
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine LearningJoel Graff
 
Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)spartacus131211
 
Regression and Artificial Neural Network in R
Regression and Artificial Neural Network in RRegression and Artificial Neural Network in R
Regression and Artificial Neural Network in RDr. Vaibhav Kumar
 
Forecasting of Sales using Neural network techniques
Forecasting of Sales using Neural network techniquesForecasting of Sales using Neural network techniques
Forecasting of Sales using Neural network techniquesHitesh Dua
 

What's hot (20)

Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Neural network
Neural networkNeural network
Neural network
 
ANN load forecasting
ANN load forecastingANN load forecasting
ANN load forecasting
 
IROS 2017 Slides
IROS 2017 SlidesIROS 2017 Slides
IROS 2017 Slides
 
Issues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsIssues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural Nets
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 Review
 
Slides ppt
Slides pptSlides ppt
Slides ppt
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
 
Neural networks
Neural networksNeural networks
Neural networks
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentation
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuning
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)
 
Regression and Artificial Neural Network in R
Regression and Artificial Neural Network in RRegression and Artificial Neural Network in R
Regression and Artificial Neural Network in R
 
Forecasting of Sales using Neural network techniques
Forecasting of Sales using Neural network techniquesForecasting of Sales using Neural network techniques
Forecasting of Sales using Neural network techniques
 

Similar to Cost-effective Interactive Attention Learning with Neural Attention Process

Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earningAnirudh Ganguly
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
Machine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning SystemsMachine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning SystemsClarence Chio
 
Learning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentationLearning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentationNAVER Engineering
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...DataScienceConferenc1
 
05012013150050 computerised-paper-evaluation-using-neural-network
05012013150050 computerised-paper-evaluation-using-neural-network05012013150050 computerised-paper-evaluation-using-neural-network
05012013150050 computerised-paper-evaluation-using-neural-networknimmajji
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913IJRAT
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applicationsAnas Arram, Ph.D
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...Wuhan University
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 

Similar to Cost-effective Interactive Attention Learning with Neural Attention Process (20)

Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
ANN.ppt[1].pptx
ANN.ppt[1].pptxANN.ppt[1].pptx
ANN.ppt[1].pptx
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earning
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
Machine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning SystemsMachine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning Systems
 
Learning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentationLearning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentation
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
 
05012013150050 computerised-paper-evaluation-using-neural-network
05012013150050 computerised-paper-evaluation-using-neural-network05012013150050 computerised-paper-evaluation-using-neural-network
05012013150050 computerised-paper-evaluation-using-neural-network
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
 
Deep learning
Deep learningDeep learning
Deep learning
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Endsem AI merged.pdf
Endsem AI merged.pdfEndsem AI merged.pdf
Endsem AI merged.pdf
 

More from MLAI2

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...MLAI2
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient DistillationOnline Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient DistillationMLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningMLAI2
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningMLAI2
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningMLAI2
 
Skill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement LearningSkill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement LearningMLAI2
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsMLAI2
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...MLAI2
 
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set EncodingMini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set EncodingMLAI2
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...MLAI2
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMLAI2
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingMLAI2
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...MLAI2
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...MLAI2
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
Adversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive LearningAdversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive LearningMLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...MLAI2
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...MLAI2
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionMLAI2
 
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...MLAI2
 

More from MLAI2 (20)

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient DistillationOnline Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient Distillation
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 
Skill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement LearningSkill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement Learning
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with Hypergraphs
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
 
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set EncodingMini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Adversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive LearningAdversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive Learning
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability Suppression
 
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
 

Recently uploaded

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Cost-effective Interactive Attention Learning with Neural Attention Process

  • 1. Jay Heo1, Junhyeon Park1, Hyewon Jeong1, Kwang joon Kim2, Juho Lee3, Eunho Yang1 3, Sung Ju Hwang1 3 Cost-Effective Interactive Attention Learning with Neural Attention Processes KAIST1, Yonsei University College of Medicine2, AITRICS3
  • 2. Model Interpretability Main Network InferenceInput Data Training The complex nature of deep neural networks has led to the recent surge of interests in interpretable models which provide model interpretations.
  • 3. Model Interpretability Main Network InferenceInput Data Model Interpretation Interpretation tool Inference The complex nature of deep neural networks has led to the recent surge of interests in interpretable models which provide model interpretations.
  • 4. Model Interpretability The complex nature of deep neural networks has led to the recent surge of interests in interpretable models which provide model interpretations. Main Network InferenceInput Data Model Interpretation Interpretation tool Inference Provide explanations for model’s decision.
  • 5. Challenge: Incorrect & Unreliable Interpretation Not all machine-generated interpretations are correct or human-understandable. • Correctness and reliability of a learning model heavily depends on quality and quantity of training data. • Neural networks tend to learn non-robust features that help with predictions, but are not human-perceptible. Model Interpretation Quality of training data Quantity of training data
  • 6. Challenge: Incorrect & Unreliable Interpretation Not all machine-generated interpretations are correct or human-understandable. • Correctness and reliability of a learning model heavily depends on quality and quantity of training data. • Neural networks tend to learn non-robust features that help with predictions, but are not human-perceptible. Model Interpretation Quality of training data Quantity of training data Whether a model learn too many non-robust features during training?
  • 7. Challenge: Incorrect & Unreliable Interpretation Not all machine-generated interpretations are correct or human-understandable. • Correctness and reliability of a learning model heavily depends on quality and quantity of training data. • Neural networks tend to learn non-robust features that help with predictions, but are not human-perceptible. Model Interpretation 1. Is it correct? 2. Is it understandable enough to trust? Quality of training data Quantity of training data Whether a model learn too many non-robust features during training?
  • 8. Interactive Learning Framework Propose an interactive learning framework which iteratively update the model by interacting with the human supervisors who adjust the provided interpretations. • Actively use human supervisors as a channel for human-model communications. Attentional Networks Human Annotator Learning Model Physician Interpretation (Attentions) Model’s decision 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention
  • 9. Interactive Learning Framework Propose an interactive learning framework which iteratively update the model by interacting with the human supervisors who adjust the provided interpretations. • Actively use human supervisors as a channel for human-model communications. Attentional Networks Human Annotator Learning Model Physician Interpretation (Attentions) Model’s decision Annotation AnnotateRetrain 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention
  • 10. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. • Retraining the model with scarce human feedback may result in the model overfitting Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation
  • 11. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. • Retraining the model with scarce human feedback may result in the model overfitting Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation Scarce feedback
  • 12. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. • Retraining the model with scarce human feedback may result in the model overfitting Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation Scarce feedback Retraining
  • 13. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation Scarce feedback Retraining ! Overfitting • Retraining the model with scarce human feedback may result in the model overfitting
  • 14. Challenge: Expensive Human Supervision Cost Obtaining human feedback on datasets with large numbers of training instances and features is extremely costly. • Obtaining feedback on already correct or previously corrected interpretations is wasteful. Annotator Annotate 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈{0, 1} Big data
  • 15. Challenge: Expensive Human Supervision Cost Obtaining human feedback on datasets with large numbers of training instances and features is extremely costly. • Obtaining feedback on already correct or previously corrected interpretations is wasteful. Annotator Annotate Big data Annotate
  • 16. Challenge: Expensive Human Supervision Cost Obtaining human feedback on datasets with large numbers of training instances and features is extremely costly. • Obtaining feedback on already correct or previously corrected interpretations is wasteful. Annotator Annotate Big data Costly Annotate !
  • 17. Interactive Attention Learning Framework Domain experts interactively evaluate learned attentions and provide feedbacks to obtain models that generate human-intuitive interpretations. Learning Model Physician Attention Mechanism LDL Respiration Cholesterol Creatine BMI Diabetes Heart Failure Hypertension Deliver attention Annotate attention Attention0.80.60.3 : Attention 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈ {0, 1} Influence Function MC Dropout Annotation Mask Deep Interpretation Tool Correlation & Causal relationship Analysis Manipulate Explain Neural Attention Processes Counterfactual Estimation Counterfactual Estimation a c b Granger Causality
  • 18. Interactive Attention Learning Framework Domain experts interactively evaluate learned attentions and provide feedbacks to obtain models that generate human-intuitive interpretations. Learning Model Physician Attention Mechanism 1. Neural Attention Processes LDL Respiration Cholesterol Creatine BMI Diabetes Heart Failure Hypertension Deliver attention Annotate attention Attention0.80.60.3 : Attention 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈ {0, 1} Influence Function MC Dropout Annotation Mask Deep Interpretation Tool Correlation & Causal relationship Analysis Manipulate Explain Neural Attention Processes Counterfactual Estimation Counterfactual Estimation a c b Granger Causality
  • 19. Interactive Attention Learning Framework Domain experts interactively evaluate learned attentions and provide feedbacks to obtain models that generate human-intuitive interpretations. Learning Model Physician Attention Mechanism LDL Respiration Cholesterol Creatine BMI Diabetes Heart Failure Hypertension Deliver attention Annotate attention Attention0.80.60.3 : Attention 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈ {0, 1} Influence Function MC Dropout Annotation Mask Deep Interpretation Tool Correlation & Causal relationship Analysis Manipulate Explain Neural Attention Processes Counterfactual Estimation 1. Neural Attention Processes 2. Cost-effective Reranking Counterfactual Estimation a c b Granger Causality
  • 20. Neural Attention Processes (NAP) NAP naturally reflects the information from the annotation summarization z via amor tized inference. Domain Expert Annotation Neural Attention Processes • NAP learns to summarize delivered annotations to a latent vector, and gives th e summarization as an additional input to the attention generating network.
  • 21. Neural Attention Processes (NAP) NAP naturally reflects the information from the annotation summarization z via amor tized inference. Domain Expert Annotation Neural Attention Processes • NAP learns to summarize delivered annotations to a latent vector, and gives th e summarization as an additional input to the attention generating network.
  • 22. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  • 23. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  • 24. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  • 25. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  • 26. Neural Attention Processes (NAP) NAP minimizes retraining cost by incorporating new labeled instances without retrain ing and overfitting. • NAP is trained in a meta-learning fashion for few-shot function estimation, where it is trained to predict the attention mask of other labeled samples, given a randomly selected labeled set as context. ! Attention "! !! New Observations !!"# "!"# !!"# Context Points !! "! #!!!"$ "!"$ !!"$ Further Rounds (s=2,3,4)
  • 27. Neural Attention Processes (NAP) NAP minimizes retraining cost by incorporating new labeled instances without retrain ing and overfitting. • NAP is trained in a meta-learning fashion for few-shot function estimation, where it is trained to predict the attention mask of other labeled samples, given a randomly selected labeled set as context. ! Attention "! !! New Observations !!"# "!"# !!"# Context Points !! "! #!!!"$ "!"$ !!"$ Further Rounds (s=2,3,4)
  • 28. Neural Attention Processes (NAP) NAP minimizes retraining cost by incorporating new labeled instances without retrain ing and overfitting. • NAP is trained in a meta-learning fashion for few-shot function estimation, where it is trained to predict the attention mask of other labeled samples, given a randomly selected labeled set as context. ! Attention "! !! New Observations !!"# "!"# !!"# Context Points !! "! #!!!"$ "!"$ !!"$ Further Rounds (s=2,3,4)
  • 29. Cost-Effective Instance & Features Reranking (CER) Address the expensive human labeling cost by reranking the instances, features, and timesteps (for time-series data) by their negative impacts. Attentional Network Domain Expert Instance-wise and Feature-wise Reranking
  • 30. Cost-Effective Instance & Features Reranking (CER) Address the expensive human labeling cost by reranking the instances, features, and timesteps (for time-series data) by their negative impacts. Attentional Network Domain Expert Instance-wise and Feature-wise Reranking 𝑷 𝑲 … … …TrainTrainValid Valid Estimate 𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!) Instance-level Reranking Select Re-rank & select
  • 31. Cost-Effective Instance & Features Reranking (CER) Address the expensive human labeling cost by reranking the instances, features, and timesteps (for time-series data) by their negative impacts. Attentional Network Domain Expert Instance-wise and Feature-wise Reranking 𝑷 𝑲 … … …TrainTrainValid Valid Estimate 𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!) Instance-level Reranking Train Feature-level Reranking Select Re-rank & select Estimate 𝑰(𝒖",$ (&) ) / 𝐕𝐚𝒓 (𝒖",$ (&) ) / 𝜓(𝒖",$ (&) ) Feature … … Feature 𝑭 Re-rank & select
  • 32. CER: 1. Influence Score Use the influence function (Koh & Liang, 2017) to approximate the impact of individual training points on the model prediction. Fish Dog Dog “Dog” Training Training data Test Input [Koh and Liang 17] Understanding Black-box Predictions via Influence Functions, ICML 2017
  • 33. CER: 2. Uncertainty Score Measure the negative impacts using the uncertainty which can be measured by Mon te-Carlo sampling (Gal & Ghahramani, 2016). Uncertainty-aware Attention Mechanism • Less expensive approach to measure the negative impacts. • Assume that instances having high-predictive uncertainties are potential candidate to be corrected. Uncertainty [Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
  • 34. CER: 2. Uncertainty Score Measure the negative impacts using the uncertainty which can be measured by Mon te-Carlo sampling (Gal & Ghahramani, 2016). Measure instance-wise & feature-wise uncertainty 0.80.60.3 Low uncertainty High uncertainty High uncertainty µ σ, )(N SpO2 Pulse Respiration • Less expensive approach to measure the negative impacts. • Assume that instances having high-predictive uncertainties are potential candidate to be corrected. Uncertainty-aware Attention Mechanism [Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
  • 35. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  • 36. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  • 37. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  • 38. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  • 39. Experimental setting – Datasets Use electronic health records, real estate sales transaction records, and exercise squat posture correction records for classification and regression tasks. 1. EHR Datasets 1) Cerebral Infarction 2) Cardio Vascular Disease 3) Heart Failure Binary Classification task
  • 40. Experimental setting – Datasets Use electronic health records, real estate sales transaction records, and exercise squat posture correction records for classification and regression tasks. 1. EHR Datasets 1) Cerebral Infarction 2) Cardio Vascular Disease 3) Heart Failure 2. Real-estate dataset 1) Housing price forecasting Binary Classification task Regression task
  • 41. Experimental setting – Datasets Use electronic health records, real estate sales transaction records, and exercise squat posture correction records for classification and regression tasks. 1. EHR Datasets 1) Cerebral Infarction 2) Cardio Vascular Disease 3) Heart Failure 2. Real-estate dataset 3. Squat Posture set 1) Housing price forecasting 1) Squat posture correction Binary Classification task Regression task Multi-label Classification task
  • 42. Attention Evaluation Interface – Risk Prediction Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation mask 𝑴 𝒕 𝜶 ∈ 0, 1 $×& and 𝑴 𝒕 𝜷 ∈ 0, 1 $×(×& . Attention Annotation Platform (Risk Prediction Task with Counterfactual Estimation)
  • 43. Real estate in NYC Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation mask 𝑴 𝒕 𝜶 ∈ 0, 1 $×& and 𝑴 𝒕 𝜷 ∈ 0, 1 $×(×& . Attention Annotation Platform (Real estate Price Forecasting in New York City)
  • 44. Action Posture Correction Task Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation mask 𝑴 𝒕 𝜶 ∈ 0, 1 $×& and 𝑴 𝒕 𝜷 ∈ 0, 1 $×(×& . Attention Annotation Platform (Action Posture Correction Task)
  • 45. Experiment Results Conducted experiments on three risk prediction tasks, one fitness squat, and one real estate forecasting task. EHR Fitness Squat Real Estate ForecastingHeart Failure Cerebral Infarction CVD One-time Training RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01 Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01 IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01 Random Re-ranking Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02 Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01 IAL (Cost-effective) AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01 IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01
  • 46. Experiment Results Conducted experiments on three risk prediction tasks, one fitness squat, and one real estate forecasting task. EHR Fitness Squat Real Estate ForecastingHeart Failure Cerebral Infarction CVD One-time Training RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01 Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01 IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01 Random Re-ranking Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02 Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01 IAL (Cost-effective) AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01 IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01 • Random-UA, which is retrained with human attention-level supervision on randomly selected. samples, performs worse than Random-NAP.
  • 47. Experiment Results Conducted experiments on three risk prediction tasks, one fitness squat, and one real estate forecasting task. EHR Fitness Squat Real Estate ForecastingHeart Failure Cerebral Infarction CVD One-time Training RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01 Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01 IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01 Random Re-ranking Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02 Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01 IAL (Cost-effective) AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01 IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01 • Random-UA, which is retrained with human attention-level supervision on randomly selected. samples, performs worse than Random-NAP. • IAL-NAP significantly outperforms Random-NAP, showing that the effect of attention annotation cannot have much effect on the model when the instances are randomly selected.
  • 48. Experiment Results Results of Ablation study with proposed IAL-NAP combinations for instance- and feat ure-level reranking on all tasks. IAL-NAP Variants EHR Fitness Squat Real Estate ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01 Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02 Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02 Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02 • For instance-level scoring, influence and uncertainty scores work similarly Ablation study with IAL-NAP combinations
  • 49. Experiment Results Results of Ablation study with proposed IAL-NAP combinations for instance- and feat ure-level reranking on all tasks. IAL-NAP Variants EHR Fitness Squat Real Estate ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01 Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02 Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02 Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02 • For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual score was the most effective for feature-wise reranking. Ablation study with IAL-NAP combinations
  • 50. Experiment Results Results of Ablation study with proposed IAL-NAP combinations for instance- and feat ure-level reranking on all tasks. IAL-NAP Variants EHR Fitness Squat Real Estate ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01 Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02 Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02 Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02 • For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual score was the most effective for feature-wise reranking. • The combination of uncertainty-counterfactual is the most cost-effective solution since it avoids ex pensive computation of the Hessians. Ablation study with IAL-NAP combinations
  • 51. Effect of Neural Attention Processes Retraining time to retrain examples of human annotation and Mean Response Time of human annotations on the risk prediction tasks. (a) Heart Failure (b) Cerebral Infarction (c) CVD
  • 52. Effect of Neural Attention Processes Retraining time to retrain examples of human annotation and Mean Response Time of human annotations on the risk prediction tasks. (a) Heart Failure (b) Cerebral Infarction (c) CVD
  • 53. Effect of Neural Attention Processes Retraining time to retrain examples of human annotation and Mean Response Time of human annotations on the risk prediction tasks. (a) Heart Failure (b) Cerebral Infarction (c) CVD
  • 54. Effect of Cost-Effective Reranking Change of accuracy with 100 annotations across four rounds (S) between IAL-NAP (bl ue) vs Random-NAP (red). (b) Cerebral Infarction(a) Heart Failure (c) CVD (d) Squat • IAL-NAP uses a smaller number of annotated examples (100 examples) than Rando m-NAP (400 examples) to improve the model with the comparable accuracy (auc: 0. 6414).
  • 55. Qualitative Analysis – Risk Prediction Further analyze the contribution of each feature for a CVD patient (label=1). A patient records for Cardio Vascular Disease • At s=3, IAL allocated more attention weights on the important feature (Smoking), which the initial training model missed to attend. • Age • Smoking : Whether smoke or not • SysBP : Systolic Blood Pressure • HDL : High-density Lipoprotein • LDL : Low-density Lipoprotein (a) Pretrained (b) s=1 (c) s=2
  • 56. Qualitative Analysis – Risk Prediction Further analyze the contribution of each feature for a CVD patient (label=1). A patient records for Cardio Vascular Disease • At s=3, IAL allocated more attention weights on the important feature (Smoking), which the initial training model missed to attend. à Clinicians guided the model to learn it since smoking is a key factor to access CVD. • Age • Smoking : Whether smoke or not • SysBP : Systolic Blood Pressure • HDL : High-density Lipoprotein • LDL : Low-density Lipoprotein (a) Pretrained (b) s=1 (c) s=2
  • 57. Summary Propose a novel interactive learning framework which iteratively updates the model by interacting with the human supervisor via the generated attentions. • Unlike conventional active learning, IAL allows the human annotators to “actively” interpret, manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai ning the main network, by training a “separate” attention generator.
  • 58. Summary Propose a novel interactive learning framework which iteratively updates the model by interacting with the human supervisor via the generated attentions. • Unlike conventional active learning, IAL allows the human annotators to “actively” interpret, manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai ning the main network, by training a “separate” attention generator. • Neural Attention Processes is a novel attention mechanism which can generate attention on unla beled instances given few labeled samples, and can incorporate new labeled instances without retraining and overfitting.
  • 59. Summary Propose a novel interactive learning framework which iteratively updates the model by interacting with the human supervisor via the generated attentions. • Unlike conventional active learning, IAL allows the human annotators to “actively” interpret, manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai ning the main network, by training a “separate” attention generator. • Neural Attention Processes is a novel attention mechanism which can generate attention on unla beled instances given few labeled samples, and can incorporate new labeled instances without retraining and overfitting. • Our reranking strategy re-ranks the instance and features, which substantially reduces the annot ation cost and time for high-dimensional inputs.