Successfully reported this slideshow.
Your SlideShare is downloading. ×

Cost-effective Interactive Attention Learning with Neural Attention Process

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 60 Ad

Cost-effective Interactive Attention Learning with Neural Attention Process

Download to read offline

We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost.

We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Cost-effective Interactive Attention Learning with Neural Attention Process (20)

Advertisement

More from MLAI2 (20)

Recently uploaded (20)

Advertisement

Cost-effective Interactive Attention Learning with Neural Attention Process

  1. 1. Jay Heo1, Junhyeon Park1, Hyewon Jeong1, Kwang joon Kim2, Juho Lee3, Eunho Yang1 3, Sung Ju Hwang1 3 Cost-Effective Interactive Attention Learning with Neural Attention Processes KAIST1, Yonsei University College of Medicine2, AITRICS3
  2. 2. Model Interpretability Main Network InferenceInput Data Training The complex nature of deep neural networks has led to the recent surge of interests in interpretable models which provide model interpretations.
  3. 3. Model Interpretability Main Network InferenceInput Data Model Interpretation Interpretation tool Inference The complex nature of deep neural networks has led to the recent surge of interests in interpretable models which provide model interpretations.
  4. 4. Model Interpretability The complex nature of deep neural networks has led to the recent surge of interests in interpretable models which provide model interpretations. Main Network InferenceInput Data Model Interpretation Interpretation tool Inference Provide explanations for model’s decision.
  5. 5. Challenge: Incorrect & Unreliable Interpretation Not all machine-generated interpretations are correct or human-understandable. • Correctness and reliability of a learning model heavily depends on quality and quantity of training data. • Neural networks tend to learn non-robust features that help with predictions, but are not human-perceptible. Model Interpretation Quality of training data Quantity of training data
  6. 6. Challenge: Incorrect & Unreliable Interpretation Not all machine-generated interpretations are correct or human-understandable. • Correctness and reliability of a learning model heavily depends on quality and quantity of training data. • Neural networks tend to learn non-robust features that help with predictions, but are not human-perceptible. Model Interpretation Quality of training data Quantity of training data Whether a model learn too many non-robust features during training?
  7. 7. Challenge: Incorrect & Unreliable Interpretation Not all machine-generated interpretations are correct or human-understandable. • Correctness and reliability of a learning model heavily depends on quality and quantity of training data. • Neural networks tend to learn non-robust features that help with predictions, but are not human-perceptible. Model Interpretation 1. Is it correct? 2. Is it understandable enough to trust? Quality of training data Quantity of training data Whether a model learn too many non-robust features during training?
  8. 8. Interactive Learning Framework Propose an interactive learning framework which iteratively update the model by interacting with the human supervisors who adjust the provided interpretations. • Actively use human supervisors as a channel for human-model communications. Attentional Networks Human Annotator Learning Model Physician Interpretation (Attentions) Model’s decision 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention
  9. 9. Interactive Learning Framework Propose an interactive learning framework which iteratively update the model by interacting with the human supervisors who adjust the provided interpretations. • Actively use human supervisors as a channel for human-model communications. Attentional Networks Human Annotator Learning Model Physician Interpretation (Attentions) Model’s decision Annotation AnnotateRetrain 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention
  10. 10. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. • Retraining the model with scarce human feedback may result in the model overfitting Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation
  11. 11. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. • Retraining the model with scarce human feedback may result in the model overfitting Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation Scarce feedback
  12. 12. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. • Retraining the model with scarce human feedback may result in the model overfitting Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation Scarce feedback Retraining
  13. 13. Challenge: Model Retraining Cost To reflect human feedback, the model needs to be retrained, which is costly. Physician Learning Model Annotated Examples 0.80.60.3 Low uncertainty High uncertainty High uncertainty : Attention Interpretation Scarce feedback Retraining ! Overfitting • Retraining the model with scarce human feedback may result in the model overfitting
  14. 14. Challenge: Expensive Human Supervision Cost Obtaining human feedback on datasets with large numbers of training instances and features is extremely costly. • Obtaining feedback on already correct or previously corrected interpretations is wasteful. Annotator Annotate 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈{0, 1} Big data
  15. 15. Challenge: Expensive Human Supervision Cost Obtaining human feedback on datasets with large numbers of training instances and features is extremely costly. • Obtaining feedback on already correct or previously corrected interpretations is wasteful. Annotator Annotate Big data Annotate
  16. 16. Challenge: Expensive Human Supervision Cost Obtaining human feedback on datasets with large numbers of training instances and features is extremely costly. • Obtaining feedback on already correct or previously corrected interpretations is wasteful. Annotator Annotate Big data Costly Annotate !
  17. 17. Interactive Attention Learning Framework Domain experts interactively evaluate learned attentions and provide feedbacks to obtain models that generate human-intuitive interpretations. Learning Model Physician Attention Mechanism LDL Respiration Cholesterol Creatine BMI Diabetes Heart Failure Hypertension Deliver attention Annotate attention Attention0.80.60.3 : Attention 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈ {0, 1} Influence Function MC Dropout Annotation Mask Deep Interpretation Tool Correlation & Causal relationship Analysis Manipulate Explain Neural Attention Processes Counterfactual Estimation Counterfactual Estimation a c b Granger Causality
  18. 18. Interactive Attention Learning Framework Domain experts interactively evaluate learned attentions and provide feedbacks to obtain models that generate human-intuitive interpretations. Learning Model Physician Attention Mechanism 1. Neural Attention Processes LDL Respiration Cholesterol Creatine BMI Diabetes Heart Failure Hypertension Deliver attention Annotate attention Attention0.80.60.3 : Attention 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈ {0, 1} Influence Function MC Dropout Annotation Mask Deep Interpretation Tool Correlation & Causal relationship Analysis Manipulate Explain Neural Attention Processes Counterfactual Estimation Counterfactual Estimation a c b Granger Causality
  19. 19. Interactive Attention Learning Framework Domain experts interactively evaluate learned attentions and provide feedbacks to obtain models that generate human-intuitive interpretations. Learning Model Physician Attention Mechanism LDL Respiration Cholesterol Creatine BMI Diabetes Heart Failure Hypertension Deliver attention Annotate attention Attention0.80.60.3 : Attention 𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 (𝒕) ∈ {0, 1} Influence Function MC Dropout Annotation Mask Deep Interpretation Tool Correlation & Causal relationship Analysis Manipulate Explain Neural Attention Processes Counterfactual Estimation 1. Neural Attention Processes 2. Cost-effective Reranking Counterfactual Estimation a c b Granger Causality
  20. 20. Neural Attention Processes (NAP) NAP naturally reflects the information from the annotation summarization z via amor tized inference. Domain Expert Annotation Neural Attention Processes • NAP learns to summarize delivered annotations to a latent vector, and gives th e summarization as an additional input to the attention generating network.
  21. 21. Neural Attention Processes (NAP) NAP naturally reflects the information from the annotation summarization z via amor tized inference. Domain Expert Annotation Neural Attention Processes • NAP learns to summarize delivered annotations to a latent vector, and gives th e summarization as an additional input to the attention generating network.
  22. 22. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  23. 23. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  24. 24. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  25. 25. Neural Attention Processes (NAP) ! Attention Context Points "! !! "! #! !! New Observations !!"# "!"# !!"# NAP minimizes retraining cost by incorporating new labeled instances without retrai ning and overfitting. First Round (s=1) • NAP doesn’t require retraining for further new observations, in that NAP automatically adapt to them at the cost of a forward pass through a network 𝑔!.
  26. 26. Neural Attention Processes (NAP) NAP minimizes retraining cost by incorporating new labeled instances without retrain ing and overfitting. • NAP is trained in a meta-learning fashion for few-shot function estimation, where it is trained to predict the attention mask of other labeled samples, given a randomly selected labeled set as context. ! Attention "! !! New Observations !!"# "!"# !!"# Context Points !! "! #!!!"$ "!"$ !!"$ Further Rounds (s=2,3,4)
  27. 27. Neural Attention Processes (NAP) NAP minimizes retraining cost by incorporating new labeled instances without retrain ing and overfitting. • NAP is trained in a meta-learning fashion for few-shot function estimation, where it is trained to predict the attention mask of other labeled samples, given a randomly selected labeled set as context. ! Attention "! !! New Observations !!"# "!"# !!"# Context Points !! "! #!!!"$ "!"$ !!"$ Further Rounds (s=2,3,4)
  28. 28. Neural Attention Processes (NAP) NAP minimizes retraining cost by incorporating new labeled instances without retrain ing and overfitting. • NAP is trained in a meta-learning fashion for few-shot function estimation, where it is trained to predict the attention mask of other labeled samples, given a randomly selected labeled set as context. ! Attention "! !! New Observations !!"# "!"# !!"# Context Points !! "! #!!!"$ "!"$ !!"$ Further Rounds (s=2,3,4)
  29. 29. Cost-Effective Instance & Features Reranking (CER) Address the expensive human labeling cost by reranking the instances, features, and timesteps (for time-series data) by their negative impacts. Attentional Network Domain Expert Instance-wise and Feature-wise Reranking
  30. 30. Cost-Effective Instance & Features Reranking (CER) Address the expensive human labeling cost by reranking the instances, features, and timesteps (for time-series data) by their negative impacts. Attentional Network Domain Expert Instance-wise and Feature-wise Reranking 𝑷 𝑲 … … …TrainTrainValid Valid Estimate 𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!) Instance-level Reranking Select Re-rank & select
  31. 31. Cost-Effective Instance & Features Reranking (CER) Address the expensive human labeling cost by reranking the instances, features, and timesteps (for time-series data) by their negative impacts. Attentional Network Domain Expert Instance-wise and Feature-wise Reranking 𝑷 𝑲 … … …TrainTrainValid Valid Estimate 𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!) Instance-level Reranking Train Feature-level Reranking Select Re-rank & select Estimate 𝑰(𝒖",$ (&) ) / 𝐕𝐚𝒓 (𝒖",$ (&) ) / 𝜓(𝒖",$ (&) ) Feature … … Feature 𝑭 Re-rank & select
  32. 32. CER: 1. Influence Score Use the influence function (Koh & Liang, 2017) to approximate the impact of individual training points on the model prediction. Fish Dog Dog “Dog” Training Training data Test Input [Koh and Liang 17] Understanding Black-box Predictions via Influence Functions, ICML 2017
  33. 33. CER: 2. Uncertainty Score Measure the negative impacts using the uncertainty which can be measured by Mon te-Carlo sampling (Gal & Ghahramani, 2016). Uncertainty-aware Attention Mechanism • Less expensive approach to measure the negative impacts. • Assume that instances having high-predictive uncertainties are potential candidate to be corrected. Uncertainty [Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
  34. 34. CER: 2. Uncertainty Score Measure the negative impacts using the uncertainty which can be measured by Mon te-Carlo sampling (Gal & Ghahramani, 2016). Measure instance-wise & feature-wise uncertainty 0.80.60.3 Low uncertainty High uncertainty High uncertainty µ σ, )(N SpO2 Pulse Respiration • Less expensive approach to measure the negative impacts. • Assume that instances having high-predictive uncertainties are potential candidate to be corrected. Uncertainty-aware Attention Mechanism [Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
  35. 35. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  36. 36. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  37. 37. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  38. 38. CER: 3. Counterfactual Score How would the prediction change if we ignore a certain feature by manually turning on/off the corresponding attention value? • Not need for retraining since we can simply set its attention value to zero. • Used to rerank the features with regards to their importance. Counterfactual Estimation Interface
  39. 39. Experimental setting – Datasets Use electronic health records, real estate sales transaction records, and exercise squat posture correction records for classification and regression tasks. 1. EHR Datasets 1) Cerebral Infarction 2) Cardio Vascular Disease 3) Heart Failure Binary Classification task
  40. 40. Experimental setting – Datasets Use electronic health records, real estate sales transaction records, and exercise squat posture correction records for classification and regression tasks. 1. EHR Datasets 1) Cerebral Infarction 2) Cardio Vascular Disease 3) Heart Failure 2. Real-estate dataset 1) Housing price forecasting Binary Classification task Regression task
  41. 41. Experimental setting – Datasets Use electronic health records, real estate sales transaction records, and exercise squat posture correction records for classification and regression tasks. 1. EHR Datasets 1) Cerebral Infarction 2) Cardio Vascular Disease 3) Heart Failure 2. Real-estate dataset 3. Squat Posture set 1) Housing price forecasting 1) Squat posture correction Binary Classification task Regression task Multi-label Classification task
  42. 42. Attention Evaluation Interface – Risk Prediction Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation mask 𝑴 𝒕 𝜶 ∈ 0, 1 $×& and 𝑴 𝒕 𝜷 ∈ 0, 1 $×(×& . Attention Annotation Platform (Risk Prediction Task with Counterfactual Estimation)
  43. 43. Real estate in NYC Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation mask 𝑴 𝒕 𝜶 ∈ 0, 1 $×& and 𝑴 𝒕 𝜷 ∈ 0, 1 $×(×& . Attention Annotation Platform (Real estate Price Forecasting in New York City)
  44. 44. Action Posture Correction Task Domain experts evaluate the delivered attention 𝛼 𝑎𝑛𝑑 𝜷 via attention annotation mask 𝑴 𝒕 𝜶 ∈ 0, 1 $×& and 𝑴 𝒕 𝜷 ∈ 0, 1 $×(×& . Attention Annotation Platform (Action Posture Correction Task)
  45. 45. Experiment Results Conducted experiments on three risk prediction tasks, one fitness squat, and one real estate forecasting task. EHR Fitness Squat Real Estate ForecastingHeart Failure Cerebral Infarction CVD One-time Training RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01 Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01 IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01 Random Re-ranking Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02 Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01 IAL (Cost-effective) AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01 IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01
  46. 46. Experiment Results Conducted experiments on three risk prediction tasks, one fitness squat, and one real estate forecasting task. EHR Fitness Squat Real Estate ForecastingHeart Failure Cerebral Infarction CVD One-time Training RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01 Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01 IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01 Random Re-ranking Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02 Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01 IAL (Cost-effective) AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01 IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01 • Random-UA, which is retrained with human attention-level supervision on randomly selected. samples, performs worse than Random-NAP.
  47. 47. Experiment Results Conducted experiments on three risk prediction tasks, one fitness squat, and one real estate forecasting task. EHR Fitness Squat Real Estate ForecastingHeart Failure Cerebral Infarction CVD One-time Training RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01 Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01 IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01 Random Re-ranking Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02 Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01 IAL (Cost-effective) AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01 IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01 • Random-UA, which is retrained with human attention-level supervision on randomly selected. samples, performs worse than Random-NAP. • IAL-NAP significantly outperforms Random-NAP, showing that the effect of attention annotation cannot have much effect on the model when the instances are randomly selected.
  48. 48. Experiment Results Results of Ablation study with proposed IAL-NAP combinations for instance- and feat ure-level reranking on all tasks. IAL-NAP Variants EHR Fitness Squat Real Estate ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01 Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02 Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02 Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02 • For instance-level scoring, influence and uncertainty scores work similarly Ablation study with IAL-NAP combinations
  49. 49. Experiment Results Results of Ablation study with proposed IAL-NAP combinations for instance- and feat ure-level reranking on all tasks. IAL-NAP Variants EHR Fitness Squat Real Estate ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01 Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02 Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02 Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02 • For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual score was the most effective for feature-wise reranking. Ablation study with IAL-NAP combinations
  50. 50. Experiment Results Results of Ablation study with proposed IAL-NAP combinations for instance- and feat ure-level reranking on all tasks. IAL-NAP Variants EHR Fitness Squat Real Estate ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01 Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02 Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02 Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02 • For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual score was the most effective for feature-wise reranking. • The combination of uncertainty-counterfactual is the most cost-effective solution since it avoids ex pensive computation of the Hessians. Ablation study with IAL-NAP combinations
  51. 51. Effect of Neural Attention Processes Retraining time to retrain examples of human annotation and Mean Response Time of human annotations on the risk prediction tasks. (a) Heart Failure (b) Cerebral Infarction (c) CVD
  52. 52. Effect of Neural Attention Processes Retraining time to retrain examples of human annotation and Mean Response Time of human annotations on the risk prediction tasks. (a) Heart Failure (b) Cerebral Infarction (c) CVD
  53. 53. Effect of Neural Attention Processes Retraining time to retrain examples of human annotation and Mean Response Time of human annotations on the risk prediction tasks. (a) Heart Failure (b) Cerebral Infarction (c) CVD
  54. 54. Effect of Cost-Effective Reranking Change of accuracy with 100 annotations across four rounds (S) between IAL-NAP (bl ue) vs Random-NAP (red). (b) Cerebral Infarction(a) Heart Failure (c) CVD (d) Squat • IAL-NAP uses a smaller number of annotated examples (100 examples) than Rando m-NAP (400 examples) to improve the model with the comparable accuracy (auc: 0. 6414).
  55. 55. Qualitative Analysis – Risk Prediction Further analyze the contribution of each feature for a CVD patient (label=1). A patient records for Cardio Vascular Disease • At s=3, IAL allocated more attention weights on the important feature (Smoking), which the initial training model missed to attend. • Age • Smoking : Whether smoke or not • SysBP : Systolic Blood Pressure • HDL : High-density Lipoprotein • LDL : Low-density Lipoprotein (a) Pretrained (b) s=1 (c) s=2
  56. 56. Qualitative Analysis – Risk Prediction Further analyze the contribution of each feature for a CVD patient (label=1). A patient records for Cardio Vascular Disease • At s=3, IAL allocated more attention weights on the important feature (Smoking), which the initial training model missed to attend. à Clinicians guided the model to learn it since smoking is a key factor to access CVD. • Age • Smoking : Whether smoke or not • SysBP : Systolic Blood Pressure • HDL : High-density Lipoprotein • LDL : Low-density Lipoprotein (a) Pretrained (b) s=1 (c) s=2
  57. 57. Summary Propose a novel interactive learning framework which iteratively updates the model by interacting with the human supervisor via the generated attentions. • Unlike conventional active learning, IAL allows the human annotators to “actively” interpret, manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai ning the main network, by training a “separate” attention generator.
  58. 58. Summary Propose a novel interactive learning framework which iteratively updates the model by interacting with the human supervisor via the generated attentions. • Unlike conventional active learning, IAL allows the human annotators to “actively” interpret, manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai ning the main network, by training a “separate” attention generator. • Neural Attention Processes is a novel attention mechanism which can generate attention on unla beled instances given few labeled samples, and can incorporate new labeled instances without retraining and overfitting.
  59. 59. Summary Propose a novel interactive learning framework which iteratively updates the model by interacting with the human supervisor via the generated attentions. • Unlike conventional active learning, IAL allows the human annotators to “actively” interpret, manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai ning the main network, by training a “separate” attention generator. • Neural Attention Processes is a novel attention mechanism which can generate attention on unla beled instances given few labeled samples, and can incorporate new labeled instances without retraining and overfitting. • Our reranking strategy re-ranks the instance and features, which substantially reduces the annot ation cost and time for high-dimensional inputs.
  60. 60. Thanks

×