This document proposes two interpretability evaluation frameworks: ROAR (RemOve And Retrain) and KAR (Keep And Retrain). ROAR evaluates feature importance by removing salient features from inputs based on a trained model's saliency map, regenerating training and test data, and retraining the model. KAR keeps salient features and retrains instead of removing. The document introduces common interpretability methods like gradients, integrated gradients, and ensembling methods. It then explains the ROAR and KAR mechanisms and provides example results, concluding the frameworks can validate feature importance estimates and model reliability.