Retrained Model Evaluation - Research Metrics Explorer

Retrained Model Evaluation

Contextuality

Desiderata

Fidelity

Explanation Type

References:

Guo et al. (2019), Hooker et al. (2019), Cheng et al. (2020), Han et al. (2020), Schiller et al. (2020), Hemamou et al. (2021), Shah et al. (2021), Li et al. (2022), Khalane et al. (2023), Raval et al. (2023)

Toggle Text Reference

A key limitation of perturbation-based evaluation methods lies in their potential to introduce distribution shifts though the alteration of features. To mitigate this, two related strategies retrain models on the perturbed datasets.
The first approach is most famously known as Remove and Retrain (ROAR) by [Hooker et al. (2019)], with related variants proposed by [Han et al. (2020)] and [Shah et al. (2021)]. It builds on the perturbation strategies of Metric “Guided Perturbation Fidelity”, but ROAR removes features (e.g., the most important ones according to the explanans) from the training data and then retrains the model from scratch on the altered dataset. Performance degradation of this retrained model, compared to the original model, is then used to infer the quality of the explanation: a faithful explanans should identify features whose removal severely affects performance.
The second strategy leverages explanations for knowledge distillation. Here, various authors train a new model on a dataset reduced to only the features deemed important by the explanation [Guo et al. (2019), Cheng et al. (2020), Schiller et al. (2020), Hemamou et al. (2021), Li et al. (2022), Khalane et al. (2023), Raval et al. (2023)].
Evaluation follows the general structure of Metric “Guided Perturbation Fidelity”, measuring how much predictive performance is retained when only the relevant features are used.
While both ROAR and distillation-based setups are valuable for benchmarking explanation methods in a more robust setting, they do not directly assess the original explanans for a specific instance. Instead, they evaluate the utility of the explanation method by measuring the average informativeness of the features it selects.

Data Randomization Test

Influence Fidelity