Influence Fidelity - Research Metrics Explorer

Back to Overview

Influence Fidelity

Contextuality

Desiderata

Fidelity

Explanation Type

ExE

References:

Krishnan and Wu (2017), Guo et al. (2020)

Toggle Text Reference

To evaluate the fidelity of ExEs, the model is retrained on a dataset modified according to the explanation. The effect of these modifications on model predictions is used to assess the explanatory quality.
[Guo et al. (2020)] remove the identified influential training points from the dataset, retrain the model, and measure the change in loss for the explanandum. If the removed instances were truly helpful, model performance should degrade. Conversely, if they were misleading, removal should lead to an improvement. Thus, a greater change in loss signals a more correct and complete set of influential instances.
In an alternative approach, [Krishnan and Wu (2017)] retain the identified influential instances but randomly flips the labels of all remaining training examples before retraining. If the explanation captures all relevant information, the prediction should remain stable. Hence, lower prediction variance after retraining indicates a more complete and accurate explanans.

Retrained Model Evaluation

Normalized Movement Rate