Prediction Validity - Research Metrics Explorer

Prediction Validity

Contextuality

Desiderata

Fidelity

Explanation Type

ExE

References:

Wachter et al. (2017), Karlsson et al. (2018), Dhurandhar et al. (2019), Guidotti et al. (2019), Mahajan et al. (2019), Dandl et al. (2020), Le et al. (2020), Molnar (2020), Mothilal et al. (2020), Nguyen and Martínez (2020), Pedapati et al. (2020), Pawelczyk et al. (2021), Rasouli and Yu (2021), Ma et al. (2022), Tan et al. (2022), Vermeire et al. (2022), Guidotti (2024), Verma et al. (2024)

Toggle Text Reference

By definition, a counterfactual must result in a different prediction from the original input. For untargeted counterfactuals, this simply requires

\hat{y}_z \neq \hat{y}_x

, while targeted counterfactuals must satisfy

\hat{y}_z = y^*_z

[Wachter et al. (2017), Molnar (2020), Guidotti (2024)].
Most authors assess the fraction of generated counterfactuals that meet this condition [Karlsson et al. (2018), Dhurandhar et al. (2019), Guidotti et al. (2019), Mahajan et al. (2019), Le et al. (2020), Mothilal et al. (2020), Pedapati et al. (2020), Pawelczyk et al. (2021), Ma et al. (2022), Tan et al. (2022), Vermeire et al. (2022), Verma et al. (2024)]. To capture more nuance, authors also propose measuring the model's confidence in the target class for the counterfactual [Rasouli and Yu (2021)], or using a continuous loss between the predicted and target class probabilities, such as the

L_1

distance [Dandl et al. (2020)].
For factual explanations, the class should remain unchanged, i.e.,

\hat{y}_z = \hat{y}_x

, which can also be verified either binary [Dhurandhar et al. (2019)] or using loss-based similarity measures [Nguyen and Martínez (2020)].

(Counter-)Factual Relevance

Sufficiency