References:
Ancona et al. (2017), Shrikumar et al. (2017), Sundararajan et al. (2017), Alvarez-Melis and Jaakkola (2018a), Arya et al. (2019), Cheng et al. (2019), Yeh et al. (2019), Zhang et al. (2019b), Bhatt et al. (2020), Elkhawaga et al. (2023)
Toggle Text Reference
Some of the most influential XAI metrics evaluate how well an explanans aligns with observed model behavior when the input is perturbed. Together, these metrics assess whether the explanation faithfully captures how the model responds to its inputs.
This includes Sensitivity-n from [Ancona et al. (2017)], Infidelity from [Yeh et al. (2019)], and Faithfulness presented by [Alvarez-Melis and Jaakkola (2018a)] and [Arya et al. (2019)]. In these approaches, input perturbations are introduced either randomly across all features [Yeh et al. (2019)], by zeroing features individually or in small groups [Alvarez-Melis and Jaakkola (2018a), Arya et al. (2019), Cheng et al. (2019), Bhatt et al. (2020)], or by manipulating subsets of fixed size [Ancona et al. (2017)]. The model's change in prediction is then compared to the explanans, using different strategies: directly against the raw attribution vector, against a version scaled by perturbation magnitude, or by summing attributions of the changed features. The deviation is measured using standard metrics like Pearson correlation [Ancona et al. (2017), Alvarez-Melis and Jaakkola (2018a), Arya et al. (2019), Cheng et al. (2019), Bhatt et al. (2020)] or mean squared error [Yeh et al. (2019)].
Complementing these are completeness-based approaches such as Completeness [Sundararajan et al. (2017)] and Summation-to-Delta [Shrikumar et al. (2017)], which assume that the explanans must fully account for the model's behavior. These compare the difference in output between an instance and a baseline (e.g., a fully perturbed input) with the sum of the attributions across changed features. Ideally, the two should match exactly. For practical purposes, however, the relative deviation between attribution sum and output delta can be used as a softer criterion.
This includes Sensitivity-n from [Ancona et al. (2017)], Infidelity from [Yeh et al. (2019)], and Faithfulness presented by [Alvarez-Melis and Jaakkola (2018a)] and [Arya et al. (2019)]. In these approaches, input perturbations are introduced either randomly across all features [Yeh et al. (2019)], by zeroing features individually or in small groups [Alvarez-Melis and Jaakkola (2018a), Arya et al. (2019), Cheng et al. (2019), Bhatt et al. (2020)], or by manipulating subsets of fixed size [Ancona et al. (2017)]. The model's change in prediction is then compared to the explanans, using different strategies: directly against the raw attribution vector, against a version scaled by perturbation magnitude, or by summing attributions of the changed features. The deviation is measured using standard metrics like Pearson correlation [Ancona et al. (2017), Alvarez-Melis and Jaakkola (2018a), Arya et al. (2019), Cheng et al. (2019), Bhatt et al. (2020)] or mean squared error [Yeh et al. (2019)].
Complementing these are completeness-based approaches such as Completeness [Sundararajan et al. (2017)] and Summation-to-Delta [Shrikumar et al. (2017)], which assume that the explanans must fully account for the model's behavior. These compare the difference in output between an instance and a baseline (e.g., a fully perturbed input) with the sum of the attributions across changed features. Ideally, the two should match exactly. For practical purposes, however, the relative deviation between attribution sum and output delta can be used as a softer criterion.

