Output Faithfulness - Research Metrics Explorer

Output Faithfulness

Contextuality

Desiderata

Fidelity

Explanation Type

WBS

References:

Andrews et al. (1995), Craven and Shavlik (1995), Stefanowski and Vanderpooten (2001), Barakat et al. (2010), Augasta and Kathirvalavakumar (2012), Zilke et al. (2016), Bastani et al. (2017), Krishnan and Wu (2017), Lakkaraju et al. (2017), Guo et al. (2018a), Laugel et al. (2018), Peake and Wang (2018), Plumb et al. (2018), Tan et al. (2018), Wu et al. (2018b), Zhang et al. (2018b), Chen et al. (2019b), Guidotti et al. (2019), Kanehira and Harada (2019), Lakkaraju et al. (2019), Zhou et al. (2019), Anders et al. (2020), Hatwell et al. (2020), Lakkaraju et al. (2020), Panigutti et al. (2020), Pedapati et al. (2020), Rajapaksha et al. (2020), Rawal and Lakkaraju (2020), Amparore et al. (2021), Chen et al. (2021), Moradi and Samwald (2021), Pornprasit et al. (2021), Bayrak and Bach (2023a), Bo et al. (2024)

Toggle Text Reference

A surrogate model should closely mimic the behavior of the original black-box model. Therefore, a common evaluation strategy is to compare the outputs of the surrogate and black-box models using a similarity or performance measure.
This can be assessed with standard measures such as accuracy,

F_1

-score, MSE, or

L_p

distances [Andrews et al. (1995), Craven and Shavlik (1995), Barakat et al. (2010), Augasta and Kathirvalavakumar (2012), Zilke et al. (2016), Bastani et al. (2017), Krishnan and Wu (2017), Lakkaraju et al. (2017), Guo et al. (2018a), Laugel et al. (2018), Plumb et al. (2018), Tan et al. (2018), Zhang et al. (2018b), Guidotti et al. (2019), Kanehira and Harada (2019), Lakkaraju et al. (2019), Zhou et al. (2019), Anders et al. (2020), Lakkaraju et al. (2020), Panigutti et al. (2020), Pornprasit et al. (2021), Bayrak and Bach (2023a)], or with similarity measures such as SSIM, Pearson correlation, or KL divergence [Chen et al. (2019b), Anders et al. (2020)]. Arbitrary loss functions may also be used [Amparore et al. (2021)].
For imbalanced datasets, [Moradi and Samwald (2021)] calculate fidelity per class, using either the true labels or the black-box predictions as reference. For individual rules, fidelity may be expressed as the rule's precision [Stefanowski and Vanderpooten (2001), Hatwell et al. (2020)].
When evaluating local surrogates, Local Output Fidelity is defined by computing fidelity within a neighborhood of the input instance [Laugel et al. (2018), Plumb et al. (2018), Guidotti et al. (2019), Rajapaksha et al. (2020)], or within a synthetic neighborhood [Panigutti et al. (2020), Pornprasit et al. (2021)]. For unseen data, [Lakkaraju et al. (2020)] validate explanations by comparing outputs to those of the nearest training instances.

Sufficiency

Internal Faithfulness