VXAI LogoExplorerDFKI Logo
Mutual Coherence
Contextuality
II
Desiderata
Plausibility(Fidelity)
Explanation Type
FA(ExE)(CE)(WBS)(NLE)
References:
Selvaraju et al. (2017), Guo et al. (2018b), Ancona et al. (2019), Fernando et al. (2019), Fusco et al. (2019), Jain and Wallace (2019), Zhang et al. (2019a), Marques-Silva et al. (2020), Nguyen and Martínez (2020), Wang et al. (2020b), Warnecke et al. (2020), Graziani et al. (2021), Malik et al. (2021), Rajbahadur et al. (2021), Krishna et al. (2022), Mercier et al. (2022), Jin et al. (2023), Duan et al. (2024), Tekkesinoglu and Pudas (2024)
Toggle Text Reference
Several authors propose to evaluate their explanations by comparing them against other XAI methods that are assumed to produce trustworthy or well-understood outputs. Although only reported for FAs, this approach is applicable to any explanation type, as long as a suitable reference XAI method is available and an appropriate similarity metric can be defined.
A common reference is the Shapley value framework (see [Shapley (1953)]), often operationalized via SHAP (see [Lundberg (2017)]) due to its solid theoretical grounding. FAs are frequently compared to SHAP estimates using similarity measures [Ancona et al. (2019), Zhang et al. (2019a), Malik et al. (2021), Jin et al. (2023), Tekkesinoglu and Pudas (2024)]. In the context of saliency maps, Image Occlusion (see [Zeiler and Fergus (2013)]) is similarly treated as a proxy ground truth [Selvaraju et al. (2017)]. Comparisons against further explanation methods are also reported in literature [Guo et al. (2018b), Fernando et al. (2019), Fusco et al. (2019), Marques-Silva et al. (2020), Nguyen and Martínez (2020)].
Other works compute similarity across multiple explanation methods, assuming that high agreement between methods reflects convergence toward a reliable explanans [Jain and Wallace (2019), Wang et al. (2020b), Warnecke et al. (2020), Graziani et al. (2021), Rajbahadur et al. (2021), Krishna et al. (2022), Mercier et al. (2022)]. Any suitable similarity measure may be applied (see Similarity Measures for an overview).
However, this evaluation approach is not without criticism. [Kumar et al. (2020)] caution that validating one explanation method using another may propagate shared biases or assumptions, providing limited evidence for actual correctness.