References:
Bastani et al. (2017), Honegger (2018), Guidotti et al. (2019), Rajapaksha et al. (2020), Warnecke et al. (2020), Amparore et al. (2021), Graziani et al. (2021), Margot and Luta (2021), Velmurugan et al. (2021b), Dai et al. (2022), Vermeire et al. (2022)
Toggle Text Reference
This metric evaluates how consistent the explanantia remain when generated multiple times for the same input and model. This, however, is only a necessary consideration for nondeterministic explanation methods.
Some authors assess this axiomatically, by counting the fraction of identical explanantia produced across repeated runs [Honegger (2018), Vermeire et al. (2022)]. Others calculate distances or similarity scores between different runs and report the aggregated variation, using general or task-specific metrics such as the variance of feature weights or feature presence [Guidotti et al. (2019), Rajapaksha et al. (2020), Warnecke et al. (2020), Amparore et al. (2021), Graziani et al. (2021), Velmurugan et al. (2021b), Dai et al. (2022)].
Although most work focuses on local instance-wise explanations, the same principle applies to global explanations, where the similarity between globally constructed explanantia is measured [Bastani et al. (2017), Margot and Luta (2021)].
While we did not identify examples of this metric applied to every explanations type in the literature, the core idea is general and can easily be extended to any XAI algorithm.
Some authors assess this axiomatically, by counting the fraction of identical explanantia produced across repeated runs [Honegger (2018), Vermeire et al. (2022)]. Others calculate distances or similarity scores between different runs and report the aggregated variation, using general or task-specific metrics such as the variance of feature weights or feature presence [Guidotti et al. (2019), Rajapaksha et al. (2020), Warnecke et al. (2020), Amparore et al. (2021), Graziani et al. (2021), Velmurugan et al. (2021b), Dai et al. (2022)].
Although most work focuses on local instance-wise explanations, the same principle applies to global explanations, where the similarity between globally constructed explanantia is measured [Bastani et al. (2017), Margot and Luta (2021)].
While we did not identify examples of this metric applied to every explanations type in the literature, the core idea is general and can easily be extended to any XAI algorithm.

