Significance Check - Research Metrics Explorer

Significance Check

Contextuality

Desiderata

Fidelity

Explanation Type

FA(ExE)CE(WBS)(NLE)

References:

Samek et al. (2016), Adel et al. (2018), Chen et al. (2018b), Kim et al. (2018), Chen et al. (2019b), DeYoung et al. (2019), Gu et al. (2019), Wickramanayake et al. (2019), Nam et al. (2020), Hemamou et al. (2021), Park and Wallraven (2021), Pornprasit et al. (2021), Agarwal et al. (2022c), Hameed et al. (2022), Bommer et al. (2024)

Toggle Text Reference

Statistical significance testing is a common strategy for verifying whether an explanation is meaningfully different from random or naïve baseline explanations. This approach is frequently applied to FAs [Samek et al. (2016), Chen et al. (2018b), Chen et al. (2019b), DeYoung et al. (2019), Gu et al. (2019), Wickramanayake et al. (2019), Nam et al. (2020), Hemamou et al. (2021), Park and Wallraven (2021), Pornprasit et al. (2021), Agarwal et al. (2022c), Hameed et al. (2022), Bommer et al. (2024)].
For CEs, statistical tests are typically used to assess whether the extracted concepts carry significantly more information than randomly sampled concepts. This is done by comparing average relevance or activation scores between the true and random concepts [Adel et al. (2018), Kim et al. (2018)].
In a more advanced variant, [Hemamou et al. (2021)] train a classifier to distinguish between real and synthetic (random) explanations. High accuracy in this task indicates that the generated explanantia contain statistically meaningful structure.
While commonly reported for FA and CE, the general idea of statistical significance testing could, in principle, be adapted to other explanation types as well. However, doing so may require explanation-specific reformulations.

Mutual Coherence

(Counter-)Factual Relevance