References:
Adebayo et al. (2018), Sanchez-Lengeling et al. (2020)
Toggle Text Reference
Explanations should highlight meaningful structures in the data, not artifacts of memorization. To verify this, the training data labels are randomized, forcing the model to fit noise rather than learn semantically relevant features [Adebayo et al. (2018)]. Label randomization can be applied to the full training set [Adebayo et al. (2018)] or a subset only [Sanchez-Lengeling et al. (2020)].
This test compares explanantia generated by a model trained on the randomized data to those from a model trained on correctly labeled data. A strong explanation method should yield low similarity between the two, as the latter explanantia reflect meaningful decision features, while the former do not. Similarity can be computed using SSIM, correlation [Adebayo et al. (2018)], rank-based metrics such as Kendall's Tau [Sanchez-Lengeling et al. (2020)], or other suitable measures (see Similarity Measures).
Although only reported for FAs, this approach can be extended to any explanation type, provided that a suitable similarity measure between explanantia is available.
This test compares explanantia generated by a model trained on the randomized data to those from a model trained on correctly labeled data. A strong explanation method should yield low similarity between the two, as the latter explanantia reflect meaningful decision features, while the former do not. Similarity can be computed using SSIM, correlation [Adebayo et al. (2018)], rank-based metrics such as Kendall's Tau [Sanchez-Lengeling et al. (2020)], or other suitable measures (see Similarity Measures).
Although only reported for FAs, this approach can be extended to any explanation type, provided that a suitable similarity measure between explanantia is available.

