Output Similarity - Research Metrics Explorer

Back to Overview

Output Similarity

Contextuality

Desiderata

Plausibility

Explanation Type

ExE

References:

Plumb et al. (2020)

Toggle Text Reference

To ensure plausibility, counterfactuals should yield output distribution similar to real instances of the target class. [Plumb et al. (2020)] evaluate whether each counterfactual

z

matches the output activation of at least one training sample, i.e.:

\exists x' \in \mathcal{X}_{y^*_z} ~:~ \delta\big(\theta(x'), \theta(z)\big) < \epsilon

This confirms that the counterfactual aligns with typical model behavior for that class.

Output Contrastivity

Mutual Coherence