VXAI LogoExplorerDFKI Logo
Autoencoder Plausibility
Contextuality
I
Desiderata
Plausibility
Explanation Type
ExE
References:
Looveren and Klaise (2021)
Toggle Text Reference
Autoencoders are trained to capture the underlying structure of the dataset. Based on this idea, [Looveren and Klaise (2021)] propose evaluating the plausibility of counterfactual examples via reconstruction errors from class-specific and general-purpose autoencoders.
Two specific scores are introduced:

IM1 (Class-specific reconstruction comparison): This metric compares how well a counterfactual zz is reconstructed by an autoencoder trained on the target class (Φyz\Phi_{y^*_z}) versus the originally predicted class (Φy^x\Phi_{\hat{y}_x}): zΦyz(z)22zΦy^x(z)22+ϵ\frac{\|z - \Phi_{y^*_z}(z)\|_2^2}{\|z - \Phi_{\hat{y}_x}(z)\|_2^2+ \epsilon}.
A low IM1 score implies that the counterfactual is more representative of the target class than of its original class, indicating class-specific plausibility.

IM2 (General manifold plausibility): This score compares the reconstructions from a general autoencoder ΦY\Phi_\mathcal{Y} and a class-specific one: ΦY(z)Φyz(z)22z1+ϵ\frac{\|\Phi_\mathcal{Y}(z) - \Phi_{y^*_z}(z)\|_2^2}{\|z\|_1+ \epsilon}.
A low IM2 score implies that the counterfactual aligns well with both the general data manifold and the target class distribution, thus indicating higher plausibility.

While these metrics provide valuable insight into how realistic or semantically valid counterfactuals are, they come with caveats. Training multiple autoencoders (e.g., per class) introduces significant computational overhead. Further, reconstruction quality may vary across classes, and [Hvilshøj et al. (2021)] showed that autoencoders can be sensitive to small perturbations, potentially undermining the consistency of the plausibility assessment.