VXAI LogoExplorerDFKI Logo
Quantification of Unexplainable Features
Contextuality
III
Desiderata
(Fidelity)Continuity
Explanation Type
FA(CE)
References:
Zhang et al. (2019a), Chen et al. (2022)
Toggle Text Reference
To assess the continuity of explanations, [Zhang et al. (2019a)] perturb the unimportant features of an input based on the explanans, and then generate a new one for the perturbed input. The similarity between the original and perturbed explanantia serves as a measure of continuity: a high similarity suggests that irrelevant changes do not affect the explanation, indicating robustness. In addition, it implies a high completeness, as relevant features must have been captured well enough to maintain the model's behavior despite noise.
A special case of this approach is the Attack Capture Rate proposed by [Chen et al. (2022)], which applies to FAs in NLP (particularly rationale extraction). Here, insertion attacks introduce distractor phrases to the input, which ideally should not influence the rationale. The metric measures how often the inserted tokens appear in the extracted explanans. This variant relies on artificial attacks, which limits its generalizability beyond NLP.
This metric may be extended to CE by perturbing the concept layer corresponding to unimportant concepts, or by mapping concepts back to input features via concept-based localization (e.g., see [Lucieri et al. (2020)]).