Quantification of Unexplainable Features - Research Metrics Explorer

Back to Overview

Quantification of Unexplainable Features

Contextuality

III

Desiderata

(Fidelity)Continuity

Explanation Type

FA(CE)

References:

Zhang et al. (2019a), Chen et al. (2022)

Toggle Text Reference

To assess the continuity of explanations, [Zhang et al. (2019a)] perturb the unimportant features of an input based on the explanans, and then generate a new one for the perturbed input. The similarity between the original and perturbed explanantia serves as a measure of continuity: a high similarity suggests that irrelevant changes do not affect the explanation, indicating robustness. In addition, it implies a high completeness, as relevant features must have been captured well enough to maintain the model's behavior despite noise.
A special case of this approach is the Attack Capture Rate proposed by [Chen et al. (2022)], which applies to FAs in NLP (particularly rationale extraction). Here, insertion attacks introduce distractor phrases to the input, which ideally should not influence the rationale. The metric measures how often the inserted tokens appear in the extracted explanans. This variant relies on artificial attacks, which limits its generalizability beyond NLP.
This metric may be extended to CE by perturbing the concept layer corresponding to unimportant concepts, or by mapping concepts back to input features via concept-based localization (e.g., see [Lucieri et al. (2020)]).

Prediction Neighborhood Continuity

Neighborhood Continuity