Categorization Scheme
This page describes the categorization scheme introduced in Section 4.1 of our paper. The scheme is structured along three dimensions: Desiderata, Explanation Type, and Contextuality.
Overview
The following table provides a high-level overview of all values in each dimension. Click on a value to jump to its description below.
| Dimension | Values |
|---|---|
| Desiderata | |
| Explanation Type | |
| Contextuality |
Desiderata
Desiderata describe criteria for what constitutes a good explanation. We propose a set of seven functionality-grounded desiderata.
Parsimony
The explanation should keep the explanans concise to support interpretability.
[Nauta et al. (2023)] introduce the property of Compactness, arguing that a briefer explanans is easier to understand. Similarly, they use Covariate Complexity to assess how complex the features are that constitute the explanans, where higher interpretability is supported by providing a few high-level concepts in favor of a very granular explanans. Both of these aspects are summarized under Parsimony by [Markus et al. (2021), Zhou et al. (2021)], preferring simpler explanantia over longer or more complex ones. The scheme used in the Quantus library [Hedström et al. (2023), Bommer et al. (2024)] defines a group called Complexity. It specifically tests for concise explanantia and aims to have as few features as possible to be easier to understand. The associated interpretability desiderata from other authors are defined less explicitly, but similarly favoring simpler explanantia [Andrews et al. (1995), Johansson et al. (2004), Guidotti et al. (2018)], proposing that simple explanantia should be short [Alvarez-Melis and Jaakkola (2018a), Jesus et al. (2021), Alangari et al. (2023b)], promoting small explanantia and only focussing on relevant parts [Robnik-Šikonja and Bohanec (2018), Carvalho et al. (2019), Molnar (2020)], and expecting an explanans with concentrated information to facilitate human understanding [Belaid et al. (2022)].
Following the proposed definitions, we include Parsimony as one of our desiderata. It expects explanantia to be as brief and concise as possible, to ensure that rationales can be understood easily and fast. We focus Parsimony exclusively on this aspect, as other associated properties are either covered by separate desiderata (such as truthfulness of the explanation) or excluded entirely as they are not functionality-grounded (general understandability of the explanation).
Plausibility
The explanation should shape the explanans to align with human expectations.
We include the Plausibility desideratum, which encompasses the idea that explanations should align with human knowledge and intuition. On one hand, this includes human expectations towards the result (explanans), e.g., “The model focuses on what a human would focus on”. On the other hand, the XAI methods' behavior (explanation) should also be aligned with human intuition, e.g., “The outputs for individual inputs should differ”.
Coverage
The explanation should provide an explanans for every explanandum.
To add more clarity to these definitions, we include Coverage with an alternative definition. It defines the amount of explananda that are covered by the explanation, i.e. reflecting whether there exists an explanans for every data input or output.
Fidelity
The explanation should make the explanans reflect the model's true reasoning.
Correctness refers to whether the explanation truthfully represents the internal logic and decision process of the black-box model. It is one of the most frequently emphasized desiderata across the reviewed frameworks. Without correctness, even the most interpretable or simple explanation may provide no meaningful insight. Terms like Faithfulness, Truthfulness, and Fidelity are often used interchangeably in literature to describe this idea. Correctness encompasses both local fidelity for individual explanantia and global alignment across the dataset [Robnik-Šikonja and Bohanec (2018), Carvalho et al. (2019), Molnar (2020)]. The general consensus is that an explanation should reveal what truly drives the model's outputs [Alvarez-Melis and Jaakkola (2018a), Markus et al. (2021), Zhou et al. (2021), Belaid et al. (2022), Alangari et al. (2023b), Nauta et al. (2023)]. It is commonly assessed by how well the explanation reflects or mimics the model's behavior [Andrews et al. (1995), Johansson et al. (2004), Guidotti et al. (2018)].
Completeness, in contrast, describes how much of the model's reasoning is captured by the explanation. According to the Co-12 properties by [Nauta et al. (2023)], an explanation should ideally include the full scope of the model's rationale.Some authors treat Completeness as a sub-aspect of Fidelity [Markus et al. (2021), Zhou et al. (2021)], while others define Fidelity itself as the capacity to capture all of the information embodied in the model [Andrews et al. (1995), Johansson et al. (2004), Guidotti et al. (2018)].
Although it is theoretically possible to have an explanation that is partially correct but incomplete (e.g., providing a heatmap that highlights only one of several relevant features), or complete but partially incorrect (e.g., including all the right features alongside irrelevant ones), neither scenario is desirable. If key features are missing or irrelevant ones are included, the explanans ultimately misrepresents the model's behavior. While Correctness and Completeness can be distinguished conceptually, they are tightly interwoven in practice and difficult to evaluate in isolation. Since our desiderata are intended to capture orthogonal evaluation dimensions, and these two cannot be meaningfully disentangled, we combine them under the unified criterion of Fidelity.
Continuity
The explanation should ensure that similar explananda yield similar explanantia.
[Nauta et al. (2023)] introduce the term Continuity as the smoothness of the explanation, i.e. similar explananda should yield similar explanantia. Others refer to this idea as Stability [Robnik-Šikonja and Bohanec (2018), Carvalho et al. (2019), Molnar (2020)], describing it as the resilience against slight variations in input features that do not alter the model's prediction. The term Robustness is used by [Alvarez-Melis and Jaakkola (2018a), Jesus et al. (2021), Alangari et al. (2023b)], to describe the same behavior, referring to it as a key requirement for trustworthy XAI.
The Quantus toolkit reflects the prevalence of this concept, providing a “Robustness” metric category [Hedström et al. (2023), Bommer et al. (2024)], which assesses the similarity of explanantia under minor changes in input. Finally, [Belaid et al. (2022)] cover the same idea under the term Stability. In addition, they assess Fragility, which they define as the resilience of explanations against malicious manipulation, such as adversarial attacks.
Our Continuity desideratum covers both of the mentioned properties. It includes the smoothness of explanations with respect to “naïve” changes in the explanandum that ideally do not affect the model's behavior, as well as the resilience of explanations against malicious manipulation attempts. Note, that this includes changes over the input data as well as the model. We decided to adopt the term Continuity instead of Stability or Robustness, to reduce the possible confusion with model robustness.
Consistency
The explanation should produce stable explanantia across repeated evaluations.
Consistency is introduced by [Nauta et al. (2023)] as a direct measure of the determinism of an XAI algorithm. Similarly, one part of the definition of Stability by [Robnik-Šikonja and Bohanec (2018), Carvalho et al. (2019), Molnar (2020)] considers variations in explanations based on non-determinism. The oldest formulation of Consistency is given by [Andrews et al. (1995)] and considers explanation methods to be consistent when they produce equivalent results under repetition.
However, several frameworks additionally consider the similarity of explanantia generated from different models trained on the same data [Robnik-Šikonja and Bohanec (2018), Carvalho et al. (2019), Molnar (2020), Nauta et al. (2023)]. Yet, different models can produce the same prediction while relying on entirely different internal reasoning. This is especially true as there are often multiple valid reasons for the same event, also known as the Rashomon Effect [Breiman (2001), Leventi-Peetz and Weber (2022)].
We include Consistency using the initial formulations, i.e., explanations should be deterministic or self-consistent, always presenting the same explanans for identical explananda. While the latter definition is present in one of the identified metrics, we do not explicitly add it to the definition of our Consistency desideratum, as we do not believe that different explananda (inputs), i.e. different models, necessarily result in identical explanantia (outputs).
Efficiency
The explanation should compute the explanans efficiently and broadly.
The first property is introduced as Portability and Translucency throughout literature [Robnik-Šikonja and Bohanec (2018), Carvalho et al. (2019), Molnar (2020)]. Portability is the variety of models for which an explanation can be used, while Translucency is the necessity of the explanation algorithm to have access to the internals of the model. Similarly, [Johansson et al. (2004)] measure Generality, given by the restrictions or overhead necessary to apply an explanation to specific models. [Belaid et al. (2022)] refer to Portability as the diverse set of models to which the explanation can be applied.
Secondly, the Algorithmic Complexity [Robnik-Šikonja and Bohanec (2018), Carvalho et al. (2019), Molnar (2020)] considers the time it takes to generate an explanans. Naturally, the amount of necessary time depends not only on the inherent complexity of the explanation algorithm, but also on the Scalability, i.e. its ability to efficiently handle larger models and input spaces [Johansson et al. (2004)]. Using the “Stress test”, [Belaid et al. (2022)] explicitly evaluate the runtime behavior with respect to increasing input size.
We subsume both of these aspects under a general desideratum called Efficiency. It includes the algorithmic or computational properties of the explanation, which might influence the choice of a specific XAI algorithm over another.
Explanation Type
We categorize VXAI metrics based on the accepted input. Apart from a few exceptions, all metrics are agnostic to the underlying black-box model or data format (e.g., tabular or image). Therefore, we do not consider this dimension separately.
Feature Attributions (FA)
Concept Explanations (CE)
Example Explanations (ExE)
White-Box Surrogates (WBS)
Natural Language Explanations (NLE)
Notably, similar to the formulation of desiderata, our categorization scheme based on explanation types can be extended horizontally. We note that there is an overlap between different categories, as they represent both the final given explanans and the explanation process. LIME [Ribeiro et al. (2016)] is a typical example, as its explanation fits a local WBS to generate a corresponding FA explanans. Similarly, there is a connection between CEs and other types, as WBSs can be leveraged to generate counterfactuals [Pornprasit et al. (2021)], as can FAs[Ge et al. (2021b), Albini et al. (2022)]. Rather than being a limitation, this overlap benefits our framework: it enables metrics designed for one explanation type to be applicable to others, facilitating broader reuse and comparison.
Contextuality
We propose to distinguish metrics based on their evaluation context, which defines how strongly they depend on or intervene in the underlying model or data. We identify five levels, each introducing progressively deeper contextual interaction.
- Level I: Explanans-Centric
- Evaluates only the explanans in relation to the raw input instance, fully independent of the model.
- Level II: Model Observation
- Relies on access to model outputs or internal activations to assess behavior.
- Level III: Input Intervention
- Perturbs input data and observes resulting changes in predictions or explanantia
- Level IV: Model Intervention
- Alters the model itself, e.g., by retraining or parameter randomization.
- Level V: A Priori Constrained
- Requires specific data, architectures, or experimental setups.

