References:
Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a), Chu et al. (2018), Honegger (2018), Yeh et al. (2019), Zhang et al. (2019a), Artelt and Hammer (2020), Fan et al. (2020), Lakkaraju et al. (2020), Artelt et al. (2021), Bajaj et al. (2021), Situ et al. (2021), Yin et al. (2021), Agarwal et al. (2022b), Atanasova et al. (2022), Fouladgar et al. (2022), Agarwal et al. (2023), Bayrak and Bach (2023b), Tekkesinoglu and Pudas (2024)
Toggle Text Reference
Neighborhood Continuity evaluates whether explanantia remain similar for similar explananda. The underlying assumption is that small changes in the input should not lead to disproportionately large differences in the output, thereby increasing user trust through Continuity.
The main distinction between proposed metrics lies in how “similar instances” are defined. Some classic metrics include Local Lipschitz Stability from [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a)], Sensitivity from [Yeh et al. (2019)], and RIS/RRS/ROS from [Agarwal et al. (2022b)].
The choice of neighborhood depends on the domain and explanation use case. Some approaches define neighborhoods using fixed-radius input distances or -nearest neighbors [Chu et al. (2018), Situ et al. (2021), Fouladgar et al. (2022), Tekkesinoglu and Pudas (2024)], while others restrict similarity to instances sharing the same predicted label [Honegger (2018), Fan et al. (2020)].
To generate similar inputs, perturbation-based strategies are often used (see Perturbation Strategies). Perturbations may be bounded in magnitude [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a), Yeh et al. (2019), Bajaj et al. (2021), Agarwal et al. (2022b), Atanasova et al. (2022)] or derived from domain-specific semantics [Zhang et al. (2019a), Yin et al. (2021), Fouladgar et al. (2022)]. Some metrics require that perturbed inputs preserve the original prediction [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a), Agarwal et al. (2022b)], or maintain similar logits [Agarwal et al. (2023)].
Explanans similarity is typically calculated using distance or correlation-based metrics (see Similarity Measures) and may be normalized by input distance [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a), Agarwal et al. (2022b)], or based on distances in model activation space [Agarwal et al. (2022b), Agarwal et al. (2023)].
Variations: [Fan et al. (2020)] compare similarity for nearby vs. distant inputs , while [Zhang et al. (2019a)] restrict comparisons to unchanged features to reduce noise from perturbations.
Although originally proposed for FAs, this concept is broadly transferable to other explanation types. For instance, [Lakkaraju et al. (2020)] evaluate WBSs by comparing surrogate models trained on perturbed data, using model-specific similarity measures such as coefficient mismatch for linear models.
The main distinction between proposed metrics lies in how “similar instances” are defined. Some classic metrics include Local Lipschitz Stability from [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a)], Sensitivity from [Yeh et al. (2019)], and RIS/RRS/ROS from [Agarwal et al. (2022b)].
The choice of neighborhood depends on the domain and explanation use case. Some approaches define neighborhoods using fixed-radius input distances or -nearest neighbors [Chu et al. (2018), Situ et al. (2021), Fouladgar et al. (2022), Tekkesinoglu and Pudas (2024)], while others restrict similarity to instances sharing the same predicted label [Honegger (2018), Fan et al. (2020)].
To generate similar inputs, perturbation-based strategies are often used (see Perturbation Strategies). Perturbations may be bounded in magnitude [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a), Yeh et al. (2019), Bajaj et al. (2021), Agarwal et al. (2022b), Atanasova et al. (2022)] or derived from domain-specific semantics [Zhang et al. (2019a), Yin et al. (2021), Fouladgar et al. (2022)]. Some metrics require that perturbed inputs preserve the original prediction [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a), Agarwal et al. (2022b)], or maintain similar logits [Agarwal et al. (2023)].
Explanans similarity is typically calculated using distance or correlation-based metrics (see Similarity Measures) and may be normalized by input distance [Alvarez-Melis and Jaakkola (2018b), Alvarez-Melis and Jaakkola (2018a), Agarwal et al. (2022b)], or based on distances in model activation space [Agarwal et al. (2022b), Agarwal et al. (2023)].
Variations: [Fan et al. (2020)] compare similarity for nearby vs. distant inputs , while [Zhang et al. (2019a)] restrict comparisons to unchanged features to reduce noise from perturbations.
Although originally proposed for FAs, this concept is broadly transferable to other explanation types. For instance, [Lakkaraju et al. (2020)] evaluate WBSs by comparing surrogate models trained on perturbed data, using model-specific similarity measures such as coefficient mismatch for linear models.

