References:
Ribeiro et al. (2016), Antwarg et al. (2019), Dhurandhar et al. (2019), Jia et al. (2019), Zhou et al. (2019), Crabbe et al. (2020), Jia et al. (2020), Guidotti (2021), Velmurugan et al. (2021a), Dai et al. (2022), Brandt et al. (2023), Carmichael and Scheirer (2023)
Toggle Text Reference
In white-box models, the internal logic is fully accessible and interpretable, providing a ground-truth rationale against which generated explanantia can be directly evaluated. Typical white-box models used for comparison include linear regressors [Crabbe et al. (2020), Dai et al. (2022)], feature-additive models [Carmichael and Scheirer (2023)], small neural networks with manually set parameters [Antwarg et al. (2019), Brandt et al. (2023)], or symbolic models such as decision trees [Ribeiro et al. (2016), Dhurandhar et al. (2019)].
Given that the model's reasoning is known, explanation methods can be assessed by comparing their output against this ground-truth explanans. This can be done using general similarity metrics [Guidotti (2021), Jia et al. (2019)] or using error metrics like MSE [Crabbe et al. (2020), Dai et al. (2022), Brandt et al. (2023)].
Further, if the white-box model relies only on a constrained feature subset, one can also measure explanation fidelity through accuracy, precision, or recall between explanans and the truly influential features [Ribeiro et al. (2016), Zhou et al. (2019), Jia et al. (2020), Velmurugan et al. (2021a)].
Although originally proposed for FAs, this approach may be extended to ExEs and NLEs, as the white-box model enables verification of whether the provided explanantia are consistent with the known reasoning. While user inspection is straightforward, adapting this check into an automatic, functionality-grounded evaluation remains challenging but potentially feasible.
Given that the model's reasoning is known, explanation methods can be assessed by comparing their output against this ground-truth explanans. This can be done using general similarity metrics [Guidotti (2021), Jia et al. (2019)] or using error metrics like MSE [Crabbe et al. (2020), Dai et al. (2022), Brandt et al. (2023)].
Further, if the white-box model relies only on a constrained feature subset, one can also measure explanation fidelity through accuracy, precision, or recall between explanans and the truly influential features [Ribeiro et al. (2016), Zhou et al. (2019), Jia et al. (2020), Velmurugan et al. (2021a)].
Although originally proposed for FAs, this approach may be extended to ExEs and NLEs, as the white-box model enables verification of whether the provided explanantia are consistent with the known reasoning. While user inspection is straightforward, adapting this check into an automatic, functionality-grounded evaluation remains challenging but potentially feasible.

