VXAI LogoExplorerDFKI Logo
Explanans Size
Contextuality
I
Desiderata
Parsimony
Explanation Type
FAExE(CE)WBS(NLE)
References:
Craven and Shavlik (1995), Stefanowski and Vanderpooten (2001), Nauck (2003), Alonso et al. (2008), Augasta and Kathirvalavakumar (2012), Lakkaraju et al. (2016), Samek et al. (2016), Zilke et al. (2016), Lakkaraju et al. (2017), Guidotti et al. (2018), Hara and Hayashi (2018), Rustamov and Klosowski (2018), Wang et al. (2018b), Wang (2018), Wang et al. (2018a), Wu et al. (2018b), Wu et al. (2018a), Deng (2019), Evans et al. (2019), Fong et al. (2019), Guidotti et al. (2019), Ignatiev et al. (2019), Lakkaraju et al. (2019), Polato and Aiolli (2019), Pope et al. (2019), Shakerin and Gupta (2019), Slack et al. (2019), Topin and Veloso (2019), Verma and Ganguly (2019), Wang (2019), Yoo and Sael (2019), Bhatt et al. (2020), Chalasani et al. (2020), Molnar et al. (2020), Nguyen and Martínez (2020), Panigutti et al. (2020), Rajapaksha et al. (2020), Rawal and Lakkaraju (2020), Stepin et al. (2020), Warnecke et al. (2020), Wu et al. (2020), Liu et al. (2021b), Margot and Luta (2021), Moradi and Samwald (2021), Poppi et al. (2021), Rosenfeld (2021), Samek et al. (2021), Dai et al. (2022), Funke et al. (2022), Huang et al. (2023b), Stevens and Smedt (2024)
Toggle Text Reference
The size of an explanans e|e| is a common indicator of its complexity. Smaller or more compact explananda are generally easier to understand and more plausible to human users. The exact method to measure size depends on the explanation type and context.

A generally applicable method is to compute the file size (in bytes) of the explanans, based on the assumption that sparse explananda can be more easily compressed [Samek et al. (2016), Samek et al. (2021)].

For WBS, size is typically measured via structural properties of the surrogate model: tree-based models are assessed by depth, number of nodes, or number of leaves [Craven and Shavlik (1995), Alonso et al. (2008), Guidotti et al. (2018), Hara and Hayashi (2018), Wu et al. (2018a), Evans et al. (2019), Slack et al. (2019), Yoo and Sael (2019), Molnar et al. (2020), Rawal and Lakkaraju (2020), Wu et al. (2020)], while rule-based systems are evaluated using the number of rules or predicates per rule [Craven and Shavlik (1995), Stefanowski and Vanderpooten (2001), Nauck (2003), Alonso et al. (2008), Augasta and Kathirvalavakumar (2012), Lakkaraju et al. (2016), Zilke et al. (2016), Lakkaraju et al. (2017), Wang (2018), Wu et al. (2018b), Deng (2019), Lakkaraju et al. (2019), Polato and Aiolli (2019), Shakerin and Gupta (2019), Wang (2019), Panigutti et al. (2020), Rajapaksha et al. (2020), Stepin et al. (2020), Margot and Luta (2021), Moradi and Samwald (2021), Rosenfeld (2021)].

For explanation-graphs, the number of nodes and edges can serve as a proxy for size [Rustamov and Klosowski (2018), Topin and Veloso (2019)]. Conversely, the (relative) number of instances covered per rule can also express parsimony [Deng (2019), Guidotti et al. (2019)].

In FA, size is typically based on the number of relevant features. This may be computed via:
• The L0L_0 norm [Wang et al. (2018a), Wang et al. (2018b), Fong et al. (2019), Polato and Aiolli (2019), Verma and Ganguly (2019), Nguyen and Martínez (2020), Poppi et al. (2021), Rosenfeld (2021), Stevens and Smedt (2024)],
• A count of features exceeding a relevance threshold [Ignatiev et al. (2019), Pope et al. (2019), Warnecke et al. (2020), Liu et al. (2021b), Dai et al. (2022)],
• Normalization by input dimensionality, e.g., in graph settings [Pope et al. (2019)],
• Integration over multiple thresholds to form a size curve [Warnecke et al. (2020)], or
• Threshold-free statistics like entropy [Samek et al. (2016), Bhatt et al. (2020), Funke et al. (2022)] and Gini index [Chalasani et al. (2020)].

While not common in the literature, for CE, similar counting mechanisms may be applied. One may count concepts exceeding a relevance threshold or use the total number of tested concepts (which is not tied to input size).

In ExE, size naturally corresponds to the number of examples used in the explanans [Nguyen and Martínez (2020), Huang et al. (2023b)].

For NLE, the number of words or sentences provides a straightforward measure of size.

In general, size can be measured per-instance or aggregated over the dataset. For instance, one can report the number of predicates in the rule explaining a single instance [Augasta and Kathirvalavakumar (2012)], or aggregate the number of predicates across a global rule set with statistics such as the average [Lakkaraju et al. (2016), Lakkaraju et al. (2017), Lakkaraju et al. (2019)], sum [Margot and Luta (2021), Moradi and Samwald (2021)], or maximum [Moradi and Samwald (2021)]. Similarly, rule counts can be aggregated over classes (e.g., average number per class [Nauck (2003)]. Optional adjustments include applying a tolerance margin (e.g., max(0,ek)max(0, |e| - k) [Rosenfeld (2021)]) or aggregating over multiple class-specific explananda per instance [Pope et al. (2019)].