Model Parameter Randomization Test - Research Metrics Explorer

Back to Overview

Model Parameter Randomization Test

Contextuality

Desiderata

Fidelity

Explanation Type

FA(ExE)(CE)(WBS)(NLE)

References:

Adebayo et al. (2018), Kindermans et al. (2019), Binder et al. (2023), Bommer et al. (2024), Hedström et al. (2024)

Toggle Text Reference

To verify that explanations are truly reflective of the black-box model's learned reasoning, the model's internal parameters are systematically randomized, and the resulting changes in explanantia are analyzed. The rationale is that if an explanation remains unchanged under randomization, it is likely generic and not informative of the model's decision logic [Adebayo et al. (2018)].
Most commonly, the weights of a neural network are randomized either entirely, by layer, or iteratively in top-down or bottom-up order. The similarity between the original and randomized explanantia is then computed [Adebayo et al. (2018), Hedström et al. (2024)]. Similarity may be evaluated using SSIM and correlation for heatmaps [Adebayo et al. (2018), Binder et al. (2023), Bommer et al. (2024)], or any suitable similarity metric (see Similarity Measures). [Hedström et al. (2024)] additionally average the results across noisy inputs (e.g., via input perturbations) to stabilize local explanations.
For gradient-based methods, [Sixt et al. (2020)] propose inserting a random activation vector at a specific layer, to break the causal connection without modifying the network weights.
A complementary strategy proposed by [Kindermans et al. (2019)] keeps the weights fixed but applies a controlled shift to the input distribution, adapting the input-layer biases such that internal computations and outputs remain unchanged. Since the model behavior is invariant by design, any change in the explanans indicates unwanted sensitivity. This shift-invariance test can be quantified using standard similarity metrics between pre- and post-shift explanantia.
Although originally proposed for FAs, the approach is applicable to any explanation type for which suitable similarity metrics can be defined.

Adversarial Input Resilience

Data Randomization Test