quantus.evaluation module

This module provides some functionality to evaluate different explanation methods on several evaluation criteria.

quantus.evaluation.evaluate(metrics: ~typing.Dict, xai_methods: ~typing.Dict[str, ~typing.Callable] | ~typing.Dict[str, ~typing.Dict] | ~typing.Dict[str, ~numpy.ndarray], model: ~quantus.helpers.model.model_interface.ModelInterface, x_batch: ~numpy.ndarray, y_batch: ~numpy.ndarray, s_batch: ~numpy.ndarray | None = None, agg_func: ~typing.Callable = <function <lambda>>, explain_func_kwargs: dict | None = None, call_kwargs: ~typing.Dict | ~typing.Dict[str, ~typing.Dict] | None = None, return_as_df: bool | None = None, verbose: bool | None = None, progress: bool | None = None, *args, **kwargs) → dict | None

Evaluate different explanation methods using specified metrics.

Parameters:

metricsdict

A dictionary of initialized evaluation metrics. See quantus.AVAILABLE_METRICS. Example: {‘Robustness’: quantus.MaxSensitivity(), ‘Faithfulness’: quantus.PixelFlipping()}

xai_methodsdict

A dictionary specifying the explanation methods to evaluate, which can be structured in three ways:

Dict[str, Dict] for built-in Quantus methods (using quantus.explain):
Example: xai_methods = {

‘IntegratedGradients’: {
‘n_steps’: 10, ‘xai_lib’: ‘captum’

}, ‘Saliency’: {

‘xai_lib’: ‘captum’

}

}
- See quantus.AVAILABLE_XAI_METHODS_CAPTUM for supported captum methods.
- See quantus.AVAILABLE_XAI_METHODS_TF for supported tensorflow methods.
- See https://github.com/chr5tphr/zennit for supported zennit methods.
- Read more about the explanation function arguments here: <https://quantus.readthedocs.io/en/latest/docs_api/quantus.functions.explanation_func.html#quantus.functions.explanation_func.explain>
Dict[str, Callable] for custom methods:

Example: xai_methods = {

‘custom_own_xai_method’: custom_explain_function

} or ai_methods = {“InputXGradient”: {

“explain_func”: quantus.explain, “explain_func_kwargs”: {},

}}

Here, you can provide your own callable that mirrors the input and outputs of the quantus.explain() method.

Dict[str, np.ndarray] for pre-calculated attributions:
Example: xai_methods = {

‘LIME’: precomputed_numpy_lime_attributions, ‘GradientShap’: precomputed_numpy_shap_attributions

}
- Note that some Quantus metrics, e.g., quantus.MaxSensitivity() within the robustness
category, includes “re-explaning” the input and output pair as a part of the evaluation metric logic. If you include such metrics in the quantus.evaluate(), this option will not be possible.

It is also possible to pass a combination of the above.

>>> xai_methods = {
>>>     'IntegratedGradients': {
>>>         'n_steps': 10,
>>>         'xai_lib': 'captum'
>>>     },
>>>     'Saliency': {
>>>         'xai_lib': 'captum'
>>>     },
>>>     'custom_own_xai_method': custom_explain_function,
>>>     'LIME': precomputed_numpy_lime_attributions,
>>>     'GradientShap': precomputed_numpy_shap_attributions
>>> }

model: Union[torch.nn.Module, tf.keras.Model]

A torch or tensorflow model that is subject to explanation.

x_batch: np.ndarray

A np.ndarray containing the input data to be explained.

y_batch: np.ndarray

A np.ndarray containing the output labels corresponding to x_batch.

s_batch: np.ndarray, optional

A np.ndarray containing segmentation masks that match the input.

agg_func: Callable

Indicates how to aggregate scores, e.g., pass np.mean.

explain_func_kwargs: dict, optional

Keyword arguments to be passed to explain_func on call. Pass None if using Dict[str, Dict] type for xai_methods.

call_kwargs: Dict[str, Dict]

Keyword arguments for the call of the metrics. Keys are names for argument sets, and values are argument dictionaries.

verbose: optional, bool

Indicates whether to print evaluation progress.

progress: optional, bool

Deprecated. Indicates whether to print evaluation progress. Use verbose instead.

return_as_df: optional, bool

Indicates whether to return the results as a pd.DataFrame. Only works if call_kwargs is not passed.

args: optional

Deprecated arguments for the call.

kwargs: optional

Deprecated keyword arguments for the call of the metrics.

Returns:

results: dict: A dictionary with the evaluation results.