quantus.evaluation module
This module provides some functionality to evaluate different explanation methods on several evaluation criteria.
- quantus.evaluation.evaluate(metrics: ~typing.Dict, xai_methods: ~typing.Dict[str, ~typing.Callable] | ~typing.Dict[str, ~typing.Dict] | ~typing.Dict[str, ~numpy.ndarray], model: ~quantus.helpers.model.model_interface.ModelInterface, x_batch: ~numpy.ndarray, y_batch: ~numpy.ndarray, s_batch: ~numpy.ndarray | None = None, agg_func: ~typing.Callable = <function <lambda>>, explain_func_kwargs: dict | None = None, call_kwargs: ~typing.Dict | ~typing.Dict[str, ~typing.Dict] | None = None, return_as_df: bool | None = None, verbose: bool | None = None, progress: bool | None = None, *args, **kwargs) dict | None
Evaluate different explanation methods using specified metrics.
- Parameters:
- metricsdict
A dictionary of initialized evaluation metrics. See quantus.AVAILABLE_METRICS. Example: {‘Robustness’: quantus.MaxSensitivity(), ‘Faithfulness’: quantus.PixelFlipping()}
- xai_methodsdict
A dictionary specifying the explanation methods to evaluate, which can be structured in three ways:
Dict[str, Dict] for built-in Quantus methods (using quantus.explain):
Example: xai_methods = {
- ‘IntegratedGradients’: {
‘n_steps’: 10, ‘xai_lib’: ‘captum’
}, ‘Saliency’: {
‘xai_lib’: ‘captum’
}
}
See quantus.AVAILABLE_XAI_METHODS_CAPTUM for supported captum methods.
See quantus.AVAILABLE_XAI_METHODS_TF for supported tensorflow methods.
See https://github.com/chr5tphr/zennit for supported zennit methods.
Read more about the explanation function arguments here: <https://quantus.readthedocs.io/en/latest/docs_api/quantus.functions.explanation_func.html#quantus.functions.explanation_func.explain>
Dict[str, Callable] for custom methods:
Example: xai_methods = {
‘custom_own_xai_method’: custom_explain_function
} or ai_methods = {“InputXGradient”: {
“explain_func”: quantus.explain, “explain_func_kwargs”: {},
- }}
Here, you can provide your own callable that mirrors the input and outputs of the quantus.explain() method.
Dict[str, np.ndarray] for pre-calculated attributions:
Example: xai_methods = {
‘LIME’: precomputed_numpy_lime_attributions, ‘GradientShap’: precomputed_numpy_shap_attributions
}
Note that some Quantus metrics, e.g., quantus.MaxSensitivity() within the robustness
category, includes “re-explaning” the input and output pair as a part of the evaluation metric logic. If you include such metrics in the quantus.evaluate(), this option will not be possible.
It is also possible to pass a combination of the above.
>>> xai_methods = { >>> 'IntegratedGradients': { >>> 'n_steps': 10, >>> 'xai_lib': 'captum' >>> }, >>> 'Saliency': { >>> 'xai_lib': 'captum' >>> }, >>> 'custom_own_xai_method': custom_explain_function, >>> 'LIME': precomputed_numpy_lime_attributions, >>> 'GradientShap': precomputed_numpy_shap_attributions >>> }
- model: Union[torch.nn.Module, tf.keras.Model]
A torch or tensorflow model that is subject to explanation.
- x_batch: np.ndarray
A np.ndarray containing the input data to be explained.
- y_batch: np.ndarray
A np.ndarray containing the output labels corresponding to x_batch.
- s_batch: np.ndarray, optional
A np.ndarray containing segmentation masks that match the input.
- agg_func: Callable
Indicates how to aggregate scores, e.g., pass np.mean.
- explain_func_kwargs: dict, optional
Keyword arguments to be passed to explain_func on call. Pass None if using Dict[str, Dict] type for xai_methods.
- call_kwargs: Dict[str, Dict]
Keyword arguments for the call of the metrics. Keys are names for argument sets, and values are argument dictionaries.
- verbose: optional, bool
Indicates whether to print evaluation progress.
- progress: optional, bool
Deprecated. Indicates whether to print evaluation progress. Use verbose instead.
- return_as_df: optional, bool
Indicates whether to return the results as a pd.DataFrame. Only works if call_kwargs is not passed.
- args: optional
Deprecated arguments for the call.
- kwargs: optional
Deprecated keyword arguments for the call of the metrics.
- Returns:
- results: dict
A dictionary with the evaluation results.