Abstract |
Saliency maps are a popular method for creating post-hoc explanations of image classification model outputs. These methods produce estimates of the relevance of each pixel to the classification output score, which can be displayed as a saliency map highlighting important pixels. Despite a proliferation of such methods, little effort has been made to quantify the quality of the saliency maps they produce. This is despite recent evidence that many methods produce maps that are invariant to the neural network's weights, and thus that cannot be faithful saliency representations (Adebayo et al., 2018). We therefore investigate possible metrics for evaluating the fidelity of saliency methods (i.e. saliency metrics) based on perturbing high-relevance pixels. We develop the theory of how we expect such metrics to perform, and what precisely they are measuring. We find that there is little consistency in the literature in how such metrics are calculated, and show that such inconsistencies can have a significant effect on the measured fidelity. Further, we note that the fidelity of a single saliency method measured over a set of images is highly variable, and influenced by image class. This leads to inconsistencies in ranking saliency methods for individual images. Overall our results show that current approaches for measuring saliency map fidelity suffer serious drawbacks that are difficult to overcome, and that quantitative comparisons between saliency methods using such metrics are flawed. |