Discriminating Spatial and Temporal Relevance in Deep Taylor Decompositions for Explainable Activity Recognition

Abstract	Current techniques for explainable AI have been applied with some success to image processing. The recent rise of research in video processing has called for similar work in deconstructing and explaining spatio-temporal models. While many techniques are designed for 2D convolutional models, others are inherently applicable to any input domain. One such body of work, deep Taylor decomposition, propagates relevance from the model output distributively onto its input and thus is not restricted to image processing models. However, by exploiting a simple technique that removes motion information, we show that it is not the case that this technique is effective as-is for representing relevance in non-image tasks. We instead propose a discriminative method that produces a naıve representation of both the spatial and temporal relevance of a frame as two separate objects. This new discriminative relevance model exposes relevance in the frame attributed to motion, that was previously ambiguous in the original explanation. We observe the effectiveness of this technique on a range of samples from the UCF-101 action recognition dataset, two of which are demonstrated in this paper.
Authors	Liam Hiley (Cardiff) Harrison Taylor (Cardiff) Alun Preece (Cardiff) Yulia Hicks (Cardiff) David Marshall (Cardiff)
Date	Aug-2019
Venue	IJCAI 2019 Workshop on Explainable Artificial Intelligence (XAI)