Explaining Temporal Information in Activity Recognition for Situational Understanding

Abstract Current techniques for explainable AI have been applied with some success to image processing. The recent rise of research in video processing has called for similar work in deconstructing and explaining spatio-temporal models. While many techniques are designed for 2D convolutional models, others are inherently applicable to any input domain. One such body of work, deep Taylor decomposition, propagates relevance from the model output distributively onto its input and thus is not restricted to image processing models. However, by exploiting a simple technique that removes motion information, we show that it is not the case that this technique is effective as-is for representing relevance in non-image tasks. We instead propose a discriminative method that produces a naive representation of both the spatial and temporal relevance of a frame as two separate objects. This new discriminative relevance model exposes relevance in the frame attributed to motion, that was previously ambiguous in the original explanation. We observe the effectiveness of this technique on a range of samples from the UCF-101 action recognition dataset, two of which are demonstrated in this paper.
  • Liam Hiley (Cardiff)
  • Harrison Taylor (Cardiff)
  • Alun Preece (Cardiff)
  • Yulia Hicks (Cardiff)
  • David Marshall (Cardiff)
  • Supriyo Chakraborty (IBM US)
  • Prudhvi Gurram (ARL)
Date Sep-2019
Venue Annual Fall Meeting of the DAIS ITA, 2019