Interpretability of Deep Learning Models: A Survey of Results

Abstract Deep neural networks have achieved near-human accuracy levels in various types of classification and prediction tasks including images, text, speech, and video data. However, the networks continue to be treated mostly as black-box function approximations, mapping a given input to a classification output. The next step in this human-machine evolutionary process - incorporating these networks into mission critical processes such as medical diagnosis, planning and control - requires a level of trust association with the machine. To establish this trust, neural networks should provide greater visibility and human- understandable justifications for their decisions leading to better insights about the inner workings. We call such models as interpretable deep networks. There are a multitude of dimensions that together constitute interpretability. In addition, the interpretation itself can be provided either in terms of the low-level network parameters, or in terms of input features used by the model. In this paper, we outline some of the dimensions that are useful for model interpretability, and categorize prior work along those dimensions. In the process, we perform a gap analysis of what needs to be done to improve model interpretability.
Authors
  • Supriyo Chakraborty (IBM US)
  • Richard Tomsett (IBM UK)
  • Ramya Raghavendra (IBM US)
  • Dan Harborne (Cardiff)
  • Moustafa Alzantot (UCLA)
  • Federico Cerutti (Cardiff)
  • Mani Srivastava (UCLA)
  • Alun Preece (Cardiff)
  • Simon Julier (UCL)
  • Raghuveer Rao (ARL)
  • Troy Kelley (ARL)
  • Murat Sensoy
  • Chris Willis (BAE)
  • Prudhvi Gurram (ARL)
Date Sep-2017
Venue 1st Annual Fall Meeting of the DAIS ITA, 2017
Variants