Mitigating Adversarial Examples in Neural Networks

Abstract While deep neural networks models have exhibited astonishing results in a variety of tasks over the past few years, researchers have recently shown how these models can be easily fooled and attacked using adversarial examples. Adversarial examples originate from benign examples which are correctly classified by the machine learning model, but an attacker adds a small amount of noise, usually unnoticeable by a human, which are deliberately crafted to make the classifier mistake on classifying the example after adding this adversarial noise. In this paper, we study the characteristics of adversarial noise and present approaches for increasing the robustness of models against adversarial examples attacks.
Authors
  • Moustafa Alzantot (UCLA)
  • Supriyo Chakraborty (IBM US)
  • Mani Srivastava (UCLA)
Date Sep-2017
Venue 1st Annual Fall Meeting of the DAIS ITA, 2017