Abstract |
Deep Learning models hold state-of-the-art performance in many fields, but their vulnerability to adversarial examples poses a threat to their ubiquitous deployment in practical settings. Additionally, adversarial inputs generated on one classifier have been shown to transfer to other classifiers trained on similar data, which makes the attacks possible even if the model parameters are not revealed to the adversary. This property of transferability has not yet been systematically studied, leading to a gap in our understanding of robustness of neural networks to adversarial inputs. In this work, we study the effect of network architecture, initialization, input, weight and activation quantization on transferability. Our experiments reveal that transferability is significantly hampered by input quantization and architectural mismatch between source and target, is unaffected by initialization and is architecture-dependent for both weight and activation quantization. To quantify transferability, we propose a simple metric, which is a function of the attack strength. We demonstrate the utility of the proposed metric in designing a methodology to build ensembles with improved adversarial robustness. Finally, we show that an ensemble consisting of carefully chosen input quantized networks achieves better adversarial robustness than would otherwise be possible with a single network. |