Abstract |
The enormous inference cost of deep neural networks can be mitigated by network compression. Pruning connections is one of the predominant approaches used for network compression. However, existing pruning techniques suffer from one or more of the following limitations: 1) They increase the time and energy consumed by the compute-heavy training stage due to the addition of the pruning and fine-tuning steps, 2) They prune layer-wise based on local information about a particular layer's statistics, ignoring the effect of error propagation through the network, 3) They lack an efficient means to determine the global importance of channels, 4) Due to the use of unstructured pruning, they may not lead to any energy advantage when implemented on mainstream platforms (GPUs and TPUs), requiring specialized hardware to reap the benefits. To address the above issues, we present a simple-yet-effective methodology for gradual channel pruning while training using a data-driven metric referred to as feature relevance score. The proposed technique eliminates the need for additional retraining by pruning the least important channels in a structured manner at fixed intervals during the regular training phase. Pruning is guided by feature relevance scores, which help in efficiently evaluating the contribution of each channel towards the discriminative power of the network. We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet using datasets such as CIFAR-10, CIFAR-100, and ImageNet, and successfully achieve significant model compression while trading off less than 1% accuracy. |