||In this paper, we consider federated learning (FL) with adaptive degree of sparsity and non-i.i.d. local dataset. To reduce the communication overhead, we first present a fairness- aware gradient sparsification (GS) method which ensures that different clients provide a similar amount of updates. Then, with the goal of minimizing the overall training time, we propose a novel online learning algorithm for automatically determining the degree of sparsity. Experiments with real datasets confirm the benefits of our proposed approaches.