Approaches to address the Data Skew Problem in Federated Learning

Abstract	A Federated Learning approach consists of creating an AI model from multiple data sources, without moving large amounts of data across to a central environment. Federated learning can be very useful in a tactical coalition environment, where data can be collected individually by each of the coalition partners, but network connectivity is inadequate to move the data to a central environment. However, such data collected is often dirty and imperfect. The data can be imbalanced, and in some cases, some classes can be completely missing from some coalition partners. Under these conditions, traditional approaches for federated learning can result in models that are highly inaccurate. In this paper, we propose approaches that can result in good machine learning models even in the environments where the data may be highly skewed, and study their performance under different environments.
Authors	Dinesh Verma (IBM US) Graham White (IBM UK) Simon Julier (UCL) Stephen Pasteris (UCL) Supriyo Chakraborty (IBM US) Greg Cirincione (ARL)
Date	Apr-2019
Venue	SPIE - Defense + Commercial Sensing 2019