Distributed Coreset Construction for Efficient Machine Learning in Coalitions

Watch the video

Military / Coalition Issue

As the use of machine learning models rapid grows, there is a growing issue of how to update and replace those models as improvements are made over time, whether due to increased training, better algorithms, or adding new categories of “targets” to identify. Existing models tend to be fairly large in size creating yet more contention for the limited bandwidth at the edge of networks where the sensors that require the models are generally located.

Core idea and key achievements

This demonstration shows how that by parameter clustering and optimisation a core-set of parameters can be automatically selected to represent the whole model that uses only a fraction of the data size, yet retains high accuracy, and that this can be achieved across a distributed set of sensors.

Implications for Defence

This solution potentially allows machine learning model retraining on devices while in the field, for example to achieve better accuracy in local conditions. By using this technology model sizes, and thus bandwidth required for distribution, can be reduced by 90% and still achieve accuracy within 6% of optimal, or a reduction of 70% and be within 1% of optimal. This could potentially allow over the air updates, or more regular updates, to occur to ensure best fit to the operational conditions.

Readiness & alternative Defence uses

This technology has been show to work with existing image recognition machine learning model algorithms and while most applicable to defence, can also be used in any situation where you may have a locally distributed compute capability across a network of smaller devices, but that the communications links back to a base may be insufficient to allow reasonable data transfer of ML models.

This work has been continued by IBM Research US, and may be incorporated into a future product offering.

Resources and references


Penn State , UCL, ARL, IBM US, IBM UK