Distributed Coreset Construction for Efficient Machine Learning in Coalitions

Abstract Coresets are summaries of datasets that preserve properties of the original dataset while significantly reducing the dataset size. They can be used for machine learning to obtain models approximately as good as those trained on the full data, but using a fraction of the amount of data. This is a promising approach for learning in the coalition setting: bandwidth is low and the network is unreliable, so minimizing data transfer is crucial. Here we demonstrated several algorithms for constructing coresets with data that is distributed across many edge devices, possibly operated by different coalition partners. Our demo illustrates the key concepts while grounding the work in the context of a coalition operation.
Authors
  • Hanlin Lu (PSU)
  • Stephen Pasteris (UCL)
  • Richard Tomsett (IBM UK)
  • Ting He (PSU)
  • Mark Herbster (UCL)
  • Shiqiang Wang (IBM US)
  • Kevin Chan (ARL)
Date Sep-2020
Venue 4th Annual Fall Meeting of the DAIS ITA, 2020