||To provide analytics for military operations, ma- chine learning models need to be trained on data from multiple sources at the tactical edge. Due to bandwidth and data privacy constraints in tactical coalitions, it is often impractical to transmit all the data to a central location. Distributed machine learning enables model training from decentralized datasets without sharing raw data. In this paper, we first analyze the theoretical convergence bound of gradient-descent based distributed learning with arbitrary data distribution at nodes. Using this bound, we propose a control algorithm that learns system characteristics and determines the best trade-off between local update and global parameter aggregation in real time, to minimize the learning loss for a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a prototype system and in a simulated environment. The results show that our proposed approach gives near-optimal performance for various machine learning models and system settings.