A Compositional Reinforcement Learning Framework for Workflow Generation
Military / Coalition Issue
Core idea and key achievements
Standard machine learning (ML) techniques can be used to model complex systems and structures given a sufficiently large dataset for training. However, collecting labelled data at a sufficient scale is highly time and resource consuming, and in dynamic systems, where the underlying structure of the data can change over time, classical ML can struggle to adapt. Reinforcement Learning (RL) techniques do not require labelled data and are designed to interact directly with environments, so can adapt over time. However, standard RL can struggle significantly with multi-task scenarios, where multiple goals need to be learned by the same agent. Multi-task RL (MTRL) is a new sub-field that seeks to address this, by producing algorithms that aim to learn multiple skills more efficiently than learning each skill using a separate agent. Most MTRL algorithms fail to harness the inherent hierarchical structure present in many domains, where some goals can be broken down into smaller sets of skills. Our MTRL algorithm takes advantage of this underlying structure to increase the efficiency of learning and produce vector representations of possible workflows in parallel with learning. Our MTRL algorithm has the following characteristics:
- Learns in sparse reward environments (only a small set of possible workflows are useful).
- Learns a compositional structure of workflows within an environment, allowing for planning and execution of extended workflows.
- Produces vector representations of workflows, which can be used for analysis and interpretability down-the-line.
Implications for Defence
The direct military relevance of this demo is to learn goal-oriented workflows of distributed coalition microservices to facilitate self-organization and coordination for service discovery, selection, planning, and execution. Wider applications of the underlying multi-task reinforcement learning algorithms could include robotics and AI-driven military simulations for modelling tactics and strategy.
Readiness & alternative Defence uses
Technological readiness level: 2-3. Pseudocode for the algorithm is described in a paper currently under review and is available as source code.
Resources and references
- D’Arcy, Millar, et al. “Reinforcement Learning using Compositional Plan Vectors and Trajectory Experience Replay”
Cardiff University, IBM UK