A Compositional Reinforcement Learning Framework for Workflow Generation

Military / Coalition Issue

Dynamic management of large, distributed coalition systems for tactical use is critically important to mission success. The internal structure and topology of these systems—and the performance and resource availability of individual microservices—can change frequently, so workflows designed to achieve specific tasks need to adjust accordingly. Reinforcement Learning (RL) can be used to plan workflows in highly dynamic systems with a minimal amount of labelled data. However, these microservice systems are often highly sparse in terms of useful actions, and the workflows that require discovery can be highly complex, causing standard RL algorithms to struggle.

Core idea and key achievements

Standard machine learning (ML) techniques can be used to model complex systems and structures given a sufficiently large dataset for training. However, collecting labelled data at a sufficient scale is highly time and resource consuming, and in dynamic systems, where the underlying structure of the data can change over time, classical ML can struggle to adapt. Reinforcement Learning (RL) techniques do not require labelled data and are designed to interact directly with environments, so can adapt over time. However, standard RL can struggle significantly with multi-task scenarios, where multiple goals need to be learned by the same agent. Multi-task RL (MTRL) is a new sub-field that seeks to address this, by producing algorithms that aim to learn multiple skills more efficiently than learning each skill using a separate agent. Most MTRL algorithms fail to harness the inherent hierarchical structure present in many domains, where some goals can be broken down into smaller sets of skills. Our MTRL algorithm takes advantage of this underlying structure to increase the efficiency of learning and produce vector representations of possible workflows in parallel with learning. Our MTRL algorithm has the following characteristics:

Learns in sparse reward environments (only a small set of possible workflows are useful).
Learns a compositional structure of workflows within an environment, allowing for planning and execution of extended workflows.
Produces vector representations of workflows, which can be used for analysis and interpretability down-the-line.

Implications for Defence

The direct military relevance of this demo is to learn goal-oriented workflows of distributed coalition microservices to facilitate self-organization and coordination for service discovery, selection, planning, and execution. Wider applications of the underlying multi-task reinforcement learning algorithms could include robotics and AI-driven military simulations for modelling tactics and strategy.

Readiness & alternative Defence uses

Technological readiness level: 2-3. Pseudocode for the algorithm is described in a paper currently under review and is available as source code.

Resources and references

D’Arcy, Millar, et al. “Reinforcement Learning using Compositional Plan Vectors and Trajectory Experience Replay”
https://pypi.org/project/gym-craftingworld/
https://gym-craftingworld.readthedocs.io/en/latest/

Organisations

Cardiff University, IBM UK