Reinforcement Learning using Compositional Plan Vectors and Trajectory Experience Replay

Abstract	Agents in real-world environments need to master domains that involve a wide variety of skills. These tasks often have inherent compositionality, whereby they can only be completed by performing subtasks in a specific order. This structure may be exploited to learn tasks more efficiently by allowing multiple tasks to be learned concurrently, rather than independently. A recent and promising approach uses compositional plan vectors (CPV), where subtasks are represented as vectors, and sequences of subtasks are represented as the sum of those vectors. The original CPV formulation uses an imitation learning paradigm, but this requires expert demonstrations for each individual task learned. Consequently, generalization to performing new tasks—for which expert demonstrations are not available—fails. This is a well-known limitation of imitation learning in general and not of the CPV approach specifically. To overcome this, we generalize the original CPV formulation using a novel algorithm for deep reinforcement learning with trajectory experience replay (CPV-TER). Through extensive experimental evaluation, we demonstrate that this approach allows for more efficient learning of multi-task environments that involve compositional tasks, with a significantly shorter training time and up to three times the performance of other standard multi-task RL algorithms.
Authors	Laura D'Arcy (Cardiff) Declan Millar (IBM UK) Padraig Corcoran (Cardiff) Ian Taylor (Cardiff) Alun Preece (Cardiff)
Date	Sep-2021
Venue	Technical Report