A Policy-Constrained Reinforcement Learning Framework

Abstract	Reinforcement learning (RL) is a powerful machine learning technique by which agents explore a given environment to identify the action or sequence of actions leading to the maximum reward starting from a given state, according to some properly defined reward function. It is clear however that in many domains, such as IoT, security and safety is critical as some state transitions can result in safety or security threats to the user specifically or to the general environment. In this paper we introduce a RL environment supporting exploration restrictions based on safety and security policies. Initial experiments show that the framework is effective and efficient.
Authors	Anand Mudgerikar (Purdue) Elisa Bertino (Purdue) Dinesh Verma (IBM US) Jorge Lobo Alessandra Russo (Imperial)
Date	Sep-2020
Venue	4th Annual Fall Meeting of the DAIS ITA, 2020