Abstract |
While reinforcement learning has achieved consid- erable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state repre- sentations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalisation and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generat- ing embedded representations. In this work, we propose a new approach for jointly embedding states and actions that combines aspects of model-free and model-based reinforcement learning. We use a model of the environment to obtain embeddings for states and actions and present an algorithm that uses these to learn a policy. The embedded representations obtained through our approach enable better generalisation over both states and actions by capturing similarities in embedding space and thereby improving convergence speed. The efficacy of our approach is evaluated on a small grid world as an initial example. For future work we plan to apply this methodology to SDC – a domain that tends to suffer from state-action space explosion. |