Strategyproof Reinforcement Learning for Online Resource Allocation

Abstract	We consider an online resource allocation problem where tasks with specific values, sizes and resource requirements arrive dynamically over time, and have to be either serviced or rejected immediately. Reinforcement learning is a promising approach for this, but existing work on reinforcement learning has neglected that task owners may misreport their task requirements or values strategically when this is to their benefit. To address this, we apply mechanism design and propose a novel mechanism based on reinforcement learning that aims to maximise social welfare, is strategyproof and individually rational (i.e., truthful reporting and participation are incentivised). In experiments, we show that our algorithm achieves results that are typically within 90% of the optimal social welfare, while outperforming approaches that use fixed pricing (by up to 86% in specific cases).
Authors	Sebastian Stein (Southampton) Mateusz Ochal (Southampton) Ioana-Adriana Moisoiu (Southampton) Enrico Gerding (Southampton) Raghu Ganti (IBM US) Ting He (PSU) Tom La Porta (PSU)
Date	May-2020
Venue	2020 International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [link]