||The deployment of Internet of Things (IoT) com- bined with cyber-physical systems is resulting in complex envi- ronments comprising of various devices interacting with each other and with users through apps running on computing platforms like mobile phones, tablets, and desktops. In addition, the rapid advances in Artificial Intelligence are making those devices able to autonomously modify their behaviors through the use of techniques such as reinforcement learning (RL). It is clear however that ensuring safety and security in such environments is critical. In this paper, we introduce Jarvis, a constrained RL framework for IoT environments that determines optimal devices actions with respect to user-defined goals, such as energy optimization, while at the same time ensuring safety and security. Jarvis is scalable and context independent in that it is applicable to any IoT environment with minimum human effort. We instantiate Jarvis for a smart home environment and evaluate its performance using both simulated and real world data. In terms of safety and security, Jarvis is able to detect 100% of the 214 manually crafted security violations collected from prior work and is able to correctly filter 99.2% of the user-defined benign anomalies and malfunctions from safety violations. For measuring functionality benefits, Jarvis is evaluated using real world smart home datasets with respect to three user required functionalities: energy use minimization, energy cost minimization, and temperature optimization. Our analysis shows that Jarvis provides significant advantages over normal device behavior in terms of functionality and over general unconstrained RL frameworks in terms of safety and security.