||Reinforcement Learning (RL) is an effective technique for building ‘smart’ SDN controllers because of its model-free nature and ability to learn policies online without requiring extensive training data. However, as RL agents are geared to maximize functionality and explore the environment without constraints, security can be breached. In this paper, we propose Jarvis-SDN, a RL framework that constrains explorations by taking security into account. In Jarvis-SDN, the RL agent learns ‘intelligent policies’ which maximize functionality but not at the cost of security. Standard network flow based attack sig-natures obtained from intrusion detection system (IDS) datasets cannot be used as policies because they do not conform to the state model of the RL framework and thus have poor accuracy and high false positives. To address such issue, the security policies for constraining explorations in Jarvis-SDN are learnt in a semi-supervised manner in the form of ‘partial attack signatures’ from packet captures of IDS datasets that are then encoded in the objective function of the RL based optimization framework. These signatures are learnt using Deep Q-Networks (DQN). Our analysis shows that DQN based attack signatures perform better than classical machine learning techniques, like decision trees, random forests and deep neural networks (DNN), for common network attacks. We instantiate our framework for a SDN controller with the goal of intelligent rate control to further analyze the effectiveness of the attack signatures.