A Security-Constrained Reinforcement Learning Framework for Software Defined Networks

Military / Coalition Issue

For military coalitions It is critical that coalition partners ensure that networking is managed in a security-aware manner to prevent attacks from undermining resilience, while at the same time ensuring that communication flows are timely delivered.

Core idea and key achievements

The use of machine learning (ML) techniques in the control plane of software defined networks (SDNs) provides enhanced approaches to traffic engineering, such as maximizing quality of service (QoS). Generally, QoS is determined by the interplay within various network functionalities such as rate control, routing, load balancing, and resource management.

This interplay can become very complex. The benefit of ML techniques is that they can model complexity given sufficient representative data to train upon. However, the diversity and scale of current networks together with the diversity of traffic behavior hinder the task of gathering data that captures enough sets of behaviors for training. This poses a challenge to classical ML. Reinforcement Learning (RL), on the other hand, relies on learning optimal policies online based on system state using a model-free approach. These policies are more likely to transfer over to a new environment, and these characteristics make them more suitable for network control. RL based frameworks have thus already been proposed for specific functions within networks, such as for controlling routing, traffic rate control and load balancing. Current uses of RL for network control focus on optimizing a single functionality, which makes these existing solutions difficult to deploy in real networks. This is particularly problematic for security.

For example, learning a policy which maximizes the throughput of the network (functionality 1: optimal rate control for QoS) can unknowingly facilitate the propagation of a high throughput Denial of Service (DoS) attack. To address such shortcomings, we have developed a framework, Jarvis-SDN, to extends current reinforcement learning (RL) techniques with security-constraints.

  • In Jarvis-SDN, the RL agent learns ‘intelligent policies’ which maximize functionality but not at the cost of security. The goal is making sure that RL-agent does not allow malicious-looking or suspicious flows to propagate into the network at the same rate as benign flows.
  • The security policies for constraining explorations in are learnt in a semi-supervised manner in the form of ‘partial attack signatures’ from packet captures of IDS datasets that are then encoded in the objective function of the RL based optimization framework. These signatures are learnt using a Deep Q-Network (DQN).
  • Our analysis shows that DQN based attack signatures perform better than classical machine learning techniques, like decision trees, random forests and deep neural networks (DNN), for common attacks.

Implications for Defence

Our approach will allow the optimizing of network metrics of interest while at the same time ensuring security.

Readiness & alternative Defence uses

Provides a conceptual framework for the use of RL techniques for security of SDN. One instantiation has been implemented for an SDN controller with the goal of intelligent rate control with protection from DoS attacks.

Resources and references


Purdue, IBM US, Imperial