Traditionally, it has been difficult for network operators to guarantee consistent policies to in-flight packets during configuration transitions, leading to potentially insecure or undefined network behavior during the transition period. It is possible that the transition process briefly allows propagation of malicious traffic, causes packets to be dropped, or incurs bandwidth degradation. For example, a flow of packets through a network would be disrupted during the transition if one switch using an older configuration forwarded packets to another switch with a conflicting, newer configuration.
In “Abstractions for Network Update,” Reitblatt et al.  exploit the abstraction principles of software-defined networks, logical centralization, to develop a system that can provably execute network updates while guaranteeing that the network experiences no undefined behavior during reconfiguration. The authors present “Kinetic,” a system to make consistent updates to ensure both packet and flow consistency during configuration transitions. The authors develop a formal model from first principles that provides provable consistency guarantees for the policies seen by any packet or flow in-flight during a network configuration update. Using this framework, the core concept introduced is that of a 2-phase process using packet version tagging for deploying network updates. Newer models by the authors use a new network programming language, called Frenetic.
Motivation: Why are consistent updates important?
Networks are subject to dynamic configuration changes for a variety of reasons, including granting/revoking privileges, load balancing, and expanding/removing end hosts and switches. However, reasoning about and analysis of highly complex and concurrent systems becomes intractable due to the explosion of possible states. Static analysis tools provide verification of steady-state network policies, but cannot be used to verify the dynamics of configuration update transitions.
The original authors of  argue that burden of ensuring consistent network policy should not rest on the parties responsible for designing and deploying network policies. Rather, the configuration process should be seamlessly abstracted from the network policy designer and operator, allowing for focused effort on what counts – the steady-state behavior of the network as a whole. This pattern of configuration abstraction is prevalent in many computer systems today, such as compilers and version control, and the field of networking would benefit greatly from development in this area.
The formal model, reported in  shows that the Kinetic framework provides provable per-packet consistency guarantees during update transition. The authors’ implementation utilizes the open-source NOX controller to implement the OpenFlow API atop a simulated network in Mininet, and subsequent experimental results provide performance metrics for one realization of the formal model. For a 192-host network, the reported results show that overhead of the operation can be quite high, depending on the application and topology. In routing applications, the 2-phase process requires several hundred thousand operations and incurs a maximum switch overhead of 80-100% across all experiments. Multicast applications requires several orders of magnitude fewer operations, but typically incurs a higher maximum switch overhead.
The authors also show that operation count and overhead can often be significantly reduced by introducing optimizations that target only the relevant paths (a subset of the entire network) for a configuration update. For routing applications, these subset optimizations typically reduce the required number of operations and overhead by 50%. Multicast performance is also improved, but not as significantly.
In this project, we have chosen to reproduce the results presented in Table 2 of the original paper. This table summarizes many experiments, reporting the number of modified switch rules, or operations (Ops), and maximum switch overhead for each individual run. The Kinetic update framework is run for two applications (routing and multicast) across three different topologies (fattree, waxman, smallworld) under three different configuration changes (hosts, routes, and both). This set of experiments is then run again using subset optimizations (described above) to examine the performance gain under different scenarios. For all experiments, size of the network was fixed at 192 hosts and 48 switches.
The advantage of the consistent update process outlined in  is a clean abstraction of the network configuration state, however this comes at a cost. Providing consistency guarantees to in-flight packets can be an expensive operation, with overhead on individual switches reaching 100% for several experiments. Table 2 of the original paper presents the bulk of experimental results from which performance discussions can be made, so it is important that these results be reproducible.
Our results, in Tables 1 and 2 below, closely match those reported in . We observe that the reported number of operations for most individual experiments deviate slightly from those reported in Table 2 of the original Consistent Updates paper, but are on the same order of magnitude. However, these results are deterministic and repeatable across multiple executions. We posit that discrepancies between our reported results and those in  could be due to different seed values for random number generators used to instantiate topologies or using different versions of the underlying software (Mininet or NOX-classic).
|Ops||Max Overhead||Ops||Ops %||Max Overhead|
Our initial implementation hurdles were due to deprecated dependencies used by the Kinetic framework. Since the original publication of this paper, NOX development has branched into a high performance NOX implementation in C++ and a research implementation in Python called POX. The authors’ implementation was developed when NOX still supported a Python interface using SWIG to compile C++ to Python library files. We solved this roadblock by reverting to a legacy version of NOX, called NOX-classic, that contains the original Python hooks, running Ubuntu 10.04 and preloading the symbol tables (using LD_PRELOAD). As part of our contribution, we created OS-independent run scripts which set environment variables and allow the update code to be run. For future research, it would be interesting to port the update code to newer versions of the controller, such as POX.
The thesis of the paper holds — it is possible to execute consistent updates in a software-defined network — however it would be helpful to develop metrics for testing the latency of network update. While operations (Ops) is a proxy for computational complexity of the configuration update, it is not a strong proxy for total update latency. An OpenFlow controller can trigger the updates of many nodes in parallel, and therefore we expect the total elapsed time for a update to propagate to be a function of network size, topology, and magnitude of configuration change. We address the quantity of operations and runtime over a range of network sizes as part of our extended explorations in this project.
There are two explorations using the Kinetic framework that we have pursued outside of the reported results of the original paper.
First, we investigate the number of required update operations over a range of network sizes to examine the scalability of this framework. The original paper reports the number of update operations using a fixed number of hosts (192) and switches (48). In this analysis we repeatedly run the experiments by varying the number of switches from 6 to 96, while maintaining a constant fanout of 6 end hosts per switch (the network grows uniformly). A plot of the number of update operations versus the number of end hosts to make configuration changes is shown in Figure 1, involving end hosts only. As the size of the network grows, we find that the number of consistent network update operations scales approximately quadratically for routing applications and linearly for multicast applications. This quadratic relationship is evident of the nature of the 2-phase update; adding or removing a host may alter the routing tables for multiple switches along the relevant paths.
Using the number of update operations as a proxy for runtime, our results indicate that the consistent update overhead and runtime increase with network size, but is dependent on the network application (routing vs. multicast), and therefore the granularity of the switch forwarding rules. In this experiment, multicast forwarding rules use prefixes to denote groups of hosts, thus leading to fewer and more general forwarding rules. Routing lookup tables, on the other hand, are heavier due to increased specificity of the forwarding rules.
We also investigate the runtime of the consistent update by logging elapsed time between the start and finish of each experiment — after initial initialization of the network in Mininet, shown in Figure 2. As discussed earlier, update operations are a proxy for update complexity, but are not indicative of the latency to complete a configuration update. The authors state in the original paper that update runtime was not measured because Mininet does not offer fidelity or resource isolation between the switches and the controller. For this exploration, we timestamp the start and end of the consistent update application to obtain an estimate of update latency.
Figure 3, the amount of operations per second, also appears to scale linearly with the number of hosts, which suggests that Mininet on a powerful instance (c1.xlarge) can support the greater load from more hosts. The fact that the number of operations scales quadratically with the number of hosts, while the number of operations per second is linear with the number of hosts, could lead to the conclusion that the time is also quadratic (shown in Figure 2).
For future work, a valuable exploration would be to investigate 2-phase update mechanisms that do not modify packet contents. The current implementation tags packets with versioning information in the VLAN tag of the Ethernet frame. However, overloading a protocol field may lead to undesirable behavior as different devices may now use this value for different functionality.
We have configured a custom Amazon Machine Instance (AMI) running Ubuntu 10.04 to closely match the original authors’ environment to avoid the dependency issues we encountered during implementation. The custom AMI is configured with the necessary libraries and packages (Boost, Nox-classic, Mininet 1.0). Running our experiment is as simple as creating a new instance using the AMI, cloning the consistent update code from our Github repository and running a simple shell script, as described in the README.
We chose to deploy a pre-configured AMI instance to facilitate rapid reproducibility of results with minimal effort for end users. Our choices were born out of much frustration encountered while configuring dependencies to create the AMI itself – others should not have to repeat this painful process.
As the results are deterministic, the size of the instance on which the AMI is installed is a factor that will affect reproducibility. Our experience indicate that larger instances (c1.medium, c1.xlarge) are best for rapidly generating all experimental results, although a local VM is possible for smaller tests.
|Ops||Max Overhead||Ops||Ops %||Max Overhead|
Please see the README in the GitHub repository.
EC2 does not allow the easy import of VMWare-created Linux images. We had an early working version of the code on a VM, but ran into difficulties getting the same software to work on EC2 due to dependency issues.
Some of the Python code spawns background processes but does not gracefully handle exceptions, which can cause foreground processes to hang indefinitely. This made it tricky for us to debug some of the code out of the box. However, we understand the intent of the code is for research purposes.
 M. Reitblatt et al, Abstractions for Network Update, ACM SIGCOMM 2012.
 Frenetic, a networking programming language. http://frenetic-lang.org/