Team: John Hiesey and Daniel Sommermann.
Key Result(s): Increasing the initial congestion window improves performance, but under lossy network conditions, there is an upper limit to how high this window should be initialized.
 Nandita Dukkipati et al., An Argument for Increasing TCP’s Initial Congestion Window, ACM SIGCOMM, July 2010
(Paper at: https://developers.google.com/speed/articles/tcp_initcwnd_paper.pdf)
 Nandita Dukkipati et al., Increasing TCP initial window, 78th IETF, Maastricht
(Slides at: http://www.ietf.org/proceedings/78/slides/iccrg-3.pdf)
Contacts: John Hiesey (firstname.lastname@example.org), Daniel Sommermann (email@example.com)
In this post, we measure the effect of increasing size of the TCP initial congestion window. In doing so, we attempt to replicate and extend some of the results of Dukkipati, et al. . In the original paper, the authors propose increasing TCP’s initial congestion window to decrease the latency of HTTP requests from a browser to a Web server. Often, HTTP creates a connection and then requests a small object (on the order of tens of a few kilobytes), and often closes the connection afterwards. The TCP slow start algorithm increases the congestion window size, which represents the maximum amount of data that can be outstanding before an ACK, by one segment size per ACK received, starting with the initial congestion window size (init_cwnd). Since Web connections tend to be short, they typically close before leaving the slow start phase, making init_cwnd have a large effect on how many round trip times a HTTP request/response requires. The original paper describes, in some depth, why increasing init_cwnd can improve the performance of the Web.
We have decided to replicate a number of the experiments from the original paper, as well as to investigate some potential problems with larger initial congestion windows. To do so, we utilize Mininet, a network emulator that runs on top of a single Linux host. By building topologies with controlled link bandwidth, latency, and other properties, we can evaluate the effect of changing init_cwnd in a controlled environment.
First, we will measure the effect of the increase in congestion window size has on the latency of small, fixed-size web requests for links with various RTTs and bandwidths. We will explore whether these results agree with those in the original paper, and determine whether an even larger congestion window size may help to a greater degree. In addition, we will explore the limitations Mininet imposes on these measurements.
Second, we will attempt to evaluate the potential negative impact of larger init_cwnd. The Dukkipati et.al. paper discusses the potential for larger initial congestion windows to cause more frequent packet loss due to buffers overflowing. Unfortunately, the authors of that paper only attempted to measure this effect using a small amount of data. We attempt to quantify this effect in Mininet by opening many TCP connections over a bottleneck link and evaluating the goodput of this network for various init_cwnd sizes.
For all of our experiments we use the general topology shown below implemented in Mininet. The only link with realistic parameters is the bottleneck link between the two routers (labeled r1 and r2) in the center of the diagram. The links connecting hosts to their router are modeled with infinite bandwidth and zero latency. In addition, we use a simple identity traffic matrix in our experiments. Thus, each client on the left communicates with exactly one server on the right. Client 1 requests from server 1, client 2 from server 2, and so on. The clients on the left run a curl command to download a file from their respective servers. We measure the latency as reported by curl as our dependent variable.
To replicate the main result of , we use only one client/server pair (N = 1) and set the delay of the bottleneck link to various values to change the end-to-end RTT, our independent variable. For each RTT we compare the latency of transferring files of various sizes for the two initial congestion window sizes (3, the default in Linux when  was written, and 10, the proposed new value). The bottleneck link for experiments 1-3 is 100 Mbps, while for experiment 4 it is 0.5 Mbps. In experiment 1 the single client requests a 30 kB file from the single server. In the rest of the experiments, the clients request exponentially distributed file sizes between 300 bytes and 30 kB to model the common traffic conditions observed in the Google paper.
In our second experiment, we use the same topology from the first experiment, but instead of comparing merely two initial congestion window sizes (3 and 10), we compare many different congestion window sizes (3, 6, 10, 16, 26, 42) to find an upper bound of acceptable initial congestion window sizes. Here we measure the absolute latency as opposed to the “improvement.”
For our third experiment, we repeat the second experiment but set the loss rate of the link to 5%.
In the fourth experiment, we replace the lossy link with a congested situation with N > 1 in the topology above. The lossy link of the third experiment was intended to approximate the conditions of this experiment. Here the clients make requests modeled as a Poisson process with a lambda parameter of 1 (roughly 1 request per second). The bottleneck link bandwidth in this case is 0.5 Mbps. We again average the latency as experienced by the clients and compare their absolute latencies.
For these experiments to work, we must use the ip command to change the initial congestion window on the Mininet hosts. The command one can use to change the initial congestion window on Linux is:
ip route change default via <gateway> dev eth0 initcwnd <iw>
It is important to note that Linux kernel versions 2.6.39 through 3.2 have a bug that makes this command not work. Thus, for our experiments we used Linux kernel version 3.3 compiled for AWS in a Ubuntu 12.04 LTS distribution. We were not able to use a stock 12.04 AMI since 12.04 ships with kernel version 3.2. The AMI we used is available in the US – East (Virginia) region and has ID “ami-baa505d3”. All of our experiments were evaluated on EC2 using this AMI. For more specific instructions on reproducing our results here, see the section at the end of this post.
In our first experiment, we were able to reproduce fairly closely the results that were obtained in the original paper. The original figure is shown above our figure.
While we found the absolute improvement matches the original findings well, the percentage improvement was slightly off. The overall shape matches well with the original findings and confirms there is motivation for looking into increasing the initial congestion window.
We then ran the experiment with many different initial congestion window sizes and compared their absolute latency.
As expected, when there is zero loss, there is no downside to increasing the initial congestion window arbitrarily, although we do see diminishing returns. When adding loss into the bottleneck link we get the results shown below:
For small RTT’s, we can see how smaller initial congestion windows perform better than higher congestion windows. For larger RTT’s, large initial cwnds have a general advantage, although we can see some evidence that very high initial congestion windows may not perform quite as well as an initial congestion window closer to 16.
Replacing the lossy link with a congested link, with many clients requesting from many servers over the bottleneck, we get the results below. For time saving, we tested with fewer cwnds:
These results match closely with the previous experiment, as we would expect, since a lossy link is an approximation of a congested link with dropped packets. The same conclusions can be drawn here: for small RTTs, smaller initial cwnds perform better. For high RTTs, larger initial cwnds perform better.
Generally speaking, we were able to verify that a default congestion window of 10 is a good value that works well under a variety of RTTs and loss situations. For some very high RTT connections, an even higher congestion window can help further, however, for <200 ms RTT (which one would imagine accounts for the majority of connections), it is better to choose a lower initial congestion window.
We found that our experiments ran significantly longer when we set the link loss to a nonzero value, which makes sense given that we are dealing with high latency links. Retransmissions dominate the time of running our code in these cases. We also discovered in the course of the project that a bug is present in the Linux kernel (specifically dealing with the dst module) between versions 2.6.39 and 3.2 inclusive that prevents the changing of the congestion window. Thus, we had to use version 3.3 of the kernel. The version of openvswitch, which the current version of Mininet uses, does not support running on post 3.2 kernels. Therefore, we had to bundle a custom version of Mininet in our EC2 AMI that makes use of an in-development version of openvswitch compatible with newer Linux kernels. We spent a considerable amount of time looking through Linux kernel source code before finding out that the newer versions of the kernel don’t have the bug. Nonetheless, we had to learn how to compile the Linux kernel and enable Xen for EC2 compatibility, which is useful knowledge. In general, this project taught us a lot about how the Linux kernel’s implementation (both past and present) of congestion windows works.
The use of Mininet itself was very easy and went very quickly. As with all in-development projects, the lack of documentation and API lists made for more try-and-check style coding, but since our topology was relatively simple, this aspect of the project was fairly easy.
Instructions to Replicate This Experiment:
- Sign up for Amazon EC2
- Create a new instance of the AMI on region US – East (Virginia) with ID “ami-baa505d3”. (We recommend using a c1.medium instance, even though the experiments are not CPU bound)
- SSH into this AMI, and run:
git clone git://github.com/jhiesey/cs244-proj3.git
- cd into cs244-proj3 and run:
git submodule init; git submodule update
(run this only once per machine boot)
- Examine the produced result.png files for the results. They are located in the generated “latency-<timestamp>-<experiment>” subdirectories.