Daniel Chiu & James Hong
In their original paper, Dukkipati et. al lay out an argument for increasing TCP’s initial congestion window size . Their motivation for doing so is to improve the average latency of HTTP responses since webpages have grown in size and smaller congestion window sizes incur avoidable latency overhead in the form of extra RTTs. (In fact, as the authors point out, many modern web browsers open many concurrent TCP connections to transfer data more quickly.) By increasing the initial TCP congestion window, servers are able to transfer the web page data much more quickly, reducing the need for inefficient hacks to circumvent TCP’s congestion control. Specifically, the authors argue that a larger initial congestion window of 10 segments is large enough to fit 90% of responses from several Google pages and services and most web responses from commonly trafficked websites.
In their experiments, Dukkipati et. al found that by increasing the initial congestion window to at least 10 (15Kb) segments, they were able to achieve an average latency improvement of 11.7% (68ms) in their average Google data center (a datacenter with a median bandwidth of 1.2Mbit/s and median RTT of 70ms) and an 8.7% (72ms) improvement in slow data centers (data centers with a median bandwidth of 500Kbps and median RTT of 70ms). They also performed experiments bucketed by bandwidth, round-trip time, and bandwidth-delay product and produced graphs showing the absolute and percentage latency improvement for different buckets of values (shown to the right). The authors found that average response latency improved across all buckets for all properties with the largest benefits for high RTT and high BDP networks and the smallest benefits for low RTT paths.2. Subset Goal and Motivation
We first replicate the top two graphs from figure 5 (above), which demonstrate the average and percentage improvements in web search “response latency” from an initial congestion window of 10 packets under a variety of bandwidth and RTT conditions. These two plots capture the crux of the authors’ argument that a larger initial congestion window delivers consistent, and reasonable, improvements in web server response latency.
In addition, we also replicate figure 2 (below), showing response latency with a range of initial congestion window sizes. This plot shows diminishing returns as window size increases past 10, and helps to justify the paper’s recommendation.3. Subset Methodology
Our experiment is setup as follows. We define a mininet topology consisting of two hosts, acting as server and client, connected to a switch in between. For this experiment, we disable random packet drops and set the buffer size to be sufficiently large. We vary the bandwidth at a bottleneck link and initialize link delays according to the RTT. By default, our topology runs with an RTT of 70ms and a bandwidth of 1.2Mbit/s, the average values for AvgDC in the paper. We use the CUBIC congestion control algorithm at both the client and server, but this is configurable. In our topology, RTT and bandwidth are independent, whereas in the paper, they are highly correlated based on traffic at AvgDC; we discuss the consequences of this correlation in our results section.
On the server node, we run a simple webserver based on python’s BaseHTTPServer. Our code allows for two types of request handlers; one that serves statically sized pages and another that serves pages based on a 7-piece piecewise function modelling the probabilistic distribution of Google Web Search request sizes shown in figure 1, red line (below).To set the initial congestion window on the server, we use ip route and append the initcwnd flag to appropriate routing table entry. We also must set the initrwnd (advertised receive window) of the client using ip route as well; this value varies between versions of Linux, Windows and OSX. Since we assume the majority of Google users run Windows, we set this to approximate the Windows default value of 65,535 bytes or approximately 43 packets . For our main simulation and for the distribution of Google web search results shown in the paper, the congestion window will almost never need to grow to be this large.
4. Subset Results
Overall, our results simulating figure 5 match the latency patterns reported in the original paper for typical RTT ranges and bottleneck bandwidths (the original paper provides the distribution of requests that fall under each bandwidth and RTT category). Shown below, our plots for figure 5 differ from the plots presented by Dukkipati et. al in two significant ways. For the following plots, we configure the webserver to serve 9Kb responses.
First, for low bandwidths, 512Kbit/s and below, we report small or no absolute/percentage improvements in latency from increasing the congestion window, whereas Dukkipati et. al report large improvements. This is because in our topology, we hold minimum RTT constant at 70ms when varying the bandwidth. In the paper, bandwidths are estimated by subnet and are highly correlated with RTT; subnets with lower estimated bandwidths also had much higher minimum RTT on the order of 4-5 times the average of 70ms. Even though the initial congestion window is set to 10 packets, it is physically impossible to packetize 9Kb of data before acks begin coming back when bandwidth is set to less than 1Mbps and RTT is held constant at 70ms; this can be verified with tcpdump. As bandwidth increases, the absolute increase in response latency approaches one RTT (33% improvement) as expected, when responses are 9Kb.
Second, for very large minimum RTTs (greater than or equal to 1s), we found that when using the default linux TCP implementation, there was no improvement from setting the initcwnd to 10 via ip route. However, we found that this was actually due to the RTT being larger than the initial retransmission timeout, RTO, (for the SYN packet) on the client (1s). The server would receive duplicate SYNs, reply with duplicate SYN/ACKS, and the client would reply with duplicate ACKS, causing TCP at the server to drop the congestion window to 1 when it sends data, which we confirm with tcpdump and tcpprobe. When we recompiled the kernel with a larger minimum RTO, we were able to reproduce the latency vs. RTT plot from figure 5 exactly (shown below). Ultimately, this is not a major problem as less than 3% of requests in the Google AvgDC experiment had RTTs greater than 1s. Put into perspective, the RTT for radio communications between the earth and the moon is roughly 2.6 seconds . We believe that Google clients originating requests with greater than 1s RTTs may have been configured differently with larger initial RTOs to begin with. Barring this, the improvement we observe in latency is one RTT and approaches a 33% decrease as before.
To reproduce figure 2 (below), we vary the initial congestion window while holding RTT and bandwidth constant at 70ms and 1.2Mbit/s, respectively. Response sizes are returned using our piecewise simulation of Google Web Search responses. Shown below, our plot closely resembles figure 2 reported by Dukkipati et. al and shows that with a distribution of response sizes centered heavily around 9Kb, there are diminishing benefits for an initial congestion window larger than 10 packets.
Finally, we include a plot of latency improvements vs. response size for initial window sizes of 3 and 10 (above). As expected, there is no difference when the response fits within 3 packets. When the response is too large for 3 packets, then the request from the server with an initial congestion window of 10 completes one RTT faster.
We had the most difficulty reproducing the paper’s results for low bandwidths and extraordinarily large RTTs for reasons already explained above in the Subset Results section. We also do not have the information or resources to reproduce exactly the request distribution in the original paper. Instead, we varied one variable at a time, keeping others fixed; this led to consistent and explainable results, but frustrated our efforts to simulate the graphs. We did not attempt to reproduce the 3rd plot, latency vs. BDP, of figure 5 because BDP depends on RTT and bandwidth, which we did not have a joint distribution over.
When implementing and testing our code we encountered several challenges also noted by groups attempting to replicate figure 5 in previous years. For example, Jung and Quinonez report that sporadic ARP broadcasts caused irregular delays . This was the case in our simulation as well and becomes especially problematic when link delays are long. We eventually traced the problem to overzealous default timeouts of 60s, which we then set to 3,600s. In a similar vein, when verifying connectivity with ping for the first time, the first ping will time out for very large RTTs due to the need to populate the ARP cache, but subsequent pings will succeed.
Overall, we found that Dukkipati et. al’s argument for a larger initial congestion window of 10 holds well in our simulated mininet topology. With an initial congestion window of 10 packets, average sized requests were able to complete one RTT sooner. The differences between our simulated results can largely be attributed to the positive correlation between RTT and bandwidth for real web clients (the authors mention that most low bandwidth traffic originates from dial-up modems and mobile networks) and variety in client TCP configurations (due to operating system, version differences, etc.). For average size web responses, setting an initial congestion window of 10 packets appears to be a simple and low-risk way to achieve faster response latency.
- Launch a new EC2 instance. Under Community AMI’s search for cs244-dwchiu-jamesh93-pa3 and create an instance with this image. A c3.large instance should be more than sufficient.
- If the Community AMI cannot be found (possibly waiting to be indexed by Amazon), it should suffice to make an instance off of the cs244-Spr15-Mininet AMI also available under Community AMI’s
- You will then need to clone the repository for our code available here: https://firstname.lastname@example.org/chiubaka/cs-244-pa-3.git
- Once your EC2 instance is up and running, SSH into it as the user: ubuntu.
- In the home folder for the ubuntu user, there is a folder called cs-244-pa-3. Navigate into this folder to find the experiment files.
- Run git pull to make sure you have the most up-to-date version of the experiment files.
- Run sudo ./run.sh to run the experiments. Results will be output to a timestamped results folder in the current directory.
- Please note that there is an exception raised when the script cleans up the Mininet topology. This exception does not affect functionality of the code and seems to be a bug related to Mininet’s interaction with the VM.
- Compare resulting images to the images available in the example-results directory. Results for RTT’s at or above 1000ms may not match because the kernel initial RTO (timeout value) is set to 1000ms by default and interferes with the experiment. (The AMI runs the unmodified kernel, since it doesn’t make realistic sense for an admin tuning initial cwnd at a server to configure client settings.)
 Dukkipati, Nandita, Tiziana Refice, Yuchung Cheng, Jerry Chu, Tom Herbert, Amit Agarwal, Arvind Jain, and Natalia Sutin. “An argument for increasing TCP’s initial congestion window.” Computer Communication Review 40, no. 3 (2010): 26-33.
 Sajal. “Tuning Initcwnd for Optimum Performance.” CDN Planet. October 11, 2011. Accessed May 29, 2015. http://www.cdnplanet.com/blog/tune-tcp-initcwnd-for-optimum-performance/.
 Jung, Raejoon, and Stephen Quinonez. “CS 244 ’14: AN ARGUMENT FOR INCREASING TCP’S INITIAL CONGESTION WINDOW.” REPRODUCING NETWORK RESEARCH. June 3, 2014.
 “Communication Delay.” Accessed May 29, 2015. http://www.spaceacademy.net.au/spacelink/commdly.htm.