A Long Journey for Transmission Control Protocol (TCP)……
— FROM RFC675 TO RFC7414
Benjamin Schwartz (bschwart at stanford.edu)
Xiaoxi Zhu (zxx313 at stanford.edu)
Transmission Control Protocol (TCP) was not designed for today’s Internet, but it is almost exclusively used as a transport layer. There have been continuous efforts and innovation from industry and academy to achieve better performance. Most of the works target at specific behaviour/subsets of TCP algorithm – fast open, time stamp, packet pacing, window scaling etc.
The paper – Proportional Rate Reduction for TCP – focuses on TCP retransmission algorithm to handle packet loss. The proposed algorithm, Proportional Rate Reduction (PRR), spreads out packet retransmission with incoming ACK (if multiple consecutive packets are dropped). Therefore, it helps to prevent TCP from entering Retransmission Time Out (RTO) and increases overall network throughput.
The paper is motivated by a study of TCP retransmission patterns with live traffic data collected from Google’s web servers. The statistics demonstrate different retransmission patterns in the two selected Data Centers, where the dominating traffic patterns are Web applications (short response) and video streaming (long lived flows). Nevertheless, packet loss is inevitable and impacts the network throughput. PRR helps to alleviate the pain by reducing time spent in recovery and maintaining a large congestion window after recovery.
What the Authors Found
The paper investigates the efficiency of the algorithm in two ways. First, an experiment with a single flow shows the expected behaviour when responding to loss created with Googles’s open source network testing tool Packetdrill (https://code.google.com/p/packetdrill/). It demonstrates the detailed algorithm implementation, which has the following desired properties:
- No idle period after retransmission of first lost packet – this is achieved by implementing proportional ACK clocking (similar to Linux Kernel rate-halving).
- Convergence of congestion window size (cwnd) to slow start threshold (ssthresh) after recovery – this property does not hold for the Linux Kernel if there is heavy loss.
- Continue sending new incoming data even after temporary stall – this helps to improve the overall traffic throughput.
- More accurate estimation of delivered data during recovery – it decouples data transmission in recovery stage from pipe size estimation, which is more prone to error.
Second, the algorithm is explored at scale by implementing it in Google’s production network and relevant network statistics are collected. The data proves that PRR algorithm outperforms Linux TCP implementation and RFC3517 in the following two ways:
- Average time spent in recovery stage is reduced
- Congestion window size is larger after recovery stage.
Our Interest & Goal
Our team is particularly interested in exploring the fundamentals and details of the proposed TCP algorithm – scheduling of retransmission, cwnd re-size, new packet transmission, etc – during recovery stage. Therefore, we have decided to work on reproducing the two plots in Fig.2 of the paper, which detail the TCP flow status of the Linux implementation (bottom) vs PRR (top).
The different behaviours of two TCP algorithms (Linux vs PRR) is tested and demonstrated with specific traffic pattern – a consecutive of four packets are dropped at the beginning of one full congestion window. Fast retransmission is triggered when duplicated ACKs are received. In addition, both algorithm support TCP option of selective acknowledgement (SACK).
The major difference observed from the plot is as follows:
- Linux implementation uses the rate-halving mechanism – retransmit one packet upon receipt of two duplicate ACKs. PRR spaces out packet retransmission more evenly.
- After recovery stage, cwnd of Linux is small as a result of traffic stall (no new packet arrives for transmission) during recovery stage. However, cwnd of PRR is approximately half of the cwnd size before entering recovery.
We setup two virtual machines (VM) on separate Amazon EC2 instances, one with Ubuntu 11.10/Linux kernel 3.0 and the other running Ubuntu 12.04/Linux kernel 3.2. Starting from Linux Kernel 3.2, PRR is incorporated in the TCP implementation.
A simple Mininet topology is created: two hosts connected by a L2 switch via 100Mbps link. The Open vSwitch is controlled by a customized POX controller. The controller inspects incoming packets, and drops TCP packets to simulate packet loss.
Netcat is used to send file from one host to the other over a TCP connection. The TCP flow is recorded in pcap file with tcpdump listening on the sender’s Ethernet interface. We post-process the pcap file to generate the plot.
Detailed instructions on how to setup the testing environment and reproduce the results can be found in README section.
The very first result
In this experiment, the switch drops the first five packets immediately after the TCP three-way handshake is established, while the initial congestion window size is ten(10) packets. The plots of TCP traffic given below result from four test runs – two linux TCP implementations – Kernel 3.0 without PRR vs Kernel 3.2 with PRR using both a smaller file (32KB) and larger file (64KB)
Two significant differences of the algorithm can be noticed:
- Kernel 3.0 without PRR immediately retransmits multiple packets upon receipt of duplicate ACKs. Subsequent retransmission is paced by half-rating – retransmit one packet for every two ACKs received. Kernel 3.2 with PRR re-transmits one packet upon receipt of SACK. Subsequent retransmission is paced out using the algorithm proposed in paper – controlled by current TCP flow status, cwnd, ssthresh, pipe, packet-in-flight, etc. The pace out behavior is not obvious in this case due to the small cwnd size. We setup another experiment to verify this point (coming up in later sections).
- Kernel 3.0 without PRR reduces its cwnd size more drastically during the recovery stage. Kernel 3.2 with PRR maintains a significantly larger cwnd size after recovery stage. Therefore, the overall transmission time is shortened/throughput increased if PRR is implemented.
What if the network is misbehaving…
In this experiment, we examine the TCP behaviour under heavy packet loss. In the first full cwnd with 10 packets, packet 1-4 and packet 6-8 are dropped. The plots are shown as below:
Under heavy loss, the superior performance of TCP with PRR can be clearly observed. Linux 3.0 without PRR does not transmit any new packet before lost packets are fully recovered. Even worse, cwnd is reduced to 1 due to the heavy loss. Linux 3.2 with PRR is able to advance its cwnd and send new packets after the first batch of dropped packets are retransmitted and acknowledged. cwnd size is close to half of the initial value immediately after recovery. As a result, the time spent to deliver a 32KB file under heavy loss is only half of that when no PRR is implemented.
Playing with the Experiment
In the final experiment, the switch waits after one full cwnd of packets are successfully transmitted and acknowledged. It then drops the first four packet at the beginning of the second cwnd of packets. Both small files and larger files are tested with two TCP algorithms. The plots are given as below:
It can be observed that retransmission of lost packet is more evenly spaced out in the case where Linux 3.2 with PRR is transmitting a 64KB file —- now the packet loss occurs at the beginning of a 20-packet cwnd. In addition, the cwnd advances to allow new packet transmission after all lost packets are retransmitted. As a result, the transmission time is significantly lower compared to TCP without PRR.
A Bit of Reflection…
In general, the experiments conclude that the two TCP algorithms (without and with PRR) demonstrate the same behaviour as that presented in the paper. However, there are some minor differences , which might arise out from the experiment setup.
The major difference is such that the authors generate test traffic using Packetdrill while we use netcat plus a lossy switch. Packetdrill is able to directly writie packet to raw socket and extract internal TCP flow status. It has more granular and accurate control over the timing of packet transmission/arrival. One example is that the SACK from receiver is evenly spread out so that the retransmission pattern can be clearly examined. While in our experiment, the ACK packets arrive in batches.
Secondly, we use two Linux Kernel 3.0 vs Kernel 3.2 to compare the TCP algorithms without and with PRR. However, it is not clear to us if there are other modifications/improvements in TCP implementation between these two versions. We have spotted two discrepencies. One is that in Kernel 3.0, cwnd is linearly increased instead of doubled even after successful transmission of one full cwnd of packets. The other is the Kernel 3.2 implements TCP Segmentation Offload (TSO) to push large packets (packet size exceeding MTU) at the end of cwnd. However, the two differences do not have a big impact on the experiment results.
Fun Along the Road…
We have encountered some complications during system and experiment setup. A quick discussion on the challenges and solutions/walk-arounds is here. We hope that this could be helpful for those who are interested in reproducing the results and exploring further.
- Initially, we was planning to modify the Linux kernel code to implement the PRR algorithm. However, considering the amount of work required to recompile and install the customized kernel, we decided to use existing kernels which has PRR incorporated. We setup the testing environment ion two EC2 instances, with one running Ubuntu 11.10/Linux kernel 3.0 and the other running Ubuntu 12.04/Linux kernel 3.2. The complication arises is that Ubuntu 11.10 is no longer supported. Some effort was devoted to learning how to connect to old sources and re-configuring apt. Additionally, we had to use the sched=’rt’ option for the CPULimitedHost because only Ubuntu 12.04+ is supported for the default configuration.
- In order to reproduce similar plots as those presented in Fig.2 of the paper, we need to generate traffic flow with exactly the same packet drops each time the experiment is run. Though mininet.link has the attribute of loss rate, the drops are random. The author (Nandita) has guided us to use Packetdrill – which generates raw socket packets. However, we have encountered some issues in setting up the tool on Ubuntu. The walk around is to use a POX controller with Open vSwitch in Mininet. The controller code is modified so that it generate losses based on packet count —- creating a deterministic packet drop for every run. Nevertheless, we highly recommend to experiment with Packetdrill if anyone is interested in reproducing this paper’s result.
- Various parameters of TCP status is needed to generate the plot. Linux socket implementation has an internal structure – tcp_info that records those parameters. However, the data structure is not accessible from external. There is an open source tool named tcpsnoop that can extract out TCP status information. However, the tool needs to bind to raw socket to obtain those information so that it can only work on the receiver side. After analyzing the tcp header, we figured that most of the parameters (sack, snd.una, retransmission, cwnd, etc) can be derived from header information. The final solution is to use tcpdump on the sender’s port and capture all TCP packets. The pcap file is then decoded with dpkt into readable data. We then use and python and matlibplot to parse the data and generate the graph.
Now, you can play with it!
A WORD BEFORE YOU JUMP INTO IT—
Since TCP is implemented in the Linux kernel, we needed two kernel implementations to generate our target plots. We chose kernels 3.0 and 3.2 for this purpose because they were readily available in Ubuntu releases. Modifying a kernel requires significant knowledge, especially considering the size of the TCP stack. For these reasons, we chose to run the experiment on two EC2 hosts, with one running a kernel prior to PRR implementation, and with one running the kernel just after PRR implementation. While there are other changes in the kernels, we were able to discern behavior that matches the expected impacts of enabling PRR.
Since PRR is implemented at the kernel level, any network simulator and any topology should be able to demonstrate its effects. We chose a very simple topology with Mininet since we have experience with these setups.
GUIDELINES FOR EC2:
- Launch 2 EC2 instances in us-east-1 (N. Virginia) region.
- One instance will have PRR enabled and the other will not.
- You can use any instance size larger than the t1/t2 family. (We used m3.medium)
- Please make sure your security group has port 22 open to your IP address.
- Additionally, it is easy to view the graphs if you open port 8000 to TCP
- traffic for the python simple HTTP server.
(Steps 1A and 1B are alternatives to each other)
STEP 1A (FOR A LIMITED TIME ONLY – AMIS COST TO HOST):
- These have already been setup for this project:
- No PRR: ami-3d9d7d56
- PRR: ami-f99d7d92
- When you launch these instances, you can run this command:
- cd cs244-project3
STEP 1B (FULL INSTALL FROM BASE UBUNTU):
- The Amazon Machine Images listed below can be found under community AMIs:
- Instance 1: ami-13ba2d7a
- Instance 2: ami-00615068
- Ssh to each of the instances and copy the following text into a file using the text editor of your choice (e.g. nano, vim). Then run it as a script.
- If you named the file setup.sh then run the command: bash setup.sh
- Run the command:
- sudo ./run.sh
- Start the python HTTP server:
- python -m SimpleHTTPServer
- View the plots by navigating to:
- Navigate to the tcp-recovery-date-time directory
- You can click on any of the .png files to view a graph
————— Start Script – Start Copying After This Line ——————
#!/bin/bash no_prr=ami-13ba2d7a prr=ami-00615068 echo "Detected Ubuntu $(lsb_release -r) with kernel $(uname -r)" if [ "$no_prr" == "$(ec2metadata --ami-id)" ]; then echo "This is the AMI for No PRR" # This release is no longer supported sudo sed -i -e 's/us-east-1.ec2.archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list sudo apt-get update sudo apt-get -y install linux-headers-3.0.0-30-virtual elif [ "$prr" == "$(ec2metadata --ami-id)" ]; then echo "This is the AMI for PRR" sudo apt-get update else echo "This is an unrecognized AMI - this script may or may not work" sudo apt-get update fi # Install dependencies sudo apt-get -y install git sudo apt-get -y build-dep python-matplotlib sudo apt-get -y install python-pip sudo pip install dpkt sudo pip install --upgrade numpy sudo easy_install -U distribute sudo pip install matplotlib sudo pip install termcolor # Install Mininet cd /home/ubuntu git clone git://github.com/mininet/mininet cd mininet git checkout -b 2.2.1 2.2.1 cd .. mininet/util/install.sh -a sudo mn --test pingall # Setup experiment directory git clone email@example.com:bschwart/cs244-project3.git cd cs244-project3 chmod +x run.sh
————— Stop Script – Stop Copying After This Line ——————