CS244 ’17 TCP Fast Open


Team: Jason Chen (cheson@) and Naoki Eto (naokieto@)

TCP Fast Open Goal:

The TCP Fast Open (TFO) paper by Radhakrishnan et al seeks to reduce the latency of the three-way handshake for initializing TCP connections from 1-3 RTTs to 0 RTTs.

Motivation:

Many connections are short-lived with only a few RTTs, thus the handshake forms a significant overhead cost. In modern webpages, we see a trend of larger webpages consisting of many relatively smaller web objects and resources. To load each site the browser must initiate tens or even hundreds of independent TCP connections with various servers and content providers.

There is a 1 RTT delay introduced for each vanilla TCP connection before clients and servers can actually exchange meaningful data. TFO seeks to reduce that delay to 0 RTTs such that requests can be sent along with the initial SYN packet from client to server, and servers can immediately send response data back with their SYN-ACK packets.

This idea seems simple, but having servers send data before establishing a full connection allows malicious users to conduct amplified DDOS attacks by making GET requests with spoofed source IP addresses. This paper contributes a mechanism of security cookies to make TFO possible in practice.

The paper provides the following figure to demonstrate TFO:

Screen Shot 2017-06-01 at 7.55.16 PM

Results:

The authors evaluated their TFO implementation in two experiments. First, they measured the improvement in download times by a TFO-enabled Chrome browser visiting four popular sites. They found that the average page load time (PLT) showed an improvement of between 4% and 41% when using TFO to download pages as opposed to non-TFO.

Screen Shot 2017-05-19 at 2.21.49 PM

The authors also ran an experiment that measured the performance impact of running TFO on servers by running a client generating HTTP 1.0 requests at a constant rate fetching 5 KB web pages. At different number of connections made per second, they recorded the CPU utilization on the server and found that the TFO-enabled servers actually performed on par with vanilla TCP servers.

Screen Shot 2017-05-19 at 4.42.07 PM

Subset Goal and Motivation:

We aim to reproduce the results from Table 1. The download time experiment was well-documented in the paper and seemed feasible to reproduce using existing tools in Python and Mininet. More importantly, we believe that this experiment embodies what is most impactful about TFO. It is ultimately clients that will engage and benefit from TFO the most, and it is important for us to verify that the described benefits of TFO still hold true for clients today.

Subset Results:

* Note that PLT is Page Loading Time *

Page RTT (ms) PLT: Non-TFO (s) PLT: TFO (s) Improvement
amazon.com 20 17.07 16.17 5%
100 32.05 25.43 21%
200 58.56 44.97 23%
nytimes.com 20 14.13 12.02 15%
100 36.3 24.26 33%
200 69.13 44.67 35%
wsj.com 20 29.02 27.93 4%
100 45.89 39.02 15%
200 80.08 62.99 21%
TCP wikipedia page 20 1.72 1.53 11%
100 4.01 2.88 28%
200 7.7 5.43 29%


Our PLTs are significantly larger than the ones in the paper for amazon.com, nytimes.com, and wsj.com. Note that this is reasonable given that the paper was published in 2011, and the sites mentioned likely had much slimmer pages. However, our PLTs for the TCP Wikipedia page was roughly the same as those from the paper. Because the TCP Wikipedia page is probably less dynamic than the other three pages, we believe that this significantly larger PLTs for the three websites are due to more resources on those webpages. The larger amount of resources can be seen using Google Chrome’s Developer Tools (Network section).

The PLTs for TCP with TFO enabled are lower than the PLTs for vanilla TCP, which matches the pattern from the paper. Further, the pattern of small percentage improvement for 20 ms RTT flows and larger percentage improvement for 100 and 200 ms RTT flows is also matched.

These results are taken from results.txt in the repository.

Challenges:

There were definitely challenges with the setup. One challenge with installing mget was not realizing that we had to create a .mgetrc file to use mget. Another challenge was trying to write a script for installing mininet that would work with our tfo.py file. There were numerous errors regarding cgroups for a while.

Critique:

In all of our experiments, we found that TFO reduces latency through reducing RTTs for the connection handshake, but we do not suggest that TFO be applied universally for all TCP connections. There are two downsides to TFO mentioned in the paper that are still unresolved. First, SYN packets with data continue to be filtered out by some middleboxes. Second, duplicate SYN packets are accepted by TFO, which means TFO is an extension that will break applications that rely on the guarantee by TCP that even with duplicate or delayed SYN packets only one connection will be made. This tradeoff by TFO can lead to significant security issues if servers use TFO without being aware of this design choice.

Extensions:

Screen Shot 2017-06-03 at 1.04.30 PM

As an extension, we wrote a script to ping the top 200 Alexa ranked websites and record the RTTs averaged over 4 ping requests. The histogram above displays the results. Note that the sites with an RTT of 0-1 ms all came from Google domains, which we hypothesize relates to aggressive caching by Google. We ignore these in our analysis.

We measured this to verify whether the paper’s choice of 20 ms, 100 ms, and 200 ms as RTT delays in their simulations were reasonable in the real world. We found their choices to be representative of the actual RTTs to more than half of the top 200 sites and that the number of sites drops off significantly for RTTs greater than 200 ms.

However, perhaps due to technological advancements since 2011, now many sites have RTTs in the single digits, so we retried the experiment with an RTT of 10 ms. The results are shown below.

Page RTT (ms) PLT: Non-TFO (s) PLT: TFO (s) Improvement
amazon.com 10 16.4 16.07 2%
nytimes.com 10 12.16 11.14 8%
wsj.com 10 28.03 27.54 1%
TCP wikipedia page 10 1.52 1.44 5%


While there is still some improvement in overall latency, what is important to understand here is that as RTT delays fall, the benefits of TFO also fall. This can be explained by the fact that smaller RTTs mean server processing time account for more of the overall latency, and that is outside the influence of TFO.

Platform:

We use Google Cloud’s VM instances because that was suggested by the TAs and is quite reliable. The entire setup is reproducible, though there will be a few small differences in PLTs due to differing loading times of mininet. One parameter of the setup that will affect reproducibility the most is not using Ubuntu 14.04. The scripts for installing the dependencies for mget will not work if Ubuntu 14.04 is not used. Mininet was used because we are more familiar with it from PA 1. mget was used because previous teams had reported trouble with using wget and TFO. Ubuntu 14.04 was used because we ran into early troubles with Ubuntu 16.04 and installing mininet/mget. In addition, Ubuntu 14.04 has Linux kernel version 4.4, which is higher than TFO’s requirement of Linux kernel version 3.7.

Instructions to reproduce:

Go to Google Cloud Engine and click ‘CREATE INSTANCE’. The instance should have the following properties:

Machine Type

  • 1 vCPU with 3.75 GB memory

Boot Disk

  • Ubuntu 14.04 LTS

Check the following under Firewall

  • Allow HTTP Traffic
  • Allow HTTPS Traffic

Click ‘Create’.

Next, you will want to SSH into the instance. Once you have done so, run the following commands:

Then, set up your current directory and run with the following commands:

  • cd project3; sudo python run.py

Wait 15-30 minutes

Check out the results in results.txt

=== Extension ===

If you wish to run the extension as well, simply run the command:

  • sudo ./rtt.sh

Wait 10-20 minutes

Checkout the RTT results for each site in rtts.txt

[end]

Advertisements

One response to “CS244 ’17 TCP Fast Open

  1. Reproducibility Score: 5

    The tables (both results.txt and rtts.txt) were extremely easy to reproduce. The instructions were clear, and the script ran unattended and completed in about 20 minutes. Both the main results and the extension results that we found correlated extremely well with the findings on this post. Additionally, the analysis and explanations for the findings (including the longer page times vs. the 2011 results) were thorough and made sense. The only note we’d like to make is that some sites in the extension had no results for round-trip-time; we attempted to manually ping some of these sites but they were unreachable. Interesting project and great job overall!

    -Hemanth and Alex

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s