CS244 ’15 TCP Fast Open

TCP Fast Open

Team: Kevin Miller and Reid Watson

Key Result: We confirmed that TCP Fast Open offered a non-trivial performance improvement on a basic real-world web browsing task.

TCP Fast Open:

The standard TCP 3-way handshake is relatively efficent, and introduces only 2 packets of overhead into a standard TCP flow. For a flow which transmits any sizable amount of data, 2 packets is a fairly negligible amount of overhead. However, consider a TCP flow which sends only 1 packet of request and 1 packet of response in an entire connection. This would result in the following exchange of packets:

  1. Client sends a SYN packet to server
  2. Server sends a SYN-ACK packet to client
  3. Client sends an ACK packet to server along with request payload
  4. Server sends an ACK packet to client along with response payload
  5. Connection teardown occurs

In this extreme scenario only half of the packets we transmitted contained actual payload data, while the rest were needed just to establish the connection!  Additionally, the client had to sit through an entire network roundtrip before any data could be sent to the application.  This means that the TCP handshake hurts network latency and throughput if a large number of short lived TCP connections are used frequently.

The authors of this paper wanted to find a way to get around the 3-way handshake in certain circumstances. If the client was able to send data along with the first SYN packet, then the server could start preparing and sending a response much faster. Getting this minor “head-start” wouldn’t hurt long-lasting connections, but would significantly speed up short, frequent connections. This system, called “TCP fast open”, is described in more detail in the paper.

Network Topology: To reproduce the results, we’ve created a simple mininet topology of a host connected to a server with unlimited bandwidth. We experimented with the same bandwidth listed in the paper (4Mb/s), but ended up with 10x slower RTTs due to issues with mininet. All of the files are small enough that the bandwidth should not be the limiting factor anyway.

Using TCPFO: Enabling TCP fast open requires setting the contents of the file “/proc/sys/net/ipv4/tcp_fastopen”. Previous years blog posts were helpful for figuring out exactly what value to set (519) in order to get TCP fast open working. TCP fast open requires some minimal changes to the client side application in order to function. We used ‘mget’ as a client side downloading program. We chose mget because previous years had expressed issues with wget and TCP fast open and we did not want to run into the same issues. We used a SimpleHttpServer (copied over from pa-1) with a simple change to set the socket option for TCP fast open if an appropriate flag is passed in.

Testing Script: Our testing script generates the topology, starts a webserver, and has the client download pages from it. Our current webserver has the contents of the pages listed in table one as well as their assets.

Performance: The paper reported performance improvements around 5 to 15 percent on a typical web-browsing workload, while our replicated results included much larger performance improvements:

Result Times

There are a few important things to consider when analyzing these results:

  1. The general trend of TCP fast open performance seemed accurate.  The performance improvements we saw were even more significant than those observed in the original paper.
  2. Our results do not account for HTTP keep-alive.  This introduces a significant slowdown in our results, since each request must now include a full handshake and teardown for each request.  In typical web traffic HTTP keep-alive avoids these setups and teardowns, saving a significant amount of time.  This is evident in the larger PLT times.
  3. The pages may have changed between the paper’s run and ours.  The paper likely used a slightly different version of the web pages than is available today.

To help validate the results even ignoring the issues with HTTP keep-alive, we decided to try running the same experiment with a different structure of resources.  Automatically determining the structure of resources in a website is difficult, and most tools for scraping websites for offline use discard that information.  To create a simplified model of real-world website structure, we used the paper’s assertion that a standard website contains 44 resources loaded from 7 (simulated) domains. The results seen in this setup matched the results of the paper much more closely

RTT (ms) PLT: non-TFO (s) PLT: TFO (s) Improvement
20 0.50 0.55 9%
100 2.42 2.72 11%
200 4.83 5.63 14%

Lessons Learned: Through reproducing the results in the TCP Fast Open paper, we learned how hard it is to perform even a simple task when lacking appropriate documentation. Setting up the topology in mininet was extremely straightforward and worked out quite nicely, but getting TCP fast open to work and getting the exact results was very difficult. This is due to the fact that there is no clear documentation on how to enable TCP Fast Open for our python server nor what to set the value in proc/sys/net/ipv4/tcp_fastopen to. Thankfully, previous years had answered some of these questions, but it was still quite frustrating.

We also learned that reproducing web experiments is extremely difficult since websites change often and doing the exact same experiment 6 months apart can give drastically different results.

Replication: To easily reproduce our results, launch an AWS instance using our AMI: ami-add4eb9d. Then type:

  • cd cs244-pa3
  • sudo ./run.sh

Which will generate the results from the table. If you would prefer to use your own machine, you can clone our repo: https://bitbucket.org/Gasparila/cs244-pa3

  • git clone https://bitbucket.org/Gasparila/cs244-pa3.git
  • cd cs244-pa3
  • sudo ./run.sh

Note that you must first install mininet and mget in order to run the script. It is extremely difficult to automate installing mget so we opted not to make an install script

One response to “CS244 ’15 TCP Fast Open

  1. Hi guys:

    Awesome project; we ran the numbers and got very similar results (slight difference of a few ms but we suspect that’s expected since it’s based on AWS machine load for mininet). Just as a note to the graders/future reproducers to allow the code to run to completion and it’ll output the full table at the end.

    The sensitivity analysis on the dynamic webpage sizes was interesting and it highlights the importance of how researchers should include all of their datasets at publication for reproducibility in a distant future. As an aside, it could be cool in the future to use an archive service and see what the actual difference in the web page sizes was then and today.


    Kevin & Omar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s