The Chrome web browser keeps HTTP connections open in the hopes of reusing them and reducing the cost of creating TCP connections, but even then about ⅓ of HTTP requests are with new TCP connections. CDNs complicate things further by changing the server which feeds different parts of a page. The purpose of TCP Fast Open (TFO) is to exchange data during initial TCP handshake to remove an RTT per TCP connection and reduce flow completion time, which makes websites load faster.
Our first goal was to replicate table 1 in the TFO paper, which shows page load times, both with and without TFO, for various common web pages. For this we fetched amazon.com, nytimes.com, wsj.com and the TCP Wikipedia pages, with all of their dependencies, and measured the page load time with and without TFO. We tried this with 20ms, 100ms, and 200ms of round-trip time, which we simulated with Mininet along with a custom Chrome shell and Python’s simple HTTP server (serving a locally-cached page), all running on an Amazon EC2 virtual machine with a custom Linux kernel with TFO support. We succeeded in reproducing the results from the paper.
Our results and code can be obtained on our Github TFO repository. They can be reproduced by using our Amazon EC2 AMI: named 474470461255/cs244-13-tcp-fastopen on the US-west-2 Oregon location, AMI ami-c41085f4, launch as c1.xlarge, with security group quicklaunch-1 (to enable SSH), and username “ubuntu”. Once logged on, run sudo run.sh, located under ~/tcp-fastopen/. Make sure the VNC server is on display 1, it should be by default.
We are running Linux kernel 3.7.9, which has built in support for TFO, on our Amazon EC2 instance. After bootup, all we have to do to enable TFO support is to write “3” to /proc/sys/net/ipv4/tcp_fastopen.
We added scripts to enable and disable TFO on demand, which we call from within our run script to show results both with and without TFO.
Chrome Shell Details
A not-so-obvious use of the Chrome shell is as a scraper: before running anything we feed it a file containing a list of URLs to fetch (e.g. Paper.pages). We then call the Chrome shell in a special page saving mode (with fetch.py) which saves all of the pages fetched under a Paper/ directory, with one sub-directory per host contacted (e.g. Paper/www.amazon.com/), and one file per page retrieved (including path, query and ref parts of the URL). We had to work around filename and path limitations by replacing names that are too long with their base 16 MD5 equivalent.
We then reuse the Chrome shell to simulate fetches, as from a regular client, fully locally with and without TFO. One critical difference with how we run this shell is that it does DNS lookups through a remapping file, so that all queries are routed to a local server instead of going to the Internet. We achieve this by starting one web server per cached host inside the Mininet environment (see Server Details below), and then printing out the 3-tuple of original hostname, local IP address, and local port number. This file is then used by the Chrome shell, whose requests in this mode are only ever fulfilled from localhost.
The Chrome shell also outputs detailed data in both scraper and client mode. Chrome usually records aggregated information that doesn’t leak any private information. Our shell outputs the parts of this information that we found useful. For our experiments we wanted to record more information, we therefore added:
- DNS lookups through time. This is mostly a sanity check, and exposes some of Chrome’s internal object lifetime and frequent DNS queries, which mostly hit the internal cache.
- Pages/resource fetches through time. These are used to validate the effects of TFO, and are used to create load plots.
- Mime type and charset for each page/resource. These are used by the servers to serve pages and resources with the appropriate Mime type: doing otherwise would cause some pages to be downloaded and not parsed by Chrome.
This data is recorded when the shell is in scraper as well as client mode, and can be compared, the former in Paper.fetchlog/, and later in the client/ directory. One caveat in this comparison is that we currently simulate constant latency to each server, but it would be easy to change our Mininet setup to simulate latencies that match the ones collected in scraper mode.
- Between each load we delete Chrome’s cache from the disk to prevent it from serving from cache.
- We changed the shell’s user agent so that it looks like a real client. Some sites otherwise serve a mobile version.
- We added a command-line option to enable TFO for the shell. This ignores /proc/sys/net/ipv4/tcp_fastopen because Mininet’s current virtual environment doesn’t reproduce that of the kernel. A real TFO-enabled Chrome wouldn’t ignore this entry.
- We changed the shell to quit the message loop when it becomes idle (when no requests are in-flight). This is somewhat of an unclean teardown, and sometimes leaks objects.
Rebuilding the Shell
Our Amazon EC2 Linux AMI as well as our repository has a pre-build Chrome shell, which can be recompiled as follows:
- Install the required dependencies from the Linux build instructions.
- Get code from the chromium git repository, and checkout git hash e299150c1203b73d35fb6b1f231af5d740942484.
- Apply the diff provided in our repository.
- make test_shell -j12 BUILDTYPE=Release (or Debug). This takes about 20 minutes to build on a powerful x86 machine.
- strip -s ./out/Release/test_shell (or Debug).
Note that this version of Chrome is somewhat buggy and sometimes crashes. We haven’t narrowed down the issue to our changes or the specific Chrome version, though we’ve tried rebasing our changes at a different hash and experienced worse stability. We obtain results in enough cases that we decided that this issue wasn’t worth fixing.
We implemented the HTTP Server using Python’s SimpleHTTPServer request handler, and a class derived from SocketServer.TCPServer that:
- Includes the setsockopt calls to initialize TFO.
- Remaps pages that aren’t found to their base 16 MD5 equivalent filename, as with the Chrome shell above.
- Use Mime type and charset saved by the Chrome shell in scraper mode.
- Saves logs for the servers in the serverlogs/ directory.
Our current test rig is Mininet, which we mainly use to provide IP addresses to each server, and connectivity between the servers and the client. This connectivity has added latencies which we currently hardcode as in the paper, but we could reproduce the latencies measured when the Chrome shell was originally run in scraper mode.
Our initial setup didn’t use Mininet at all: we simply started the Python simple HTTP servers and delayed responses to simulate latency. We started all servers on localhost, and mapped them to different ports.
There are a few issues with our experiment reproducibility.
The current web pages are different from those that existed when the experiments were conducted. We contacted one of the paper authors and they do not have a cached version of these pages. A possible solution to rectify this is to look at web.archive.org for significant differences between now and the time the paper was written. We ignored this issue because we can see the results of TFO from the pages that we fetched, and they are in the ballpark of the results obtained in the paper.
The results correlate well with those of the original TFO paper, showing reasonably significant latency reductions (in the ballpark of 5-20% depending on the website and RTT). We first generated a table analogous to the first table in the original paper, which shows RTT reduction as a function of TFO being enabled and the RTT setting of the network. This can be seen below:
PageRTT(ms) | PLT (s) | | no TFO | TFO | Improv. http://www.wsj.com 100 | 6688.445 | 5807.929 | 13.16% 20 | 4255.73 | 4073.823 | 4.27% http://www.amazon.com 100 | 3366.808 | 2547.842 | 24.32% 20 | 1450.884 | 1319.65 | 9.04% http://en.wikipedia.org/wiki/Transmission_Control_Protocol 100 | 4909.855 | 4085.305 | 16.79% 200 | 8403.108 | 7015.135 | 16.51% 20 | 2599.715 | 2301.599 | 11.46%
Additionally, we took the liberty to generate timing diagrams showing the difference between fetching websites with TFO enabled and disabled. These demonstrate the advantages of something like TFO as RTTs increase on the network. An example diagram can be shown below, for fetching the Wall Street Journal website at 100ms RTT. Other plots can be seen in the GitHub repository.
As a further result we’ve experimented with the Alexa Top 100 US websites. This requires significant amount of data, which we haven’t had the time to explore and therefore omitted from our results.
As a further improvement we can collect the CDN layout for these websites from different access points around the world (i.e. run the Chrome shell in scraper mode from different computers), and experiment with different CDN layouts to figure out potential improvements. CDNs require additional TCP handshakes, consolidating them therefore reduces the effect of TFO.
Additional work can be done to evaluate CPU utilization and to replicate Figure 6 in the paper as well.
An optimization that we can investigate would be to measure the benefits of synchronizing the invalidation of cookies between a web site and its associated CDN servers, that is have TFO cookies per domain instead of per IP. By following a common schedule, we could maximize the time in which the entire web site’s resources are also fetched with TFO enabled. The authors of the paper explicitly wanted to avoid such handshakes and synchronization to reduce implementation complexity.
We could also explore allowing Chrome to open more than 6 concurrent connections per domain. This might cause issues with some routers, which can probably be detected and backed-off, but in general could better showcase TFO’s results.