Team: Elizabeth Walkup, Yanlei Zhao
GitHub Repository: https://github.com/botfish/cs244-pa3/
SPDY is a Google project that aims to make HTTP faster by adding a number of features, like prioritization and multiplexing many transfers over a single TCP connection. It is used in major websites like Google, Facebook, and Twitter and is on the client side in Chrome, Firefox, and Internet Explorer 11. However, papers examining SPDY’s performance have been varied and inconsistent in their findings – the authors of “How Speedy is SPDY?” are focused on doing a more in-depth study that looks at all causes of delay (including packet loss and browser computation) when comparing normal HTTP and SPDY.
The primary motivation for this study is the fact that SPDY is the basis of HTTP/2 . HTTP/2 is a major revision of web protocol and its key focus is on performance. If HTTP/2 is going to be the next dominating web protocol, it is important to understand the performance aspects of SPDY, the basis of HTTP/2.
The original authors found that SPDY provides a significant improvement over HTTP when dependencies in page load process and the effects of browser computation are ignored. Most SPDY benefits stem from the use of a single TCP connection, but the same feature is detrimental under high packet loss.
We chose to reproduce Figure 3 and Figure 7 from the paper:
We chose Figure 3 because it highlights one of the key reasons why reviews on SPDY are so mixed – web protocol performance depends on many external factors in the network and web pages. Figure 3’s experiments cover the variations on web page characteristics and one network parameter (packet loss rate). Figure 7 was chosen because its results are some of the most significant in terms of performance. Based on their graph, SPDY has a huge advantage over HTTP in terms of reducing the number of retransmissions, almost eliminating it.
Our results for the Object Number and Retransmissions experiments match the original graphs fairly well. We had fewer retransmissions overall than they did, probably because we were using a zero-loss Mininet link on a single machine, rather than a real link. We also had slightly fewer website samples (167 vs 200). But the result is the same – SPDY clearly reduces the number of retransmissions. The Object Size experiment had the same overall result (SPDY performing better), but it had a greater margin between SPDY and HTTP.
Where our results differ is on the Loss experiment. The original paper found that SPDY performed worse than HTTP for high loss rates, but we found instead that it actually performed better. In fact, as the loss percent rates increased, the improvement given by SPDY increased:
While this contradicts the paper’s results, it is in line with the results found by Google’s SPDY whitepaper. So it is possible that the original paper had some other environmental factors that were not listed. This highlights SPDY’s problem of inconsistent results, as mentioned in the introduction.
One of the essential tools we needed for this project was Epload , a page load emulator written by the authors of the original paper. But when we got the tool it was not in working condition, so we had to debug it. It used a function that did not appear to exist anywhere – one of the authors said this was probably an environment configuration problem. We tried many different versions and setups, but in the end we just removed the offending line of code, and this solved most of our problems (although we had to make some adjustments to Apache to accommodate the change). This process ended up costing us about three or four days.
Another challenge we had was generating desired websites with differing object number and object size for reproducing Figure 3. In order to work with Epload, one must generate a dependency graph for each website, describing each and every object in that website and its dependency relations. The authors of this paper have a dependency graph generator of their own, but it is still in beta and not released to the public. We had to write our own dependency graph generator, which is quite simple and naive, which could possibly account for some of the differences.
As mentioned by the authors of the original paper, web object dependencies and computation have an overwhelming effect on the performance of SPDY. Yet they do not provide the dependency relations of their experiments or specify how their synthetic dependency relation is built. During our experiment, we found that changing how objects depend on each other could easily cause the performance results to differ by a factor of 10.
We found that the paper’s results for SPDY hold for all but the Loss experiment, as noted before. We tried many things to make the graphs more similar:
- Different versions of mod_spdy 
- Different TCP send/receive/congestion window sizes, ranging from 1 packet to 10 packets
- Using TCP Reno vs TCP Cubic
- Varying the size of the Amazon EC2 instance
- Different domain sharding options in Epload 
- Varying computation and delay times in the dependency graphs
- Varying object dependencies in the dependency graphs
Despite our efforts, we were not able to reproduce the paper’s results any better than what is shown here. It is possible that there are factors that the original paper did not specify that cause this difference. Since the paper was only published last year, we do not believe the differences are due to it being outdated, although it does use some older versions of software. Overall, it seems clear that SPDY has a substantial advantage over plain HTTP.
We chose to use Mininet on Amazon EC2 instances as the basis for this project. EC2 was chosen for its reproducibility. This project requires a good amount of specific setup, both on server and client side. And since SPDY is a changing, ongoing project, we have very specific version requirements for SPDY, Apache, and the underlying operating system. With EC2, we can just create a snapshot of everything, and let whoever is interested in reproducing the results type “sudo ./run.sh” without the tedious and time consuming setup stage. Mininet was chosen because we require a controlled network environment for running our subset experiments. We want to be able to control the round trip time, the bandwidth, and the packet loss rate, and Mininet provides an easy way to do this repeatedly with only a few lines of code.
We chose to use an old version of Ubuntu (12.10) close to the version in the paper. This required a few more setup steps, because the package repositories needed to be adjusted and, in some cases, old versions of software needed to be found. We chose to use an older OS because we wanted to avoid the problems of CS 244 Project 2, where seemingly unknown network changes in the underlying OS caused a huge difference in results.
We would certainly like to see Mininet give better support for running Apache web servers. It took some tinkering to get the server to serve pages through the correct Mininet host.
On the server side, the Apache extension that was used  was transferred to the Apache Foundation after the paper was written, and they do not seem to be maintaining it, so we could not test using a newer version of Apache. It would be nice if Apache would pick this project up again.
- Start an EC2 instance of our AMI image. The image name is “ReproducingSPDY” and the AMI ID is “ami-dd96a9ed”. We used a c3.large instance.
- Start the instance and SSH into it using “ssh ubuntu@DNS-name-of-instance”
- cd cs244-pa3
- sudo ./run.sh
- The results will be in the “result” folder in cs244-pa3
For more involved instructions on setting everything up from scratch, refer to the README in the repository (https://github.com/botfish/cs244-pa3/blob/master/README.md).
- Wang, Xiao Sophia, et al. “How Speedy is SPDY?” Proc. of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 2014.
- HTTP2 Project. https://http2.github.io/
- SPDY Whitepaper. https://www.chromium.org/spdy/spdy-whitepaper.
- Epload. http://wprof.cs.washington.edu/spdy/tool/.
- mod_spdy: https://code.google.com/p/mod-spdy/.