Reproducing “The Cost of the ‘S’ in HTTPS” originally by David Naylor et al., 2014
Reproduction by Benton Case and Zak Whittington, Spring 2017
“The Cost of the ‘S’ in HTTPS,” published in December 2014, attempts to quantify the prevalence and added costs of HTTPS web traffic. At the time of writing, the conventional wisdom was that HTTPS was growing rapidly in popularity, and that HTTPS usage was imposing some cost on both clients and servers. Given that HTTPS was growing so ubiquitous, the authors wanted to provide a comprehensive analysis of the specific types of costs it could induce. They took quantitative measurements where possible, and also included qualitative discussion of topics they were unable to directly measure. Their primary takeaway was that extra latency imposed by HTTPS is non-negligible and noticeable to users—especially over 3G cell networks—while most other effects are negligible or ambiguous. Our results contradict the original finding: we found HTTPS had negligible effects on load time for the majority of websites, and was even faster than HTTP for almost a third of sites tested, suggesting the landscape of web encryption has changed dramatically in the last 2-3 years.
We chose to reproduce this research because HTTPS has become substantially more popular since this paper was written, and we would like to update their results. About 55% of traffic by volume is encrypted today, up from 38% in 2014. Today, about 75% of time spent browsing is on HTTPS sites according to Google’s Chrome statistics. Moreover, in the time since the paper was published, the characteristics of HTTPS traffic have changed: in 2014 almost all HTTPS traffic was still “small, privacy-sensitive objects,” whereas today it is increasingly seen as the default protocol for all types of traffic, including video streaming. In 2014, YouTube was just beginning to rollout HTTPS video streaming; today, essentially all YouTube traffic is encrypted (97%), Netflix, which makes up approximately a third of all web traffic by volume, is serving “most streams” over TLS, and many other streaming services are following suit . Considering the new web traffic patterns, we expect changes in certificate verification, improvements in network speed and latency, and modern browser implementations may have had an impact on the overhead costs of HTTPS.
The paper considered many forms of potential cost, only some of which we will seek to reproduce. Broadly, they considered usage trends, webpage load time, TLS handshake data overhead, and cell phone battery usage. The authors collected per-flow logs from a vantage point monitoring 25,000 customers of a major European ISP for two years from 2012 to 2014, which they used to create their usage trends and TLS handshake overhead analysis. We are unable to reproduce these results due to lack of access to data of this scope, but we also feel it is unnecessary to update these results. Since encryption has become a topic of interest to the general public, several large tech companies including Google and Mozilla have begun publishing public data on HTTPS usage, likely with a higher degree of accuracy than anything we, or even the original authors, could muster.
Instead, we will focus on reproducing what we consider to be the most interesting quantitative result from their paper: added latency over mobile and fiber networks. The original paper found that a significant portion of sites were noticeably (>0.5 seconds) slower over HTTPS. As the paper argued, in a world where users expect sites to load in under two seconds  and a one second delay could cost a company billions , delays of this magnitude are non-negligible. The original authors used a headless browser called PhantomJS to measure the page load time of the Alexa Top 500 sites over HTTP and HTTPS, averaged over 20 loads each, repeated over 3G and fiber connections. They presented this data in a cumulative distribution function (see Figure 5 below), demonstrating that for about 90% of websites over 3G, and for 40% of websites over fiber, extra latency was more than 500ms. This methodology is simple but compelling, suggesting that HTTPS is noticeably slower than HTTP to the end user, especially over 3G.
We also used PhantomJS, and ran our tests over fiber and 4G networks. The fiber tests were conducted from a computer running on the Stanford network, and the mobile tests were run from a Macbook tethered to an iPhone connected over a T-Mobile 4G LTE network. We also revived the original author’s code and confirmed that their code generated similar results.
Input Data: Like the original paper, we used the Alexa Top 500 sites as our initial sample.
Filtering: Though the original paper did not explicitly mention this, according to email communications with author David Naylor, they only tested on sites that served responses over both HTTP and HTTPS. We filtered similarly, dramatically reducing the number of eligible sites. If a site ever served a URL redirect to an HTTPS site from what was originally an HTTP response, or if a site failed to respond with a OK 200 response to an HTTPS request, or failed to respond at all, we removed them from our sample. We suspect that some sites failed to respond because they detected we were using PhantomJS instead of a normal browser.
Timing: Our timing script was also written from scratch, and sends 4 synchronous, consecutive requests to each website, first over HTTP, then over HTTPS, and measures the time from sending the initial request to once the page (including all its resources) is completely loaded, all of which is recorded in a measurements file, and only some of which is used to generate the plots.
Plotting: Our plotting script was written from scratch, runs in Python on the output generated by the timing script, and uses matplotlib to generate CDF plots.
Mobile: Our script for mobile was nearly identical to the script for fiber, but also included an HTTP user agent header mimicking a Galaxy Nexus phone running the mobile version of Chrome, and also faked a display size of a Galaxy Nexus. We also tested over 4G rather than 3G, since 4G is now used by over 85% of Americans .
Protocol Compatibility: While the original paper talked about the prevalence of HTTPS browsing, it used a different dataset with different metrics to do so; it did not discuss the prevalence of HTTP(S) websites within the Alexa Top 500 Sites. Our script filtered to only time sites that served both HTTPS and HTTP without HTTPS redirects, resulting in a total of 182 viable sites. We expect some sites may actually have served either HTTP or HTTPS, but chose not to respond to requests that originated from a headless Phantom browser.
|Table 1: Breakdown by Category of Alexa Top 500 Sites’ Available Protocols|
|Category||Number of Sites|
|Both HTTP and HTTPS||182|
|Only Redirects to HTTPS||194|
|Neither HTTP nor HTTPS||27|
Fiber: Our results over fiber differ significantly from the original paper’s results. We expect these differences are due to the changing landscape of the internet as opposed to deficiencies in methodology. It is clear that the HTTPS landscape has changed dramatically. HTTPS has exploded in popularity: twice as many sites serve only HTTPS than serve only HTTP. In many cases, HTTPS is actually faster than HTTP these days, which was so rare in 2014 that it wasn’t discussed as a possibility in the original paper. Our CDF’s are presented below, along with some salient takeaways.
Mobile: These results differ with the original paper’s results to an even greater degree, in that HTTPS often saved a significant amount of download time compared to HTTP. In about 30% of sites, load times were noticeably faster when over HTTPS than when over HTTP, and there was no significant difference on most of the remaining sites.
Over both fiber and Mobile, we found that for the vast bulk of sites, HTTPS adds no distinguishable latency over HTTP, and is even faster than HTTP in many cases, directly contradicting the findings of the original paper.
Our Reproduced Results
- The original paper’s conclusion is that HTTPS introduces non-negligible latency; our findings contradict that claim
- Only roughly 15% of sites are noticeably (>500ms or 1.3x) slower over HTTPS today, versus 40% in the original paper
- ~80% of sites saw no difference between HTTPS and HTTP load times today, versus 55% in the original paper
- HTTPS is actually faster than HTTP for about 30% of sites today, versus a negligible (<3%) number of sites in 2014
The simplest version of our reproduction scripts is designed to run seamlessly on the CS244 VM. They can also be run relatively easily on any machine; see the git readme file for more details.
Step 1: Download the CS244 Ubuntu VM
Step 2: From within the VM, clone our public github repo
git clone https://github.com/Zak244/244-HTTPS.git https-cost
Step 3: Run Setup and Timing Scripts
cd https-cost ./setup ./run_tests
The output should be four plots: ratio-mean.png, ratio-median.png, difference-mean.png, and difference-median.png. The scripts will rely on the Alexa data gathered on May 30, 2017 by default.
If interested, more options for reproduction are detailed in the git readme file.
We had to limit the scope of our research due to resource constraints; we lacked access to many of the datasets and technologies used by the original team. However, one of the core findings of the paper was that HTTPS introduced non-negligible latency compared to HTTP, and our results were very different from those. We encountered no significant challenges reproducing this section of the work, and we have several ideas for possible future extensions.
We interpreted the original paper as making broad claims about what the average internet user could expect from HTTPS vs HTTP browsing. However, one downside to their research is that it uses an obscure headless browser called PhantomJS, which is detectable by servers, and may thus be getting unrealistic results. This is particularly problematic for mobile sites. The original paper did not discuss any attempts to make the PhantomJS browser appear to be a mobile browser; they seem to have run the same tests using a 3G USB modem from their laptop. Today, many websites use browser detection heuristics to serve specialized mobile content, so we tried to make our Phantom browser look like a mobile browser using all the (somewhat limited) flexibility available to us in the Phantom interface. We suggest an extension that runs similar tests using a headless browser that more realistically emulates popular browsers like Chrome and Firefox, in both desktop and mobile environments.
 https://www.theverge.com/2016/8/1/12341686/youtube-google-traffic-https-encryption-protected  https://www.google.com/transparencyreport/https/metrics/?hl=en  https://www.fastcompany.com/1825005/how-one-secondcould-cost-amazon-16-billion-sales  https://www.oneupweb.com/blog/need-speed-truth-page-load-time/  https://www.w3counter.com/globalstats.php  https://opensignal.com/reports/2017/02/usa/state-of-the-mobile-network