Keith Major email@example.com
Mark Xue firstname.lastname@example.org
Goals: “An Experimental Study of TLS forward secrecy deployments” [HABJ] evaluated the state of TLS forward secrecy deployments in 2014 by surveying the top million websites indexed by Alexa. They also performed experiments simulating server and client TLS sessions to demonstrate that Elliptic Curve Cryptography could provide forward secrecy and an increase in performance.
Motivation: If the private key of a server providing TLS with RSA key exchange is compromised, the attacker can recover session keys and the plaintext of all previous communications. This provides incentive to store intercepted ciphertexts in case the private key is later compromised. Holding large quantities of encrypted sessions also increases the potential reward for compromising a server’s private RSA key. Forward secrecy protocols remove these incentives by negotiating an ephemeral key that cannot be recovered from the ciphertext with the server’s private key.
Results: [HABJ] observed that over 74% of hosts using TLS supported forward secrecy using either ephemeral Diffie-Hellman (DHE) or ephemeral Elliptic Curve Diffie-Hellman (ECDHE). However, 82.9% of hosts that supported DHE used DH parameters weaker than their signature key strengths. They tabulated key strengths for authentication and key exchange, but did not provide the correlation that produced the 82.9% number. Their experimental findings showed that ECDHE-RSA was nearly as fast as RSA-RSA, and that using ECC for key exchange and authentication (ECDHE-ECDSA) provides a significant performance increase from RSA-RSA. On the client, other than a one-time issue when a client first sees an ECDSA signature, there is no performance degradation from RSA.
Subset Goal: We chose to reevaluate their scan results, in particular the comparison of DH parameters with authentication key size. Also, we sought to graphically plot the relationship between authentication and key exchange key sizes, in order to better display the paper’s main result that 82.9% of surveyed sites supporting DH used DH parameters weaker than their authentication key size.
Subset Motivation: We believed the server and client performance tests would be highly dependent on our choice of platform, and so would be limited in its relevance and accuracy in duplicating the original conditions. Duplicating the performance evaluation might illuminate how libraries and hardware optimizations have changed the relative performance of different cipher suites since then. However, we decided that an update to the TLS deployment survey would directly illuminate how impactful this paper has been, if indeed ECC is more widely adopted and DHE misconfigurations are less common. There are already several SSL scanners such as SSL Labs’ SSL Pulse and Hubert Kario’s yearly survey that survey for known vulnerabilities and oudated TLS versions. However, they do not analyze for the specific key strength mismatch identified in this paper.
Forward Secrecy Statistics 2017 HABJ'14 ------------------------------------------------------------ Forward Secret Hosts 448357 (98.65%) >74% Total TLS Hosts: 454489 Weak DH Parameters 55260 (23.44%) 82.9% Total DHE hosts 235753 283,647
We found that support for forward secrecy is near universal among TLS hosts, now over 98%, mostly due to widespread adoption (over 97%) of ECDHE. The number of DHE hosts has declined slightly, but encouragingly, the portion that have weaker DH parameters than their public authentication key has declined from 82.9% to 23.52%. The distribution of these keys are plotted above
Key exchange method support on TLS servers Method Hosts HABJ'14 IMC'07 -------------------------------------------------------- RSA 336,146 (73.96%) 473,688 (99.9%) 99.86% DHE 235,570 (51.83%) 283,647 (59.8%) 57.57% ECDHE 443,203 (97.52%) 85,070 (17.9%) Total Hosts: 454,489
Key Exchange Protocols (Table II) The original paper found that nearly all sites supported RSA key exchange, and only 17.9% supported ECDHE. We found that it has shifted significantly in favor of ECDHE, and a slight decline in sites supporting DHE.
Diffie-Hellman parameter size support for DHE key exchange Size(bits) Hosts HABJ'14 ------------------------------------------ ≤768 183 (0.08%) 97,494 (34.3%) 1024 51,486 (21.84%) 281,714 (99.3%) 2018 2 (0.00%) 0 2046 1 (0.00%) 0 2048 16,5847 (70.35%) 859 (0.3%) 2096 1 (0.00%) 0 2236 16 (0.01%) 0 2432 3 (0.00%) 0 2560 1 (0.00%) 0 3072 168 (0.07%) 0 4086 1 (0.00%) 0 4092 1 (0.00%) 0 4094 1 (0.00%) 0 4096 18,069 (7.66%) 14 (0.0%) 8192 9 (0.00%) 0 Total DHE Enabled Servers: 235753
DH Key parameters (Table III) Whereas the original paper found that most (99.7%) sites used DH parameters of 1024 bits or less, we found that over 70% of sites now use 2048-bit DH parameters.
Elliptic curves used for ECDHE key exchange Curve Hosts ------------------------------ sect233r1 0 (0.00%) 3,123 (3.6%) secp521r1 0 (0.00%) 73 (0.0%) sect163r2 0 (0.00%) 26 (0.0%) secp224r1 0 (0.00%) 3 (0.0%) secp384r1 9,579 (2.16%) 86 (0.1%) secp521r1 8,033 (1.81%) 73 (0.0%) secp256k1 1 (0.00%) 0 secp192r1 0 (0.00%) 1, (0.0%) secp256r1 426,379 (96.20%) 81,789 (96.1%) sect571r1 91 (0.02%) 316 (0.3%) brainpoolP512r1 6 (0.00%) 0 Total EC Key Exchange Servers: 443,203
There continues to be significant lack of diversity in the choice of elliptic curve,, dominated by secp256r1
Authentication method support on TLS servers Method Hosts HABJ'14 IMC'07 -------------------------------------------------------- RSA 388,908 (85.57%) 473,780 (99.9%) ≥99.86% Anonymous 0 (0.00%) 7,750 (0.0%) 0.02% DSA 0 (0.00%) 22 (0.0%) ECDSA/ECDH 91,044 (20.03%) 3 (0.0%)
There has been a significant increase in the use of ECC signing for certificates. The lack of anonymous authentication found may be due to OpenSSL deprecating those ciphers.
RSA key sizes of TLS server certificates Size (bits) Hosts HABJ'14 IMC'13 IMC'07 --------------------------------------------------------------------------- ≤ 512 4 (0.00%) 350 (0.0%) 0.1% 3.94% 513 - 1023 0 (0.00%) 20 (0.0%) 0.0% 1.42% 1024 395 (0.10%) 87,760 (18.5%) 10.5% 88.35% 1025 - 2047 0 (0.00%) 20 (0.0%). 0.7% 0.01% 2048 362,845 (93.30%) 374,294 (79.0%) 86.4% 6.14% 2049 - 4095 244 (0.06%) 251 (0.0%) 0.0% 0.00% 4096 25,411 (6.53%) 11,093 (2.3%) 2.3% 0.19% ≥ 4097 12 (0.00%) 22 (0.0%) 0.0% 0.00%
Not wholly unexpected, 1024 bit RSA certificates are largely deprecated.
Challenges: Our biggest conceptual challenge was reconciling differences in what is supported by OpenSSL and if we would have to use a fork to run our scan. We determined, for example, that most updated builds will reject DH parameter sizes less than 1024 bits. We determined we didn’t need that level of fidelity and would benefit more from a more reproducible setup.
We spent a lot of our time learning the OpenSSL API and finding out how to extract the cipher suite and key properties we were interested in, and verifying that the API’s were giving us the data we expected. Badssl.com proved to be a very good resource. We were also quite concerned with conducting this survey responsibly, and addressed this by limiting our requests by host and virtual machine.
Critique: The widespread adoption of ECC certainly validates the thesis of the paper. The significant decrease of misconfigured DHE parameters (82% to 23%) is evidence that the paper brought light to this issue and that the majority of domains were responsive to update their configuration. The current updates of openssl to reject DHE parameters less than 1024 bits shows that the cryptographic community is very responsive to making changes to the library to support clients and servers alike.
Extensions: TLS: We found 454,489 sites that responded to either a TLS 1.2 or 1.1 handshake, 99.4% of which supported TLS 1.2. The original paper found 473,802 hosts, but did not specify which versions of TLS they scanned for. Although there has been a recent push for more widespread adoption of HTTPS, more than half of the survey population still does not support the most recent version of TLS.
We were surprised to discover that the distribution of key strengths and choice of algorithms were not very different between the top 100,000 and the whole data set. We had expected that the top 10% might be better configured or be more likely to support TLS, but the numbers were much closer than we expected. Most surprising though, is that 2048 bit DH parameter support declines with increasing popularity, matched by an increase in 1024 bit support.
Platform: The authors used a modified version of sslscan, a C utility. Their modifications were not available, and sslscan has since been extensively expanded to test a broad suite of SSL vulnerabilities. However, it is statically built against a custom form of OpenSSL that retains many unsafe features now deprecated. Since we were testing only for supported (but misconfigured) settings, and would have had to reimplement the server key size collection as the authors did, we chose instead to rewrite our small required feature set in Go, building against the system libssl. Building against the system libssl made reproducibility much easier as we deployed successfully on a variety of systems.
Our choice of Go proved quite productive. Although there was a learning curve to interoperating with OpenSSL’s C API’s, Go does have a very good bridge to C, while still allowing us to benefit from the type and memory safety of Go. The concurrency primitives also helped us easily multithread our scanning and analysis, and rate-limit our scanning.
We chose a pipeline that would allow for easy inspection at each step. The scan results are saved as a JSON file for each host. We then load the individual JSON’s into a sqlite database for the final data analysis. This two-step process allowed us to transparently archive the collected data in a human readable format that we could easily troubleshoot during the scan phase, while running fast queries on the database once complete. The JSON files are provided as a tar archive instead of natively in the git repository; indexing a million files is fairly taxing on git (as well as other filesystem monitoring daemons such as Dropbox or iCloud Drive).
SSL Scan ======== This is a lightweight ssl scanner based on https://github.com/rbsec/sslscan, written for CS244 to reproduce the scan results of [An Experimental Study of TLS Forward Secrecy Deployments](http://www.w2spconf.com/2014/papers/TLS.pdf) It is highly recommended that you perform this on a cloud VM, not on your home network or farmshare. Create a VM Instance in Google Cloud ==================================== Ubuntu 17.04 1 vCPU, 3.75GB RAM 20 GB required to run full 1 million host analysis or static analysis of prescanned archive 10GB default sufficient for random subsample run of up to 500,000 hosts Dependencies ============ This requires the following libraries (and for $GOPATH to be set) sudo apt-get install libssl-dev golang sqlite3 mercurial export GOPATH=$HOME/go This has been confirmed to work with OpenSSL 1.0.2d and Go 1.5.1 on Ubuntu 15.10. Testing performed with OpenSSL 1.0.2g and Go 1.7.4 on Ubuntu 17.04. Known issues with Go < 1.2 Requires OpenSSL >= 1.0.2 (for SSL_get_server_tmp_key). Fetch Code =========== go get github.com/mmx1/sslScanGo will fetch the source and its dependencies. Scripts ========= Three Main Scripts: 1) analyzeStatic.sh 2) runRand.sh 3) runall.sh Overview: Each script will populate a database with the data and analyze the data by creating output files for each main result from the paper (HABJ). After creating the output files, the script will initiate a HTTP server on port 80. Use the VM's external facing IP address to access the files. Differences: 1) analyzeStatic.sh -> will not make any queries but will utilize our archived data we captured to create our blog post, so you can see the same results. This script takes about 30 minutes. 2) runRand.sh -> collects data from a 20000 domain random sample of the 1 million websites. This script is meant to show a representation of the work in a reasonable amount of time. This script takes 8 hours. 3) runall.sh -> collectss data from the top 1 million websites, which is used to create the results in the blog post. This data is archived for use with analyzeStatic.sh. This script will take 12 DAYS. Recommend only running 1 & 2 for reproducing the results: 1) Run script analyzeStatic.sh to see results in blog post. (30 min) $GOPATH/src/github.com/mmx1/sslScanGo/scripts/analyzeStatic.sh WARNING: You should not run this script in a directory monitored by a cloud service such as Dropbox or iCloud Drive or add the data folder to a git repository, it will create a directory with a million files (total size ~20MB). 2) Run runRand.sh to collect and analyze sample of the data (8 hours) $GOPATH/src/github.com/mmx1/sslScanGo/scripts/runRand.sh > output.txt & disown 3) run runall.sh to collect and analyze all 1million domains (12 days) $GOPATH/src/github.com/mmx1/sslScanGo/scripts/runall.sh > output.txt & disown * Disowning the process will allow you to safely logout, come back, and inspect the tail of the progress file. tail -f output.txt * When complete, the script may hang on a few outstanding hosts. If so, you can kill the scanner and manually trigger the populator and analyzer: $GOPATH/bin/sslScanGo -populate && $GOPATH/bin/sslScanGo -analyze Which will finish in about 2 minutes. To view the results: sudo python -m SimpleHTTPServer 80 View by going to: http://externalIPAddress Specific Usage ===== Run go get github.com/mmx1/sslScanGo to fetch the source and its dependencies. And run: $GOPATH/bin/sslScanGo The default for sslScanGo is to run the scanner on the entire top-1m.csv file. Run the database conversion with: $GOPATH/bin/sslScanGo -populate This will read from ./data/ and output ./scanDb.sqlite in the folder where the original go code is located. data/ directory should be a full of only json files that are from the output of sslScanGo Note: Go language requires a specific setup of where the code is and where the executable is. This is why the GOPATH system variable is so important. To run the queries on the database: $GOPATH/bin/sslScanGo -analyze executes the query code and outputs files below. Output Files ============ 1) BigResult.txt => main result of the paper comparing hosts that utilize DHE key exchange for the TLS handshake with the number of hosts that utilize weak DHE parameters (i.e. keyexchange bits < authentication key bits) 2) mainResult.png => plot of key exchange key strength vs authentication key strength 3) TableI.txt => List of errors from querying the domains 4) TableII.txt => What the hosts utilize for key exchange (RSA, DHE, ECDHE) 5) TableIII.txt => Number of hosts for each key size of DHE 6) TableIV.txt => Enumerating the curves used for EC key exchange suites 7) TableV.txt => Enumerating authentication algorithms 8) TableVI.txt => Enumerating authentication key strengths