CS244 ’15: Misbehaving TCP Receivers Can Cause Internet-Wide Congestion Collapse


Kenny Root and Kevin Zhang

Introduction

The TCP congestion control algorithm was written with cooperation in mind. It relies on both the sender and receiver behaving correctly and equitably. What happens when the rules aren’t followed? There are techniques that allow a misbehaving TCP receiver to manipulate the sender into sending packets at an unfair rate. Specifically, this paper discusses an attack dubbed the optimistic acknowledgement or “opt-ack” attack which conceals data losses to reduce round trip time and increase the total throughput at the cost of data integrity. This allows opt-ack to be used for denial of service attack.

Original problem

Sherwood, et al.,[1] considered a receiver whose sole interest is exploiting opt-ack to mount a distributed denial of service attack against not just individual machines but entire networks. Their intuition was that TCP algorithms were regular enough so that the receiver could easily to guess what the sender’s state was after the initial TCP handshake was completed. This works even when the attacker’s bandwidth is a fraction of those they are attacking.

TCP uses windowing to allow multiple packets to be in flight at once to hide network latencies. Many operating systems use just a handful of different algorithms such as TCP Reno[2] and follow a standard procedure for opening up the window. The opt-ack attack exploits this regularity by simulating the victim’s TCP state to stay in sync without having to actually receive the data that the victim is sending. To the victim this is indistinguishable from having a large bandwidth link as it appears all the packets are being delivered correctly. This causes the victim to flood their own network link sending more and more packets as the TCP congestion algorithm allows the TCP window to open up.

To prevent TCP connection hijacking, the initial sequence numbers for each side are typically randomized[3]. The opt-ack attack begins by connecting to a victim and requesting a resource that causes the victim to begin sending a large amount of data. For instance, an attacker could request a large file transfer via FTP, a large web page asset via HTTP, or simply the “chargen” service present on many UNIX servers. Once the victim and attacker are synchronized at the TCP level and the victim begins to send the requested data, the attacker starts its attack by simulating the victim’s state and sending fake acknowledgements, simply called ACK in TCP terminology, to induce the victim to send as fast as possible.

Goals

The original paper demonstrated a danger from the opt-ack (one attacker, many victims) through analysis and experiments. Different scenarios were tried by adjusting the maximum segment size (mss) and window scale multiplier (wscale)[4]. The authors calculated that a combination of these two could easily lead to flooding any connection available to a victim. For instance, an mss of 88 and a wscale of 14 the attacker could theoretically induce 1.6 terabits a second of traffic from a single victim. The mss can be tuned down to send more packets per second therefore inducing more TCP overhead and stretching the attack out a bit more. That is: for the same amount of data sent from the victim’s application server, more traffic will appear on the network due to the TCP packetization overhead.

The paper goes on to discuss how to prevent these attacks. They judged each suggestion based on efficiency, robustness, deployability, simplicity, and where it needed to be implemented. For instance the least practical suggestion was changing the TCP specification to include secure cookies which would require either a “flag day” where everyone changes over or incremental deployment where servers would remain vulnerable during the transition.

There were several other suggestions but the decided-upon defense was chosen based on being easily implemented on the victim’s side without breaking current TCP implementations. The idea was randomly skipped segments where the server chooses an ACK window to skip (a “hole”) at random and sends the next packet with sequence corresponding to directly after the hole. A legitimate client would ACK the segment just before the hole and wait for the missing segment to be retransmitted. An attacker cannot be sure if the packet was simply dropped due to the flooding attack or never sent at all, so the attacker would indicate they received the missing segment successfully. This immediately alerts the victim of the attack and the victim drops the connection.

Results

Figure 2 from the original paper

The original paper’s Figure 2.

A simulated network was set up, the opt-ack attack implemented in the user space ns-2[5] program, and several attacks were carried out. The results showed their predicted performance was almost exactly what happened in ideal conditions (99.9% of the predictions). However they were only able to simulate around 512 victims due to CPU speed and other factors. The results showed that the attacked servers were affected adversely so the amount of time it took for a legitimate client to downloading a 100MB file  increased 17.42x during a local LAN-based attack.

The paper also went on to implementing the suggested opt-ack defense of randomly skipping segments. Their implementation successfully defended against the opt-ack attack while maintaining an approximately 99.5% efficiency with normal users.

Our reproduction

We decided to reimplement the opt-ack attack from the ground up in a more portable way than being implemented inside of ns-2. We used mininet[6] as the virtual network simulator to set up the simulated environment for which we’d reproduce Figure 2 and Figure 3 from the original paper. We chose Java for its portability and used a library called RockSaw[7] which allows you to use raw sockets from Java. Although there are few users of the RockSaw library that would could find, there was a former student at Northeastern University named An Dang who implemented an HTTP GET using RockSaw[8]. Dang’s TCP packet parsing code was used as the basis for our version of the opt-ack attack.

Our initial implementation would effectively stop after a couple seconds. Upon examining the packet dumps, we could see our sent attack ACKs overrunning the victim’s current sequence number. We suspected that the pseudocode presented in Section 2 of the original paper did not contain all the techniques implemented in the final algorithm. We searched around and were rewarded by finding an “extended” version of the paper which contained an extra appendix describing the original authors’ experience implementing the opt-ack attack. The appendix detailed a situation which mirrored our issue exactly and also included details about the changes made to mitigate the problem.

Figure 2

Reproduction of Figure 2 in the opt-ack paper.

Reproduction of Figure 3 in the Opt-Ack Attack paper.

Reproduction of Figure 3 in the Opt-Ack Attack paper.

Our version of the opt-ack attack written in Java doesn’t achieve the same level of performance as the original authors’ version written in C, but it still induces the attack effectively despite the Java language not being well-suited to the task of manipulating bits at the level needed to parse headers effectively. The amount of traffic induced scales with the number of victims similarly as well. During our test on an Amazon EC2 instance (c3.large), at 64 victims and higher the attack begins to stop scaling up. A c3.2xlarge instance could scale up to 128 victims, but started faltering at 256 victims.

We believe the scaling up is due to a combination of frequent transitions to and from the JNI layer for the raw socket and also our Java program’s inefficiency manipulating bit-level data structures that are essential for dealing with performance at this level. Further tuning could be done to optimize the Java program, but we instead recommend future implementors stick to the standard and proven C language for networking programs.

The performance of the Java program does not indicate a problem with the opt-ack attack; it is a very powerful asymmetric attack that is still possible today. During our experiments we tried the attack against TCP Reno, TCP Tahoe, BIC, and CUBIC with success. Implementation of the defenses drops efficiency of the connection by a negligible amount, but it’s probably more than server operators are willing to do. However if even a few attackers started using the opt-ack attack, the economics of implementing the defense would instantly be palatable.

Reimplementation source code

If you wish to examine the source code that made this attack possible, see Opt-Ack Attack in Java project at BitBucket. There you will find a README with all the information you need to re-create this attack in your own environment. Please note that this source code is made available for academic research purposes only.

Acknowledgements

We would like to thank An Dang for making his source code available, the mininet project maintainers for maintaining code that made our virtual network environment setup easy, Professor Nick McKeown, and our very patient TA Lisa Yan for taking time to meet with us about this project.

References

    1. Sherwood, Rob, Bobby Bhattacharjee, and Ryan Braud. “Misbehaving TCP Receivers Can Cause Internet-wide Congestion Collapse.” Proceedings of the 12th ACM Conference on Computer and Communications Security – CCS ’05 (2005): n. pag. Web. <https://www.cs.umd.edu/~capveg/optack/optack-extended.pdf>.
    2. Allman, M., V. Paxon, and W. Stevens. “RFC 2581 – TCP Congestion Control.” The Internet Society, Apr. 1999. Web. 25 May 2015. <https://tools.ietf.org/html/rfc2581>.
    3. Gont, F., and S. Bellovin. “RFC 6528 – Defending against Sequence Number Attacks.” The Internet Society, Feb. 2012. Web. 25 May 2015. <https://tools.ietf.org/html/rfc6528>.
    4. Jacobson, V., R. Braden, and D. Borman. “RFC 1323 – TCP Extensions for High Performance.” The Internet Society, May 1992. Web. 27 May 2015. <https://tools.ietf.org/html/rfc1323>.
    5. The Network Simulator – ns-2. Web. 24 May 2015. <http://www.isi.edu/nsnam/ns/>.
    6. Mininet. Web. 21 May 2015. <http://mininet.org/>.
    7. RockSaw Raw Socket Library. Web. 19 May 2015. <https://www.savarese.org/software/rocksaw/>.
    8. Dang, An. “Dangan249/RawSocket.” GitHub. Web. 30 May 2015. <https://github.com/dangan249/RawSocket>.

One response to “CS244 ’15: Misbehaving TCP Receivers Can Cause Internet-Wide Congestion Collapse

  1. Reproducibility score: 4/5

    Graphs: The bottom graph (figure 3) matches perfectly. The top graph, however, is much more saw-toothy in our reproduction. The only line that matched the smooth curve seen above is the 256 victim line. The rest exhibited logarithmic sized saw-tooth swings.

    Instructions: We suggest adding the machine parameters (cpu and memory) that were used for running experiments in addition to specifying the Amazon EC2 AMI image. Similarly to the blog authors, the experiment did not complete on some of the smaller machines we tried, -medium and -large, for example, and threw switch shut-down errors on the bigger machines. These errors did not stop the experiments, but the messages were not convincing that everything was running correctly. Everything finally finished on the c3.2xlarge instance for us and produced results that did not perfectly match the authors’.

Leave a comment