Alexander Schaub <firstname.lastname@example.org>
Trey Deitch <email@example.com>
We are focused on reproducing results from the paper “Misbehaving TCP Receivers Can Cause Internet-Wide Congestion Collapse” by Sherwood, Bhattacharjee, and Braud from CCS ’05, available from http://www.cs.umd.edu/~capveg/optack/optack-ccs05.pdf.
The original paper tries to show that misbehaving TCP clients can cause massive congestion. The authors explain how much traffic a single misbehaving node can induce, depending on the TCP configuration of the server. “Misbehaving” here means sending acknowledgments for packets before those packets are actually received, ACKing the whole congestion window on each RTT and causing rapid increase of the congestion window to the limit of what the network can support. They also present and analyze the performance of several proposals that aim to counter this attack.
The attack described by the authors is simple to mount and yet has very serious consequences, as whole networks can be DoSed with very little effort. To our knowledge, none of the proposed mitigations have been implemented yet, and therefore TCP remains vulnerable to this problem.
The original paper found that a malicious client could cause a bandwidth utilization at the victim’s servers as high as 1700 times the bandwidth of the client’s connection for typical server configurations, and over 30 million times the bandwidth of his connection in the worst case. Existing methods for detecting this kind of attack do not work well, because unlike ordinary DoS attacks, incoming traffic at the server remains low, and even if every server limited its maximum congestion window, an attacker could still open several TCP flows to different servers and overwhelm the network using this same attack. Setting a hard congestion window limit would also penalize legitimate users who happened to have high-bandwidth connections.
Concerning possible mitigations, the paper found that selectively dropping packets at the server without reducing the congestion window for these forced drops, and ending the connection if such a dropped packet was ACKed, prevented this attack from succeeding while maintaining over 99% of the performance of TCP.
Our Reproduction Work
Motivation and Goals
Our goal was to mount this attack and reproduce the high bandwidth utilization at a victim server caused by a client with a relatively low bandwidth connection. More precisely, we aimed to reproduce (parts of) Figure 7 from the original paper, which shows how much traffic a single user, connected to multiple servers, can cause when maliciously using optimistic ACKing. This seems to be the most impressive result of this paper and is a very convincing proof-of-concept of the described attack. Note that in this experiment, the attacker is connected to a switch with a T1 line, i.e. a 1.544Mb/s link, but is able to induce up to 5GB/s or 40Gb/s of traffic.
As the authors of the paper explain, “the attacker injects ACKs into the network before the corresponding segments have even reached the attacker, so remaining synchronized with the victim can be non-trivial. Maintaining this synchronization of sequence numbers is crucial to the attack.” In order to avoid overrunning the victim, we use a pacing thread that reads packets received from the victim, detects retransmissions, and adjusts our estimation of the clients congestion window accordingly. Once the attack is running, most traffic from the victim will be lost, but this code may still help prevent overruns, especially when the attack has just begun.
Also, the algorithm presented in the paper hides a crucial detail: after sending an ACK to every victim in a round-robin fashion, the attacker needs to wait for a specific amount of time before sending the next round of ACKs, and determining this duration is crucial to the success of the experiment. If the attacker does not wait, as we did not in the early iterations of our program, the attacker will systematically overrun the victims, and as a result will not able to generate much traffic. We solved this problem by choosing an adaptive “target bandwidth,” and by waiting as long as it would take for the ACKed data to get sent over a link with this target bandwidth. The target bandwidth is initialized to a small value, but increases quickly up to near the maximum theoretical capacity of the victim’s 100Mb/s link. Using this technique, we found that the attacker would rarely overrun the victims, and when it would, the attacker’s ingress link was so saturated that the packets with repeated sequence numbers, indicating an overrun, would not get to the attacker before the victim’s socket timed out.
Finally, we had to battle against numerous implementation challenges. For instance, on a standard OS (but strangely not on the Mininet VM used for PA1), using raw sockets to initiate a TCP handshake will cause the OS to send RST packets because it does not know about the raw sockets. We fixed this using iptables rules to filter out any TCP packets containing RST flags sent to the victims. Also, when the experiment is taking place, most of the traffic to the attacker is dropped by the switch. Thus, the duration of the experiment was first limited by the ARP cache timeout, because the attacker would never receive an answer to its ARP requests. Configuring Mininet to use static ARP tables allowed us to get past this limitation. As a result, in the simulation, we are able to sustain the attack for long periods of time.
We decided to try and reproduce the results with Mininet. The original paper describes the attacker and victims as connected in a “star topology,” with the malicious client being connected to the switch with a 1.544Mb/s link, and the victims being connected to the switch with 100Mb/s links. To reproduce the results, we therefore created a simple topology, with the victims and the attacker all connected to the same switch, and the links having the corresponding capacities. The delay, as in the paper, was set to 10ms, the wscale factor equal to 4, and the maximum segment (mss) size equal to 1460.
We were able to produce a very similar graph to figure 7, up to 32 victims. After that, we started to run up against the performance limitations of our machine.
We used a c3.large Amazon instance to produce these results. Upgrading to a c3.2xlarge instance yielded slightly better results for the 64 victim case, confirming the suspicion that we are in fact limited by the CPU power of the simulation, and not by actual networking constraints. We were not able to start the experiment with 128 or more victims, but this only shows the scalability limits of our implementation of the simulation. Using the faster c3.2xlarge instance and adjusting the socket timeouts, we were able to start an experiment with up to 100 victims, but the aggregate bandwidth was hardly larger than with 64 victims, so these results are not shown here.
To obtain the results, we measured the number of bytes that were sent out through a standard TCP socket at the server, just after the data was accepted by the socket. Thus, the timing of the measurements is not completely accurate, because the data takes some amount of time to be placed on the wire. However, the average bandwidth reported over larger time periods is correct. The fluctuations on the graphs with less victims could actually be artifacts of our measurement method. Also, we probably slightly underestimate the induced traffic, as we do not take packetization and protocol overheads into account. We also ignore potential retransmissions.
Nonetheless, the overall thesis of the original paper clearly holds up today, since we were able to reproduce the original attack quite closely. Furthermore, we have seen no evidence that any of the mitigations proposed by the authors of the original paper have been widely deployed, so we suspect this attack could still be effective in practice today.
We chose to implement this experiment using Mininet and Python scripts, while the authors used the ns2 packet simulator and implemented the attack in C. This might explain the better scalability of their experiment compared to ours. However, implementing the attack in Python is probably much easier, and up to the scalability limit we reported above, the performance closely matches the author’s finding. Since Python allows the use of raw sockets and comes with efficient methods to manipulate byte-level structures, no extra libraries were necessary.
We did need to send and parse raw TCP packets, but we were able to do this relatively easily by tweaking one of the numerous Python scripts available on the Internet to perform those tasks. We briefly considered using scapy, a more comprehensive packet manipulation library, but we decided that we did not need most of its capabilities and could do without its overhead.
Lastly, the experiment is heavily CPU-dependent : on a local VM, we were not able to achieve more than 8MB/s per victim even with a small number of victims, and the attack did not scale very well past 12 victims, with the maximum aggregate traffic being observed at around 24 victims.
Reproducing the results
All the instructions can be found here : https://bitbucket.org/AlexCid/opt-ack. In short, reproducing the results requires to
- Launch a c3.large instance on AWS with the Mininet CS244-Spr15 AMI (instructions on how to do this can be found here : http://web.stanford.edu/class/cs244/ec2setup.html)
- Clone the repository :
git clone https://AlexCid@bitbucket.org/AlexCid/opt-ack.git cd opt-ack
- Install the gnuplot-nox packet to be able to generate the graph :
sudo apt-get update sudo apt-get install gnuplot-nox
- And then run the provided shell script.
Finally, the generated graph can be found at ~/opt-ack/results/output.png. The document can be retrieved by either opening port 8080 on the Amazon instance and launch a Python http server, or connecting to the Amazon instance using an sftp client to retrieve the graph.