At the time when many of the Internet’s protocols were being designed and implemented, the Internet was a small and cooperative place, comprised primarily of hosts managed by engineers that were united by the common cause of connecting computers effectively, reliably, and fairly. Among these protocols was TCP, still one of the most used on the Internet today. Although it has changed some over the years, responding to the challenges of Internet congestion and demands for ever-higher transfer rates, TCP remains dependent on the cooperation of the sender and receiver to ensure, among other things, a fair distribution of throughput between flows competing for the same bottleneck link.
In “TCP Congestion Control with a Misbehaving Receiver,” Savage et al. demonstrate three attacks, each of which a greedy receiver might employ to induce a sender to transmit data arbitrarily fast. In real terms, a user could deploy these attacks on their personal computer’s TCP stack and expect to load webpages (especially small ones) much faster. This would create an incentive for many users to do the same, a deeply problematic outcome, since the congestion control algorithms these attacks exploit are key to preventing Internet-wide congestion collapse. In this blog post, we attempt to reproduce two of Savage et al.’s attacks, termed ACK division and DupACK spoofing.
ACK division exploits the incongruity between TCP’s error control protocol, implemented with byte-granular seqnos and acknos, and its congestion control protocol, which is defined in terms of segments. According to RFC 2581, “[d]uring slow start, TCP increments cwnd by at most [one segment] for each ACK received that acknowledges new data.” Assuming slow start begins by sending a cwnd of one segment, then a cooperative receiver responds by sending a single, cumulative ACK, causing cwnd to double. Under ACK division, however, the receiver sends M separate ACKs (each acknowledging 1/M of the initial cwnd’s bytes), which causes the sender to inflate its cwnd much faster.
DupACK spoofing, on the other hand, exploits the fast retransmit and fast recovery features of TCP congestion control. Upon receipt on a triple duplicate ACK, a typical sender retransmits unacknowledged data immediately. According to RFC 2581, the sender should also “[s]et cwnd to ssthresh plus 3*[sender’s maximum segment size]” and “increment cwnd by [the sender’s maximum segment size]” for each additional duplicate ACK received. This behavior is readily exploited by DupACK spoofing, which deliberately sends multiple copies of the same ACK to induce a flood of traffic from the sender. Both attacks are illustrated below.
When implemented as modifications to the TCP stack in the Linux 2.2 kernel and used to request a webpage, Savage et al. produced the following plots, which show remarkable gains in transfer rate (ACK division is on the left, DupACK spoofing on the right):
Although the focus of this blog post is on duplicating the results of Savage et al. in attacking TCP, it’s worth knowing that Savage et al. also propose modifications to TCP to limit the potency of these attacks: (1) to eliminate the incongruity between byte- and segment-level acknowledgment, incrementing cwnd by a maximum segment size only when byte-level ACKs acknowledging a whole segment are received, and (2) to add a singular nonce and reply-nonce to the TCP header to allow senders to identify duplicate ACKs and not inflate cwnd in response to them. With these modifications, Savage et al. assert, misbehaving receivers are unable to induce a sender to send any faster than they would otherwise.
Although Savage et al. implemented their attacks by compiling a custom kernel, we determined early on that compiling and debugging our own would take far too long. Thankfully, the open-source LwIP (Lightweight TCP/IP) project has published a Linux port of their user-space TCP/IP stack (it is most popular among embedded programmers), which we were able to modify for our purposes. As in Savage et al., our changes were modest, adding or modifying less than 100 lines of code and concentrated in tcp_out.c. The LwIP source code also came with a TCP echo server, which we hacked into a sink for TCP packets by disabling echo replies. Linked against our modified LwIP library, this comprised our misbehaving receiver.
On the sender side, we built a simple Python script to saturate the link between sender and receiver. The script merely sends a stream of ‘A’ characters for a specified length of time (for our purposes, 5 seconds worked nicely).
To run the experiment, we configured Mininet to connect the sender and receiver with a 1000 Mb/s link and a 100 ms RTT (a high latency was chosen so that the characteristic sequence number plots of slow start would be more visible). Since LwIP operates behind its own device tap0, we configure the sending Mininet host to route packets destined to tap0 through the receiving Mininet host. We run tcpdump on the receiving Mininet host to observe sequence numbers of packets sent by the sender.
Figure 1: Seqno (bytes) vs elapsed time for the ACK division attack.
Figure 2: Seqno (bytes) vs elapsed time for the ACK duplication attack.
Figure 3: tcpdump of the ACK division attack.
To verify the desired behavior of our attacks, we analyzed the raw tcpdump output to check sequence numbers and congestion windows (Figure 3), and plotted the sequence number progression of each TCP flow over time (Figures 1 and 2).
The ACK division performed as intended on the receiving host, sending multiple (valid) acknowledgements to the receiver for each received segment. This is illustrated in Figure 3: the transmitted segment 97:185 is acknowledged by acks for seqnos 97, 116, 135, 154, and 173. However, this did not achieve the desired effect of inflating the congestion window. The tcpdump output indicates the congestion window remained constant at 2144 outstanding segments, and Figure 1 shows that the transmission rate during the ACK division attack was virtually identical to the baseline.
One possible explanation for this behavior is that the TCP flow was never in the slow-start phase to begin with, preventing the one-SMSS-per-ACK behavior required for the attack to succeed. Previous groups tried to circumvent this by randomizing the sender and receiver IPs for each experiment to ensure the TCP flow was fresh each trial – we achieved this by restarting the virtual machine that ran the tests.
Our implementation of the DupACK attack also worked as intended on the receiver, returning multiple duplicate acknowledgements in an attempt to dupe the sender into fast recovery. This did not produce the overall desired effect – it appears the sender ignored the flurry of DupACKs and kept its congestion window unchanged, instead of reverting to slow-start. In Figure 2, the DupACK attack produces virtually the same traffic patterns as the baseline experiment.
More than anything else, our biggest takeaway from completing this project is that, while quicker and easier to debug than a custom kernel or a loadable kernel module, working with LwIP was no easy task. While we are inclined to complain about the project’s weak documentation and infrequent commenting, we are nonetheless thankful that our day-to-day programming need not break the socket abstraction. As much as LwIP was a pain point, however, we feel incredibly indebted to Mininet, and the ease with which we were able to debug routing and connectivity issues through the command line interface.
In future work, we would like to augment observation of the sender’s
TCP sequence numbers by directly instrumenting the Linux kernel TCP stack and learning exactly how the kernel’s congestion control algorithm is interpreting our ACK division and DupACK spoofing. It is curious that we were unable to observe the cwnd grow from its initial size of 10 segments in the baseline or in either of the attack scenarios, and a simple kprobe or loadable congestion control module would give us more direct insight than the sequence numbers themselves.
TCP Congestion Control with a Misbehaving Receiver, Stefan Savage, Neal Cardwell, David Wetherall, and Tom Anderson, ACM SIGCOMM Computer Communication Review 29(5):71-78, October 1999.
LwIP: A lightweight user-space TCP/IP stack.
Mininet: For network virtualization.