Team: Andrew Haven and James Whitbeck.
Key Result: A misbehaving TCP receiver can cause the sender’s congestion window to grow faster than expected, while retaining end-to-end reliability, by changing the way the receiver sends ACKs.
TCP Congestion Control with a Misbehaving Receiver, Stefan Savage, Neal Cardwell, David Wetherall, and Tom Anderson, ACM SIGCOMM Computer Communication Review 29(5):71-78, October 1999.
lwIP: A lightweight TCP/IP stack
Mininet: Rapid Prototyping for Software Defined Networks
Contacts: Andrew Haven (email@example.com), James Whitbeck (firstname.lastname@example.org)
During a TCP connection, both ends are cooperating to determine the correct transfer rate. The sender limits its window to avoid congesting the network, while the receiver limits its window so that its buffer doesn’t overflow. The TCP protocol also ensures that competing flows get a fair share of the bandwidth. It is well understood that a misbehaving sender can decide to send faster than normal and get a higher share of the bandwidth. What Savage et al. show in the TCP Daytona paper is that a misbehaving receiver can cause the sender to send at a much higher rate. A real world example could involve a web client downloading content from a server much faster than other clients, even though the server probably wants to serve everyone equally.
The paper demonstrates three attacks in which a misbehaving receiver, can take advantage of how the sender treats ACKs, as specified in RFC 2581, to cause the congestion window to grow arbitrarily large, while retaining the reliability properties of TCP.
- The first attack involves sending an ACK for each segment in small increments. Some TCP implementations will increase the window size for each ACK received. By sending acknowledgements one byte at a time the receiver can trick the sender into artificially increasing its window size.
- The second involves sending many duplicate ACKs, which will cause the sender to increase its window size and send a new segment by forcing the sender into fast recovery mode. This works because fast recovery, after halving the congestion window, increases it by one MSS for each duplicate ACK received. By flooding the sender with duplicate ACKs, the receiver can more than make up for the halving of the congestion window, and cause it to grow arbitrarily large.
- The final attack is to optimistically send ACK packets before the data is received. This may damage end-to-end reliability, but it can cause the sender to increase their window size very fast and arbitrarily large. The premature ACKs will cause the sender to assume the corresponding data has been received, even if it is still in flight. This attack is particularly dangerous because missing ACKs are TCP’s signal for congestion. Optimistic ACKs can therefore mask this signal and cause serious network issues.
Using Mininet on Amazon’s EC2, we investigate whether Linux is still vulnerable to these attacks. In particular, we attempt to replicate figures 4 and 5, illustrating the first two attacks. The third attack, optimistic ACKing, is trickier because TCP can no longer be relied upon for reliability, so we leave replicating figure 6 to further work.
We contacted Stefan Savage about the paper but unfortunately, the code he used has been lost. We therefore re-implemented it ourselves.
To be able to modify the TCP stack without having to recompile the kernel, we used a user-space TCP stack called lwIP. It is used mainly in embedded systems, but a port to Unix was available. Our modifications all reside within the tcp_send_empty_ack function in tcp_out.c.
We setup a Mininet topology with two hosts – a receiver and a sender – and a single link between the two. The bandwidth was set at 1000Mbps and the delay to 100ms. We didn’t want the link to be a bottle neck, and the delay is there so that a slow start can be observed more easily. The receiver is running a program called tcpsink that is sitting behind a virtual TAP interface. The TCP stack on that interface is lwIP. tcpsink just listens for connections and dumps all received data. The sender runs a small Python script that saturates the link with TCP traffic. We run tcpdump on the sender to capture all the outgoing data packets and we plot the sequence numbers.
The code was developed on Ubuntu 12.04, running the Linux kernel version 3.2.0. The code is available for replication on an EC2 instance running Ubuntu 12.04.
We plotted the sequence numbers for all data sent from the sender. We did not plot ACKs because the large number of ACKs overwhelmed tcpdump, as is explained in the next section.
We found that ACK division still causes more data to be sent compared to a non-misbehaving receiver. The increase we find is, however, not as sharp as in the original paper. It could be that the original vulnerability is no longer exploitable but that ACK division still causes the sender to increase his window. We did not investigate the cause of the vulnerability and leave it to further work.
On the other hand, we were not able to reproduce the speed-up with ACKs duplication. The duplication of ACKs does lead to lots of retransmission but the excess duplicate ACKS are not making the sender send more data. So this attack actually leads to lower goodput.
Getting lwIP to work was quite a challenge. The lwIP project has very little documentation, and its code has few comments. Moving the TCP stack into user space made us realize how much complexity there is in a modern networking stack, which we sometimes take for granted in the kernel.
On a related topic, we found the CLI in Mininet an invaluable tool in establishing connectivity between the sender and the process running the lwIP stack. It allowed us to test and modify network parameters on the fly, like listing active interfaces and inserting routes into routing tables, from a single command-line interface.
Another challenge was to accurately capture all the traffic using tcpdump. Quite often, tcpdump could not process the packets fast enough and would therefore drop them from its queue. This was indicated by the message ” X packets dropped by kernel”. Using the “-nn” flag, which prevents all lookup of host names and port names, helped but was not sufficient. As a result, the graphs could be missing whole ranges of sequence numbers. This occurred mainly when lots of ACKs were being sent during the attack experiments, and it happened a lot more on EC2 than on our local development VMs. As a consequence, we are not capturing ACKs as the original paper does and are only showing sequence numbers. We have made the number of ACK divisions and duplicates an adjustable test parameter, so if these values are set low enough, then capturing ACKs can be re-enabled in tcpdump.
We also had a surprisingly hard time getting to observe a TCP slow start. We started with opening a connection over the loopback interface and the traces showed that the window was constant throughout, not slow-starting or even doing AIMD. It seems that the kernel might forego the slow-start on a local interface. We were more successful using Mininet and we could see a slow start at first. But after running the experiment a few times, the slow-start behavior would disappear and the window would be constant from the beginning. After much head scratching, we tried changing IP addresses on every run and that solved the issue. Apparently, the kernel must be keeping some state about the link between two IP adresses, even when the connection is closed.
While we showed that dividing and duplicating ACKs had an impact on throughput, it would be interesting to study exactly how the TCP stack treats these extra ACKs. In the case of ACK duplication, one could try to make the stack vulnerable again or compare it to the version that is tested against in the original paper (2.2.10).
Additionally, we did not explore the third attack, the Optimistic Acking, which would be a good avenue for further work. It will necessarily involve fixing the reliability issues with this attack.
Instructions to Replicate This Experiment
Details for replicating the experiment are available on github. They involve launching an EC2 c1.xlarge instance. The output is the pair of graphs describing ACK division and ACK duplication.