Seeing RED


Team: Justin Costa-Roberts and Carl Case.

Key Result: We were unable to reproduce the benefits of random early detection (RED) shown in the original paper describing the idea [1].

Sources:

[1] Sally Floyd and Van Jacobson. Random Early Detection Gateways for Congestion Avoidance.

[2] Van Jacobson and Michael J. Karels. Congestion Avoidance and Control.

[3] Bram Cohen. TCP Sucks.

[4] Kathleen Nichols and Van Jacobson. Controlling Queue Delay.

Contacts: Justin Costa-Roberts (unjustin@stanford.edu), Carl Case (cbcase@stanford.edu)

Introduction:

Jacobson and Karels conclude the seminal TCP paper “Congestion Avoidance and Control” with a look to the future [2]:

Only in gateways, at the convergence of flows, is there enough information to control sharing and fair allocation. Thus, we view the gateway `congestion detection’ algorithm as the next big step.

That next step came a few years later when Sally Floyd and Van Jacobson published “Random Early Detection Gateways for Congestion Avoidance” [1]. Random Early Detection (RED) is a queue management scheme for gateways (switches) in which those gateways begin to drop packets *before* the packet buffer is all the way full. These early drops feed back to the sender through TCP’s congestion control to back off as congestion is imminent.

RED is an elegant algorithm, and it has found renewed importance as network operators and users increasingly face “buffer bloat,” the introduction of larger and more numerous buffers at the edges of the Internet. These are present in broadband access modems and the OS kernel itself. The trouble is that packet buffers much larger than the bandwidth-delay product of the outgoing link serve no useful purpose but instead increase latency and harm the congestion control features of TCP.

To see why, note that the purpose of a packet buffer is to handle a *short-term* mismatch between input and output rates that can occur from bursty traffic or link multiplexing. Any long-term mismatch is congestion and no amount of buffering is sufficient. Since the sender will be allowed to send more data after one RTT (when acks come back), any packets still in the buffer after one RTT create a “standing buffer,” i.e., a non-zero-length queue that stays the same length as packets are inserted at the same rate they leave. This introduces latency, as packets must sit in the buffer even though the buffer is serving no purpose. Furthermore, these large buffers turn congestion into long delays rather than dropped packets, so TCP’s congestion control mechanisms, which depend on packet drops to detect congestion, become almost useless. This is of no help to the end-user who would see delays from either queueing delay or retransmissions.

RED promises to fix these issues, as early drops will keep the queue from growing too large. Unfortunately, RED has seen little adoption in the Internet. Bram Cohen, the inventor of BitTorrent, facetiously points to the difficulty in selling a router with smaller buffers using RED, as the pitch is essentially “Has less memory! Drops more packets!” [3]

More seriously, RED is an algorithm with a number of parameters, and tuning RED is a non-trivial task. Writing in 2012, Nichols and Jacobson say [4]:

Although RED was simple and can be effective at reducing persistent queues, little guidance was available to set its configuration parameters and it functioned poorly in a number of cases. This led to a general reluctance to use it.

Our hope is that by returning to the original work on RED, we might first validate the presentation given there, and as a side effect, show Mininet’s usefulness in reproducing “old” networking results. Second, the figures in the original paper are notably obtuse, so we expect that working through reproducing them will give helpful insight into what exactly they mean. And third, we hope to gain some insight into RED ourselves and see just how easy it can be to explore RED’s parameter space with a tool like Mininet.

We recreate two comparisons of the standard tail-drop queue to RED (three figures total). The first comparison is in Figure 5 of the original paper, comparing the throughput against average queue length of both tail-drop and RED. It shows that to achieve a given throughput, RED uses less buffering — it has a shorter queue on average. The second pair of figures are 12 and 14. These show the bottleneck link utilization, average queue length, and share of the bottleneck assigned to a high-delay host for each of tail-drop and RED respectively. They show that RED provides greater fairness for the high-delay host while maintaining shorter queues and achieving higher bottleneck link utilization.

Methods

Traffic Generation:

In all the experiments we reproduce, the authors use multiple FTP servers as the traffic sources, all sending to a single sink host. To reproduce the experiments, we need the following capabilities from the FTP servers:
– Set the application payload to 1000 bytes
– Modify the TCP maximum congestion window
– Calculate the percentage of data at the sink that came from a particular host

We wrote a small piece of C code to provide this functionality, the quote-ftp-client and quote-ftp-server. The name comes from the fact that the original version sent the string “ftp” repeatedly. These applications use the sockets API to directly provide the requirements listed above:
– We disable Nagle’s algorithm and send() 1000 bytes at a time
– We use setsockopt() on the client side to set the receive window size
– Each host can send a different byte repeatedly, and the client tracks how many bytes of each it sees

We first explored implementing these features without writing custom code. Unfortunately, the standard method for setting TCP congestion windows (sysctl net.core.rmem_default) does not respect network namespaces, so it is all or nothing in Mininet. Additionally, we cannot simply compare throughput at each of the server’s interfaces to get relative throughput, as this does not take into account which packets are dropped in the queue.

Both experiments require us to set a maximum congestion window size. The ability to do so is particularly important in the second experiment, which uses a small congestion window and a high-delay link to effect bursty TCP behavior. To sanity-check our traffic generator, we created a simple topology that connected a sender to a receiver with a switch in between:

delay:              16ms            2ms
          (sender) ------ (switch) ------ (receiver)
bandwidth:         45Mbps          45Mbps

The link from the sender to the switch was given a high delay and the sender’s congestion window was capped at eight packets. The sender then opened up a TCP connection with the receiver and sent data for three seconds. We collected data every 10ms for the final two seconds to test the burstiness. Figure 1 shows the results of the test.

Figure 1: Validating the Bursty Traffic Generator

Topologies:

We reproduce Floyd and Jacobson’s Figure 6 and Figure 11 topologies (see Figures 2 and 3, below) for our first and second experiments, respectively.

Figure 2: The first topology we reproduce

Figure 3: The second topology we reproduce

The topologies themselves were trivial to create in Mininet. The difficulty was in imposing the RED discipline on the sink-facing switch interfaces. We made several alterations to the Mininet link module (link.py). First, we removed the hard-coded RED parameters and built-in functionality to allow topologies to specify arbitrary RED parameters. In testing the effect of our changes, we found that latencies on our RED-enabled links were in effect zero because of a bug in the Mininet code that had the hard-coded RED qdisc handle clobbering the hard-coded netem qdisc handle, never letting the netem delay take effect. Our second change was therefore to resolve this naming conflict. Once we had fixed this issue, we found that average queue length in the switch was many times higher than the RED limit we had set. Realizing that the netem limit was winning out over the RED limit, our third change was to reverse the order in which we added RED and netem to the qdisc tree, so that RED’s limit would be applied first (this required non-trivial refactoring of the qdisc creation code). Finally, we realized that we were applying RED to every interface in the topology, so we made it possible to specify that a queueing discipline be applied to just one interface — our sink-facing switch interface.

Results:

Figures 4, 5, and 6 show our results (on the right) next to the original results (on the left). Figure 4 is our reproduction of Floyd and Jacobson’s Figure 6. For whatever reason, they elected to put throughput on the abscissa and average queue size on mantissa, though queue size is the manipulated variable and throughput is the response. Their plot shows that at low average queue sizes, RED clearly maintains higher throughput than DropTail. Our graph shows the two disciplines matching each other across average queues sizes.

Figure 4

Figure 5 shows how Node 5’s share of system throughput, average queue size, and average bottleneck link utilization vary as the DropTail buffer size is varied. Figure 6 shows how the same measures vary with the RED minimum threshold.

Our results are positive on the first graph, showing Node 5’s share of the bottleneck link throughput. This host is the one with a higher delay and small window. Thus, under DropTail with small buffers, it is liable to have whole window losses and receive an abnormally small share of the throughput. Our results match the original ones: DropTail gives a larger share to Node 5 as the buffer gets larger, while RED gives that node a fair share throughout the range of small to larger drop thresholds. The effect is small however, whereas it is very pronounced in the original paper.

The other two plots show average queue length and bottleneck link utilization over time. These results are disappointing. RED and DropTail perform similarly, and neither result shows close semblance to the original paper’s result.

As in Figure 4, DropTail performs disappointingly well with small buffer sizes. There is certainly an increase in throughput at Node 5 as we increase the buffer size, but it is not nearly as dramatic as the increase Floyd and Jacobson see. RED, however, performs much as we would expect, exhibiting no noticeable increase in throughput share at Node 5 as we increase its minimum threshold parameter.

Figure 5

Figure 6

Lessons Learned:

Unfortunately, our result is essentially a negative one. We could not reproduce Floyd and Jacobson’s plots. The lessons we learned as a result are of three kinds. First is the lesson about RED. The received wisdom is that RED is tricky to get right and it often doesn’t work out of the box. Our results are in line with this received wisdom; even setting the parameters as near to Floyd and Jacobson’s as we can on modern Linux, the performance comes out differently. This does not mean we couldn’t get RED to work through brute-force parameter search, but that would not be a reproduction of the original results.

Second is the lesson about queueing disciplines in Linux. They are complicated enough that the baseline Mininet system did not use them correctly. In particular, as soon as there are multiple queueing disciplines, the interactions among them lead to positively bizarre behavior that can be tough to debug.

Third is the lesson about reproducing “old” research. We don’t think the negative result is our fault or the fault of Mininet or Linux or even the original paper. Rather, 19 years is a long time in computers, and both the underlying hardware and software have changed. We believe, for example, that the DropTail qdisc implementation in Linux is highly optimized, as it is the default used the world over; in contrast, the tc-red implementation sees little use. As a result, our divergent result could be entirely explained by implementation differences across time.

Instructions to Replicate This Experiment:

1. Begin by creating a new AWS instance. Use the provided machine image and setup instructions given for this class.

2. SSH to the newly created host. First update Mininet:

> cd ~/mininet
> git pull --rebase
> sudo make develop

3. Pull our project code:

> cd ~
> git clone git://github.com/cbcase/seeing-red.git

4. Replace mininet’s link.py file with our modified version:

> cd ~/mininet/mininet
> mv link.py link-original.py
> ln -s ~/seeing-red/link.py link.py

5. Build the necessary C code:

> cd ~/seeing-red/quote-ftp
> make

5. Run the experiment (takes a little while…):

> cd ~/seeing-red
> sudo python red_simulation.py --sim1 --sim2 --plot

6. Look at the results:
– sim1/sim1plot.png is reproduction of Figure 5
– sim2/dtplot.png and sim2/redplot.png are reproductions of Figures 12 and 14, respectively

2 responses to “Seeing RED

  1. We were able to reproduce the results shown here, and it was simple to carry it out with the instructions provided.
    In fact, Figure 14) more closely resembled the original graph with respect to the inital spike.

  2. “We believe, for example, that the DropTail qdisc implementation in Linux is highly optimized, as it is the default used the world over;”

    1) It is not that drop tail is optimized at all. It is too dumb to optimize. 🙂 It is that TCPs have evolved considerably in light of drop tail queues everywhere in the past 19 years. cubic, hystart, initial window changes, proportional rate reduction, pacing, TSQ, etc have all made significant differences. One problem that red solved, however, remains – tcp global synchronization, which you can see on short queues (but is hard to see on long ones)

    2) I would have liked it if you documented what kernel version you were running. there *were* two bugs in linux red fixed in the past 4 years, one of them was: commit 1ee5fa1e9970a16036e37c7b9d5ce81c778252fc
    Author: Eric Dumazet
    Date: Thu Dec 1 11:06:34 2011 +0000

    3) Thank you very much for revisiting this paper and making your results reproducible. This kind of work is desperately needed in such a young field, where all the variables are in flux, and yet human minds draw further conclusions from results that have since been invalidated.
    (I keep hoping someone will revisit the famous self-similarity paper with modern traffic one day.)

Leave a comment