By Brad Girardeau and Samantha Steele
Loss-based congestion control strategies have dominated throughout the history of TCP. These use packet losses as a proxy for congestion, but as buffer sizes and line rates have increased, performance has suffered. When buffers are large, a loss arrives too late to effectively signal congestion, causing the sender to fill the bottleneck buffer and leading to long RTTs from bufferbloat. When buffers are small, intermittent losses may signal congestion prematurely, preventing throughput from reaching the link speed.
BBR: Congestion-Based Congestion Control describes a better approach to congestion control based on a rigorous definition of congestion. A flow should ideally have data inflight equal to the product of the bandwidth of the bottleneck link with the round trip propagation time, called the bandwidth delay product: BDP = RTprop x BtlBw. At this point, a connection completely saturates the bottleneck link while maintaining an empty buffer. Sustained operation with more data inflight is then considered congestion. As shown in Figure 1 from the paper, loss based algorithms operate far to the right of this point. Research by Leonard Kleinrock showed this point was optimal but also proved an impossibility result: RTprop and BtlBw cannot be simultaneously measured because they adhere to an uncertainty principle. Measuring RTprop requires operating to the left of the BDP window while measuring BtlBw requires operation to the right.
Figure 1: Shows the effect the amount of data inflight has on delivery rate and round-trip time
BBR is a new congestion control protocol that sidesteps this impossibility result by using a control system that continually estimates RTprop and BtlBw in sequence. With this mechanism, BBR is able to operate approximately at the ideal point, maximizing throughput while minimizing RTT. BBR has now been merged into the Linux kernel and will be increasingly used to improve network performance going forward. We reproduced key measurements from the paper about BBR’s operation to validate these performance gains.
The paper describes a series of experiments demonstrating that BBR achieves significantly better performance than the state-of-the-art TCP CUBIC congestion control. Experiments measuring BBR in isolation show that it successfully operates at high throughput with essentially no standing queue, resulting in significantly lower RTTs than CUBIC. To test BBR in a live environment, the protocol has been deployed in the Google B4 WAN. This resulted in a 133x throughput increase from 15Mbps over CUBIC to 2Gbps with BBR. BBR has also been deployed on YouTube. Streaming RTTs decreased by 53 percent globally and 80 percent in the developing world. Importantly, experiments show that BBR achieves this performance while adhering to TCP fairness — flows converge to a fair share of a bottleneck link in the presence of both BBR and CUBIC flows.
We reproduced Figures 5 and 6 of the paper (shown in the Results section below), which summarize two key findings:
- BBR operates near the optimal point of high throughput with low RTTs [Figure 5]
- BBR flows converge to a fair share of a bottleneck link [Figure 6 and extension]
We chose the first result because it clearly shows BBR’s performance advantage over CUBIC, where BBR does not increase latency by filling the bottleneck buffer.
The second result matters for demonstrating that BBR can be feasibly deployed in the wild. Over time, all flows should receive a fair share of a link. Another important consideration is whether this fairness holds when BBR and CUBIC flows compete with each other, which the paper claims is the case. As an extension, we tested this finding, mimicking an experiment suggested by follow up work to the paper.
We successfully reproduced the essential features of Figures 5 and 6 and an extension of Figure 6. For each experiment, we performed tests in two different environments: using separate VMs and using emulated hosts in Mininet. This allowed us to test reproducing the figures in the manner they were created by the authors (using two VMs) and in an environment more suited to easy reproduction (Mininet). When testing between two VMs, we use both netperf and iperf to create flows — netperf is used by the authors while iperf is used in the Mininet environment.
Figure 5 compares the RTT over time of a TCP CUBIC flow in red and a BBR flow in green. The flows are run separately over a 10 Mbit/s bottleneck link with 40ms round trip time and 210ms of buffer (250ms – RTprop).
We reproduced the following figures, which almost exactly match the figure in the paper, especially when using the same environment as the authors (netperf between VMs).
Netperf flows between VMs
Iperf flows between VMs
Iperf flows in Mininet
Sometimes there are differences in the exact timing and characteristics of the CUBIC flow’s packet loss recovery episodes in the figures, but these do not affect their support of the paper’s key finding that BBR is able to operate with almost no queue, while CUBIC incurs a much higher RTT.
Figure 6 tracks the throughput of 5 BBR flows, one started every 2 seconds. The flows are run over a 100Mbits/s link with 10ms RTT.
We reproduced the following results, which are similar to the original graph, particularly the essential element of flows converging after 30-40 seconds. We were not able to consistently reproduce the exact convergence pattern found in the original graph (and there is some variability when producing our graphs as well) because BBR seems to be sensitive to exact differences in timing of packet drops for the various flows, but the overall synchronization and convergence remain consistent. Inconsistencies sometimes occur when one flow sees the ProbeRTT state of other flows at a particular time. It can gain more than its fair share of bandwidth, and keeps that bandwidth until its next ProbeRTT phase.
Iperf flows between two VMs
Netperf flows between VMs
Iperf flows in Mininet
The paper asserts that BBR is able to compete with TCP, but does not support the claim with experiments. As an extension, we added an experiment to compare the throughput of one CUBIC flow and one BBR flow on a 10Mbit/s, 40ms link. We found that BBR is largely able to hold its own, although the result is dependent on buffersize. In the top figure, the buffer size is roughly 6*BDP. The flow is originally starved when CUBIC fills the buffer. However, its probe RTT pacing phase allows it to compete, pushing towards a fair share of the bottleneck. As shown on the bottom, CUBIC dominates for very large buffer sizes (more than 18*BDP) because it so aggressively fills the buffer. This coincides with other fairness experiments that the authors have subsequently released.
Running BBR in Mininet
Since reproducibility is a key goal of this project, we wanted to run our experiments in a virtual network like Mininet. However, the implementation of BBR in the current kernel release requires the fq queueing discipline to pace packets at the bottleneck rate (rather than in response to ACKs), which is not supported by Mininet. We initially tried to override the Mininet network interface configuration with a manual tc command to use fq, but we were not able to get the required pacing working. Instead, we decided to both use a two VM setup (similar to the authors) and investigate a recent kernel patch that removed the need for fq.
The kernel patch introduces built-in TCP pacing for BBR to support situations in which fq cannot be used. This patch is currently only included in the most recent net-next 4.12-rc2 Linux kernel. We installed this kernel from source, but found it too unstable to work with, often requiring a hard reboot. Next, we tried to apply the patch to the stable 4.9 kernel that we had previously been using. Because the Linux TCP module was refactored between the 4.9 version and the patch’s base commit, this required some modifications to compile and allowed BBR to on Mininet with much more reliable behavior. However, a periodic networking bug that was fixed in the refactor caused the kernel to crash under high load. To avoid this, we applied the patch to the Linux v4.11 kernel, allowing us to run BBR over Mininet stably.
Understanding Linux Queueing
We used Linux traffic control mechanisms (tc) to emulate bottleneck links on the VMs. When initial experiments did not match up with the paper, we suspected our configurations were at fault. We found general advice on the BBR development group to use the netem qdisc on the receiver to control ingress and egress, while using only fq on the client. There were not specific commands provided, so we first had to investigate Linux queueing more thoroughly, starting with the excellent resource: https://www.coverfire.com/articles/queueing-in-the-linux-network-stack/. This gave a detailed picture of how the whole system worked, which was then supplemented with guides on specific queueing disciplines.
We were still having difficulty reproducing the convergence in Figure 6. We asked for feedback on the BBR development group, where we quickly received helpful advice. One key issue was turning off segmentation offloading, since this interacts unpredictably with queues in qdiscs that work on a per-packet level. This causes packets for some flows to become much bigger than others.
Overall our results supported the key ideas in the paper. Figure 5 confirmed that BBR does not create standing queues, outperforming CUBIC. Figure 6 shows that BBR flows will become synchronized and converge to a fair share of the link.
Our experience reproducing Figure 6 showed that BBR does not converge as cleanly or quickly as indicated in some cases. BBR synchronizes flows around its ProbeRTT state, when clients briefly send almost no data to allow buffers to clear for a good RTT estimate. This occurs every 10 seconds. Sometimes one flow takes much more bandwidth than others because it sees the other flows enter ProbeRTT and clear the link at an advantageous time. If flows don’t start exactly every 2 seconds, they sometimes take an extra period or two to converge. For example, we saw some groups of flows take 50-60 seconds to converge instead of 30 in Figure 6. This is not a fundamental flaw but points to an area where performance could be improved.
We chose Google Cloud Computing VMs as our platform and are running (for reasons described above) a patched v4.11 Linux kernel. For the virtual network environment, we used Mininet because it is easy to set up with simple topologies and can use the BBR support in the patched Linux kernel. For emulating bottleneck links in the two VM setup, we used standard Linux traffic shaping techniques set up with tc.
This setup is reproducible, even though it does require slightly more effort to set up as we had to install a new kernel from source and are using two VMs. It is scripted to make the process as easy as possible. As discussed in the Critique, there is some natural variability in the figures (especially Figure 6), but the key elements of the figures stay consistent.
The following instructions to reproduce our setup and figures have been tested on Mac OS X and should also work on Linux. On Windows, you will need to manually run the gcloud commands found in the create_vms.sh and run_experiments.sh scripts.
- Clone the repository: https://github.com/bgirardeau/bbr-replication
- Complete instructions can also be found in the README
- Install Google Cloud from https://cloud.google.com/sdk/downloads
- Initialize and log in to Google Cloud:
- Follow instructions after running gcloud auth login
- From inside the repository directory:
- Setup the experiment VMs by running bash create_vms.sh (there are 1-2 interactive prompts at the start before it runs for ~30 minutes)
- Run the experiments and download the figures by running bash run_experiments.sh (may take ~20 minutes)
- View the downloaded figures in figures folder