Gary Miguel and Angad Singh.
A single connection over 3G and WiFi using fully optimized MPTCP with 600KB or more of receive buffer can fully utilize both links. Also, high jitter significantly disrupts MPTCP connections, making them perform worse than regular TCP.
1. Original paper (PDF): How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP. Costin Raiciu, Christoph Paasch, Sebastien Barre, Alan Ford, Michio Honda, Fabien Duchene, Olivier Bonaventure, and Mark Handley. NSDI 2012.
Related website (with video): How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP
2. Paper used for background on 3G link latency (PDF):
TCP/IP Performance over 3G Wireless Links with Rate and Delay Variation. Mun Choon Chan and Ramachandran Ramjee. MOBICOM 2002.
Multiple network paths between two communicating hosts are more and more common as more computing moves to data centers and mobile devices. Data centers have multiple paths through different core and aggregation switches, while many mobile devices can be connected to several links at once.
The paper presents Multipath TCP (MPTCP), a transport protocol that can utilize multiple network paths for a single connection.
We aimed to replicate Figure 4 from the paper which tests various versions of MPTCP and regular TCP performance while varying the size of the receive buffer.
This figure considers the case where an MPTCP connection goes over two links, one with much higher latency and lower bandwidth than the other. While the receive buffer required for TCP is given by Bandwidth x RTT, the receive buffer required for an MPTCP connection over multiple links is ∑ (Bandwidth(i) x RTTmax).
Figure 4(a) from the paper shows that the initial implementation of MPTCP performed worse than regular TCP for certain receive buffer sizes. This is because the packets sent over the lower-latency link can fill up the receive window, with gaps left for the packets sent over the high-latency link preventing the receiver from delivering the data to the application. MPTCP’s current implementation includes optimizations, dubbed “Opportunistic Retransmit” and “Penalizing slow subflows” (M1 and M2), which attempt to fix this. Opportunistic Retransmit will re-transmit packets that it has already sent on the high latency link over the low-latency link when a full receive window prevents it from sending any new data. Penalizing slow subflows means that when an opportunistic retransmission is necessary, the higher-latency subflow’s congestion window is halved to try to prevent getting into this situation again.
In the figure, the three plots show the effects of varying the receive window size, as well as optimizations M1 and M2 on the throughput of a dual-homed host. The host is connected to two emulated links: Wi-Fi and 3G. We want to see if we can replicate these results using Mininet. We also want to go beyond what the authors did to see the effect of link latency variance (AKA jitter) on throughput. We think this will bring the results closer to reality, since wireless link technologies often experience high jitter in the real world.
We used a slightly customized version of Mininet to emulate the topology as described below. You can see the differences on Github. We use a modified Linux 3.2.0 kernel (64-bit) in Ubuntu 12.04. We had to compile our own patched version of MPTCP to re-enable the ability to disable the MPTCP optimizations M1 and M2. This patch is available in our git repository. We used iperf and bwm-ng to measure bandwidth. Our source code is on Bitucket.
For each configuration and window size, we ran iperf for 27 seconds, ignoring the first 5 and last 2 seconds of each run. We got our throughput values from bwm-ng and our goodput values from iperf. At first we thought we would have to run it for much longer to get consistent results, but this turned out to be long enough.
We run our tests for 3 different topologies: the one shown in the graph below, one with only a 3G path and one with only a WiFi path. We emulate WiFi and 3G links using the parameters given in the paper:
WiFi connection – 10ms delay, 8Mbps bandwidth and a 80ms buffer.
3G connection – 75ms delay, 2Mbps bandwidth and a 2 second buffer.
In all topologies, the server is connected to the switches with a 100Mbps link with 0.1ms delay.
In order to enable communication over multiple paths, we set up two subnets, as shown in the diagram.
We assign IP addresses to all four interfaces to form two separate subnets, one going through each switch. We use the “ip route” tool to set up two routing tables on each host, one for each subnet.
Receive Window modification
We set the minimum, default, and maximum values for the TCP receive window to the same value using the sysctl utility. For example:
sysctl -w net.ipv4.tcp_rmem=10240 10240 10240
MPTCP optimizations enable/disable
Currently, M1 and M2 optimizations are enabled by default in the MPTCP kernel. We received a patch from Christoph Paasch (maintainer of the MPTCP kernel) that adds sysctl variables with which we can selectively enable or disable the optimizations.
We tested the fully optimized (M1+M2) MPTCP and regular TCP with no jitter, with low jitter and with high jitter. The link characteristics for each setting are summarized in the table below.
|Base latency||Low jitter||High jitter|
We used netem to add jitter according to a Pareto distribution to approximate wireless link latencies. This distribution approximates the fat-tailed latency distribution seen in real wireless networks . We measured the RTT of our 3G link with our high-jitter netem settings and recorded and recorded them in the file 3g_pareto_RTT.ods, which is in our git repository.
We created 6 graphs, labeled a-e and a-htb. Graphs a-c are the same as figures 4(a)-4(c) in the paper, while d and e show the performance of TCP and the fully optimized MPTCP with low and high jitter, and a-htb shows the results of using the HTB queuing discipline for the same settings as graph a. All other measurements were taken using the TBF queuing discipline.
Our results in graph c are very similar to figure 4(c) from the paper. However, in our results the required receive buffer for MPTCP to exceed TCP’s performance is slightly higher than in the paper. Given the many differences between our experimental set-ups, we don’t think think this difference is significant. However, the fact that performance is slightly worse than in graph b for small buffer sizes indicates that perhaps the “penalizing slow subflows” algorithm is too aggressive for this particular topology and version of MPTCP.
In both graphs a and b we see no dip in goodput at around 100-200KB of receive window. In graph a, the performance stays closer to TCP than in the paper, and in graph b our goodput looks slightly better. Some of these differences may be due to a difference in how we’re setting the TCP receive window. Christoph Paasch told us that for their experiment they only set the maximum size, while we set the minimum, default and maximum to the same value. It’s possible that the OS chose something less than the maximum in some trials of their experiment. There are also many other possible causes for the differences: we are using newer MPTCP code and a much newer version of Linux, and we don’t have access to the code that they used to generate their graphs, so we can’t be sure what other differences there are between our set ups.
Our results in graphs a-htb differ from the results in graph a. Ignoring the dip at 100KB (for more on that see the section on jitter), they look closer to figure 4(a) in the paper. We discuss this further in “Lessons Learned”.
Graphs d and e show that MPTCP performed quite erratically with jittery links. Especially with high jitter (graph e), the connection sometimes fails to transfer any significant amount of data. Even when it did transfer data, it performed significantly worse than regular TCP.
As for the drops to zero, we saw this happen without any jitter as well, but only a few times in our hundreds of runs. We tried running each trial 3 times and taking the median result and we still saw this failure. When this happened the standard output of iperf was also corrupt or incomplete. The fact that its correlated with high jitter makes us think this could have something to do with iperf failing to properly initiate a connection in our topology. There may be some strange interactions between iperf and MPTCP going on with such high and variable latency. We are in contact with Christoph regarding this issue and he is interested in following up on it.
A few key details were not specified in the paper. The paper didn’t specify how the researchers recorded throughput and goodput. For example, they didn’t say how much time they recorded over or what tools they used to record. They didn’t even mention what tools they used to simulate or emulate their network.
Their largest omission, in terms of its effect on our ability to replicate their results, was describing how exactly they emulated their specified buffer sizes on their links. At first, we used the “max_queue_len” mechanism in Mininet, which by default uses HTB (specifically through the “limit” argument passed to the netem qdisc). At first we saw much worse performance in graphs a and b than the authors did. After contacting Christoph Pasaach regarding this, he told us that the authors used TBF (specifically through the “latency” argument to the tbf qdisc).
To double-check our original parameters, we used iperf to saturate the 3G link, and then used ping to check the round trip time between client and server. RTT rose to around 2 seconds, which is exactly what we would expect. We tried using TBF as the authors did, and we found the RTT under load was much lower, so there are definitely significant differences between the two. After discussions with Christoph and the TA’s, we decided to do most of our experiments with TBF, in order to try to better approximate the original experiment.
Given that the maximum queueing delay described in the paper and measured under TBF are different, we think that the authors’ description of the link characteristics did not match what they actually tested, though our knowledge of the different queuing disciplines is quite poor and without access to their original code we can’t be sure. In any case the problem is merely academic, as our tests with both HTB and TBF show that the current implementation of MPTCP (which contains both M1 and M2) performs quite well.
We did get a fair amount of variance in our results. That is, the exact buffer sizes at which MPTCP performance would surpass TCP performance varied from run-to-run. We could improve this by doing multiple trials and graphing the mean or median result, but our experiment already takes about an hour to run, so we accepted some variability in return for a reasonable running time.
Unexpected time sinks
Compiling the kernel several times was very time consuming, especially manually setting parameters in “make menuconfig” to enable MPTCP. Figuring out how to set up multiple routing tables took us a long time, since we knew very little about Linux’s iproute tools. Finally, just running the experiment takes a long time, because we have to test so many configurations and window sizes.
Suitability of Mininet
Mininet did almost everything we needed. We needed to make a small change to it in order to set the delay using the “latency” argument to the TBF qdisc. Also, as a suggestion, the API of the “delay” parameter in a link’s configuration could be improved. Currently it is treated as a string that is passed to netem. However, this is at odds with other parameters like “bw”, which is expected to be a numeric type. We were at a loss as to why setting delay to 1 and 1000 had the same results until we realized we had to set the parameter to the string value ‘1ms’ or ‘1000ms’. There should be separate configuration parameters for delay, jitter, and delay distribution, and Mininet should enforce that they are of the expected types.
EC2 and scaling
Running on EC2 didn’t present any major issues, once we got our custom kernel installed. A c1.medium instance has more than enough resources to handle this experiment, which after all involves only two hosts and two fairly low-bandwidth links. Scaling the experiment to higher link speeds (for example to emulate modern 802.11n and LTE networks) should be possible on one EC2 instance, but we have not tried.
Instructions to Replicate This Experiment:
Create an EC2 instance
- Log into https://console.aws.amazon.com/ec2/home?region=us-east-1.
- Click on “instances” in the left-hand navigation menu.
- Click “Launch instance”, at the top.
- Under “Choose Launch Configuration”, choose “More Amazon Machine Images”.
- Search for “angad-garymm” and select it. The AMI id is “ami-6c00a105”. Click “Continue”.
- Click “Edit details”.
- For Type, select c1.medium.
- Under, security, Security Group, select quicklaunch-1.
- Click “Launch”.
- (copied from class EC2 instructions): The Python SimpleHTTPServer module is an easy way to access files created on your instance over the web. By default it starts on port 8000. To add a rule to allow requests to this port:
- click on NETWORK & SECURITY -> Security Groups on the left-side Navigation Pane.
- click on quicklaunch-1 on the top-right pane.
- On the bottom right pane, click on the Inbound tab. Create a new Custom TCP rule for port 8000, then click the + button to add the rule.
- Click Apply Rule Changes.
Get the code and run it
- Ssh into your instance with user name “ubuntu”.
- git clone https://bitbucket.org/angadsg/mininet-mptcp.git (TODO: git repo SHA1 version)
- cd mininet-mptcp/src
- sudo ./record_experiment.sh
- Allow about an hour for it to run.
View the results
- Start an HTTP server on the EC2 virtual machine:
- python -m SimpleHTTPServer
- The graphs will be in PNG files under src/<some timestamp>/