CS244 ‘14: Sprout

Khaled AlTurkestani (khaled@stanford.edu) Chirag Sangani (csangani@stanford.edu)


Sprout is an end-to-end transport protocol that provides high throughput and low latency over cellular networks, taking into account the high variability of cellular network bandwidth. The problem with current transport protocols such as TCP is their slow reactivity to changing network bandwidth. As a result, we often observe either a queue buildup in the network when there is a sharp bandwidth drop, or an underutilization of the link when the bandwidth increases. To demonstrate the performance gains of using Sprout as a transport protocol, the paper compares its performance to different video conferencing applications such as Skype, Google Hangouts, and FaceTime, because they exemplify real-time interactive applications that benefit from delay and throughput gains.

Sprout takes advantage of the fact that queue delays are self-inflicted by user flows since most cellular networks keep per-user queues. Its metric for congestion control is the observed packet delay by the receiver. Using that delay, the sender statistically forecasts near-future bandwidth with 95% probability and adjusts its sending rate accordingly.


We believe that this paper is trying to solve a very interesting and relevant problem that is likely to become more pertinent in the near future. As society moves towards an increasingly mobile world, cellular networks are increasingly likely to be the “last-mile” link to a user running interactive applications. Furthermore, we will see more interactive applications with demands for low latency being created for or migrated to mobile devices. Therefore, it is very beneficial to investigate different transport protocols that are tailored towards cellular networks.

Original Results

The original authors evaluated the performance of Sprout, Skype, FaceTime, Hangouts, and a number of TCP variants, running on AT&T LTE, Verizon LTE, Verizon 3G (1xEV-DO), T-Mobile 3G (UMTS), and Sprint (not in the paper). The average throughput, self-inflicted delay, and utilization for each application on each network was calculated. It was found that Sprout consistently provided the lowest self-inflicted latency, while maintaining moderate throughput.

The authors evaluated the effect of the confidence parameter of Sprout on its performance, and found that a higher confidence value led to lower delays, but also lower throughput. The resilience of Sprout towards link loss was measured; it was found that there was no appreciable change in delay for up 10% loss, while throughput suffered by almost 50%.

Finally, the authors found that running a fat flow and an interactive flow simultaneously over a Sprout tunnel provided isolation to the interactive flow, compared to running them without the tunnel.

Results of Interest

We decided to reproduce a subset of Figure 7 from the paper. This figure reports average throughput and self-inflicted delay for a number of applications on various networks. For reasons of easy reproducibility (as explained later), we chose to reproduce the results for Sprout, TCP Cubic, and TCP Vegas. In addition, we evaluated TCP Reno, though it was not part of the original results. The networks for which we reproduced these results are all the original networks listed above.


Figure 7 (from the original paper):  Throughput and delay of all the applications and protocols tested in the paper. 

Subset Motivation

The core contribution of the paper is the claim that Sprout provides better latency and comparable or better throughput for interactive applications as compared to existing techniques. The validity of this claim directly informs the merit of Sprout. Other results are ancillary to this main result, and are void if the main result does not hold.

We chose the subset of applications for convenience of reproducibility: as explained later, while the original evaluation test bench allowed for greater flexibility, it is harder to reproduce. Our technique allows us to reproduce our chosen subset of results with minimal effort.

Reproduced Results

Figure 1 below shows our reproduced results for running Sprout and three TCP variants over the various networks discussed in the paper (in addition to Sprint). Our results match with those of the original authors to a great extent. Sprout consistently performs better than any TCP variants in terms of self-inflicted latency, but generally has moderate to poor throughput performance. We hypothesize that the variation in absolute values is caused by the differences in the testbench, but there may be other reasons. We investigate this in the next section.


Figure 1. Throughput and delay for Sprout, TCP Reno, TCP Vegas, and TCP Cubic on network traces of different networks.


Our first extension explores a core assumption of Sprout: that the link behaves like a doubly-stochastic process in which the underlying λ of the Poisson process itself varies in Brownian motion. As a qualitative analysis, we generated histograms of packet arrival intervals for the different networks considered shown below in Figure 2.

hist-combinedFigure 2. Histograms of packet arrivals for the different network traces used in the experiment.

As a reference for comparison, here is a histogram for packet arrival intervals for a hypothetical network, where the intervals between arriving packets is distributed exponentially:

hypothetical-hist-combinedFigure 3. Histogram of packet arrivals assuming an exponential distribution.

It appears that arrival intervals might actually be distributed exponentially. Does Sprout work equally well with such network behavior? To answer this question, we generated a number of hypothetical networks where the arrival intervals are distributed as per a Gaussian, Poisson, uniform, or exponential distributions. We then measured the performance of Sprout and the 3 TCP variants on these hypothetical networks. The results we obtained are presented in Figure 4.


Figure 4. Throughput and delay of Sprout, TCP Reno, TCP Vegas, and TCP Cubic assuming exponential, Gaussian, Poisson, and uniform packet arrival rates.

We see similar results as before: Sprout continues to provide superior latency performance, while its throughput is moderate to poor. Thus, we can conclude that Sprout works well under different kinds of network behavior distributions.

Finally, we noticed some non-negligible variation in average throughput values for all applications between different simulation runs but using the same network traces. Consequently, we decided to perform sensitivity analysis: we ran each application 5 times on AT&T LTE and Verizon LTE and measured the mean and standard deviation for throughput and self-inflicted latency. Figure 5 shows our results.


Figure 5. Sensitivity analysis of Sprout, TCP Reno, TCP Vegas, and TCP Cubic over the Verizon LTE AT&T and traces.

We observed that, while the self-inflicted latency is fairly stable, the throughput varies significantly. However, the average throughput for all applications is similar.

We conclude that Sprout’s real benefit is improved latency – its throughput gains over TCP are marginal at best, if any. However, this improvement in delay might still be very beneficial, especially that Sprout does not sacrifice any significant throughput over TCP.


While reproducing the results presented in the paper, we were faced with two conflicting goals: to remain true to the original methodology so as to minimize the possibility of deviation in results, and to devise a testing methodology that allows anybody to reproduce our results with minimal effort. The original test methodology required specific hardware. After considering various alternatives, we settled on emulating the original setup using Mininet.

Our original intention was to recreate all the data presented in the paper – this included network characterization traces. A brief explanation on the purpose of these traces follows:

To evaluate network behavior in a repeatable and consistent manner, the original authors created traces that characterized a cellular network. Two devices at the opposite ends of a cellular network would saturate both uplink and downlink of the cellular network. Each device would then record the exact timestamp on which any packet was received. This trace of timestamps would be treated as “ground truth”: it is assumed that these are the only times when a packet could have crossed the network. This trace was then fed into a cellular network emulator, namely Cellsim, that sat between a client and a server as a transparent bridge. Any traffic between the client and the server would pass through Cellsim, and the bridge would let packets through only at those times that correspond to a timestamp entry in the trace. In effect, the traffic between the client and the server would experience throughput and delays as if it were traveling over the cellular network.

Naturally, these cellular traces are a core component of the experiment. The original authors provided us with the traces collected by them, but we desired to collect our own traces. Unfortunately, we were unable to do so – we suspect that this was due to differences in network configuration on end hosts. Ultimately, we settled on using the traces provided by the original authors.

Mininet posed another limitation: the use of Mininet on a Linux machine made it difficult to reliably automate the measurement of performance for Skype, FaceTime, and Hangouts for reasons of ease of reproduction. Consequently, we settled for measuring the performance of different TCP variants.

Aside from the aforementioned difficulties, we had little trouble in setting up the testbench or interpreting results – there were no hidden assumptions or omitted details that prevented us from reproducing any result.


The basic idea behind Sprout is that networks behave according to a doubly-stochastic Poisson process with a varying λ. We can’t help but question this assumption, which is provided little justification beyond the fit of 17 minutes’ worth of data for one cellular network. As we showed previously, this fit could also apply to an exponential, or even a Gaussian process with the right parameters. However, we showed that Sprout’s performance does not suffer if the network follows a different distribution. We are curious, though, about how Sprout would perform if it assumed a different model for network behavior.

We do have one concern about the testing methodology, namely the manner in which throughput is calculated. For transport protocols, one usually assumes throughput to mean data throughput available to the application, also called goodput. However, in the methodology employed by the paper, throughput is calculated in terms of bytes seen by the transparent bridge (Cellsim) between the client and the server. Since the bridge has no knowledge of the transport protocol, it measures all bytes, including packet headers, retransmissions, corrupted packets, etc. It is unclear how this overall throughput corresponds to application goodput for each protocol, or if such a relationship can even be easily established.

Instructions for Reproduction

Create an EC2 instance on Amazon AWS using ami-23f38013 (EC2 region Oregon). A high-power, high-memory instance such as m3.large or c3.xlarge is highly recommended. Once the instance is running, login to the instance as the user “ubuntu”. No password is needed; EC2 relies on public-key cryptography for authentication. Tutorials for creating and logging into an EC2 instance can be found easily.

After logging in:

$ cd ReproducingSprout

This is the root directory for the experiment. Generated raw results will appear in the “results” directory, and the graphs will appear in the “plots” directory.

First, make sure you have the latest code. To do so, run:

$ git pull

Once you know you have the latest code, you can run the experiment using the following command:

$ sudo ./run.sh

The expected runtime is close to 19 hours. If you encounter any errors, please contact the authors of this article for assistance.

The source code for this project is available with an open-source MIT license at https://github.com/csangani/ReproducingSprout.


One response to “CS244 ‘14: Sprout

  1. Peer evaluation comments:

    We ran into a few issues when trying to reproduce these results, but the team was very responsive through email and quickly resolved these issues (i.e., their EC2 image was private at first and not available to the public, and there was a bug in matplotlib, the python library they used to create their plots). After applying 2 patches with ‘git pull’, we ran the experiment again and after a runtime of ~19 hours, the results we obtained matched well with the expected output plots.

    We were very impressed with the sensitivity analysis they carried out. It is very extensive and covers important assumptions that are not obvious in the original research. The team also did a very good job in choosing the type of the graphs, i.e., histogram and error bar graph. This made it easy for us to interpret what is going on under the hood. Very thorough work and great job.

    Score: 3/5, according to the rubric:

    (3 – Code ran to completion but the results on the blog were not produced with the default run configuration)

    This needs some explanation. The default code does not produce the plots correctly, but after working with the team and updating the codebase, we were able to reproduce the plots perfectly. Given the runtime of the experiment (19hrs), we imagine debugging was not easy. So we think it’s best that we report what we encountered and let the TAs decide the final score. 🙂

    — Wei Shi and Sumitra Narayanan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s