Alfalfa:
A Videoconferencing Protocol for Cellular Wireless Networks
Team: Leo Alterman (leeoo@stanford.edu), Alex Quach (alexq@stanford.edu)
Original Paper: K. Winstein et al, Alfalfa: A Videoconferencing Protocol for Cellular Wireless Networks
Introduction
In their original paper, Keith Winstein and his collaborators describe a protocol to stream live video over cellular wireless networks. Their work is broken into a predictive transport-layer algorithm for congestion and flow control (“Sprout”) based on the metric of packet interarrival time, and a streaming video codec capable of agilely changing the stream’s bitrate. This builds on top of their previous work developing a set of transport-layer primitives for building internet-reliant applications on top of cellular wireless links (which they call SSP, the State Synchronization Protocol).
Figure 1: The physical topology used in the author’s experiments (which we reproduced in mininet)
The authors performed their experiments using a physical topology of three desktop computers connected to the internet through a single gigabit ethernet switch, as laid out in Figure 1. One of the computers acts as a transparent bridge between one of the hosts and the rest of the network, running custom “CellSim” software that throttles the throughput of one of its interfaces over time to simulate the fluctuating capacity of a cellular wireless link.
Within this topology they ran instances of Sprout, Skype, Facetime, and Google Hangout to observe each client’s behavior when faced with a link with unpredictable capacity. This is most directly illustrated in the paper’s first graph (Figure 1 in the original paper, reproduced as Figure 2 below), which plots Skype and Sprout’s throughput and per-packet delay on a link whose capacity trace was originally observed on Verizon’s 4G wireless network in Cambridge, MA.
The motivation of the experiment was to demonstrate that Sprout behaves better than traditional streaming solutions over links with highly fluctuating capacity. There is a constant balance between maximizing subjective video quality (abstracted in the paper and here as just raw throughput) and congesting the network, which significantly increases the per-packet delay (very bad for soft real-time applications like videoconferencing). An ideal algorithm would utilize precisely the true capacity of the link, minimizing delay and maximizing video bitrate.
Figure 2: The graphs from the Alfalfa paper we seek to reproduce
As Figure 2 shows (taken from the original paper), Sprout’s predictive algorithm tends to track link capacity fairly well while keeping delay under 100ms. Skype performs similarly well in the common case, but in times of significant fluctuation like at 20 seconds into the trace Skype overshoots the capacity, congesting the queue and driving up delay an order of magnitude. Sprout, on the other hand, cuts throughput drastically in order to maintain its 100ms threshold.
Methods
Our key result is reproducing the authors’ results entirely on a single computer within a Mininet topology, demonstrating how it is possible to produce the same results without the need to set up three physical desktops and a gigabit switch. We attempted to follow the author’s experiment setup as closely as possible ‒ our Mininet topology contains three hosts connected in the way Figure 1 describes, running the clients and cellsim software more-or-less unmodified from the original experiment (we elaborate more on this in the following sections).
Our simulation starts a client on each of the client hosts, begins a call between the two, and then feeds a wireless trace (the same one used by the authors) into CellSim. At this point, it measures the TX throughput on the CellSim-throttled host’s interface while CellSim records the delay experienced by each outgoing packet. After the experiment duration is up, we dump both logs to our output directory and rerun with the next client.
The virtual topology is connected to the internet using an iptables-based software NAT originally written by Glen Gibb [1]. This is necessary to allow Skype to authenticate with the central Skype service. Once a connection is established, the call becomes completely peer-to-peer. We confirmed this by severing the topology’s internet link while a Skype call was running in Mininet, which did not interrupt service.
Metrics
Though the measurement method and meaning of ‘link throughput’ was obvious to us, we had trouble measuring ‘per-packet delay’ and understanding the original paper’s use of the ‘self-inflicted latency’ metric. Initially, we tried to measure the one-way latency between clients by measuring the difference between packet send and arrival times across our topology. For Sprout this involved modifying the client source to dump packet sequence numbers and timestamps, while for Skype we set up tcpdump to record this information (thinking that we could correlate the tcpdumps on both ends to find packet send and arrival times by their sequence numbers). We discovered after implementing this, however, that Skype doesn’t actually use TCP to stream calls. Instead, it uses its own protocol encapsulated within UDP, which left us with no easy way of differentiating packets across clients (and hence determine their travel times).
At this point we contacted the paper’s authors to find out how they measured this data for their own graphs. They explained to us that the metric was actually a measure of the queueing delay each packet experienced within CellSim, and that CellSim outputted this information to stderr. We were able to pipe this output into a file and then produce delay graphs like those seen in the original paper.
CellSim
CellSim acts as an invisible bridge between the client and the rest of the topology, forwarding, delaying, and dropping packets according to the wireless trace. The traces take the form of timestamps when packets were sent and received on the original wireless link. This was recorded while the authors were saturating the wireless link in both directions with 1500-byte MTU packets, so each event in the trace adds 1500 bytes to an “okay to send” counter. All events contained within a small sampling period are aggregated, and CellSim allows that many bytes of data to flow through the bridge. The counter is reset to zero at the end of each sampling period and any packets not allowed through the link are queued for when there is spare capacity.
Later in the experiment, we modified CellSim to allow us to specify the number of bytes each trace event should add to the counter. Note that we did not change the MTU parameter of any of the hosts ‒ the only effect this parameter has on the network is it effectively scales the bandwidth that the wireless trace allows to flow through CellSim. The consequences of this are explored in our Results section.
Local VM
We set up our experiment inside of a Xubuntu 12.10 virtual machine run in VMWare Player locally on a laptop with a 4-core 2.2 GHz Intel i7 processor, 8 GB of RAM, and a hardwired ethernet connection to the internet. We were originally planning to run our simulation on EC2, using the CS244 AMI. However, we were unable to run Skype in that environment due to EC2’s lack of hardware graphics acceleration (necessary for video decoding).
After spending a fair amount of time trying to get this to work, we decided to drop EC2 and instead used a local VM. This has the added benefit of allowing better isolation, because we are more in control of the environment in which the VM is running. Unfortunately, this means that the experiment VM might be run on less powerful computers, which could lower the maximum streaming rates of the clients and affect experiment results. To compensate for this, we included a parameter in CellSim that allows us to control the streaming rates of the clients by scaling the wireless trace capacity. We explore this in our Results section.
Skype
To completely automate the experiment, we control Skype using its exposed D-Bus backend. It turned out to be incredibly difficult to control multiple Skype clients on the same host at the same time, since both instances will clash inside one D-Bus namespace. Our solution was to launch each Skype client within an isolated bus and write connection strings in the experiment directory to send commands to these buses. In the end, Skype was launched by a shell script launched by an application launched by a Python script inside a virtual Mininet host created by our master experiment Python script (inside a VM). We were told by one of our mentors, Dom Cobb, that we had to go deeper, but we felt our solution worked well enough as is.
We also installed a video loopback driver in the VM to feed Skype a regular video test pattern generated by gstreamer. This ensured experiment results were not skewed by video content and were reproducible across runs.
Sprout
Sprout required a less complicated setup to automate. We used an example client from the Sprout source tree that establishes a connection between two endpoints and transfers dummy packets between the endpoints, because the author’s Alfalfa application doesn’t yet support streaming actual video. Testing Alfalfa itself would be irrelevant to the experiment, because the point of the experiment is to test Sprout’s predictive capabilities under fluctuating links (which is not affected by what is actually being sent), and to measure throughput and delay, not application-level metrics like video quality.
Results
Initially we ran the experiment with all parameters set as they were in the hardware topology used in the original paper, and this produced the graphs below in Figure 3. Note that Sprout takes around 40 seconds at the beginning of its run to do some precomputation of Gaussian values, which is why it appears inactive at the beginning of each graph.
Figure 3: Reproduction of results with 1500-byte events
We see that Sprout hugs the capacity graph fairly well, whereas Skype falls completely short, staying at 500 Kbps at best. Our Sprout result agreed with the results in the original paper, but our Skype results did not. We theorized that this is because of insufficient CPU time in the VM to allow both Skype clients to stream video simultaneously at full capacity, which would explain why the graph clips well below link capacity.
In order to test this hypothesis, we modified CellSim as described above in our Methods section and set the trace scalar to 100 bytes per event instead of 1500 bytes per event. This effectively moved the average link capacity below Skype’s maximum on our CPU. After making this change and running the experiment again, we found that our graphs matched the results in the original paper more closely, without any hint of the confounding factor introduced by limited CPU resources. These results are shown in Figure 4.
Figure 4: Reproduction of results using 100-byte events
Analysis
Our final graphs in Figure 4 display much of the behavior produced in the original Alfalfa paper, but differ in some notable ways. Firstly, Skype tends to have higher throughput than Sprout during stable times (at 150 seconds in the trace) and even appears to track link capacity much in the way that Sprout does. Sprout still recovers more quickly following fluctuations (at 75 seconds) and maintains on average a lower per-packet delay than Skype.
One explanation for this is that since the paper was written, the Skype client has gotten better at dealing with variable-capacity environments. This could be tested by obtaining an older version of the Skype client and rerunning the experiment with it, but unfortunately we couldn’t locate a suitable version in time for this report.
Another explanation for this is that our modification of CellSim is affecting the quality of the simulation. Since we’re scaling the CellSim trace rather than shifting it, the troughs and peaks of the capacity graph get closer together. As a result, we’re observing a less bursty link. It is significantly harder to shift a CellSim trace up or down than it is to scale it, because of the way the data is encoded, but a good next step to test this theory would be to carefully translate the trace down to the right order of magnitude. It could also be that Skype and Sprout’s behaviors are simply different at this lowered order of magnitude. To say one way or another would require a more in-depth analysis of both clients.
Another way our graphs differ is the order of magnitude in the latency graphs. In our results both Skype and Sprout tend to spike above the 100 ms line fairly often. Though their relative behaviors are still showcased in these graphs, their absolute performance is quite different from what we saw in the original paper. We think this might also be caused by limited CPU resources, but we don’t have an immediately obvious solution to this like we did with the throughput graph and trace scaling.
Reproducing results
Instructions for Automatic Setup
Our entire experiment VM can be downloaded as a 2 GB tarball from here. To spin it up, install the free VMWare Player from here (available for Windows, OS X, and Linux). The extracted VM will take up 6 GB of host disk space and must be run on a 64-bit processor with suitable virtualization features enabled (VT-x if you live in Intelville or AMD-V if you’re a resident of AMDonton).
From the home window select “Open a Virtual Machine,” navigate to the folder extracted from the tarball, and select the file “CS244 PA3 Skypebox.vmx”. The first time you do this, VMWare Player will ask if the VM was moved or copied – select copied. Once the VM boots up, log in to the user Ubuntu with the password alfalf4. Before proceeding ensure that the VM has internet access by pinging google, and if it does not troubleshoot appropriately.
Next, launch a terminal window, navigate to the directory cs244_pa3, and execute sudo ./run.sh. The experiment should be entirely automated from this point forward and last for about 20 minutes, running Skype and Sprout for 5 minutes each under a 1500-byte-scaled CellSim and again for 5 minutes each under a 100-byte-scaled CellSim. After everything is complete, four graphs similar to those found in our results section should be located in the ./alfalfa-out-[most_recent_timestamp] directory.
Should anything terrible happen, you can abort the experiment by hitting Ctrl+C and, if the two Skype clients are still open, manually right-clicking their indicator icons at the top of the screen and selecting ‘Quit’. Then run sudo mn -c to clean up the topology.
Instructions for Manual Setup
The easiest way to reproduce our results is using our provided VM (see the section above). Just for reference, though, here is how to set the environment up from scratch:
- Clone our git repository
- Install the following apt packages: v4l2loopback-dkms gstreamer1.0-tools gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad git-buildpackage debhelper protobuf-compiler libncurses5-dev debhelper autotools-dev protobuf-compiler libprotobuf-dev dh-autoreconf pkg-config libutempter-dev zlib1g-dev libncurses5-dev libssl-dev libboost-math-dev
- Install Skype from the provided Skype.deb package
- Install our modified skype4py library by navigating to the project’s skype4py directory and running ./setup.py install
- Build our modified version of CellSim by going navigating to the project’s cellsim/sender directory and running make
- Build Sprout by navigating to the project’s alfalfa directory, configuring with ./autogen.sh; ./configure –enable-examples, and then running make
5/5 We think your team did a great job reproducing this challenging experiment. The results match up nicely and the instructions are very easy to follow.