CS244 ’15: CONFUSED, TIMID, AND UNSTABLE: PICKING A VIDEO STREAMING RATE IS HARD.


Team: Jason Clavelli, Kim Truong

Paper: Te-Yuan Huang , Nikhil Handigol , Brandon Heller , Nick McKeown , Ramesh Johari, Confused, timid, and unstable: picking a video streaming rate is hard, Proceedings of the 2012 ACM conference on Internet measurement conference, November 14-16, 2012, Boston, Massachusetts, USA

Introduction

One of the biggest changes to consumer internet services in recent years is the rise of video streaming. Services like Netflix, Hulu, and Vudu have gone from virtually unknown a decade ago to as much as 50% of the download traffic on the internet. When it comes to service quality, streaming services have an advantage in that the file size is not fixed. That is, while a normal 1MB web page has to load 1MB in order to display properly, a video stream can adjust the quality depending on the available bandwidth. This leads to a natural tradeoff between good quality and uninterrupted video. If the video quality is too high, the available bandwidth cannot support it, and if it is too low, the user experiences unnecessarily poor quality. The paper we are working with explores a problem that occurs when streaming over HTTP, which is common practice today. If there is a flow competing with the video stream, it will take up most of the bandwidth, forcing the video stream down to the lowest possible quality. In this project, our main goal is to reproduce the graph from the paper that demonstrates this problem (figure 4a).

Goals

The objective of the original paper is to pick a video streaming rate that gives viewers the best experience in the presence of competing traffic flows. Currently, clients for popular video streaming services like Hulu and Netflix dynamically pick a video rate based on their estimation of the available bandwidth. This strategy works except in the presence of competing flows, their estimation is skewed most of the time. This skewed estimation of the bandwidth leads to poor video streaming quality.

Motivation

In the real world scenario, it is possible for viewers to stream video while performing a background task that competes with video streaming. For example, a viewer could be backing up his hard drive over the cloud or downloading a large file while streaming video. It is well-known that the bulk of today’s Internet traffic comes from video streaming. Moreover, viewers are liable to running other network-related background tasks without understanding the interaction between the two activities. As a result, it is very important to minimize the user impact on video streaming quality. However, current clients pick rates that are much lower than their fair share of the available bandwidth (about 1/7th of their actual share).

Results

The main results of the original paper essentially just confirmed that the topic of the paper was valid: all three services suffer from this problem. A few results near the end also confirmed that the authors’ recommendations would be effective. The suggested solutions of less conservative rate selection, better filtering, and bigger segments all resulted in better video rates. Although not validated, the authors more interestingly suggested to not estimate bandwidth at all when deciding on the video rate. Instead, clients should aggressively pick the best rate to avoid buffer underrun and leave the bandwidth management to TCP.

Subset Goal

We chose to reproduce graphs 4a and 5 first and foremost. Both figures showed the effects of competing stream on video playback rate. Figure 4a performs these measurements with automatic playback adjustment turned on, while Figure 5 does it with playback adjustment turned off.

Subset Motivation

Figure 4a and 5 underpin the rate selection problem. Together, they demonstrate the fact that playback adjustment is part of the problem, because lowering playback in response to a perceived drop in bandwidth is what allows the other flow to take over.

Subset Results

Our results generally match up to theirs. We see two main differences:

  1. Our video flow throughput does not spiral down past 1 Mbps. In Figure 4a of the paper, the authors cited that as the video rate decreases, the flow is more vulnerable to perceiving lower throughput. However, our throughput does not seem to be more vulnerable. Instead, at lower video rate, the smaller download segment has a consistently shorter download time. So our throughput does not seem to spiral downward. It merely decreases as video rate decreases and increases when the rate increases.
    Figure 1: Our attempt to reproduce the paper's Figure 4a. This is the video flow throughput with automatic rate selection in the presence of a competing flow.

    Figure 1: Our attempt to reproduce the paper’s Figure 4a. This is the video flow throughput with automatic rate selection in the presence of a competing flow.

    Figure 1a: Paper's Figure4a (Service A). Video flow throughput with automatic rate selection in the presence of a competing flow.

    Figure 1a: Paper’s Figure4a (Service A). Video flow throughput with automatic rate selection in the presence of a competing flow.

    Even though our results do not match the paper’s Service A’s results (Figure 4a), they do closely match the paper’s simulated results (Figure 20). Our results also use a 10-sample moving average filter.

    Figure 2: This is the paper's Figure 20. This is the paper's custom client similar to Service A with a 10-sample moving average filter.

    Figure 2: This is the paper’s Figure 20. It shows the paper’s custom client similar to Service A with a 10-sample moving average filter.

  2. Our throughput is not affected when video rate is artificially forced high. In Figure 5 of the paper, they demonstrated that if the client maintains the highest video rate regardless of the lower perceived throughput, the throughput would eventually go back up and obtain its fair share of the bandwidth. In our experiment when we force the video rate up to a fixed 1750 kbps at 450 second, the throughput does not seem to be affected. That is, it remains consistently low as before the rate is fixed to a high rate.
    Figure 3: Our attempt to reproduce paper's Figure 5. This is the video flow throughput with manual video rate selection 1750 kbps at 450 seconds. The throughput remains relatively the same as before the rate is fixed to 1750 kpbs.

    Figure 3: Our attempt to reproduce paper’s Figure 5. This is the video flow throughput with manual video rate selection 1750 kbps at 450 seconds. The throughput remains relatively the same as before the rate is fixed to 1750 kpbs.

    Figure 3a: This is the paper's Figure 5 (Service A). The video flow throughput when rate is artificially fixed at highest rate 1750 kbps at 450 seconds. The throughput increases.

    Figure 3a: This is the paper’s Figure 5 (Service A). The video flow throughput when rate is artificially fixed at highest rate 1750 kbps at 450 seconds. The throughput increases.

We share the following results with the paper’s results:

  1. Our attempt to reproduce Figure 6a matches the paper’s. Our TCP throughput varies between full throughput of 5 to 3 Mpbs until the buffer is full. When the buffer is full, the throughput swings between full 5 Mpbs and 0 Mpbs as the client enters the periodic ON-OFF sequence. The client turns OFF when the buffer takes time to drain and back ON to request new segments when the buffer has space.
    Figure 4: This our attempt to reproduce Figure 6a. We used a smaller buffer size, which fills at 10 seconds to better display that our buffer enters the ON-OFF sequence when the buffer is filled.The yellow line is the TCP throughput, which is at the highest bandwidth 5 Mbps before the buffer gets filled up.

    Figure 4: This our attempt to reproduce Figure 6a. We used a smaller buffer size, which fills at 10 seconds to better display that our buffer enters the ON-OFF sequence when the buffer is filled. The yellow line is the TCP throughput, which is at the highest bandwidth 5 Mbps before the buffer gets filled up.

    Figure 4a: This is the paper's Figure 6a. The TCP throughput swings between full 5 Mbps to 0 Mpbs as the buffer takes time to drain before client can request new segments.

    Figure 4a: This is the paper’s Figure 6a. The TCP throughput swings between full 5 Mbps to 0 Mpbs as the buffer takes time to drain before client can request new segments.

  2. Our attempt to reproduce Figure 6b matches the paper’s. Their request interval before the buffer is full averages 1.5 seconds and 4 seconds after. Their buffer is full at around 170 seconds. Our request interval before the buffer is full also averages 1.5 seconds and 4 seconds after. Our buffer is full at around 150 seconds. Because our results are simulated, there is not any variability in the request interval before the buffer is full. The request rate is constantly at 1750 kpbs. Mininet yields the same RTT. After the buffer is filled, our request rate changes, and as a result, there is more variability.
    Figure 5: Our attempt to reproduce paper's Figure 6b. The request interval averages 1.5 seconds before and 4 seconds after the buffer is filled. Notice the lack of variability in the request interval before the buffer is filled. That is likely due to the simulation artifact where the RTT and rate are consistent values. After the buffer is filled, the request rate changes, which likely introduces the variability.

    Figure 5: Our attempt to reproduce paper’s Figure 6b. The request interval averages 1.5 seconds before and 4 seconds after the buffer is filled. Notice the lack of variability in the request interval before the buffer is filled. That is likely due to the simulation artifact where the RTT and rate are consistent values. After the buffer is filled, the request rate changes, which likely introduces the variability.

    Figure 5a: This is the paper's Figure 6b. The request interval before the buffer is filled has variability because the results are from an actual experiment of Service A.

    Figure 5a: This is the paper’s Figure 6b. The request interval before the buffer is filled has variability because the results are from an actual experiment of Service A.

  3. Our congestion window of the video flow enters slow start during the ON-OFF sequence similar to the paper’s results from Service A.
    Figure 6: This is our attempt to reproduce the paper's Figure 8a. Our congestion window of the video flow drops to 1 MSS, which indicates slow start.

    Figure 6: This is our attempt to reproduce the paper’s Figure 8a. Our congestion window of the video flow drops to 1 MSS, which indicates slow start.

    Figure 6a: This is the paper's Figure 8a. The congestion window (condo) of the video flow from Service A indicates slow start because cwnd drops to 1 segment, which is similar to our results.

    Figure 6a: This is the paper’s Figure 8a. The congestion window (condo) of the video flow from Service A indicates slow start because cwnd drops to 1 segment, which is similar to our results.

Challenges

These are our main challenges:

  1. Setup: Conceptually, it was difficult to understand how to integrate the TCP socket streaming with Mininet. Also, there are multiple processes operating in parallel that all need to be tied together. For example, the client requesting video segments at different intervals, the server sending the segments, the playback buffer draining the video segments, and the competing flow vying for the bottleneck bandwidth. We ended up implementing the client, server, and playback buffer on separate threads and using locks to enforce control flow. For example, we made sure the client does not request 4 seconds of video segment while the buffer is draining and vice versa. To integrate the threads to Mininet, we spawned the client and server threads on separate Mininet hosts using popen.
  2. Available Information: In general, the paper is clear and explicit in implementation details. However, there were two details that were not so clear. First, the authors mentioned “the playback buffer is set to 240 seconds…” It was not clear that the buffer holds exactly 240 seconds of video at the current rate. That is, regardless of the buffer capacity, the buffer would only hold up to 240 seconds. Second, Figure 9 shows the playback rate in relation with the available bandwidth. The graph y-axis is labelled “Available Bandwidth/Request Rate (kb/s)”, which seems like a ratio because bandwidth is in kb/s and request rate is also in kb/s. But the y-axis is labelled with a unit (kb/s). We resolved the issue by simply interpreting the y-axis as kb/s. We verified by checking that our video flow throughput results matched theirs. 

Critique

An interesting aspect of the original paper was the fact that there were two different environments for the experiments: the Service A environment and the custom environment. The purpose of the custom environment in the original paper was to be a platform for implementing and measuring the paper’s recommendations, but we found the unmodified custom client particularly interesting. Specifically, the difference between the measurements in the authors’ custom client and those of the actual Netflix (Service A) client were substantial — Netflix experienced a dropoff down to 235 kbps (the lowest possible rate), while the custom client almost never dropped below 1100 kbps. The paper attributed this difference to “some subtle differences between Service A’s algorithm and ours.”

The overall results from our custom client matched those of the authors almost perfectly. See Figure 1 and Figure 2 above. Consequently, our results differed from the real Netflix results just as the authors’ results did.

This difference led us to question the paper’s explanation for the dropoff: our client did not experience the downward spiral, and the authors’ client implemented seemed to have experienced it very briefly where the throughput dipped to the lowest 1000 kbps but went back up. The throughput of author’s client never went as low as 560 – 375 kbps while the Netflix results stayed at 560 – 375 kbps for over 300 seconds.

To figure this out, we looked at the paper’s other graphs (obtained from the Netflix environment), which demonstrate the sources of the dropoff. Our intent here was to find a difference between the Netflix client and our custom client in one of these graphs, which would isolate a potential cause for the difference in overall results.

First, we reproduced the paper’s Figure 6a (See Figure 4a), which shows the change in TCP throughput once the buffer fills up. The throughput is consistently near 100% until the buffer fills up, at which point it enters the periodic on-off phase. Along with this, we reproduced the paper’s Figure 6b (See Figure 5), which is essentially another way of looking at the results of Figure 6a. This figure shows that the interval between requests increases to approximately four seconds once the buffer fills, which is the reason for TCP’s on-off phase. Our results matched the results from the Netflix environment, which led us to conclude that these details were not the sources of the difference between Netflix and the custom clients.

Next, we attempted to reproduce Figure 8, which compares the congestion window of the video flow and the competing flow. This is meant to show that the short bursts of TCP traffic in the video flow are unable to get the congestion windows up to a reasonable size, which causes the competing flow to overpower it. We were unable to exactly reproduce this figure because of difficulties in getting tcp_probe to monitor multiple ports at once. However, we were able to monitor the congestion window of the video flow and verify that it enters slow start and never maintains a high window. This led us to conclude that the drop in cwnd caused by the on-off phase was not the source of difference in results.

Finally, we tried to reproduce Figure 5, which shows what happens when the video is forced back up to 1750 kbps (the maximum rate) after it drops for some time. Normally, the drop in video rate is what lowers the perceived throughput, which in turn causes the video rate to drop even lower, and so on. This figure is supposed to show that when the video rate is forced back up to the full rate, the throughput stabilizes at a point where it can sustain this higher rate. In other words, forcing the rate back up to 1750 kbps breaks the downward cycle and improves the video rate.

However, we were not able to reproduce this result. In our attempt, there was no noticeable difference before and after we forced the throughput up to 1750 kbps. This suggests that there is an inconsistency between the Netflix client and the custom client in the effect of segment size on bandwidth. To further verify this, we tried downloading segments of each different rate to see the effect that video rate has on bandwidth. Our results (shown below, in kbps) show that the difference in bandwidth between the playback rates is not significant — the difference between the highest rate and the lowest rate is around 10%.

[4038.4, 4207.8, 4365.2, 4441.8, 4543.1, 4594.6, 4594.7, 4645.3, 4692.7]

Platform

We used Mininet running on EC2 c1.xlarge to reproduce the results. The topology is simple, and the speed links are of reasonable values. So there were no resource issues running the simulations on EC2. There were no parameters that affect reproducibility. We could reproduce the author’s simulated results very closely.

Our setup: we used 4 ms RTT with 120 kbit buffer size.

README

1. Launch an EC2 c1.xlarge instance on aws.amazon.com. Follow instructions here: http://web.stanford.edu/class/cs244/ec2.html.

2. SSH into the c1.xlarge instance.

3. Clone the git repository there.

$ git clone https://@bitbucket.org/clavelli/cs_244_assignment_3.git

4. Inside the folder, run the script.

$ sudo ./run.sh

NOTE: The following parameters can be tweaked in the script (number in parenthesis is default):

  • max queue size at bottleneck link: maxq (10)
  • duration of test: t (800)
  • start time of competing flow: c (300)
  • number of video seconds that playback buffer holds: p (240)
Advertisements

5 responses to “CS244 ’15: CONFUSED, TIMID, AND UNSTABLE: PICKING A VIDEO STREAMING RATE IS HARD.

  1. One logistic note: It would probably be good to list the name of a particular AMI image to use.

    The natural thing to do, not knowing anything, would be to use the “CS244-Spr15-Mininet” (ami-cba48cfb) image from the CS 244 instructions you linked to, and this turned out to significantly change the results. Since this image can’t be used with a c1.xlarge, it was run with c3.2xlarge. It is interesting to note that the Figure 4a produced by this setup is closer to the original paper’s graph than yours:

    Since the OS of this image (Ubuntu 14.04) was not out at the time the paper was written (the paper doesn’t say what OS they used), it probably wasn’t reasonable to use this image, but it suggests that there may be some OS-layer network factors or even just instance size factors that can significantly impact the results.

    When running using an image with Ubuntu 12.10 (CS244-Win13-Mininet, ami-7eab204e), the graphs were much closer, though two of the graphs (the ones for Figure 6a and Figure 6b of the paper) were not graphed over the same time scale shown in the blog post. You mentioned that this was due to some last-minute issues with the graphing code, so it is just graphing the entire experiment rather than a select portion. The portions that the graphs share appear to match well.

    Overall grade: 4/5

    • As you mentioned, the AMI was listed in the link from step 1 of the README. Step 2 said to use the c1.xlarge.

      Our main objective was to reproduce the paper’s graph 4a and 5. If you were to run on c1.xlarge per README, you would get nearly the same graph as posted. You’re right the results on c3.2xlarge looked closer to the paper’s 4a. We should have tried that at the end, but there was an issue early on with using c3.2xlarge. So we switched to c1.xlarge and did not get to check back. The rest of the graphs were part of an investigation to understand why we did not see the downward spiral. Those graphs were not our original intent to reproduce. So we did not focus on perfecting them. Additionally, as you have noted, we ran into last minute issues graphing them while polishing our main graphs Figure 1 and 3. We didn’t realize it would be counted against us for trying to reproduce more graphs if we could not present them perfectly. Otherwise, we could have easily left them out as they were not required.

      • According to the email from Jason, you guys used the older AMI image, CS244-Win13-Mininet (ami-7eab204e), which is not mentioned anywhere in the link for Step 1 (that was given to us via a Piazza post for Project 2). The AMI image listed in the link literally cannot be run as a c1.xlarge due to some virtualization differences – Amazon wouldn’t even let me try.

        As far as the graph grades go, we were just going by the literal definition of the grades given on Piazza, and that seemed the best fit. It didn’t really say what was optional or not. I’m the sure the TA’s final grade will be able to take that into account more appropriately than ours.

  2. Since the instructions did not specifically list an AMI image to use, the natural thing to do was use the “CS244-Spr15-Mininet” (ami-cba48cfb) image from the CS 244 instructions you linked to, and this turned out to significantly change the results. Since this image can’t be used with a c1.xlarge, it was run with c3.2xlarge. It is interesting to note that the Figure 4a produced by this setup is closer to the original paper’s graph than yours:

    Since the OS of this image (Ubuntu 14.04) was not out at the time the paper was written (the paper doesn’t say what OS they used), it probably wasn’t reasonable to use this image, but it suggests that there may be some OS-layer network factors or even just instance size factors that can significantly impact the results.

    When running using an image with Ubuntu 12.10 (CS244-Win13-Mininet, ami-7eab204e), the graphs were much closer, though two of the graphs (the ones for Figure 6a and Figure 6b of the paper) were not graphed over the same time scale shown in the blog post. The authors mentioned that this was due to some last-minute issues with the graphing code, so it is just graphing the entire experiment rather than a select portion. The portions that the graphs share appear to match well.

    Overall grade: 4/5

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s