CS244 ’17: BitTyrant: Do incentives build robustness in BitTorrent?


by Andy Bartolo (bartolo where stanford dot edu) and Dinislam Tebuev (tebuevd where stanford dot edu)

 

Introduction

In the early 2000s, Bram Cohen’s BitTorrent protocol took the world by storm. His seminal 2003 paper, “Incentives Build Robustness in BitTorrent,” described the principles for a global distributed filesharing system. (In June of 2004, P2P solutions provider CacheLogic reported that BitTorrent traffic comprised one-third of all traffic on the Internet!) Most remarkable about Cohen’s protocol was that, in contrast to a typical client-server setup, BitTorrent download speeds and availability actually increased with an increasing number of downloaders (leechers). The key insight was that downloaders could also serve as uploaders, and, furthermore, they could coordinate amongst themselves to intelligently replicate the rarest parts of a file, increasing robustness. At the heart of the protocol was a “tit-for-tat” strategy, which encouraged peers to upload chunks to those peers who uploaded different chunks back to them.

In 2007, a group led by Tom Anderson at the University of Washington decided to take another look at BitTorrent’s tit-for-tat strategy. Their paper, aptly titled “Do incentives build robustness in BitTorrent?,” suggests that BitTorrent peers may benefit from strategically choosing their active sets (the peers with whom they’re currently exchanging data) and the upload rate with which members of the active set are served. Their modification of the BitTorrent protocol, dubbed BitTyrant, employs three mechanisms to improve download speeds for a strategic peer: reciprocation ratio maximization, active set maximization, and dynamic equal-split adjustment. These are described in depth in Section 2, “BitTyrant Principles.” The overall goal of our work is to replicate and validate the download speed benefits claimed by Anderson et. al.

Today, Netflix and other streaming services are increasingly eating into BitTorrent’s core use case. However, its tit-for-tat strategy is still very useful for deploying multimedia content at scale. For example, startups Peer5 and PeerCDN have created embeddable JavaScript libraries that seamlessly offload a portion of server+CDN requests to other peers who are also downloading the same content. Beyond just reducing costs for providers, Peer5 and PeerCDN can also improve QoS for consumers, e.g., in a setting where many peers within an intranet are downloading the same content, but bandwidth to the Internet at large is relatively constrained. Thus, finding an optimal tit-for-tat strategy is of significant importance for the future of web and mobile content delivery.

 

BitTyrant Principles

BitTyrant modifies the stock BitTorrent tit-for-tat protocol in the following three ways:

  1. Maximize reciprocation bandwidth ratio per connection – it’s to our advantage to “shop around” for peers that give us the greatest ratio of their upload bandwidth in exchange for our upload bandwidth. This gets the best “bang for the buck” for our precious (particularly on an asymmetric ADSL/Cable connection) upload bandwidth.
  2. Active set maximization – it’s to our advantage to increase the size of our active set until the reduced upload rate from existing peers outweighs the benefit of adding another peer. Computational/memory overhead from maintaining a large active set must also be taken into account.
  3. Dynamically reducing equal-split tit-for-tat – we may be able to start off with a high upload-for-upload ratio, and then reduce it, without the peer noticing (reducing its upload bandwidth to us).

One primary observation that the authors make is that, even in stock BitTorrent, altruism (an informal term that indicates a peer’s willingness to upload) is not linearly rewarded. That is, as peers provide greater upload capacity, they receive a sublinear increase in available download capacity.

BitTyrant performs its unchoke exploration phase at 10-second intervals, the same as stock BitTorrent, but dynamically sizes its active set and dynamically adjusts the upload bandwidth it provides to each peer. Here, BitTyrant is simply solving the classical knapsack problem.

The first critical metric for BitTyrant is up, the minimum upload contribution to some other peer P that causes P to upload back to the first peer. BitTyrant adjusts up down by 10% for every three rounds that it remains unchoked, and increases up by 20% after transitioning from unchoked to choked. The second critical metric is active set size.  To guess an optimal active set size, BitTyrant updates a table of equal-split distributions, which tell it when it is “spreading itself too thin” (i.e., adding another peer would decrease its overall download rate).

 

BitTyrant (claimed) Results and Benefits
(Subset Goal and Motivation)

We chose to focus on download completion time, as this is the primary metric of importance for BitTorrent downloads. However, there are other metrics which bear some interest, e.g., 1.) throughput robustness to peer churn, with many nodes entering and leaving the network at the same time, and 2.) availability/survivability, i.e., the point at which there are just enough pieces of the file throughout the network for it to be recovered in full. Nonetheless, the limited time frame we had to experiment led us to focus on completion time.

Anderson et. al.’s 2007 paper found that, indeed, BitTyrant gives benefits for the aforementioned two metrics.  Consider the following graph (Figure 10 in the Anderson paper), which provides information on speedups for a single strategic (BitTyrant) peer amongst many non-strategic peers:

Picture1.pngFig. 1.

This graph tells us that less than 10% of real-world swarms saw a performance degradation when using BitTyrant’s download algorithm, and 30% saw a speedup of 2X or more.

These tests were run with a single BitTyrant client in a swarm of 300 to 800 real-world peers. However, we will be simulating a swarm scenario in a cloud computing machine, but the scale (hundreds of standard BitTorrent nodes with a single BitTyrant node) will be replicated.

The graph below shows the benefit of the BitTyrant algorithm across a range of client upload speed caps. Again, this is for a similar single strategic peer setup, but with a synthetic PlanetLab swarm (still one BitTyrant node amongst hundreds of standard BitTorrent nodes). This is the scenario that we chose to replicate (Section 5.1 of the Anderson paper).

Picture2.png
Fig. 2.

 

Our Platform

Unfortunately, the modified Azureus BitTyrant client from the Anderson paper is difficult to work with, primarily because it requires an Xorg GUI to function, and is generally not very scriptable. Since the original client is effectively abandonware, we sought a more modern, scriptable implementation of the original BitTyrant algorithm. Deluge is a popular C++/Python-based BitTorrent client, which uses libTorrent as its backend. libTorrent supports a BitTyrant-inspired choke/unchoke and active set algorithm (see https://github.com/arvidn/libtorrent/blob/master/ChangeLog).

We used Deluge to simulate BitTyrant and non-BitTyrant Azureus, and used Transmission to simulate the 350 peer nodes that are also simultaneously downloading the file. Transmission is an extremely lightweight torrent client whose light CLI facilitated us spinning up 350 instances of it. Resource usage for the swarm was surprisingly low, with around 4GB of RAM in use for the entire simulation. We ascribe this to the efficient C/C++ implementations of Transmission and Deluge.

Because both Deluge and Transmission treat instances with the same IP, but different ports, as separate peers. This allowed us to eschew Mininet entirely, and simply rely upon the BitTorrent tracker to coordinate communication amongst all the nodes participating in the swarm. Furthermore, both Deluge and Transmission allow setting upload caps via command-line, obviating the need to do this via a metered Mininet link.

One particular parameter of interest was the distribution of upload caps for the set of 350 peers. In their PlanetLab simulation, Anderson et. al. use their estimate of a realistic distribution of upload bandwidth limitations for real-world peers. In a nutshell, these upload caps determine how fast the swarm can replicate data amongst its members. Unfortunately, this data was not available in numeric form, and was only given in the paper as the following CDF (the blue line):

Screen Shot 2017-06-05 at 7.58.00 AM.pngFig. 3.

In order to provide our own 350 simulated peers with such a distribution, we used simple binning to approximate the CDF. The result isn’t perfect, but it gives a good representation of the broad categories of upload caps present.

fig2Fig. 4.

Unfortunately, we didn’t have time to evaluate results across a range of swarm sizes (we only evaluated 350), but this would be the next parameter we’d adjust were we to continue our experiments. Our guess is that relative benefits would remain similar, as a single BitTyrant client can only connect to a limited subset of the 350 nodes at a time anyway.

 

Our Results

The original BitTyrant paper used 5MB files within the swarm, and saw download speeds on the order of minutes. We tried using 5MB files, but found that in the high-upload-cap scenarios, the downloads completed within seconds — i.e., too quickly to really assess the effects of the BitTyrant vs non-BitTyrant algorithms. Instead, we used a 30MB file with an initial seeder count of 5.

Recall that the 350 peers continue to use the above “binned” upload cap distribution. In the trials below, we consider the performance of one single additional peer, the “peer in question” (so 350 + 1 total in the swarm). This peer was evaluated with both BitTyrant and non-BitTyrant algorithms, across a range of upload caps, with caps identical to those in Fig. 2. (16, 64, 128, 300, 512, and 1024 KBps).

Here are the results of three distinct trials, using Deluge in BitTyrant and non-BitTyrant mode:

fig1-1.pngfig1-3.png

fig1-2.png

Fig. 5-7.

We can see that, when faced with a low amount of upload bandwidth to work with, BitTyrant actually performs worse than the stock algorithm. However, when given a high upload bandwidth, BitTyrant does perform better than the stock algorithm. Some reasons for this are provided below.

 

Critique

Of course, it’s possible that the libTorrent implementation of BitTyrant isn’t a faithful one. However, we have reason to believe that the benefit of BitTyrant is indeed proportional to its upload bandwidth cap.

Because it aggressively maximizes its active set, it’s likely that BitTyrant, in an effort to connect to as many peers as possible, is splitting its limited upload bandwidth into streams too small to be useful (i.e., the equal-split distribution table used to size the active set is inaccurate). With such a small bandwidth to each peer, the overhead of communicating with the peer outweighs the benefit of the data that it can download.

Another possible explanation for this is active set thrashing. Again, recall that even though BitTyrant chokes and unchokes at the same 10-second intervals as the stock algorithm, it reduces its contribution dynamically, which may cause other peers to disconnect from it, even if it would like to remain connected to them. Overall, this results in more churn in the active set.

We also tried downloading a much larger file (a 700MB Debian CD-ROM), and saw similar behavior, with BitTyrant taking longer to complete for the smaller upload caps, and with a slight speed advantage for high caps.

The original paper maxed out the single peer’s upload cap at 1024 KBps. In the modern day, even mobile networks can reach uplink speeds in excess of 25 Mbps, so this assumption certainly doesn’t always hold (although it could in the developing world). In the future, we’d be interested in evaluating the utility of BitTyrant with “modern” upload caps.

Another consideration for modern-day BitTorrent (and for P2P CDN offload services based on it), is that there will be a number of “supernodes” that can upload at an extremely high rate. For example, these might be dedicated seedboxes hosted in datacenters with very high upload bandwidths, or, for an offload service, the CDN’s servers itself. The point is that BitTyrant has a more aggressive strategy for seeking out these extremely altruistic nodes, and, thus, it will be more likely to find them. Therefore, we do recommend the BitTyrant algorithm in swarms that contain supernodes.

 

Challenges

The single greatest challenge we faced was trying to adapt the abandonware Azureus client to our needs, and, finally, giving up and moving over to the libTorrent implementation. The Deluge interface is relatively nice to work with, but we had to significantly retool our Python data collection scripts in order to use it.

One reason we firmly determined to move away from Azureus was that we could not get VNC or X forwarding working on Google Cloud, despite following Google’s semi-official blog post at https://medium.com/google-cloud/linux-gui-on-the-google-cloud-platform-800719ab27c5. This was what really forced us to move over to libTorrent, which resulted in a cleaner design, but cost us many, many hours.

Deluge requires a plugin in order to manipulate gritty low-level libTorrent config options (like enabling BitTyrant). This plugin was only available through the web interface, not CLI, so our replication steps require that we change it there.

Other challenges included ensuring that all 350 instances of the peer swarm had separate configuration and download directories, so that they wouldn’t step on each other. For each iteration, we had to clean up after each of these instances, ensuring that they began a fresh download session, and didn’t just continue to seed.

We also encountered an apparent bug in Deluge’s “run-script-upon-download-completion” feature. This feature simply did not work, and adjusting permissions and running different commands/scripts had no effect. However, Deluge does support moving a file to a different directory once it’s finished. We verified that the file was, in fact, moved. We ended up implementing a kludge to monitor the completed directory for a new file, and then taking the occurrence of that as the completion time of the download.

 

How to Replicate:

The experiment and data + graph generation run entirely on Google Cloud. Accessing the testbed is very easy: simply shell into the instance by calling ‘ssh cs244@104.196.234.28′. First, email the authors of this blog post to have them start up the cloud instance. They’ll reply to your email with the ssh password.

Once you’ve shelled into the instance, perform the following steps:

  1. Navigate to folder ‘experiment’ within the cs244 user’s home directory.
  2. Run ‘./run.py regular’. This will take a few minutes, and will output a data dump to the ~/logs directory. Don’t worry if you see an error about deluge already running.
  3. Now, you’ll need to change the algorithm from stock to BitTyrant in the Deluge WebUI. Run ./deluge-webui, then, from your laptop, connect to http://104.196.234.28 (port 80). The default password for the interface is “deluge”.
  4. In the top bar of the WebUI, click “Preferences”, then click “ltConfig” under Categories. Scroll down to the entry labeled “choking_algorithm” and change it from “0” to “3”. Click Apply, then OK. You can now kill the ./deluge-webui script in your terminal.
  5. Run ‘./run.py bittyrant’. Again, this will take a few minutes. Once it’s complete, you have all the data points you need.
  6. Run ‘./generate-plots.py’. This takes the data from the logs directory and produces graphs in the ~/plots directory.
  7. Navigate to the ~/plots directory, and run ‘sudo python -m SimpleHTTPServer 80’. Finally, access http://104.196.234.28 (port 80) again. The generated .pngs will be served out of the directory.
  8. Email us if you have any questions!

 

Code + Dependencies for a clean install

If you wish to set up the testbed from scratch, you can use the scripts hosted at http://github.com/andrewbartolo/cs244-pa3. The setup requires at least the following dependencies:

transmission, transmission-cli, and transmission-daemon
python-libtorrent
deluge, deluged, deluge-web, and deluge-console
slimit (a Deluge dependency)
deluge-ltconfig (https://github.com/ratanakvlun/deluge-ltconfig), for enabling BitTyrant mode
numpy
matplotlib

Advertisements

2 responses to “CS244 ’17: BitTyrant: Do incentives build robustness in BitTorrent?

  1. Peer evaluation comment from Koki and Michael:

    Reproducibility score: 5/5

    Overall the instructions were very clear and the scripts ran to completion without any problem. There were just couple minor things that we want to note here in case someone else encounters with the same issues.

    – When we first accessed deluge from webUI, we could not find “ItConfig” under “Categories”. This problem went away when we closed the browser and logged in again so we guess that was nothing really serious.
    – The first time we ran “./run.py”, the script hung and we had to kill it manually (after killing it, terminal prompted us password for sudo). The second time we just ran it with sudo and the script ran to completion. Later we tried again without sudo and this time it worked for us, so we could not really tell if the first time was some sort of accident or not.

    Aside from those minor issues above (which could just be us), the reproduction was successful and the results matched with the original results in the post fairly well. The plots generated are attached below as a reference:

    fig1.png
    https://drive.google.com/file/d/0B6mfhZTiRt4ENklaaFg2Z1JMT0E/view?usp=sharing

    fig2.png
    https://drive.google.com/file/d/0B6mfhZTiRt4EekxERGVnWUpWQzg/view?usp=sharing

  2. Pingback: CS244 ’17: BitTyrant: Do incentives build robustness in BitTorrent? — Reproducing Network Research – Ignore Beat Fail·

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s