We examine Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks, one of the most highly cited papers of the BItTorrent protocol. The major contribution in the paper was showing how the influx and outflow of seeds and peers modelled the behavior of a simple fluid model. The authors conducted the first experiments under a ‘simulated BitTorrent-like network’. We reproduce the experiment under a self-contained BitTorrent network emulated in Mininet. Because the paper’s experiments were in a simulation, we test whether running actual BitTorrent clients in an emulated network still follows the fluid model.
The original paper was attempting to develop a model to describe the performance, scalability, and efficiency of of BitTorrent, a Peer-to-Peer (P2P) protocol designed for file-sharing. In particular, they wanted to be able to develop a model to categorize the number of seeders and peers in a network for a particular torrent file, as this has implications on the performance of the file sharing.
This problem is important because BitTorrent was a radical change from existing file transfer protocols. Because BitTorrent is so different from previous approaches to file transfer, characterizing its efficiency and performance in different scenarios was key to understanding how well the system behaves, and how it compares to traditional file transfer. Although people had a general understanding that BitTorrent likely scales well and is efficient, a model that describes BitTorrent’s behavior would be quite useful for understanding both how the system behaves, and the strengths and weaknesses of the protocol for future research.
The authors of the paper performed three experiments to validate their fluid model against the behaviors of the BitTorrent network.
A private BitTorrent-like network is simulated, with all the client nodes and tracker servers controlled fully by the authors. All network characteristics such as the bandwidth of each peer, rate of download aborts, and the influx/outflow of seeds/peers are determined by a Markov model, with the initial parameters left constant. The rate at which downloaders and seeders leave the system is the same in this experiment. The authors found that the normalized number of seeders and downloaders followed the behavior predicted by their simple fluid model very closely.
The second experiment is exactly the same as the first, except we increase the rate at which seeds leave the system. This results in the uploading bandwidth becoming the main bottleneck. The fluid model fits well for the results obtained from the first and second experiment, with the best results coming from simulations where the arrival rate of new peers is low.
A seed is a node with a fully downloaded file willing to share fragments with peers. The authors introduced a seed into a real life BitTorrent network and studied the influx/outflow of seeds and peers. The number of seeds in the network at any given time was characterized well by the fluid model. The number of downloaders in the network at any given time was characterized less well, but within the bounds suggested by the 95% confidence interval.
Subset Goal/Motivation and Extensions
Our goal is to see if an emulated (rather than simulated) BitTorrent network follows the fluid model as shown in experiments 1 and 2 in the paper. We chose these experiments to reproduce because Mininet allows us to achieve the real-world behavior of experiment 3 while maintaining control of the network as in experiments 1 and 2. We originally planned on directly reproducing experiment 3, but we realized that we would need to offer a very popular Torrent file on the internet, which is something that we don’t have. We also considered examining the tracker for such files on the internet, but it did not seem possible to get the complete server logs for these, which is what we would need to recreate the graph in experiment 3.
We chose to use Mininet on an EC2 machine for our experiments. We chose Mininet because we were familiar with the system, and it seemed relatively easy to set up a BitTorrent system inside Mininet. We chose EC2 because it allowed us to emulate larger topologies than our laptops could handle. We were able to rerun our experiments several times and see similar behavior. We think the power of the machine that is emulating the network will affect the reproducibility of our results. If the machine is not very powerful, then it may not be able to sustain the throughput required for our experiments.
Our experiment involved setting up a topology in Mininet, creating a tracker, and then launching multiple BitTorrent clients in the Mininet topology to emulate a real BitTorrent network. To pick our topology, we drew inspiration from the backbone network of the US. We wanted to model a network where there was a cluster of clients in San Francisco, Los Angeles, and Seattle.
US AT&T Backbone Network
To model this scenario, we created 3 switches in Mininet, each representing a “city” or backbone node. We also made the delay of one of the links twice as long as the other, to model the increased distance between San Francisco and Seattle. A diagram of our setup is reproduced below.
Figure 2. Mininet Topology
We run our experiment until 200 unique clients in have entered the system. Like the authors, we model client arrivals as a Poisson process, and we model the departure of seeders and downloaders as a Poisson process as well. We do this by using an exponential distribution to find the interarrival/interdeparture times, respectively. We poll our tracker server every 5 seconds to obtain the current number of downloaders and seeders in the network at any given time. Then, we run the simple fluid model with the same parameters and compare the results between it and our tracker.
While implementing our BitTorrent simulation, our biggest conceptual challenge was the type of experiment the authors performed. We initially thought that by “simulation of a BitTorrent-like network,” the authors meant they ran BitTorrent clients in a simulated network. However, after discussing the paper with Keith, we realized that the authors did not actually run BitTorrent clients, but rather simulated their behavior. A pleasant side-effect of this realization was that our experiment was somewhat novel. We were able to see if a simulated network with actual BitTorrent clients (which we could control) matched the fluid model, which is an experiment that is closer to the behavior of real BitTorrent networks.
The largest implementation challenge we had was finding a suitable BitTorrent client. We needed a client that we could control via the command line, and that we could run multiple instances of in a single machine without them sharing state. Initially, we tried the Transmission client. However, we realized that because Transmission ran as a daemon, as each new client attempted to download the torrent, the Transmission daemon would see that the torrent was already downloaded and then refuse the download. As a result, we narrowed our search to clients that did not run as daemons. We then tried aria2c, which almost worked except that the client would not act as the ‘initial seeder.’ That is, it would not offer to seed the torrent without having downloaded it first, which the first peer in the network that offers the file would have to do. We finally settled on ctorrent, which met all of our constraints.
Differences In Experiment Implementation
The authors were able to get 500-600 downloaders and seeders in their simulation. However, since we are running a BitTorrent network emulated in Mininet, our EC2 machine can only sustain ~15 downloaders and seeders at a time, each with 2Mb/s of bandwidth. The authors ran the experiment for 3.5 days, while we ran ours for about 40 minutes due to time constraints.
The first experiment involved setting the rate at which downloaders and seeders leave the system to be the same. We set the arrival rate λ=0.1, and the departure rates of both the seeders and downloaders to be 0.01. From their torrent logs, we measured each client to have a download speed of about 2Mb/s and an upload speed of about 0.4Mb/s. We shared a 30.4Mb torrent file among the peers. We polled the tracker for the number of seeders and downloaders every 5 seconds. We reran the experiment multiple times, and present two runs below:
Figure 3. Experiment 1 Run 1
Figure 4. Experiment 1 Run 2
As you can see, while the fluid model follows the seeders moderately well, it has more difficulty modelling the number of downloaders, similar to the paper’s results in Experiment 3. In particular, we found that the fluid model consistently underestimated the amount of downloaders in the system. We believe that the one reason is related to the variance in the random arrivals/departures, and possibly variance in the performance of the emulated Mininet network/EC2 machine. That is, there are periods where the number of seeders drops in both graphs, either due to a cluster of random departures of seeds, or due to intermittent low bandwidth in the network preventing new seeds from joining. These periods seem to “queue up” downloaders, leading to an increase in the number of downloaders that the fluid model doesn’t predict. For example, in Figure 1 at 1000 seconds, and in Figure 2 at 250 seconds.
The second experiment differs from the first simply by increasing the seed departure to 0.05 from 0.01.
Figure 5. Experiment 2
Like in Experiment 1, the Fluid Model predicts seeders more closely than it does downloaders, and the number of downloaders increases rapidly when the number of seeders is low (see the period between 1250-1500 seconds).
Fluid Model Predictions
In both our experiments, we found that the fluid model predicted the number of seeders well, even in the face of high variability. However, after the first large drop of seeders, the fluid model fails to predict the number of downloaders well.
A large drop in seeders in the network results in a large increase of downloaders. The aggregate seed upload rate is much smaller, leading to downloaders staying in the system longer. Because the arrival rate of downloaders remains the same, their net amount is increased. Unfortunately, the fluid model is not sensitive to the sudden changes in seeders and this leads to a large prediction error. Its downloader predictions remain constant, when in reality, it should increase. The author’s simulated BitTorrent system shows the fluid model fitting a lot better because there is little variability in the number of seeders and downloaders. This allows their network to reach a steady state characterized by the fluid model.
Comparison with Third Experiment
In their 3rd experiment, the authors ran their fluid model against an unpopular real world torrent. We notice that the variance in the number of seeds and downloaders match the variance observed in the BitTorrent networks we ran. We also notice that the fluid model suffers from the same insensitivity of the variance, and does not predict the number of seeders and downloaders very well.
In particular, we bring attention to the first large drop in seeders at t = 800. If we leave the fluid model unchanged, it would not have predicted the subsequent number of downloaders very well. However, for time between 800min and 1300min, the authors let λ and γ change linearly, justifying their choice by citing their tracker logs. After the modification is made, their fluid model conveniently fits the data a lot better.
However, we see the same time-varying behavior in our experiments. This is in spite of the fact that our experiments hold λ and γ parameters constant, since we controlled precisely when downloaders arrived and when seeders left. We therefore wonder if there is some other time-varying factor missing from the fluid model.
In conclusion, we reproduced experiments 1 and 2 from the paper, except that we ran our experiment on an emulated BitTorrent network on Mininet. We observed results that were closer to the author’s findings in experiment 3, where they ran the fluid model against a real world BitTorrent network. We believe that there is a discrepancy in results because the fluid model is not very sensitive to variability in seeders/downloaders, leading to high prediction errors in systems where there are large fluctuations. The author’s simulated results contains low variability, whereas our Mininet BitTorrent network and the real world BitTorrent network has significantly higher variability. The authors were able to get their fluid model to match their real-world trace by time-varying certain parameters. However, even though we controlled these parameters (and held them as constant) in our experiments, we were not able to match our observed downloaders with what the fluid model predicts. Therefore, we wonder if there is a separate time-varying parameter missing from the fluid model.
To see the graphs, you should enable X-forwarding when ssh’ing in to the EC2
Machine. Actually running our experiments is simple. Go into the torrent_sim directory and run:
Once you do this, Mininet will create a Topology of 200 clients; THIS CAN TAKE A WHILE. Please be patient while it does so. Then, it will run the experiment until all clients have arrived into the network. This will take approximately 30-40 minutes. You can see the progress so far in your stdout. You can also poll the current number of downloaders and seeds in the network by running
tail -f results/stats
Once the experiment ends, it will run the fluid model. This is where you need X-forwarding. It will display 2 graphs; one for the seeders and one for the downloaders. That’s it!