Bryan Cheng (bbch at stanford)
Calvin Tuong (ctuong at stanford)
Singla et. al. present Jellyfish, a data center topology that has a large amount of randomness built into it. In the paper, it is specifically compared to a fat-tree topology. The authors’ goal is to show that Jellyfish can support at least as many servers as a fat-tree with the same capacity and equipment, and also that it is much more flexible, allowing the network to be easily expanded. The flexibility comes as a result of the fact that Jellyfish is random – there isn’t any structure that needs to be maintained when extending the network (i.e. adding more servers or switches). Additionally, Jellyfish is expected to have high bandwidth. This comes from the fact that in a random graph, the average path length between servers is shorter than in a fat-tree, and smaller path lengths leads to higher throughput.
As we’ve seen in class, data center optimization is only getting more and more important with the growth of companies like Google, Facebook, etc., which pour an enormous amount of money into their data centers. We have seen examples of optimizations in the transport protocol, such as DCTCP. We have also seen VL2, which is an optimization on data center topologies. The authors of the Jellyfish paper specifically note that organizations and companies have trouble expanding their existing data centers, and this is a motivating factor behind their design.
The algorithm to create a Jellyfish topology is extremely simple. Given a set of hosts and switches, connect each host to a distinct switch. The connections between switches are determined by randomly choosing two switches that aren’t already directly connected and have free ports and then connecting them. This is done until no more links can be added. At the end, assuming we have enough switches and ports to connect the graph, there may be switches that have one or more free ports. If a switch s has more than one free port, we can randomly choose an existing link (s_1, s_2), remove it, and replace it by two links (s, s_1) and (s, s_2). This removes two free ports from s, and the process can be repeated until s has at most one free port. (This process is known as incremental expansion.) This creates a topology in which the switches are completely randomly connected. Adding new nodes to this topology can be done easily by following a process similar to incremental expansion.
Given this algorithm, it is easy to see why Jellyfish is much more flexible than a fat-tree. Additionally, because of the randomness, path lengths will tend to be shorter because in a fat-tree, flows have to go up and down the tree if they need to fully cross the tree. These characteristics suggest that Jellyfish is a topology that could have some performance improvements over a fat-tree. Because the topology is random, it is important to test different performance metrics to make sure that on average, there actually is an improvement over a fat-tree.
Results from the Paper
Overall, the authors found that Jellyfish performed just as well as or better than a traditional fat-tree topology in all of their different metrics. These metrics include average path length, average throughput, fairness, and number of servers supported, among other things. At the same time, Jellyfish provides much more ease of expansion, which, the authors argue, makes it a viable choice of topology in real data centers.
Our goal in this project is to replicate the results of figure 9 and part of table 1 from the paper. Figure 9 shows path diversity provided by ECMP routing and k-shortest-paths routing in a Jellyfish topology. Specifically, it shows that with k-shortest-paths, links are generally part of many more distinct paths than with ECMP. This is an important result because higher path diversity results in better ability to utilize the capacity of the network as much as possible and to avoid congestion. The figure shows that k-shortest-paths is a much better routing algorithm to use on Jellyfish than ECMP (they also confirm this by running packet simulation tests on Jellyfish using both k-shortest-paths and ECMP). This result allows the authors to conduct their other experiments with k-shortest-paths instead of ECMP on Jellyfish.
Table 1 shows the results of running some packet simulations on fat-tree and Jellyfish. For this project, we look specifically at the case when TCP is used as the transport protocol and not at MPTCP. (Aside: while the MPTCP case might be more relevant because the average throughput is higher (and thus MPTCP is what would probably be used in practice), our replication of table 1 came in addition to our initial replication of figure 9, so we did not have time to explore and configure MPTCP.) The table reveals that using k-shortest-paths on top of TCP with Jellyfish gives us the same average throughput as running ECMP on fat-tree. This is one step forward in showing that Jellyfish achieves about the same performance as fat-tree.
We replicate figure 9 by running an analysis on a Jellyfish topology with 16 servers, 20 switches, and 4 ports per switch. We also have a parameter p which decides approximately what percentage of all node pairs we should be routing between. Based on p, we randomly decide which pairs of servers should have traffic routed between them. Then, we compute paths between these pairs based on the routing algorithm that we are analyzing (either k-shortest-paths, with k = 8, or ECMP). We look at the links on each path and count the number of paths that a given link appears on. Finally, we sort this list of links in order of path diversity (i.e. in order of increasing number of paths that the link appears on).
Each of the graphs that we generated shows that k-shortest-paths offers greater path diversity than ECMP when used with Jellyfish. These results agree with the results presented in the paper. However, in the paper, k-shortest-paths shows a clear advantage over ECMP while in our results, it only seems to be slightly better. We believe that this is because our topology is significantly smaller (the reason for this is discussed in the Challenges section). Because there are fewer paths overall in a smaller topology, we should expect the differences between the two routing algorithms to be smaller. Additionally, with smaller p, the differences should be even smaller because the total number of paths is very small.
To conduct our throughput test, we start an iperf server at every host. Each host is both a sender and a receiver. We use a random permutation to assign which other server each server is sending to. This server/client pair then establishes the appropriate number of flows (depending on the experiment) between them. We let the flows run and measure the throughput at each host. We also run the experiment 3 times and take the average of the throughputs over each run for each host. (Note: The paper runs each experiment 5 times, but our total runtime is already very long with only 3 runs each. Increasing this to 5 runs would almost double our runtime to almost 2 hours.) The output is the average throughput over all servers and runs.
|Fat-Tree, ECMP||Jellyfish, ECMP||Jellyfish, k-shortest-paths|
|TCP 1 Flow||36.9%||38.5%||30.4%|
|TCP 2 Flows||34.1%||41.3%||30.2%|
|TCP 8 Flows||25.5%||38.6%||25.5%|
Our results demonstrate that Jellyfish with ECMP is typically better than fat-tree, though Jellyfish with k-shortest-paths performs roughly the same as fat-tree. In the paper’s reports, for 1 TCP flow, ECMP on Jellyfish was the best by about 20%, while for 8 TCP flows, Jellyfish’s k-shortest-paths and ECMP on fat-tree were about 25% better. However, for their results, they were able to conduct these experiments on 780 servers for Jellyfish and 686 servers for fat-tree. Due to the limitations of Mininet, and despite running it on an xlarge instance, we could only simulate so much for this experiment.
For the 1 flow case, our results are relatively/proportionally similar to the paper’s results. However, for the 8 flow case, we believe that the discrepancy comes from the fact that we have such a small number of servers. Contrary to the result in the paper, we see that ECMP achieves slightly better throughput than k-shortest-paths, even when 8 TCP flows are used. As the authors state in the paper, the benefits of k-shortest-paths comes from the fact that it offers much more path diversity than ECMP. But intuitively, and also as seen in our previous path diversity results, k-shortest-paths does not have much of a path diversity advantage over ECMP when the topology is as small as ours. In our small simulation, there is only a limited number of paths either way. Thus we cannot fully take advantage of the topology’s properties. For example, for k-shortest-paths (k = 8), in a normal data center, the 8 shortest paths might be around the same length, but in our topology, it could vary by a factor of 2-3 because our topology is so small. In fact, there might not even be 8 paths between two hosts, so the advantages of k-shortest-paths cannot be fully realized. Thus, we don’t expect ECMP and k-shortest-paths to perform that differently even with 8 flows. But if we could increase our topology size, we would hope that the differences would show.
One last observation we made was that the average throughput for 1 and 2 flows were about the same, but with 8 flows, the throughput drops off in all cases. One possible reason for this is that even with an xlarge instance on EC2, the instance could not handle the computational intensity of running 16 sets of 8 flows. If this were the case, it would just be a limitation of our environment.
We ran into several major challenges in our replication of these results. The first is that the paper is generally unclear about the exact parameters that they used to run their experiments and generate their results. For example, for figure 9, the authors state that they used “random permutation traffic at the server-level,” but they do not mention how many node pairs were included in these random permutations or their exactly methodology in deciding how to generate this traffic. To resolve this, we run several experiments by varying a parameter p, which represents the chance that a node pair will have traffic routed between them. By using p = 0.25, 0.5, 0.75, and 1, we obtain results for when roughly one-fourth, half, and three-fourths of the node pairs have traffic between them, and also for when we enumerate all possible node pairs.
Another instance of this challenge is in the specification of topology size used in their experiments. While they state that they used a Jellyfish topology with 686 or 780 servers, the number of switches and the number of ports per switch does not seem to be specified. The other (related) challenge is that we could not possibly run Mininet with that many servers anyway. Thus we had to scale down our topologies by running with 16 servers, 20 switches, and 4 ports per switch for Jellyfish. For fat-tree, we had to use k = 4 (though the size of the fat-tree is never actually specified in the paper anyway). Because of these factors, we had to make these assumptions, but given our restricted setup, we believe that we replicated the results to the best of our ability.
We chose to use Mininet and RipL-POX. RipL-POX was a clear choice because it was designed specifically for running experiments on data center topologies and fat-tree and ECMP were already built into the code. Additionally, RipL-POX runs together with Mininet, so we chose to use Mininet. After the first two assignments, we also found Mininet to be easy to use, and it fit well with our need to run TCP simulations over our topology.
Instructions for Setting Up and Running the Experiments
1. Create a new instance in US West (Oregon)
2. Choose the “CS244-Win13-Mininet” AMI
3. Change the instance type to “c1.xlarge”
- Note that if you would like to view the graphs by spawning an HTTP server on the instance, you have to open port 8000 (or some other known port)!
4. Launch the instance
5. ssh into your new instance
6. Run “git clone https://github.com/ctuong/CS244.git ”
7. Run “cd CS244”
8. Run “sudo sh install.sh”
9. Run “sudo sh run.sh”
- The experiments may take about an hour to run, so you might want to use something like screen or tmux to keep it going
10. Graphs result-p.png will be created, where p = 0.25, 0.5, 0.75, and 1
11. result-throughputx.txt will be created, where x = 1, 2, and 8
Code is available at https://github.com/ctuong/CS244
Feedback on Mininet and RipL-POX
We found the software useful to perform these simulations. It was quite annoying though that all these software packages were disjoint. Thus, we would have to start Mininet and then start POX to be able to run out tests. If it were possible to bake all these packages into one standalone package and make it easier to launch the entire set at once, that would make it easier to get started.
For Mininet, we also would like more documentation. It should let us know what each of the parameters for the functions mean and also the default values for various parameters.