Sweet Song (shihui) and Jason Zhao (jlzhao)
Jellyfish is a novel topology for organizing data center networks. Unlike traditional techniques such as Fat-trees and other well defined structures, Jellyfish proposes a random graph topology for top-of-rack switches. The authors show that this approach is both more cost-efficient and can support more servers with the same amount of equipment. Furthermore, Jellyfish topologies can maintain the same overall throughput as Fat-tree networks due to short average path length between nodes.
Motivation & Goals of the Paper
Data centers are an integral part of the Internet architecture, but they present a very unique and tailored functionality. With the transition to colossal mass-produced low-cost commodity hardware, development of data centers is now a priority for industry giants such as Google, Amazon and Microsoft. Since many data centers are privately built and controlled by companies, the level of control a network architect has for data centers is much higher compared to a wide area network. Singla et al present Jellyfish as an enhancement to current data center topologies. Connecting different servers together in a data center is a fundamental issue and can drastically improve the performance and cost of a data center if done correctly. The authors studied current state of the art technologies and proposes a new graph structure that will be cheaper to construct and reach similar if not better levels of performance. The authors proved their design via a series of theoretical bounds and network simulations.
Jellyfish uses a simulated random regular graph (RRG) to wire up top-of-rack switches in a data center network. The RRG algorithm simply add links between 2 random non-neighboring switches until there are no open ports remaining. In the case where there are still open ports but no links can be added, an existing link can be removed and the algorithm will proceed. This graph structure exhibits low average path length between servers. Consider the following figure,
The main advantages of Jellyfish are efficiency and flexibility. Most nodes in a Fat-tree network are 6 hops apart whereas in Jellyfish, most nodes are between 4-5 hops apart. Not only are paths shorter in Jellyfish, the RRG network also has more distinct paths between end hosts. This improves the fault tolerance of the overall network but requires different routing algorithms in order to maximize the benefits. Its ease of growth with the edge swap design caters to the foreseeable growth of data centers. Since the proposed designs are purely topological, the authors argue that Jellyfish can be readily deployed.
Result to Reproduce
We aim to reproduce the following figure from the original paper. Our decision to chose this as the experiment to replicate was to explore one of the core assumptions of Jellyfish which is path diversity. Jellyfish claims to be superior to traditional data center networks because links are efficiently utilized between many nodes and we want to ensure that this is the case.
This figure shows that in order to fully maximize the path diversity in Jellyfish, traditional multi-path routing algorithms such as equal-cost multi path routing (ECMP) is not sufficient. Over 50% of links are only on 2 or fewer distinct paths between servers where as in k-shortest path routing, only around 6% are on 2 or fewer distinct paths. With newer TCP modes such as multi-path TCP, having more distinct paths can significantly increase the throughput of the sending server. We hope to replicate this experiment and verify that ECMP does in fact produce fewer distinct paths compared to the k-shortest path routing algorithm.
We first generated a random regular graph with n servers and k ports per server. To closely approximate the topology as stipulated in figure of the paper, we created a python script genGraph.py to spawn this for a total of 220 switches with 12 ports each. Then genTraffic.py randomly assigned 686 servers to each switch with uniform probability. The traffic script then creates a random permutation traffic matrix where each server is sending to exactly 1 other server and receives from exactly 1 other server. This traffic matrix is then passed to genPaths.py to calculate the number of available path for each flow. To compute the paths, we used BFS to find k shortest paths from one end point to another. The same algorithm is used to compute the available paths for ECMP where the cost metric is simply number of hops. We then aggregated the number of time each link is used for the following 3 algorithms: 8-shortest paths, 8-way ECMP, and 64-way ECMP. Finally, the links were ranked by the number of distinct paths they are on and this was plotted using gnuplot.
To our delight, our plot looks almost identical to the plot of the graph. The step functions in our output mimic those in the graph, with some rank differences. The range of the graph also closely resembles the original papers. The minor differences between where the step occurs can be attributed to the randomness in the graph generation. One interesting thing to note is that since the original topology is not well defined, the actual graph can be very different depending on the
characteristics of the switches being used. For example, if we used fewer switches with more ports per switch, the number of distinct paths will decrease because there are fewer nodes in the network. Our extension in this experiment included varying graph characteristics of switch and port numbers.
The topology presented in Jellyfish did live up to the author’s claims. The random regular graph can utilize the same number of switches as a traditional fat tree and achieve high path diversity between nodes. The paper give a very concise description of the network and the traffic matrix used which allowed us to reproduce the results. However, the author never did an analysis on the optimal split between number of switches and servers. One could vary the network density by having more switch to switch connections and fewer switches. One could also construct a sparse network with fewer switches and more servers per switch. Both of these designs lead to different path sharing characteristics than the authors original paper which we explore in the next section. Another concern we had about this paper is how these links will be efficiently utilized. The authors mentioned using MPTCP as the transport layer protocol to simultaneously use all these additional paths but if a link is shared by many other paths as it is the case in Jellyfish, the level of contention for all links will be very high and MPTCP may unfairly punish all flows since they are sharing many of the same links in the network and there is much less isolation compared to before.
After reproducing the original figure, we decided to study the effects of graph density on path diversity. We generated a series of random topologies while holding the number of servers constant. We choose 686 servers which was provided in the original paper. However, we started by constructing graphs with only 100 16-port switches and went all the way up to 300 16 port switches. For each case, we generated the original figure from the paper using our new graph. In addition, we compute the minimum number of distinct paths between nodes, the maximum number, and the average number. This information allows us to see how density affects path diversity which is directly related to route quality when using MPTCP.
The following figures were selected to show how the original graph varies when using different topologies. With a dense graph and a smaller number of switches, the number of distinct paths between nodes is starts off pretty high and increases at a steady pace. Compared to the most sparse graph, there are many links which end up completely unused which would be a wasted network resource.
In addition, when examining the min/max/avg plot of the number of distant paths a link is on, one can clearly see that as the graph gets more sparse, paths become less and less used. With the 220 switch graph, we can see the average distinct path count is around 6. This parameter can be used to tune how many paths MPTCP should use when it is trying to establish a connection. In general, the average distinct path count seems to relate linearly to the server density. We would have to run more experiments to conclude with confidence the exact relationship.
The vagueness of the topology for the plot was a hindrance during our development. In particular, there was no mention of number of switch link ports per switch. Deducing backwards from the ~2600 rank of link graph show in Figure 9, we concluded through multiple experiments that the number used in the simulation was most likely 12. This then in turn allowed us to come up with 220 as the number of switches. Then given the number of 686 servers in the figure, we calculate 4 as the number of servers per switch. While this was our main comparison to the result we were trying to reproduce, we also included many other numbers in our experiment to demonstrate the diversity of the Jellyfish topology.
The machines used in this experiment were a Linux Fedora 18 and a Macbook Pro Maverick. However, this result is reproducible on any machine with python 2.7+, shell scripting, and gnuplot installed.
Instructions for Experiment Reproduction
- Run “git clone https://bitbucket.org/jlz27/cs244_proj3.git”
- Run “cd cs244_proj3”
- Run “sh runExperiment.sh (this should take about 10 minutes to run over 10 different graph topologies)
- The result for each topology should be in the folder labeled as 686_100_8_8/686_100_8.png
- The density graph is in the top level directory in file called density.png
To run an individual experiment using a custom number of servers and switches, use “sh generateGraph.sh <# of servers> <# of switches> <# ports for switches> <# ports for servers>”
Make sure that the number of switches multiply by the number of ports for servers is greater than the number of servers, or else the topology is impossible to construct.