By Bo Yoo and Mauricio Narvaez

Introduction

The amount of traffic in data centers has been increasing continuously. To accommodate large traffic, it is essential for data centers to be able to grow incrementally while avoiding bandwidth bottlenecks. One of the data center network topologies that has been proposed to facilitate high bandwidth usage is full-bisection bandwidth fat-tree topology, which is a hierarchical structure that assigns higher weight or in this case more neighbors as the node gets closer to the root. This rigid structure required by fat-tree topology restricts the granularity of the data center size growth since the number of servers depends on the number of ports available per switch. Therefore, the authors of the paper Jellyfish: Networking Data Centers Randomly [1], propose a new data center network topology: Jellyfish.

The Original Paper

Goals and Motivation

There is a market need for data centers to grow their capacity incrementally to handle the increasing amount of traffic. Since this network layout is dictated by the number of switch ports available, a hard rigid structure such as fat-tree inhibits this. The goal of this paper is to introduce a new data center network topology that allows smooth incorporation of additional servers for the future growth. It is also important that this topology promotes efficient utilization of the network. The authors suggest a degree bounded random graph topology among top-of-rack (ToR) switches called Jellyfish to achieve this, and show that this flexible structure allows easy incremental growth while being highly efficient.

Results

In the paper, the authors first show that Jellyfish outperforms other suggested data center network topology in terms of performance. Jellyfish supports 27% more servers at full capacity than fat-tree topology of the same size and it is highly failure resilient. Also, its network capacity is over 91% of the best-known, bandwidth-efficient, degree-diameter graph. The degree-diameter network gives the best throughput but does not satisfy the incremental network growth criteria. The authors also report that Jellyfish generally has shorter average path length between servers which means that this random network is highly connected and therefore efficient. Bisection bandwidth is often considered as a measurement of the network capacity. This measures the lowest bandwidth panning two subnetworks of the same size in a network. As shown in the Figure 1 below (Figure 2.a in the original paper), Jellyfish networks can support more servers compared to fat-tree under the same condition.

Figure 1. Number of servers Jellyfish and fat-tree networks can handle by the normalized bisection bandwidth

Jellyfish’s flexible structure also allows smooth incremental expandability. Their idea is that when one rack of servers is added, choose random links in the existing network, remove them, and then link the two newly freed ports to the new ToR and repeat as needed. The authors explain that because the path lengths increase slowly, the bandwidth per server remain high even after large expansions. This shows that the expansion is easy and can be done incrementally without sacrificing significant bandwidth loss for bigger data centers.

Our Project

Goals and Motivation

Our goal of this project is to first implement a degree bounded random graph Jellyfish network to simulate data center traffic. Then we plan to recreate Figure 2 below (Figure 9 in the original paper). This figure shows that because of Jellyfish’s highly interconnected structure, equal cost multi path (ECMP) routing algorithm does not allow full utilization of the links. Instead they show that k-shortest path routing works much better with Jellyfish. We chose this particular figure because by looking at the network utilization in different routing schemes we can learn these two routing algorithms in detail, compare the two and how one outperforms the other in the context of Jellyfish structure.

We also want to explore the network properties of Jellyfish. The authors of the paper talked about the high bisection bandwidth and short average shortest paths, but we plan to measure additional network properties that can explain high throughput of Jellyfish. We will generate multiple Jellyfish topologies to measure average shortest path, the diameter, and global clustering coefficient of the networks. Since we expect Jellyfish to be highly interconnected we expect to see low diameter and high clustering coefficient (which measure how close the network is to a full clique) across multiple iterations. We will also plot distribution of betweenness centrality for all nodes in Jellyfish. For Jellyfish to have no bottleneck server and utilize the network efficiently it should have a uniform betweenness and have no node with unusually high betweenness centrality such as figure 3 below. Networks like figure 3 would actually give a decent bidirectional bandwidth because the bottleneck node with high betweenness centrality would be hidden inside the subnetwork since two subnetworks have to be equal size. Looking at betweenness centrality distribution of the whole network will allow us to see if nodes like H exists in Jellyfish.

Figure 2. Figure we plan to reproduce as the main portion of our project

Figure 3. An example of a network with a high betweenness centrality node, H. Betweenness centrality measure how many shortest paths includes that particular node or how essential the node is in connecting the network together.

Platform

From the project groups who did this project during the previous offerings of the course, we learned that Mininet is not a good platform this project since it cannot handle enough servers for us to simulate the traffic routing algorithms shown in the figure 2 above (our main goal of the project) correctly. Therefore, we chose to implement everything using Python to generate random networks and then simulate network traffic to analyze the network structure and traffic routing algorithms.

Subset Results

We implemented a random network topology generator to create Jellyfish network structures. The algorithm is very simple, it takes in just three parameters: switch count, ports per switch, and rack height (how many ports on each switch are used for servers instead of other switches). It initializes all the nodes and adds random links until it fails (one of the two nodes is saturated) 100 times in a row – at that point we can assume the network is pretty saturated. Figure 4 below demonstrates two example graphs we generated with default parameters along with their network properties.

Figure 4. Random graphs generated with 686 nodes.. Here we show the topology of 2 independent iterations. The network properties and the graph are generated from an edge list we output using Cytoscape. The color of the node corresponds to the degree of each nodes. Blue nodes have lower degrees (min = 13) and red nodes have higher degrees (max = 29).

Generating Figure 9

Figure 9 in the Jellyfish paper measures the number of paths every link appears on and displays the number as a rank vs number (where rank is from least number of paths to highest number of paths). It displays this data for paths generated with ECMP 8 way, ECMP 64 way, and 8-shortest paths. The paths come from random permutation traffic – each server sends data to a random other server. We simplified this by sending traffic from switch n to switch n+1, n+2…n+i, n+(rackHeight). This assumes each server behind a switch contacts a server behind a different switch. The network was already random and the switch’s number was completely unrelated to the structure of the network, so we can pick any (rackHeight) other servers and still get good data. To generate the set of paths, we implemented Yen’s Algorithm for k shortest paths and ran it once per pair of switches, for 64 paths. We then extracted ECMP from this by taking the first set of paths with the same length (capped at 8 and 64), and 8 shortest paths by taking the first 8 paths in the original list. Then went through each set of paths to count how many times each individual link appeared in a dictionary that mapped link id to number of occurrences. Finally, we ordered the dictionary entries by value and generated the points (rank of link, number of paths) and input this to matplotlib to generate the below graph. Note that each run generates a random graph so the exact values may be different but all show the same trend:

Figure 5. Our recreation of Figure 9 from the Jellyfish paper, using 686 servers with a port count of 24 per switch and a rack height of 5. The Jellyfish paper didn’t specify which parameters they used so we found the ones that created the most similar graph by trial and error.

Challenges and Critiques

The largest challenge in reproducing the paper’s results was the lack of detail the paper contained. They didn’t include concrete numbers for rack height, switch size, etc. so we had to figure those out through trial and error to produce a graph that looked most similar. They didn’t explain where they got the paths used in this calculation other than saying “random permutation traffic” which took a while to decipher. Finally, we couldn’t find any concrete description of what exactly went into calculating ECMP. All papers were analyses of ECMP and its shortcomings, or vague descriptions that referenced ECMP but there was no official “ECMP Paper” that outline what exactly it was, everything simple said it used multiple paths of equal lengths. In our calculations, we interpreted ECMP to be include up to N paths whose length are equal to the shortest path. Any more are excluded, and any less results in simply less paths. Even if there 1 path of length 3 and 9 paths of length 4, our ECMP calculation would return the single 3-long path.

Extensions

To extend this project even further we decided to explore network properties of random networks (Jellyfish network) in detail. In particular, we chose to calculate and plot global/local clustering coefficients, diameter, average shortest paths, and betweenness centralities of Jellyfish networks [2]. Due to the number of connections each switch can make (which comes from the number of ports on the switches), randomly generated networks with hardware constraints seem to consistently build networks with similar topological structure which can explain the consistency of performance by Jellyfish network although it was created randomly. We ran all the experiments below with our default parameters (686 switches, 12 rack height, and 48 switch size).

Global clustering coefficients measure how close the network is to a full clique or a fully connected network. Simply put it measures out of triplets of connected nodes in the network what fraction of them are closed triangles. Local clustering coefficients is a measurement per node and this measures the likeliness of a node and its neighbors to cluster together or form a full clique. This is calculated by per node in the network out of all the edges that can exist among its neighbors (directly connected nodes to the node in question) how many are actual edges. Originally we thought the network structure will actually look very close to a clique since that is the network topology that will give us the best throughput. Interestingly, as shown in figure 5 below, we find that global clustering coefficient of Jellyfish networks while consistent but are pretty low around 0.05. Local clustering coefficients distributions seem to show peak around 0.01-0.03 range and no node has local clustering coefficient above 0.2. Considering a full clique like structure will have clustering coefficient of 1, this shows that Jellyfish networks are actually quite different from cliques.

Diameter and average shortest paths are other metrics to represent how well connected a network is. Instead of only considering direct links, these two look at what are the shortest paths between any two given nodes. Diameter is the longest shortest path of a network. To explain the performance of the Jellyfish network we expected diameter to be consistently small so any traffic can communicate between two nodes without having to do many hops and average shortest paths to be even lower so that in majority of the case two nodes can be connected with even smaller hops. As shown in figure 6 below, we can see that this is actually the case for Jellyfish. In 5 separate instances all of the networks have diameter of 4 but average shortest paths is actually lower at just above 2.

We briefly discussed betweenness centrality in Goals and Motivation section but we define how we calculated it more formally here. Betweenness centrality measures how often a node participates in a shortest path between any two other nodes in the network. To calculate this we first obtain shortest paths (actual paths not lengths) between all nodes and then betweenness centrality of a node is just a fraction of paths that node is part of but not the source or the target. We then normalize the value and plot the distribution plot of 5 iterations. Due to normalization we do see some nodes with extreme centrality value but the distribution is skewed towards left and peaks sharply around 0.25-0.45 which means that there are more nodes with smaller betweenness centralities as shown in Figure 7. This is what we expected to see because this means that many nodes have similar betweenness centrality and only few that have low or high value. There are many shortests paths in the network that do not have to go through a particular node and this reduces the chance of having a bottleneck and congestion at that node, and this helps to explain the performance of Jellyfish.

Although we thought about including network properties of a full clique as a baseline, the values were too trivial and was not worth the computing power to generate them. A full clique will have global/local clustering coefficients of 1 (for every node and for the entire network), a diameter of 1, an average shortest paths of 1, and betweenness centrality would not make sense because for any two nodes the shortest path is a direct path and will not include a particular node unless it was the source node or the target node.

Figure 6. Global clustering coefficient (top left), local clustering coefficient (top right), diameter (bottom left), average shortest paths (bottom right) plot of 5 iterations of Jellyfish networks. The top left plot, bottom left plot, and bottom right plot simply plot the Global clustering coefficient, diameter, and average shortest paths values, respectively, and the top right plot is a distribution plot of local clustering coefficients where each bar color represents each iteration.

Figure 7. Distribution plot of betweenness centrality. Each color represent each iteration

**README**

In order to run our code using Google Cloud VM and open the images you must allow X11 forwarding with your local computer. The following direction is intended for Mac users. If you want to skip the step of X11 forwarding (won’t be able to display the image using the terminal) simply use “SSH” button provided by Google Cloud Platform when you create an instance.

1. Open a VM instance in Google Cloud Platform Console:

We used 8v CPU, 52GB Memory, and Debian GNU/Linux 8 (jessie) image, default for everything else. If you are not using X11 forwarding open the instance using SSH button provided on the VM page and move to step 8.

2. Install Google Cloud SDK in your local terminal:

**curl https://sdk.cloud.google.com | bash**

** exec -l $SHELL**

3. Generate a key-pair for authentication. In your local terminal:

**ssh-keygen -t rsa -f ~/.ssh/my-ssh-key-jellyfish -C jellyfish** #no passphrase

**chmod 400 ~/.ssh/my-ssh-key-jellyfish**

Go to the metadata page in Google Cloud Platform Console -> SSH Keys -> Edit -> Add item. Where it says “enter entire key data”, enter the output of the following command:

**cat ~/.ssh/my-ssh-key-jellyfish.pub**

Then save

4. Make sure you have X11 downloaded in your local computer (https://support.apple.com/kb/DL1605?locale=en_US)

**vim ~/.ssh/config**

Add these three lines:

**Host ***

** ForwardX11 yes**

** ForwardX11Trusted yes**

Restart the shell

5. Open your Google Cloud instance using the following command:

**ssh -i ~/.ssh/my-ssh-key-jellyfish jellyfish@[EXTERNAL IP ADDRESS]** #can be found on the Google Cloud Platform Page

6. Download xauth and edit /etc/ssh/sshd_config:

**sudo apt-get install xauth**

** sudo vim /etc/ssh/sshd_config**

Add these two lines:

** ****X11UseLocalhost no**

** AddressFamily inet**

On the same page make sure this line is uncommented:

**X11Forwarding yes**

sudo /etc/init.d/ssh restart # restart the shell

exit

7. Relog-in to the instance using -X flag:

**ssh -X -C -i ~/.ssh/my-ssh-key-jellyfish jellyfish@[EXTERNAL IP ADDRESS]**

8. Install dependencies:

**sudo apt-get update**

** sudo apt-get install git-core**

** sudo apt-get install python-matplotlib**

** sudo apt-get install build-essential checkinstall libx11-dev libxext-dev zlib1g-dev libpng12-dev libjpeg-dev libfreetype6-dev libxml2-dev**

** **# the next one is only if you are using X11 forwarding

** sudo apt-get install imagemagick**

9. Clone our repository

** ****git clone https://bitbucket.org/maunarvaez/cs244-hw3.git**

** cd cs244-hw3**

10. To see detail of how to run our code run:

**sh run.sh h**

To run our code with default parameters without running network analysis

**sh run.sh d**

To run our code with default parameters and run the network analysis [full analysis]

**sh run.sh n**

Our script with full network will take 4-5 hours to run. Therefore, we recommend you run the script on the background. To do so use the following command instead. This command will forward all standard outputs to a ‘log.txt’ file. Betweenness centrality calculation is the bottleneck step. The rest of the script will run in less than 5 mins.

**screen python main.py -a True -f True -i 5 -t 100** # then in screen press control A control D to detach from screen

To list all screen sessions running

**screen -ls**

To reattach to a screen

**screen -r [pid]**** **# do not need to enter pid if there is only one process running

To run our code with smaller parameters and run the network analysis [fast]. This takes around 1 min to run.

**sh run.sh s**

To run our code with your own parameters directly follow instructions when you run

**sh run.sh h **

** ** or

** python main.py -h**

11. Output explanations:

If you run our script without doing the network analyses it will only generate one figure:

figure_9.png # our recreated version of figure 9 in the original Jellyfish paper

If you run our script with the network analyses it will generate these text files and figures:

figure_9.png # our recreated version of figure 9 in the original Jellyfish paper

global_clusc_results.txt #numeric results of global clustering coefficients

Global_CC.png #graph of global clustering coefficient results

local_clusc_*_results.txt #numeric results from local clustering coefficient, * represent network number (iterations)

Local_CC.png #graph of local clustering coefficient results

diameter_results.txt #numeric results of diameter

Diameter.png #graph of diameter results

avg_shortest_paths_results.txt #numeric results of average shortest paths

Avg_Shortest_Paths.png #graph of average shortest paths results

betwc_*_results.txt #numeric results of betweenness centrality

Betweenness.png #graph of betweenness centrality results

If you are using X11 forwarding, you can use the following command to view the figures

**display [filename] **#e.g. figure_9.png

Citations

[1] A. Singla, C.Hong, L. Popa and P. B. Godfrey. Jellyfish: Networking Data Centers Randomly. Proceedings of USENIX Symposium on Networked Systems Design and Implementation, 225–238, Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), San Jose, CA, 2012, USENIX.

[2] Fazle E. Faisal and Tijana Milenkovic (2014), Dynamic networks reveal key players in aging, Bioinformatics, 30(12):1721-1729.

Reproducibility Score: 5/5

Reproducing the graphs was straightforward and took the promised 5-6 hours. Though there are small differences in values—probably due to the randomness in generating a network topology for the experiment—the reproduced plots show exactly the same trends as those in the blog post.

The Jellyfish paper and its motivations are summarized clearly. The challenges in reproduction were addressed cleverly, and it was fortunate that there were previous reproductions to warn away from using mininet for the project.

The expansion attempted to find patterns in the generated Jellyfish networks that could be observed from graph properties to explain its performance. For next steps (expansion to the expansion), it would be interesting to see if the generated Jellyfish networks have edge expansions close to d/2 as found in the Xpander paper.

Plots we reproduced: