Team: Stephen Chu, Karthik Jagadeesh. Thank you to Lisa Yan for her guidance.
Paper: Jellyfish: Networking Data Centers Randomly https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/singla
Source code: https://bitbucket.org/sachu/jelly
Source: Jellyfish NSDI ’12 Presentation
Jellyfish network composed of 432 servers attached
to a random graph of 180 switches
Jellyfish’s main goal is to contribute a new randomized network architecture that (1) allows easier incremental expansion and (2) provides higher overall throughput than the traditional fat tree topology. The authors claim that a network of randomly connected switches provides a more fluid structure to build on while still serving greater or equal throughput.
Source: Jellyfish NSDI ’12 Presentation
The Jellyfish contribution is important because it inspires a reset in thinking about how to plan the architecture of our future ever-expanding networks. We place a premium on architectural flexibility as our networks grow and component costs decrease. While the Jellyfish network seems impractical to deploy today because of the heavy operational costs in cabling and monitoring, the paper suggests that progress towards the physical automation of data centers will enable us to explore new network architectures.
The primary results in the Jellyfish paper are promising:
- Constrained to equal fixed equipment cost, the Jellyfish network provides for 25% more servers operating at full throughput (using MCTCP and 8-shortest-paths routing).
- Jellyfish mean paths are shorter than in fat-tree, allowing higher overall throughput.
- Compared to LEGUP , Jellyfish provides double the bisection bandwidth at 60% lower cost because Jellyfish does not need to reserve free ports for future expansion.
- Jellyfish’s total cost does not exceed the equivalent fat tree cost if cabling optimizations are used. Examples include placing the switches centrally to reduce link length and localizing a fraction of connections.
Reproduction of Figure 9
We attempted to reproduce the paper’s Figure 9, which demonstrates that ECMP does not sufficiently take advantage of the path diversity provided by Jellyfish. Instead, the routing strategy 8-shortest-paths produces many more distinct paths across the switch-to-switch links.
We picked Figure 9 because it exposes a clear limitation of Jellyfish in today’s networks: traditional routing strategies such as ECMP cannot extract the path diversity benefits of the randomized network, which are important for load balancing and utilizing full network throughput. Jellyfish calls for the investigation of newer routing strategies for further optimization of traffic flows, and it comes at a good time because the recent developments of centralized SDN controllers have eased the implementation, deployment, and testing of new routing strategies.
Certain ambiguities of the Figure 9 experiment also motivated us to attempt to reproduce the results:
- What was the degree of each node in the randomized switch graph?
- What was the server capacity of each switch?
- How does different traffic patterns such as all-to-all affect path diversity in Jellyfish?
To reproduce Figure 9, we started by implementing the randomized switch graph. We tested that our graph construction produces a single connected segment with the maximum possible number of edges between switches. Our entire network construction algorithm follows the paper closely and is as follows:
Building the random graph of switches
(1) While possible, randomly connect two open switches.
(2) If <= 1 open switch port remaining, end the algorithm because the graph is satisfied and we have created the maximum number of links.
(3) Else, there exists an open switch that should be connected with two other closed switches. Break the link between 2 closed switches and connect them both to the open switch.
Random assignment of servers to switches
Randomly assign each server to a switch that has open capacity for a server.
Generating random permutation of traffic
Randomly assign each server (sender) to send traffic to another server (receiver) that is not already receiving traffic.
Afterwards, we calculate the shortest paths for each (sender, receiver) pair. For each of 8-shortest-paths, 64-way ECMP, and 8-way ECMP we maintain a separate dictionary of (link:count) pairs, where the link is a directed switch-to-switch edge and count is the number of times that edge is traversed by the generated traffic.
Our reproduction of Figure 9 results
Our most similar plot was generated using the following simulation parameters:
- 686 servers (same as in the paper)
- 212 switches, each with 13 switch-to-switch ports and 23 switch-to-server ports
We also experimented with different combinations of numbers of switches, switch-to-switch ports, and switch-to-server ports. Additional plots can be found in the code repository.
Our reproduction was very similar in shape to the original Figure 9, giving us confidence in the authors’ claim that K-shortest-paths can use significantly more of the path diversity than ECMP. However, there are some differences between the plots that we will discuss in the evaluation section.
Evaluation of Reproduction
Challenges and Critique
The main challenge in reproducing Figure 9, aside from testing the correctness of the random graph generation, was understanding the parameters of the authors’ simulation. The paper states that 686 servers were used. However, the paper omitted how many switches were used and each switch’s numbers of switch-to-switch edges and switch-to-server edges. We tried to replicate Figure 9 with realistic estimations of the environment. Figure 9 shows roughly 2750 links. We matched this with 212 switches with 13 directed edges to other switches (212 x 13 = 2756). We chose for each switch to have 23 remaining open ports to servers to arrive at a realistic total 36 ports per switch.
The primary difference from the original and the replication is the higher number of distinct paths observed in the replication by the right tail (higher rank) of links. This may be caused by the algorithm’s randomness, but repeated runs of our simulation always yielded links that were traversed by at least 20 distinct paths. Another cause might have been cabling localization and centralization performed by the authors, although they did not state any optimizations performed for Figure 9. The paper states that cabling optimizations reduce server at full throughput capacity by several percentage points, and perhaps this lowers path diversity as well.
Our replication of the author’s figure shows that ECMP does not take advantage of the additional path diversity provided by Jellyfish, and that this argument is easily reproducible. We see in the random permutation (and all-to-all traffic extension) experiments that 8-shortest-paths outperforms ECMP in terms of number of distinct paths crossing over the links. We are now confident that the Jellyfish architecture cannot use ECMP routing to maximize throughput. This result is important because it makes us realize we cannot simply reuse traditional routing strategies in new network architectures.
Platform and Implementation
We chose to implement the simulations from scratch in Python2.7 for simplicity. We decided not to leverage network simulators such as Mininet because (1) implementing random graph and traffic construction and K-shortest path routing did not take advantage of these simulators’ supporting functionalities and (2) previous Jellyfish reproductions cited the inability to scale to 600+ servers using Mininet.
How to Run the Reproduction
Running the simulation involves cloning the repository and executing paths.py (written in Python2.7). The program will print out a counter until 686 (for 686 servers) and plot a replication of Figure 9 from the Jellyfish paper. Default parameters are 686 servers, 212 switches, each switch having 13 ports to connect to other switches and 23 ports to connect to servers. The output is the plot contained in figure9.png.
>> git clone https://firstname.lastname@example.org/sachu/jelly.git
>> cd jelly; python paths.py
Extension: N X N traffic simulation
Given the random network of edges generated in previous steps we assign each server to send traffic to every other server. The goal with this test is to measure a worst case performance for this network. Given N servers we would expect to send traffic in N*N-1 different permutations of the sender-receiver server configurations. We then compute the same 3 routing strategies to measure the effectiveness of the Jellyfish network in handling this level of load.
The new plot from this simulation closely resembles the previous ones we have seen but is much smoother due to the higher flow of traffic and the higher number of connections (N X N) between the servers. The 64 way ECMP and 8-way ECMP curves are very close to each other and in fact for smaller size networks with fewer paths the two curves end up being the exact same. As seen previously, ECMP does not perform as well as the 8-shortest-paths and this substantiates the author’s claim that even at high loads K-shortest-paths has much more path diversity.
Due to running time constraints we did not run this simulation on the original set of parameters. Instead the parameters used for this simulation are: 300 servers with 100 switches, 10 switch links, and 16 server ports. Because the number of servers is reduced by half we have proportionally reduced the other parameters to ensure the size of the underlying network is reasonable.
As expected, the big challenge in running the N X N simulation of this extension is the exponential increase in the number of connections between servers. We end up have an enormously high amount of connections between servers as we increase the number of servers. Additionally, as the number of servers increase we need to increase the number of switches in the underlying network as well the number of ports on each switch. The time to compute the shortest path algorithms also increases. It starts to become infeasible to effectively simulate N X N server communications due to both the exponential increase in the number amount of communication as well as the time to compute shortest paths.
 LEGUP: Using Heterogeneity to Reduce the Cost of Data Center Network Upgrades http://conferences.sigcomm.org/co-next/2010/CoNEXT_papers/14-Curtis.pdf