CS244 ’13: Jellyfish, Networking Data Centers Randomly


Project Name: Jellyfish vs FatTree Topologies

Team Members: Lynn Cuthriell (lcuth@stanford.edu) / Patrick Costello (pcostell@stanford.edu)

Sources:

Jellyfish: Networking Data Centers Randomly

Code:

https://github.com/pcostell/cs244-jellyfish

What problem is this paper trying to solve?

“Jellyfish: Networking Data Centers Randomly” tries to solve the problem of incremental expansion of data center networks. Data center networks must be high-bandwidth and support a large number of servers, and they frequently grow as a result of changes in use. Current popular topologies, such as FatTree, are heavily structured and difficult to expand incrementally.

The proposed solution is Jellyfish – a random, unstructured topology that is high capacity while still allowing for easy incremental expansion. Jellyfish adds links randomly between switches to form a regular random graph. This method allows switches and hosts to be added individually to form networks of an arbitrary size.

Why is this problem important/interesting?

Data centers are incredibly important in today’s computing environment and are a significant investment for companies. Efficient use of resources is critical. However, hampered by inflexible topologies, companies who wish to increase the capacity of their data center must either expand in large chunks or oversubscribe certain switches. Both options are inefficient and costly.

If Jellyfish permits incremental expansion, reduces the cost of equipment, and maintains throughput, it could be a significant improvement in the design of data centers.

What did the original authors find?

The original authors found that in addition to allowing for incremental expansion, Jellyfish is more cost efficient than fat-tree, enabling 25% more servers using the same switching equipment. They also found that Jellyfish topologies have the same or higher throughput per-server than the comparable fat-tree topologies, and a lower mean path length.

These results are contingent upon the use of k-shortest-paths routing rather than ECMP, because the authors discovered that ECMP does not provide enough path diversity for Jellyfish.

What subset of results did we choose to reproduce?

We chose to reproduce the graph of link path counts and the table of per-server throughput with different topologies and types of TCP (with the exception of MPTCP). Both are shown below:

link-countsthroughput_table
Why that particular choice?

We chose to reproduce these results because we felt that they were the most important sections of the paper. The incremental expansion is inherent to the design and doesn’t need reproducing – what’s critical is how that incremental expansion affects benchmarks that matter in data centers. Of all the benchmarks the authors measured – path length, cost-effectiveness, resilience to failure, number of servers supported, per-server throughput, and flow-fairness – throughput is the most critical. Loss of throughput would negatively impact the ability to scale with more hardware, the main goal of Jellyfish, so throughput is the most important metric to verify.

The link-counts graph is also good to verify, because it confirms the importance of using k-shortest-paths over ECMP. The authors introduce the use of k-shortest-paths because their topology doesn’t work well with the more standard ECMP protocol. We thought it would be important to verify that this relatively significant change is the right thing to do.

How well do the results match up?

Our results for the link path counts were very similar to the results seen in the paper. k-shortest-paths provides significantly more path diversity than ECMP, even with our smaller topology of 20 switches and 16 hosts. The reproduced graph is shown below:

link_utilization

Our results for per-server-throughput show that Jellyfish performs better than the FatTree on most runs, with the performance gap widening as the number of flows increase. The reproduced table is shown below:

Jellyfish-Throughput-Results

At first glance, it appears that ECMP performs better than k-shortest paths on Jellyfish. However, we believe this to be because our network is small, with little possible link diversity even in an ideal case. Our data shows that as the number of flows increases, ECMP gets worse and k-shortest-paths improves. Presumably, on a big enough network, this trend would be enough to make k-shortest-paths the clear winner on Jellyfish. This supports the author’s claim that link diversity, facilitated by k-shortest-paths, improves Jellyfish’s performance.

What challenges did we face in implementation?

The biggest challenge that we faced was interfacing with riplpox. We spent the majority of our time debugging problems in the connection between our pox controller and mininet. Further than that, we faced difficulty reproducing the experiment at full scale because mininet (or mininet and riplpox together) had problems handling our larger switch simulations. Due to this difficulty, we chose to replicate the experiments at a smaller level.

Does the thesis hold?

The paper presented two main theses. The first is that Jellyfish is a good alternate to the FatTree because it increases link diversity, thus decreasing congestion. The second thesis is that this topology would be easy to build and continuously scale with better or more hardware.

Our analysis of the paper is that is its first thesis holds. We were able to verify that Jellyfish usually has better per-server throughput than FatTree and that using k-shortest-path gives Jellyfish even better link diversity than ECMP. Our analysis of per-server throughput does not show better results with k-shortest-path than with ECMP. However, our analysis does show that as we scale the topologies with more flows, k-shortest-path improves, while ECMP gets worse. We hypothesis that if we could have expanded our experiment to more nodes, the k-shortest-path throughput would have performed better than that of ECMP.

As for the second thesis of the paper, that a Jellyfish data center could be achievable, we are less convinced. The authors spend a fair amount of the paper discussing the best way to design a data center to use Jellyfish. Because Jellyfish randomly assigns links, the cable management for such a system could get very complicated. Further, the layout of the data center could have a huge impact on latency, since server locality (such as between switches in a FatTree pod) is not guaranteed.

How reproducible is the setup?

We require a small number of additional programs to run our experiment, in addition to the two following scripts:

sudo ./setup.sh
sudo ./run.sh

The first script will install the python modules for ripl, riplpox, and our module, jellyfish. The second script will run the experiment. The experimental data is output in two files:

link_utilization.png
throughput.txt

The first file is a graph showing the number of paths used by the different protocols on jellyfish. This graph corresponds to Figure 9 in the original experiment. The second file contains the table that corresponds to Table 1 in the original experiment.

Please see the README in our Github for more instructions about how to reproduce our results.

Feedback on mininet/ec2 hurdles/pox/riplpox/ripl or any other third-party code you used, or any other suggestions/rants?

Riplpox, while claiming to be a “generic” tool for dealing with data center routing, seems like it could be better designed to allow for any sort of fixed network routing. Jellyfish is a data center topology, however it did not interact well with riplpox, which depends on many characteristics of a standard hierarchical topology, such as switch layering.

We spent many hours searching for a bug in that ended up being the result of us accidentally adding the reverse of a link to the mininet topology. Rather than failing, printing warning messages, or gracefully ignoring the second link, mininet instead overwrote ports on the switches attached to the link. This was very frustrating, as everything appeared to work fine (including pingall), only not send anything over iperf. This happened because riplpox does not send the first packet of a flow over the network, rather it uses it to install the routes through the network, then delivers it directly to the destination. Because this allows pingall to complete, we believed our network to be fully connected, even though it wasn’t.

 

One response to “CS244 ’13: Jellyfish, Networking Data Centers Randomly

  1. It was extremely easy to reproduce the results. The directions were clear, and the output matched almost exactly. Score: 5

    – Calvin and Bryan

Leave a comment