CS244 ’17: Xpander: Towards Optimal-Performance Datacenters


Team: Praveen Kumar and Matthew Volk

Paper: Xpander: Towards Optimal-Performance Datacenters. CoNEXT ’16

1. Introduction

Today’s datacenters suffer from sub-optimal performance as measured by many metrics, in particular: (1) network utilization, (2) resilience to failures, and (3) incremental expandability. The key concept of Xpander [1] is to connect the network devices so that the topology becomes an expander graph. The benefits automatically come from the graph theoretic properties of expander graphs. The paper proposes an algorithm called 2-lifting to generate such expander graphs for a given number of nodes and degree of each node, which is limited by the port count on each switch. The paper also delves into the operational considerations such as cable costs and floor planning that would be needed to deploy this in a datacenter.

The goal of the paper is to build an optimal datacenter topology in terms of performance while being scalable and deployable.

The key motivation for this work is that despite extensive efforts, today’s datacenters exhibit sub-optimal performance in terms of the metrics mentioned above. Drawing from the recent work on random topologies, the authors explore the key graph theoretic properties that lead to optimal performance.

The primary result of the paper is that Xpander achieves near-optimal performance in terms of throughput and bisection bandwidth guarantees. Further, the paper also claims that Xpander is close to optimal in terms of other metrics as well, such as robustness to traffic variations, resiliency to failures, incremental expandability, and path lengths.


2. Replication Goals and Motivation

Our first goal is to reproduce the generation of Xpander graphs and compare with Jellyfish [2] in terms of basic properties such as length of shortest paths. Following this initial test, we focus on performance in terms of throughput and resilience to failures. We explain each of these in detail in the following sections, along with key challenges faced and insights.

2.1 Generating Xpander and Jellyfish Graphs

First, we wrote simple scripts to generate the topologies required by the experiments in the paper. The following figure shows sample Xpander graphs that we generated with 2-lifting and d=3.

Screen Shot 2017-05-31 at 5.10.57 PM

Sample Xpander topologies generated (n is the number of switches)


2.2 Average Shortest Path Lengths

We compute the shortest paths for all node pairs and compare the average shortest path lengths with Figure 7 from the paper (see the graph below).

Screen Shot 2017-06-01 at 5.49.13 PM

Average path lengths as reported in the paper (Figure 7)

Average path lengths as reported in the paper (Figure 7)

Motivation: This graph is fundamental to the paper, because it provides a justification as to why Xpander graphs work so well: the average shortest path between any two hosts in an Xpander grows with number of servers like that of the Jellyfish network, near the optimal curve.

Reproduced Result: Computing average shortest paths was straightforward, and the following figure shows our results compared with average shortest path lengths for optimal topology [3]. On computing the shortest paths over these graphs, we observe the same growth pattern for Xpander and Jellyfish as in the paper, and the results match those mentioned in the paper.

sp_(32,_8)

(a) Reproduced result: k = 32. There are 8 servers placed under each switch.

sp_(48,_12)

(b) Reproduced result: k = 48. There are 12 servers placed under each switch


2.3 Throughput

Next, we wanted to compare the performance in terms of throughput.

Motivation: Figures 4a and 5 in the paper present their primary contributions: that throughput as a function of the number of servers in the network performs near optimally and can match the performance of Jellyfish networks. As such, we feel these are important graphs for us to reproduce.

2.3.1 All-to-all throughput

Results for all-to-all throughput from paper (Figure 4)

Results for all-to-all throughput from paper (Figure 4)

Method: First, we compare performance with all-to-all (uniform) traffic through the network, measuring throughput as a function of the number of servers. For all-to-all traffic matrix, the maximum throughput is defined as the maximum value of α, such that a uniform traffic matrix with all elements = α can be routed through the network without violating the capacity constraint of any link.

Assumption: The paper does not specify how they compute the optimal routing strategy to measure the maximum achievable throughput. So, we follow the following approach: Generate a traffic matrix with unit demand between every pair of nodes. Generate a concurrent multi-commodity flow routing problem that minimizes the maximum link utilization (MLU) and formulate this as a linear program. Find the minimum possible MLU and set α = 1 / MLU. This means if the uniform traffic matrix with each element α is used, then the MLU will be 1 when flows are routed optimally.

Challenges: The number of commodities grows as the square of number of nodes (n), and the number of variables in the LP grows cubic to the number of nodes, as the number of edges = n * d. As a result, the LP becomes really large, and it took a few hours to solve for even small number (around 150) of servers. Thus, we present the corresponding results with smaller number of servers. They match the expected results in this subset.

Screen Shot 2017-06-03 at 4.31.54 PM.png

Replication of Figure 4a

2.3.2 Throughput with k-shortest paths

The previous results showed theoretical limit on the performance. However, in a realistic setting, the paper proposes using k-shortest paths for routing and shows that the performance for Xpander and Jellyfish are almost similar:

Results for K-Shortest & MPTCP with K = #Subflows. 6 servers placed under each 36-port switch. From paper (Figure 5)

Results for K-Shortest & MPTCP with K = #Subflows. 6 servers placed under each 36-port switch. From paper (Figure 5)

Challenges: The description in the paper isn’t clear on what the metric means. What is the throughput normalized with? We tried a few possible interpretations of the graph, and were unable to determine what was precisely being shown in the graph. As a result, we decided to perform the following experiment.

Method: One key observation from these graphs is that Xpander is competitive with Jellyfish when using k-shortest paths. To validate this, we generate Xpander and Jellyfish graphs with similar node counts, and measure throughput with all-to-all traffic matrix similar to the previous experiment. But, instead of using optimal routing, we using k-shortest paths.
Assumption: For a source-destination pair, the traffic is split evenly among all the k paths.

Results: The following graphs shows our results for all-to-all throughput using k-shortest paths. The three graphs correspond to: a) average throughput for a node-pair is defined as α such that a uniform all-to-all α traffic matrix can be routed, b) the average total throughput achieved by a sender to all receivers, and c) the aggregate (sum over all node-pairs) throughput achieved by the network. The values are normalized by the corresponding upper bound as discussed earlier.

(a) Normalized throughput per node-pair

(a) Normalized throughput per node-pair

(b) Normalized throughput per sender

(b) Normalized throughput per sender

(c) Normalized aggregate throughput

(c) Normalized aggregate throughput

Discussion: The paper seems to lack critical information pertaining to the experiment they did and the precise metric presented. Since, we are not sure of the precise metric on the y-axis in the paper, the trend shown in our results doesn’t match those in the paper.Thus, we are not able to replicate the results.

However, through our modified experiments, we are indeed able to confirm the similarity between the performance of Xpander and Jellyfish. Our assumption of equal distribution of traffic over the k-shortest paths may deviate from MPTCP’s performance, but this has been kept consistent for both the topologies.

2.4 Incremental Expansion

Two stretch goals are to test that the incremental expansion works as simply as the authors claim and to see if we can reproduce the near-perfect throughput results they reported. The first of these will be somewhat qualitative, but we hope to make expansion as simple as changing hyperparameters in a script. The latter will be compared against the graph in Fig. 8.

2.4.1 Generating Graphs Incrementally

As described in the papers, incrementally expanding graphs can be achieved quite simply, requiring no more than 8 lines of Python in each case. Below are some incrementally-generated Jellyfish and Xpander graphs.

Screen Shot 2017-06-03 at 4.16.01 PM

Jellyfish graphs generated incrementally from a 6-node seed

 

Screen Shot 2017-06-03 at 4.16.17 PM

Xpander graphs generated incrementally from a 6-node seed

2.4.2 Incremental All-to-All Throughput

Screen Shot 2017-05-03 at 11.08.57 AMFigure 8 from Xpander paper: Throughput on Incrementally-Expanded Graphs

Screen Shot 2017-06-03 at 4.22.42 PM.png

Attempt to Recreate A subset of Figure 8

Challenges: As before, the description in the paper isn’t clear on what the metric means. What is the throughput normalized with? We tried a few possible interpretations of the graph, and were unable to determine what was precisely being shown in the graph.

Method: One key observation from these graphs is that Xpander is competitive with Jellyfish when generated incrementally. To validate this, we generate Xpander and Jellyfish graphs with similar node counts, and measure throughput with all-to-all traffic matrix similar to the previous experiment. We use optimal routing.

Results: The above graphs shows our results for all-to-all throughput using optimal routing on incrementally-generated graphs. Again, it is unclear what the authors wanted to normalize by, and computing throughput using optimal routing for more than 40 nodes took hours to run, so we were unable to generate the same data as they were due to time and resource constraints. Because the normalization applied to both are the same, we see that Xpander performs within a factor of three of Jellyfish in this constrained dataset, though our results here are far from conclusive.


2.5 Resilience to failures

Motivation: An important result mentioned in the paper is the resiliency of Xpander topology being competitive to Jellyfish. Thus, we attempt to replicate the following figure.

Resiliency to failures. Result from the paper (Figure 6)

Resiliency to failures. Result from the paper (Figure 6)

Method: Similar to the paper, we create Xpander and Jellyfish topologies of similar sizes and measure throughput with an all-to-all uniform traffic matrix while varying the probability of link failures.

Result: The following figure shows our results with k-shortest paths routing, and the throughput vs failure rate measured is similar to as mentioned in the paper.

Replicated results for throughput vs link failure rate.

Replicated results for throughput vs link failure rate.


3. Instructions to replicate

3.1 From source

  1. Set up a Google cloud instance, following the instructions here:
    https://cloud.google.com/compute/docs/quickstart-linux
    Note that you will want a machine with at least four processors for reasonable performance as well as (ideally) at least 48GB of RAM. We recommend using either Debian or Ubuntu, though any Linux distribution should work.
  2. The source code for our experiments is at https://github.com/prvnkumar/xpander.
  3. Install the dependencies by running download_deps.sh. See requirements below if not running on a supported architecture.
  4. Start the virtual environment using source venv/bin/activate from the outermost directory of the repo.
  5. To regenerate the graphs, just run the script run.sh. This may take an hour or two depending on the hardware.
  6. The data and graphs will be generated in the output/, graphs/ and incr_graphs/ directories, respectively.

3.2 Requirements

If you want to run on a slightly different architecture, you will need the following:

  • Python 2.7.X (tested with 2.7.9)
  • Matplotlib (version 2.0.2, important!)
  • Mininet
  • PuLP (a Python linear equation solver)
  • Networkx
  • Standard Python libraries (math, numpy, random)

4. Critique

Overall, we found that the results about basic properties of Xpander graphs seem to hold pretty well, and are competitive with Jellyfish. For results based on optimal routing, we found it difficult for the linear programs to scale to the size mentioned in the paper. For some of the results, we found that the paper lacks critical details which are necessary for replicating them.

5. References

[1] Valadarsky, Asaf, et al. “Xpander: Towards Optimal-Performance Datacenters.” Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies. ACM, 2016.

[2] Singla, Ankit, et al. “Jellyfish: Networking Data Centers, Randomly.” NSDI. Vol. 12. 2012.

[3] Singla, Ankit, Philip Brighten Godfrey, and Alexandra Kolla. “High Throughput Data Center Topology Design.” NSDI. 2014.

Advertisements

One response to “CS244 ’17: Xpander: Towards Optimal-Performance Datacenters

  1. Reproducibility score 5:
    The code ran to completion and the graphs matched with the results reported in the blog post. The blog post was thorough and replicated several parts of the original paper. The visualizations of the built Xpander topologies were also helpful.
    The dependency installation script didn’t precisely work. The script checks “if [-d “$mininet”]; then …” and then installs mininet and the virtualenv. In other words, the script only installs dependencies if the mininet directory already exists, which means the dependencies never get installed. Eventually we had to install the dependencies by hand.
    The graphs were all present and in a folder. However, it would be useful to specify (for example, in the README) which file corresponds to each figure in the blog post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s