CS244 ’18: Beyond Fat-Trees and into the “Xpanse”

Gabriele Fisher, Reynis Vazquez-Guzman

Original Paper: Simon Kassing, Asaf Valadarsky, Gal Shahaf, Michael Schapira, and Ankit Singla. 2017. Beyond Fat-trees Without Antennae, Mirrors, and Disco-balls. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17). ACM, New York, NY, USA, 281–294. https://doi.org/10.1145/3098822. 3098836

Network topologies continue to be an active area of research, with academics and industry engineers alike proposing a multitude of novel topologies both static and dynamic. From randomly generated graphs to dynamic topologies using lasers and disco balls, the field now has many contenders for the best data center network topology. For our final project, we were drawn to Kassig et al.’s paper “Beyond Fat-Trees without Antennae, Mirrors, and Disco Balls” because of its efforts to sift through the competitive playing field of topologies and identify best (and realistically usable) topologies.

The paper is separated into two primary arguments. The first is a casual argument that dynamic topologies are not cost effective to deploy and maintain. Second, they argue that the static Xpander topology offers the highest ratio of switches-to-hosts, and makes for the cheapest yet easily extensible network. Unfortunately, Xpander with equal-cost multi-path (ECMP) routing performs worse than fat-trees using ECMP. The authors propose a hybrid routing algorithm called HYB and demonstrate that Xpander-HYB can better approximate the performance of fat-tree ECMP. They use empirical evidence to demonstrate that a Xpander graph can use this hybrid routing algorithm to optimize throughput and flow completion time. We focus on this second argument, and seek to replicate a subset of its experiments.

HYB is a hybrid of equal-cost multi-path (ECMP) and Valiant load-balancing (VLB) as an optimized routing scheme. This algorithm, which the authors name HYB, uses flow size (packets sent so far) to decide whether to switch from ECMP to VLB. The authors choose to make this threshold Q=100KB in this paper. They argue that this load-balancing scheme is well-suited for skewed workloads across the top of rack switches in a data center. We seek to replicate a subset of experiments using Xpander HYB, Xpander ECMP, and fat-tree ECMP running under random permutations of network traffic.

Full Report.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s