CS 244 ’18: Evaluating F10, a Fault-Tolerant Data Center Network

Diveesh Singh, Jean-Luc Watson

Original Paper: Liu, Vincent, et al. “F10: A Fault-Tolerant Engineered Network.” NSDI. 2013.

In this project, we reproduce the results published in F10: A Fault-Tolerant Engineered Network by Liu et al, namely, that a switch topology co-designed with fault recovery protocols can robustly maintain connectivity even after experiencing many switch failures. Modern data centers form the backbone of cloud services, and thus require high availability at minimum cost. A common switch topology that mostly addresses these concerns is the “FatTree”: the resulting network is highly scalable, cost-efficient, and contains many redundant links that can provide fault tolerance in the case of a switch or link failure. Unfortunately, the F10 authors identify that in a number of scenarios, a FatTree, designed without serious consideration for fault tolerance, results in suboptimal network performance. Specifically, because the topology is symmetric, any switch attempting to route down through a failed child cannot route through any of its other children because they will in turn attempt to route through the faulty switch. This forces the affected data center to make use of expensive, long rerouting paths; such a system may not be able to respond quickly enough to prevent connection loss.

Full Report.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s