Team: RJ Walsh and Emin Topalovic.
Key Result(s): Tail-drop queues, coupled with asymmetric traffic flow to a single output, causes flows in ports with fewer flows to lose out on throughput. This issue occurs in a demonstration topology as well as a Fat Tree. We further show that routing style does not necessarily help alleviate this issue.
 Exposing Unfairness in Data Center Networks. Pawan Prakash, Advait Dixit, Y. Charlie Hu, and Ramana Kompella. NSDI 2012.
We explore the problem of TCP outcast as presented by . Outcast presents itself under two specific conditions. The first is the use of tail-drop queuing in your routers. The second is an asymmetry in the number of flows arriving at different ports of a router with the same destination. This causes port-blackout. Port-blackout occurs when a series of back-to-back packets coming into a port are dropped. So when two different ports send to one port, one of the senders gets unlucky and has its packets dropped due to port-blackout.
The asymmetry exacerbates this problem since the port with few flows could lose a flow’s congestion window tail. This leads to more catastrophic effects than seen with the other port, which may drop the same amount of packets but interspersed over a large number of flows. This is less consequential because from any one flow’s view only few packets are dropped leading TCP to react in a less severe fashion.
|Brocade FastIron GS series||Taildrop|
|Cisco Catalyst 3750-E||Weighted Taildrop|
|Cisco Nexus 5000||Taildrop|
Table 1. List of popular COTS routers and their queuing policy from 
The conditions for outcast are actually fairly easy to meet and are met by many data centers. The use of commodity off-the-shelf (COTS) routers means the tail-drop requirement is often satisfied as most COTS routers resort to this for their queuing control. Table 1 from  represents this by showing the queuing policy of common COTS routers.
The multi-rooted tree topology of data centers contributes to the asymmetry requirement. If we consider a common topology, the fat-tree topology, we see exactly this condition arising due to the fact that senders are likely to originate in different parts of the network and are thus likely to arrive at different input ports to the switch right before the receiver.  shows that the number of nodes 2n hops away increases exponentially, and therefore the number of nodes in disparate locations is high. Figure one demonstrates this happening in a fat-tree topology.
In this work we will recreate the experiments of  showing outcast occurring. We will do this by setting up a port-asymmetric arrival pattern to tail-drop switches, where (n-1) of n flows will arrive at one port and 1 flow will arrive at another. The work will show the adverse effect of the asymmetry on the flow arriving at a port by itself. We will do this first on a more contrived topology to demonstrate the effect and then consider a realistic fat-tree topology. To finish we consider the effect of non-static routing on Outcast by using hashed routing to route the flows in fat-tree.
The primary goal will be to replicate Figure 2 from .
We use Mininet to recreate the topology in figure 3. We use a sender per flow, as outcast is dependent on flow count versus host count this works well with the requirements. We make (n-1) senders send to an aggregate switch which outputs to one port of the switch before the receiver. We use another port for a single host sending a single flow. This creates an asymmetry as now there are (n-1) flows going into port-0 of the receiver switch and 1 flow going into port-1. We plot the throughput of each sender.
We also use Mininet to build a k=4 fat-tree topology. We set up one host to be the receiver and 15 other hosts as senders. One of the sender hosts shares a switch in the same pod as the receiver and so will have its own port on the switch. The other senders will be coming in through the remaining two ports on the switch connecting to the receiver.
We let the senders run for 60 seconds plotting the throughput of each sender to the aggregate or the receiver switch. In our data, S1 was the receiver switch and S2 was the aggregate switch. We ran 2,6 and 12 flows in competition with the one flow sending to S1.
Our results exactly mimic those of . We see that as the number of competing flows increase, the throughput of the single-flow drops dramatically, while the flows coming in within the larger set share bandwidth fairly equally.
Figure 4: Outcast with topology of Figure 1. With 2, 6, and 12 competing flows
Effects of Topology and Routing
We next consider the effect of topology on outcast. To do this we form a k=4 Fat Tree topology, having 15 of the 16 hosts send to one other host. We show only a subset of the hosts which do not share the same pod switch as the receiver because the graph gets harder to read with the full 15 and their behavior is the same as the selected ones.
We first use spanning tree routing which ensures that there is one path leading to the pod switch before the receiving host. This should recreate the earlier conditions where 14 of the 15 sending hosts (those not connecting to the same pod switch as the receiver) will all send through one port while one of the sending hosts will be sending through another, competing for the one output link. We get the results in Figure 5. These results show an extreme occurrence of Outcast where the throughput of the isolated sender does not even register. This makes sense as now there are even more competing hosts than in the previous example where the throughput was quite low already.
We also tested the effects of routing, considering routing based on hashes. We wanted to see if the distribution across the incoming ports would lead to a lessening of the effect. Our results can be seen in Figure 6. We actually observe no improvement to the throughput of the outcasted node as a result of this. One possible reasoning is that the distribution is just not enough given that there are only two ports to spread the load over. Even with perfect uniform hashing this leads to about seven hosts on each port competing with the one. We do, however, see other flows getting starved in the hashing case. One way to explain this is that now instead of having all the flows except one go on one port you have all the flows except, say, 2 go on one port and then you are creating an asymmetry between the 13 flow bulk and the two flows that are now by themselves. This could result with unfortunate hashing, but this is just speculation and further study would be necessary. Finally, the drop in throughput around the thirty second mark could be due to RipL’s response to timed-out paths, something that is currently untested and possibly leads to lost flows. The spikes at the end are harder to explain, but could be explained due to special behavior arising out of tear-down procedures, though this is clearly simply speculation.
Lessons Learned: The main result we found that surprised us was a change in Mininet between the two instances. For the final project, a new instance type was created which used DCTCP and allowed us to leverage RipL for the Fat Tree topology. However, this version of Mininet changed the drop discipline for the router, which is one of the key components of the outcast problem. As a result, we were unable to reproduce our linear topology results using this instance.
The assignment was relatively straightforward for most of the work we did. Since we already had experience from assignment 2, we were able to simply leverage most of the code we had already written for this assignment as well. The most challenging aspect was making the assignment work with the Fat Tree topology, which required us to introduce RipL and POX, additional libraries that we had yet to work with. Ultimately it felt like the configuration was very fragile and opaque, requiring a lot of conceptual overhead to integrate. An example of this is the fact that RipL’s behavior after path timeout is untested, something that was not known and could have contributed to result inconsistency.
Finally, we saw minor performance issues on smaller instances. When we first started on a t1.micro instance, the tasks often failed to finish successfully, but switching to a c1.medium instance type solved this issue for us.
Instructions to replicate this experiment: There are two components to this experiment. We found that both gave good results on a c1.medium sized instance.
For the Linear topology (Figure 3):
- This should be run on the cs244 AWS instance in the Virginia East datacenter. Create an instance using this template to get the configuration desired
- Note: newer versions of Mininet have changed the queue drop mechanism away from tail drop. This experiment will NOT work with these
- Clone the repository at https://github.com/cybergasm/CS244-Outcast
- As root, run ./outcast-sweep.sh. This should take about 5 minutes to do runs for n=2, 6, and 12 based on the topology described above.
For the FatTree topology:
- This should be run on the cs244/dctcp AWS instance in the Oregon West datacenter. Create an instance using this template to get the configuration desired
- The project requires RipL and POX to run. Clone the resository at https://github.com/brandonheller/riplpox and follow the setup instructions in INSTALL.
- Clone the repository at https://github.com/cybergasm/CS244-Outcast
- Disable MPTCP and DCTCP if your machine supports it:
- sysctl -w net.ipv4.tcp_dctcp_enable=0
- sysctl -w net.ipv4.tcp_ecn=0
- As root, run ./fattree-sweep.sh. This should take about 5 minutes to do a run for a Fat Tree topology of k=4
- To use spanning tree as the routing algorithm, run ./fattree-sweep.sh -r st
- In case you get stuck on ‘0_0_2 waiting to listen to port 5001’ just restart the experiment as this is pox+mininet+ripl strange state land