DCell Topology: Why not to use custom pox controller’s packet-in handler
Team: Jessica Fisher and Alex Valderrama
Key Result(s): DCell topology can be simulated in mininet except that the routing policies should not be simulated with a custom pox controller as it causes numerous issues with the mininet topology
This paper presents a topology called DCell as a solution for data center networking (DCN). The three goals for DCN are scalability, fault tolerance, and high network capacity. For the scalability goal, it’s important that more servers can be added to the network without affecting the already connected servers. Since data centers are constantly growing, having to stop the entire data center and reorganize the network in order to add more servers is infeasible. Unfortunately, with hundreds of thousands if not millions of servers within a single data center network, server and link failures are quite common which makes it important that any suggested topology handle these failures gracefully hence the fault tolerance goal. Finally the high network capacity goal is crucial for data center networks since many big data computations include many-to-many communication patterns which lead to high bandwidth usage.
The current DCN practice when this paper was written in 2008 was to connect all of the servers using a tree hierarchy of switches, core-switches or core-routers. This can support scalability though sometimes it will be necessary to buy additional expensive high quality switches as the hierarchy grows. Also, the tree hierarchy means that there are not many duplicate paths between two nodes so if a link or switch goes down, it completely stops traffic between certain pairs of nodes. Lastly, a hierarchy of switches doesn’t have high bisectional bandwidth which can cause issues with high bandwidth using applications. Due to this limitations in the current DCN practices, the authors proposed the DCell topology which is recursively defined along with specialized routing algorithms.
One of the important goals for the DCell topology was fault tolerance. In order to gauge the fault tolerance of DCell, an experiment was run in which two links within a level one DCell were removed and then reintroduced 8 seconds later. Also later during this experiment, a server node was shutdown to access the impact of node failures. The authors found that the TCP throughput was only affected for a few seconds before returning to base level after both of these types of failures. The goal of our experiment is to replicate these observations and specifically figure 11 in the paper which shows TCP Throughput during this specific fault tolerance experiment. We chose this subset result to replicate because the main goal of the DCell Topology is fault-tolerance so measuring TCP throughput during node and link failures will allow us to validate that DCell properly handles topology failures gracefully.
To replicate the results from this experiment, we built the DCell topology within mininet for the DCell-1 level which means there are 5 DCell-0 clusters where each DCell-0 cluster has 5 hosts and 1 master switch. In mininet, a host is not capable of the complicate routing intelligence needed within the DCell topology since hosts needs to route directly to other hosts so we added a layer of abstraction. In our DCell topology, each host is only connected to a switch which takes the place of the host in the DCell topology. This will allow us to control the routing behavior of all of the switches in the network (20 switches and 5 master switches). In order to control the routing behavior, we utilize POX which allows us to write switch controllers. These switch controllers hook into the mininet switches in the following way. When a switch gets a packet that doesn’t match any of its routing table entries, our controller’s packetIn handler is called and thus gets to control where the packet is sent. All of the topology’s routing logic is implemented in this packetIn handler.
Unfortunately, while there are a lot of surface tutorials for POX, there are not a many in-depth guides on how to combine mininet and POX together in order to control packets. The packet controller requires a lot of code to just achieve low-level functionality. It is also hard in the POX controller to figure out which switch the controller is communicating with. It took us hours of searching to figure out how to get a switch name and port information when the switch is attached to the topology so we could record that information for use in our routing logic later. Often, we had to look at the POX source code to understand where to get information that we required and how to actually call the methods that we saw in the tutorials which were used without much explanation. Then we had to implement ARP responses in order for the hosts to discover IP address-MAC address pairings in order for the ping protocol to work correctly. Once we figured this out, we then spent hours trying to figure out why ping worked by iperf did not work. Turns out that the switches only send the first 128 bytes of a packet to the controller which means that for long TCP packets, the packets were truncated causing the controller to forward truncated packets and thus the TCP handshake wasn’t working properly. After we changed this setting we were able to get the built in mininet iperf command to work to show the throughput along the path within the topology that we were trying to replicate (which is from host (0,0) to host (4,3)).
However, the throughput reported was only 0.1% of the expected throughput (1.7MB/sec instead of 1000MB/sec). We were unable to figure out exactly why this measured throughput was so much lower than the expected throughput based on the link bandwidth along the path but we hypothesize that the latency involved in the custom pox controller handler could be causing the problem. Routing table lookups take almost no time but in this simulation when a packet was received at a controller it would not match any of the routing table entries and the switch would have to call the custom controller’s packetIn handler which caused overhead along with the routing logic which had to run to figure out where this packet should go.
However, we’re not sure how else one would be able to react to link failures and thus change the routing table entries for the topology as the ideal simulation requires. It is quite possible to set the original path between two nodes, then reset it after a certain amount of time to simulate a link failing yet this doesn’t really capture the true meaning of the results which was to measure the performance of the failure detection in the routing algorithms.
Overall it seems that the platform of mininet and POX along with implementing the routing logic within the packetIn handler of the controller were design decisions that make it impossible to reproduce the results from Figure 11 in the paper. We were successfully able to create the topology and run iperf along specific paths but the delay from the controller handler made it infeasible to reproduce the results.
Instructions to Replicate This Experiment:
To replicate this experiment you will first need to copy an Amazon EC2 Instance provided by Stanford’s CS244 class. For instructions on how to do so see this link: http://www.stanford.edu/class/cs244/ec2.html
Once you have your instance setup, login to your home directory and use the following command to clone this repository:
From the newly created cs244-dcell/ directory run the following commands:
- mv cs244-pa3 ../
- mv pox/ext/DCellController.py ../pox/ext/DCellController.py
- cd ../cs244-pa3
- chmod 777 run.sh
- sudo ./run.sh
Code is available at: https://bitbucket.org/avald/cs244-dcell