CS244 ‘13: Hedera


Team:  Behnam Montazeri, Milad Sharif

Key Result(s):

1) Compared Hedera Global First Fit in Fat-Tree topology against ECMP and Nonblocking network.

2) The simulation results for nonblocking topology in mininet agrees very well with what have been reported in [1].

3) The mininet simulations for ECMP routing over a fat-tree topology produce results that agree well, in terms of the trend, with what reported in [1] however the actual values for bisection bandwidth is lower than what reported in [1].

4) Our mininet simulations for Hedera global first fit algorithm shows higher bisection bandwidth than the ECMP algorithm (except for random traffic pattern) however the bisection bandwidth values does not exactly match to the values in [1].

Source(s):

[1] M. Al-Fares et al. Hedera: Dynamic Flow Scheduling for Data Center Networks(NSDI ’10)

[2] M. Al-Fares et al. A Scalable, Commodity Data Center Network Architecture(SIGCOMM ’08)

Contacts: Behnam Montazeri (behnamm@stanford.edu), Milad Sharif (msharif@stanford.edu)

Introduction:

Typical data center topologies consist of multi-rooted trees with many equal-cost paths between any hosts. However, existing forwarding protocols are optimized for single path networks. To take advantage of the paths multiplicity in datacenter networks Equal Cost Multi-Path forwarding (ECMP) is widely used. ECMP statically selects a path among all of the equal-cost paths using flow hashing. Hashing is used to ensure that all the packets belonging to a flow take the same path However, static forwarding as it shown in [1] might lead to aggregate throughput loss due to the random hash collision of two or more elephant flows in the downstream.

These issues motivated the authors in  [1] to design Hedera as a complement to ECMP in datacenter networks. Hedera is a centralized scheduler which dynamically schedules the flows in multi-stage network by collecting flow information from the switches. In their implementation, Hedera polls the edge switches for flow information and performs scheduling every few seconds.

The authors in [1] have presented both simulation and physical testbed results for different schedulers including   Global First Fit and Simulated Annealing for different traffic patterns  and compared the results with bisection bandwidth achieved by ECMP and ideal “non-blocking” network  for 20 different traffic patterns. The  physical testbed benchmark for a k=4 fat-tree and the simulation results for k=32 are shown in Fig. 9 and Fig. 10 respectively [1]. The authors in [1] reported up to 100%  improvement of Hedera comparing to static load balancing therefore we decided to replicate the experiment results by emulating the same k = 4 fat-tree topology in Mininet and tried to reproduce the result in Fig. 9.  However, due to the lack of time, we only implemented Global First Fit scheduler. 

Implementation

To replicate the results for Hedera, we broke down the implementation into few steps:

  • Creating the topologies: we used Mininet  to implement the nonblocking and fat-tree topologies. We limited the bandwidth of each link to 10 Mbps (as opposed to the 1 Gbps links used in [1]) , set the buffer size of each switch to 100 packets per port and allocated about 3% of CPU on EC2 instance to each of 16 hosts.
  •  OpenFlow controllers: we used POX to design the OpenFlow controllers for ECMP routing and Hedera scheduler.   We implemented the controllers based on the ripl , a Python library for simple data center controller,  and  added new functionalities to ease the implementation of Hedera.
  • Generating different traffic patterns: for generating different traffic patterns we started with load generator that the authors of the paper used for however we ended up using iperf to generate flows with exact same traffic patterns as in the paper.
  • Demand estimation: for this part we collect flow statistics every few seconds and estimate the actual demand of the large flows based on the collected flow stats using the recursive algorithm in [1].
  • Bandwidth monitoring to produce the output results: In order to get the bisection bandwidth we used bwm-ng which is a very reliable tool for monitoring the bandwidth.

The hardest part for us  was monitoring the flow statistics to estimate the traffic demands. In order to get the flow statistic, the controller sends OpenFlow stat-requests to all of the switches.  For some reason,  the controller was unable to correctly collect the flow stats using the load generator that the authors of [1] have used and we had to generate the traffics using iperf. The other issue we had was every time we update the routing tabels in the switches, some flows didn’t go through the new routes. It turned out that when the routes get updates the switches might still use the old route. So we ended up deleting the old routes before installing the new one.

Results:

The physical test bed results from the Hedera paper for a k=4 fat-tree topology and the Mininet simulation results for different schedulers using the same topology are shown in Fig.1 and Fig. 2. respectively. The nonblocking results are the highest bisection bandwidth that the schedulers can achieve. As it can bee seen from Fig.1 Hedera Simulated Annealing and Global First Fit algorithms produce near optimum bisection bandwidth that are much better than ECMP for most of traffic patterns.  Mininet results for Nonblocking and ECMP closely match those from the hardware testbed except for few patterns (stride 1 and randx). Hedera GFF scheduler shows some improvements comparing to ECMP  (except for the random traffic pattern). However the improvements for most of the traffic patterns are not as significant as it was reported in [1] . We suspect the discrepancies are mostly because of the stochastic behavior of mininet over the EC2 instances. We observed up to  25% variations for some of the traffic patterns.

Fig. 1. Physical testbed results

Fig. 1. Physical testbed results

rate_yaay

Fig. 2. Mininet results

In order to check for the dependencies of the simulations on EC2 platform, we also repeated the simulation on an xlarge instance. Figure3 shows the results for simulation on a xlarge instance. As we expected, the performance of mininet simulation is dependent on the instance and the bisection bandwidth vary noticeably based on the cpu availability. Moreover, we also observed that the results are also depending on the link speed and queue. Increasing the link speed, would decrease the simulation performance (as expected) and changing the queue values can effect the simulation results. The plots here are generated for link speed of 10Mb/s and queue =100 packets.

mininet results on xlarge ec2 instance

Fig.3. mininet results on xlarge ec2 instance

Critique:

We used the exact same traffic patterns that the authors of [1] used. Each pattern basically consists of few long-live flows that send data through the entire experiment. In most of our simulation the Global First Fit scheduler only changes the routes few times. Therefore, the simulation does not completely test the main goal of the paper which is dynamic flow scheduling and scheduling frequency didn’t have significant effects on the simulation results.

Moreover, in real data-center most of the flows are small flows. Since Hedera only  schedules elephant flows, the results would be more meaningful if the effect of small flows were included.

Instructions to Replicate This Experiment:

Create an Ubuntu 12.04 instance running Mininet 2.0

cd ~

git clone https://msharif@bitbucket.org/msharif/hedera.git

cd hedera

Follow the instructions in the README

3 responses to “CS244 ‘13: Hedera

  1. Reproducibility score: 5/5

    Great work overall, results were fully reproducible! We noticed slight variations in plot values, but your blog post explains the variability in EC2 instance performance.

    Some feedback on deployment:

    Point users to the preconfigured AMI, such as the class VM with Mininet pre-installed (CS244-13-Mininet ami-7eab204e), or provide a ‘install.sh’ script in your repository that takes care of downloading and installing Mininet and POX. This will make deployment a bit simpler.

    In the README, consider moving the notes about dependencies and pythonpath / symbolic links to the beginning of the document since they are critical to running the experiment correctly.

    Change the hard-coded dependency for POX in hedera.py (line 78) to use an environment variable or POX Python path in case the user has installed POX in a different location than their home directory.

    Also, an estimate of runtime would be helpful for new users.

  2. Pingback: CS244 ‘15: Hedera Flow Scheduling – Draft | Reproducing Network Research·

  3. Hello,
    Could someone help me to understand what is the bisection bandwidth and how is it calculated in this implementation?

    Regards

Leave a comment