CS244 ’13 DCTCP


Team:

Prachetaa Raghavan, Arjun Gopalan

Key results:

DCTCP’s low and steady queue occupancy was verified and replicated using mininet. The results are plotted in comparison to TCP. DCTCP’s throughput characteristics are also analyzed and it obtains full throughput as long as the queue marking threshold K is set above a reasonable minimum threshold.

Sources:

[1] Alizadeh, Mohammad, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. “Data center tcp (dctcp).” ACM SIGCOMM Computer Communication Review40, no. 4 (2010): 63-74..

[2] Handigol, Nikhil, Brandon Heller, Vimalkumar Jeyakumar, Bob Lantz, and Nick McKeown. “Reproducible network experiments using container-based emulation.” In Proceedings of the 8th international conference on Emerging networking experiments and technologies, pp. 253-264. ACM, 2012.

[3] https://reproducingnetworkresearch.wordpress.com/2012/06/06/dctcp-and-queues/?csspreview=true

[4] S. Floyd. RED: Discussions of setting parameters.

[5] http://lartc.org/howto/lartc.adv-qdisc.red.html

Contacts:

Prachetaa Raghavan (pracheta@stanford.edu)
Arjun Gopalan (arjung@stanford.edu)

Introduction:

DCTCP is a new TCP based protocol that seeks to achieve low latency for foreground flows, high throughput for background flows and guarantees a queue occupancy that is an order of magnitude lower than that in traditional TCP. These issues are particularly relevant in data center networks. One of the major issues that DCTCP tries to address is the latency requirements of applications. DCTCP uses ECN to achieve the above goals but the key difference in DCTCP’s use of ECN is the way the sender responds to ECN echo bits. In the case of TCP, the senders response to congestion in a fixed manner. If they see a packet drop, they halve the window. In DCTCP, the ECN-enabled sender estimates the amount of congestion in the network, based on the fraction of marked packets over the last RTT. The sender’s congestion window is then updated based on this estimate. The beauty of this is that the sender will respond to congestion within 1 RTT after the queue occupancy becomes greater than K, the queue marking threshold. This ensures a small queue length at all times. This also means that the DCTCP is less aggressive than TCP when it comes to congestion window modification.

The paper includes extensive analysis and measurements. In addition to measuring the queue occupancy and throughput, they measure completion times for different kinds of traffic and  analyze impact of dynamic buffering. We were interested in replicating Figure 13 and Figure 14 from [1] (shown below for reference) because those graphs precisely compare the queue occupancy and throughput of DCTCP with TCP. These two metrics are relatively easier to measure and they are fundamentally more important than other metrics such as completion times. The main contribution of the paper is low queue occupancy and high utilization. These are exactly what we wanted to verify. Figure 13 plots the cdf of the queue occupancy for both DCTCP and TCP and Figure 14 plots DCTCP throughput vs marking threshold (K) for different values of K. As a first step, we wanted to validate our experimental setup. So we decided to replicate Figure 1 from [1] (again, shown below for reference) which the plots congestion window over some time interval. As an extra step, we wanted to verify the impact of N, the number of hosts on the throughput of DCTCP at the bottleneck link. So in this case, we basically plot Figure 14 but for different values of N.

fig1

fig13

fig14

Implementation Overview:

Our experiment is run on mininet using a DCTCP enabled kernel. The topology is a star topology with N hosts where N-1 of them are senders of long lived flows to the Nth host (receiver) through a bottleneck link. All the link rates are 100 Mbps in the experiment. Although the paper uses link speeds of 1 Gbps and 10 Gbps for a few experiments, mininet does not allow accurate measurements beyond 100 Mbps [2]. However, we are still able to qualitatively verify and compare DCTCP’s performance.

Setup validation:

In order to replicate Figure 1, we created a topology with 2 senders sharing a bottleneck link to the 3rd host. The RTT between a sender and the receiver 1 ms, K = 20 packets, max queue size = 200 packets. The following replicated figure proves that the setup is working well with DCTCP maintaining a consistently low congestion window. We also observed that as we increased the RTT, keeping K fixed, the variation in queue size for DCTCP increases.

dctcp_tcp_queue

Figure 14:

The figure below shows the variation of throughput for various K values for DCTCP. The topology is same as mentioned above except that the RTT is now 4 ms. The reason for a higher RTT is for showing a better variation in throughput as K changes. Having a very small K value may not yield interesting results. The prevent the bottleneck queue from underflowing, the value of K must be > RTT * C / 7 where C is the bottleneck bandwidth. As the paper has shown, the throughput increases with increasing values of K. This is because the packet’s ECN bit gets marked only when the instantaneous queue occupancy at the bottleneck link reaches K. Hence, when the packets have their ECN bits set, there is a reduction in the window size at the sender. Therefore, it is logical that an increase in K will lead to an increase in throughput, assuming we have not reached the maximum queue occupancy. We observe a similar pattern here too and this validates the graph shown in the paper. We choose the value for ‘burst’ to be 100 because this will result in successful EWMA (exponentially weighted moving average) calculation for all of the ‘K’ values in the graph. This is because RED settings require burst to be > K [5]. However, this places a higher weight on the moving average and hence will take a longer time to converge. We thus wait for the experiment to stabilize before we measure throughput.

k_sweep

Throughput vs K, for varying N:

The figure shown below is an extra addition beyond the results from the paper. We wanted to observe the variation of throughput with varying number of senders, all connected via the same switch and share the same bottleneck link to the receiver. We use the same topology as Figure 14 but we keep increasing the number of senders connected to the switch. We observe that as we increase the number of hosts from 3 to 10, the throughput also increases. When the number of hosts is 10, even for a very small K (such as 5), the throughput is around 95 Mbps.

The reason is that as the number of hosts sharing the same bottleneck link increase, even when the packets’ ECN bits are set, there are other flows sharing the link which can keep the link utilized. Consequently, we observe a significant increase in throughput as we increase the number of hosts. We see that when K reaches 20, all of the lines merge together, showing that for K >= 20,the effect of K on throughput is insignificant as each flow is capable of keeping the link utilized, which is shown in the replication of Figure 14. As mentioned in the paper, the actual value of K that is required for 100% throughput may be a little higher than the theoretical requirement owing to reasons such as TCP stack implementations, MTU settings and network adapter configuration. In addition, the EC-2 environment may cause other minor variations.

n_and_k_sweep

Figure 13:

The graph plots the CDF of queue occupancy with 2 flows and 20 flows sharing the same bottleneck link with the bottleneck switch buffer capable of holding 200 packets. The RTT is 2 ms and the value of K is set to 20. The test is run for a long time, for both DCTCP and TCP because we needed enough sampling points for queue occupancy to get a good plot of the CDF. We observe that DCTCP has a low queue occupancy throughout and as the number of flows increase, the queue occupancy only increases marginally. On the other hand, for TCP, the queue occupancy is high and as the number of flows increases from 2 to 20, the queue occupancy increases substantially and in both cases, converging to 200 packets.

cdf_flows

Conclusions:

We have been able to qualitatively verify and compare DCTCP’s queue and throughput characteristics on mininet. The topology  involved is a simple star topology with N-1 senders sharing a single bottleneck link the Nth host. Since DCTCP has been replicated to a good extent, as in [2] and [3], we were careful enough to watch out for the nuances in setting up and running our experiment. The environment for the experiment is as important as the experiment itself. The accuracy of the results largely depends on timing issues. They should be measured only in steady state and with no other extraneous factors affecting the experiment. The EC-2 environment can be  unpredictable at times and we did encounter some strange behavior when plotting DCTCP throughput as a function of K, the queue marking threshold.

Platform choice:

We have chosen Mininet as the platform for this assignment for a number of reasons. To name a few, an EC-2 instance with DCTCP enabled kernel already exists, earlier experiments on reproducing DCTCP results have used Mininet and explored the limitations and ease of reproduction. Finally, the graphs that we are interested in involve a simple topology and just monitoring of throughput and queue occupancy. These are familiar to us from the earlier assignments. That said, tuning RED parameters will greatly affect the reproducibility of the experiment. As alluded to earlier, one has to be very careful and fully understand the working and requirements of TCP RED because ECN is implemented as RED queueing discipline in linux. The calculations/settings from the paper were not directly transferable to the linux setting and it did require us to try out different possibilities.

Challenges and open issues:

As mentioned earlier, we did experience some strange and unpredictable behavior on EC-2. We noticed that this was the case when the EC-2 server is busy, which is usually before 6 PM. However, since our experiment did not require complex interactions among various parts, it was not hard for us identify when exactly we needed to wait long enough to reach steady state.

Some future work for DCTCP replication is to reproduce Figures 23 and 24 from [1] which measure flow completion times for different classes of traffic as they have not yet been done. This would be interesting because we can then verify if the completion times for mice flows are small and whether or not they are  affected by the few elephant flows in a data center.

Instructions to replicate the experiment:

  1. Login to your AWS account and go to AWS Management Console and then EC2

  2. Search for the AMI with the name “CS244_DCTCP_WIN13” and launch the instance with the quicklaunch security setting and your existing *.pem key, preferably in c1.xlarge mode or c1.medium (c1.medium should suffice).

  3. Login to the instance and pull the code from our repository using the command below :

    git clone https://prachetaa@bitbucket.org/prachetaa/cs244_dctcp_win13.git

  1. Run the command “sudo ./run_extended.sh” for replicating all figures. (Takes about 50 minutes to complete)  (or ) Run the command “sudo ./run.sh” for replicating all figures except for throughput vs N graph. (Takes about 30 minutes to complete).
  1. Do not close the terminal running the experiments as they stop if you close them. The experiments will complete and the graphs will be emailed to you and we will stop the instance once this is done.

NOTE : The shell scripts ask you for your email id, please provide them without any spaces and in the format xyz@zyx.com . This is for emailing the graphs back to you without the pain to copy them to your local machine etc. Furthermore, the script asks you whether you want to shutdown the instance automatically once the graphs are reproduced. Please provide a Boolean answer of 1 or 0 and we will shutdown the instance automatically once done. However, please note that any issues in between can lead to problems where we will not be able to send out the results via email and cannot shut down the instance, so please do verify that the instance is shut down. Hope it was easy to replicate the experiment.

Advertisements

2 responses to “CS244 ’13 DCTCP

  1. The experiment was easy to setup and run. We attempted to run it two times and both runs finished succesfully. The produced graphs were quiet looklike what reported on the blog post. Our reproducibility score is 5/5

  2. Pingback: CS244 ‘16 DCTCP | Reproducing Network Research·

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s