DCTCP


Team: Nikhil Handigol, Brandon Heller, Vimal Jeyakumar, and Bob Lantz.

Key Result(s): DCTCP consistently maintains a small queue occupancy while maintaining high throughput.

Source(s):

  1. M. Alizadeh, A. Greenberg, D.A. Maltz, J. Padhye, P. Patel, B. Prab- hakar, S. Sengupta, and M. Sridharan. Data center tcp (dctcp). In Proceedings of the ACM SIGCOMM 2010 conference on SIGCOMM, pages 63–74. ACM, 2010.
  2. Dctcp patches. http://www.stanford.edu/~alizade/Site/DCTCP.html.
  3. M. Alizadeh, A. Javanmard, and B. Prabhakar. Analysis of dctcp: stability, convergence, and fairness. In Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, pages 73–84. ACM, 2011.
  4. K. Ramakrishnan and S. Floyd. A proposal to add explicit congestion notification (ecn) to ip. 1999.

Contacts: Nikhil Handigol (nikhilh@stanford.edu), Brandon Heller (brandonh@stanford.edu), Vimal Jeyakumar (jvimal@stanford.edu), Bob Lantz (rlantz@cs.stanford.edu)

Introduction

Data-Center TCP, or DCTCP was proposed in Sigcomm 2010 [1], as a modification to TCP’s congestion control algorithm. DCTCP simultaneously achieves high throughput and low latency. DCTCP leverages Explicit Congestion Notification [4] feature in commodity switches to detect and react—not only to the presence of network congestion—but also to its extent, signaled by the sequence of ECN marks stamped by the switch.

One experiment in the paper is a micro-benchmark that verifies if DCTCP consistently maintains a small queue occupancy while maintaining high throughput. To test this, the authors used an ECN-capable switch which had support for monitoring a metric of interest—the output queue occupancy. This support is readily available in a software queue. To replicate the experiment, we used the publicly available DCTCP patch [2] in our setup.

Figure 1. The topology used for the DCTCP experiment.

In both Mininet-HiFi and hardware, we created a simple topology of three hosts A, B and C connected to a single 100Mb/s switch. In Mininet-HiFi, we configured ECN through Linux Traffic Control’s RED queuing discipline, and set a marking threshold of 20 packets.  Hosts A and B each start one long lived TCP flow to host C. We monitor the instantaneous output queue occupancy of the switch interface connected to host C.

TCP: Instantaneous queue occupancy.

(a) TCP: Instantaneous queue occupancy.

DCTCP: Instantaneous queue occupancy.

(b) DCTCP: Instantaneous queue occupancy.

TCP: CDF of instantaneous queue occupancy.

(c) TCP: CDF of instantaneous queue occupancy.

DCTCP: CDF of instantaneous queue occupancy.

(d) DCTCP: CDF of instantaneous queue occupancy.

Figure 2: Reproduced results for DCTCP [16] with Mininet-HiFi and a identically configured hardware setup. Figure 2(b) shows that the queue occupancy with Mininet-HiFi stays within 2 packets of hardware.

Figure 2(c) shows the queue behavior in Mininet-HiFi running DCTCP, and from an identically configured hardware setup. Both TCP and DCTCP running in Mininet-HiFi exhibit nearly identical behavior of queue occupancies over time, when compared to hardware. The main takeaway is that the simple micro-benchmark, and perhaps even testing the convergence and stability of the dynamics of DCTCP [3] over a wide range of topologies and RTT could be emulated by Mininet-HiFi.

Verifying fidelity:

Since DCTCP’s dynamics depend on queue occupancy at the switch, the outcome of this experiment relies on accurate link emulation in the network. An ideal 100Mb/s link would take 120μs to transmit a 1500 byte packet, and a 1Gb/s link takes 12μs. To verify this was indeed the case in Mininet-HiFi, we monitored the dequeue times of every link in Mininet-HiFi. To avoid startup/teardown effects, we collect samples after the experiment stabilizes, from which we compute the percentage deviation of observed inter-dequeue time deltas from the ideal (120μs for 100Mb/s). That is, if xi is a sample, the percentage deviation is 100 × |xi − 120|/120. Figure 3 visualizes these deviations as a Complementary CDF, for the experiment with links rate limited to 100Mb/s and 1Gb/s.

Figure 3. Complementary CDF of inter-dequeue time deviations from ideal.

Figure 3. Complementary CDF of inter-dequeue time deviations from ideal.

We find that htb emulates 100Mb/s quite well: the inter-dequeue times are (in the worst case) within 10% of what would be exhibited by an ideal link, for 0.1% of all packets observed in a 3s window of time. However, when all links operate at 1Gb/s, we lose fidelity in multiple aspects: shown in Figure 3, the inter-dequeue time deviations are far (100%) from ideal for over 10% of all packets in the same 3s window of time. Though not shown, the average bottleneck link utilization (over a period of 1s) drops to ∼ 80% of the what was observed on hardware, as the CPU is completely saturated.

Limits of Mininet-HiFi:

The DCTCP paper had experimental results for 10Gb/s bottleneck links. Emulating a 10Gb/s link would require full-sized packets to be dequeued every 1.2μs, which stretches the limits of today’s hardware support for timers. In particular, we found that the best timer resolution offered by Linux’s ‘High Resolution Timers’ (hrtimer) subsystem was about 1.8μs. (This value depends on the frequency of the hardware clock and overheads in programming them.)

Instructions to Replicate This Experiment:

git clone https://bitbucket.org/nikhilh/mininet_tests.git
cd mininet_tests/dctcp

Follow the instructions in the README file there:

https://bitbucket.org/nikhilh/mininet_tests/src/ad08368cf347/dctcp/README

3 responses to “DCTCP

  1. While, I have not found how to set ecn on the switch in your code(by grep). Can you tell me which file it is in?

  2. I have applied patch on linux 3.2 kernel successfully, I have run dctcp programs given on github provided by you. but it is giving the same results as another TCP variants. sysctl -a | grep tcp_dctcp is giving net.ipv4.tcp_dctcp_enable=0 and net.ipv4.tcp_dctcp_shift_g = 4.
    Then I have enabled it and run the program but it is not giving the expected results . What would I supposed to check that DCTCP is installed successfully.

Leave a comment