CS 244 ’20: Shenango Reproduction


Datacenter applications must be scheduled to provide ๐œ‡-scale tail latencies and high throughputs even for highly dispersive workloads [10]. Modern data plane operating systems support these requirements by spin-polling the NIC on dedicated CPU cores and skipping the bloated kernel network stack with kernel-bypass networking [6, 8โ€“11]. However, many of these systems just pin threads to cores and let CFS do the scheduling [6, 8, 9, 11]. This approach has two downsides. The first is that this allocates for peak load, which wastes hardware resources. The second is that CFS was designed for the millisecond scale, not the microsecond scale, so there is no way to for CFS to do fine-grained multiplexing of multiple latency-critical ๐œ‡s-scale applications with low-priority background work that gets good performance and is responsive to bursts in network traffic.

Leave a comment