CS 244 ’24: Replicating Results from “Sidekick: In-Network Assistance for Secure End-to-End Transport Protocols”


Team Members
Kamran Ahmed, Griffin Miller, Hari Vallabhaneni

Source
Sidekick: In-Network Assistance for Secure End-to-End Transport Protocols

GitHub
https://github.com/cs244-team/sidekick

Introduction

Performance-enhancing proxies (PEPs) have been used to improve the performance of TCP connections. However, these proxies are tightly coupled to the underlying protocol, making it difficult to modify and enhance the protocol without disrupting the PEPs’ functionality. As new transport protocols that provide end-to-end encryption (e.g. QUIC) are developed and become more popular, PEPs are fundamentally incompatible with these opaque protocols because the proxies can no longer access critical plaintext information like sequence numbers. Without compromising protocol security, a novel system was needed to address this incompatibility and provide the same benefits as a PEP.

The paper we have selected to replicate results from, “Sidekick: In-Network Assistance for Secure End-to-End Transport Protocols,” introduces a system that provides in-network assistance for fully opaque protocols. Yuan et al.’s main contribution is a selective acknowledgment mechanism called a quACK (“quick ACK”) that allows middleboxes to efficiently acknowledge packets without access to cleartext sequence numbers and without the need to modify the transport protocol; more broadly, sidekick protocols can be used by in-network intermediaries to send information along adjacent connections to assist endpoints of opaque transport protocols.

The paper’s quACK mechanism uses some clever math to arrive at a solution that uses reasonable link overhead, is cumulative over all packets sent, and is efficient to decode. The original paper examines some “strawman” quACK solutions that fail to provide all of these properties: sending an acknowledgment back for every packet is the simplest solution, but uses significant link overhead, and also is not cumulative—because quACKS can be dropped themselves, the sender could misinterpret a dropped quACK as a dropped packet and spuriously retransmit a packet that the proxy received. A cumulative solution that uses a low link overhead is hashing all the received packets together and sending this hash along with the count received; unfortunately, the sender basically has to brute-force the hash to determine the missing packets, so the decode time becomes unreasonable with a non-trivial number of packets.

Sidekick’s quACK solution models the problem of efficiently finding missing packets in terms of solving a system of polynomial power sum equations. QuACKing works as follows (condensed from the paper):

  • In place of cleartext sequence numbers, packet IDs take the form of four consecutive bytes at a fixed location in the fully encrypted payload. The probability of a collision will be very low for pseudorandom data, so packet IDs are effectively unique.
  • As long as there is some threshold t on the number of packets missing at one time, the proxy and sender each calculate t running power sums of packet IDs (for each power from 1 to t, raise packet IDs to that power and then sum together), and the proxy sends its power sum back to the sender in a quACK, at a set interval based on time or number of packets received.
  • The sender can efficiently use the two power sums to calculate the “set difference” between the packet IDs it has sent and the packet IDs received by the proxy and use its local mapping from packet IDs to real packets/sequence numbers to retransmit any missing packets.

The authors demonstrate that introducing a proxy that can transmit quACKs to a data sender
can improve performance on a variety of benchmarks without compromising protocol security. The authors provide additional contributions, including integrating Sidekick with an HTTP/3 file upload application and PACUBIC (a Path-Aware version of CUBIC congestion control) to approximate the congestion-control behavior when using a PEP-assisted split TCP CUBIC connection.

Claim

For our study, we specifically focus on replicating the evaluation of Sidekick with a low-latency audio stream over an asymmetric path in terms of latency and loss. The paper motivates this problem with a real-world example of a train passenger on an audio call with a friend, connected to a Wi-Fi access point over a low-latency, high-loss link. After reaching the access point, the audio will travel to the friend via a high-latency, but lower-loss, cellular data link. Due to the high loss in the first link and the high latency in the second link, the friend may experience playback delays in their de-jitter buffer that reassembles the audio stream. Negative acknowledgments (NACKs) from the friend may take ~one RTT to arrive from the point in time where a packet is lost on the first link.

Since loss mainly occurs on the first link, introducing a proxy that can quickly inform the sender of lost packets would reduce the de-jitter latency at the receiver, ultimately improving the audio stream quality for the friend. With this motivating example, the authors demonstrate that Sidekick can reduce the de-jitter latency by 50% in emulation and 90% in real-world experiments. This not only demonstrates that a Sidekick proxy can efficiently encode quACKs, but that a data sender can quickly deduce which packets were lost based on the information contained in the quACKs. Therefore, a proxy that wishes to provide in-network assistance can do so even if the underlying protocol provides end-to-end encryption.

We follow the paper’s implementation of this low-latency media application and evaluate the data receiver’s de-jitter latency in both emulation and real-world environments. We are highly interested in this motivating example as this is a common scenario that can be compared to the base protocol without proxy assistance.

Methodology

In this section, we briefly outline the paper’s methodology. We describe the low-latency media application that is used to evaluate Sidekick, the Sidekick proxy itself, and the emulation and real-world experiments. We then describe our methodology for replicating the results of the paper and the differences in our experimental setup.

Paper Methodology

Yuan et al. implement a low-latency media sender and receiver in Rust to examine de-jitter latency in emulation (Yuan et al. Figure 4a) and real-world experimentation (Yuan et al. Figure 8a). The sender sends an audio stream of 240-byte packets containing random data at 20 ms intervals (96 Kbit/s) to the receiver over UDP. The receiver maintains a de-jitter buffer to store packets so that it can play them in order. The de-jitter latency is the time between when a packet is received and when it can be played back. The sender also includes a sequence number in each packet so that the receiver can detect lost packets and send a negative acknowledgment (NACK) to the sender, one per RTT. The sender retransmits lost packets based on these NACKs (Figure 1).

Figure 1. Low-latency media protocol and experimental setup. The receiver maintains a de-jitter buffer and sends NACKs if it receives a non-consecutive sequence number. In this example, packet 3 is lost along the first hop. Upon receiving packet 4, the receiver will send a NACK to the sender for packet 3. Upon receiving this NACK, the sender will retransmit packet 3.

The Sidekick proxy is also implemented in Rust, and it sniffs packets with a raw socket. The proxy and sender both maintain cumulative power sums of packet identifiers. The proxy sends these power sums, along with the total number of packets received and the last packet identifier received, in a quACK to the sender every two packets. The sender uses the difference between its power sums and the proxy’s power sums to quickly determine the “set difference” of which packets were lost and need to be retransmitted, without needing to know every packet identifier that was received by the proxy. We refer the reader to the introduction of this post and Section 3 of the original paper for more details on the quACK problem and the power sum polynomial solution used to encode quACKs. Figure 2 illustrates the integration of the Sidekick proxy with the base protocol.

Figure 2. Low-latency media protocol with Sidekick proxy assistance. The proxy maintains cumulative power sums and sends them in quACKs along an adjacent Sidekick connection.

The experimental setup is outlined in Section 6.1 and Scenario 1 of Table 5 in the original paper, but we provide the reader with a brief overview here. In emulation, the sender is connected to the receiver over two asymmetric links: a lossy Wi-Fi link and a lossless cellular link. Importantly, the first link not only exhibits loss but also has a lower delay than the second link (Table 1).

In real-world experiments, the sender is connected to a Wi-Fi access point that directly runs the Sidekick proxy. This access point is connected to the Internet via a cellular modem. The data receiver is located in the nearest AWS region. The sender sends packets to the receiver as aforementioned, and the de-jitter latency at the receiver is measured and compared to the latency without the Sidekick proxy interposing on the base connection. More details on the experimental setup and specific link properties are presented in Tables 2-3 below.

Our Methodology

We used the same emulation parameters in Mininet as described in the paper. For real-world experimentation, the differences between our implementation/setup and the original study are shown in Table 2. We chose to implement the system in C++ rather than Rust, and we had access to different hardware resources. 

Additionally, we weren’t able to perfectly match the link properties for the real-world experiments (Table 3). Regardless, we were eager to see if the results would be replicable even with slightly different hardware and software, as this would suggest that Sidekick is beneficial even with varying network conditions and hardware configurations.

The low-latency media protocol encrypts and decrypts an audio stream to illustrate that the proxy has no access to cleartext sequence numbers. We implemented this in our study, but after further review of the original implementation, we found that the audio stream was not encrypted. We don’t imagine this would significantly impact the results, but it is worth noting.

For our real-world experimentation, we placed our Wi-Fi access point and proxy (Raspberry Pi tethered to an LTE hotspot) in a study room in Stanford’s Gates Computer Science Building with the data sender (MacBook Pro) located some distance away. We experimented with different locations and distances to match the 3.6% loss and low latency as closely as possible. The data receiver was running on a GCP server located in Oregon with 0% loss and lower RTT times than noted in the paper.

Instructions to Replicate Results

See Tables 1-3 for reference on software and hardware configurations. Our replication code is available at https://github.com/cs244-team/sidekick. We include detailed instructions on how to run the media application, Sidekick proxy, and testing scripts in our repository.

Results

In this section, we present the main results of our replication study and discuss their implications.

We replicated the low-latency media application results of the original paper in both emulation (Yuan et al. Figure 4a) and real-world scenarios (Yuan et al. Figure 8a). Our replication results are presented in Figure 3 below.

Figure 3. Replication results for de-jitter latency in a low-latency media application. Yuan et al.’s findings are presented on the left with our replication results on the right. The end-to-end baseline media protocol without proxy assistance is denoted as “Simple E2E”. Sidekick [N]x lines present the de-jitter latency with [N]x the quACKing interval. “Sidekick” will send quACKs every 2 packets received, “Sidekick(2x)” sends every 4 packets, and “Sidekick(4x)” sends every 8 packets.

In emulation, we found a similar reduction in the 99th percentile de-jitter latency. Yuan et al. found that a Sidekick proxy can reduce this delay from 52ms to 25ms, a 52% reduction. We found a similar 51% reduction from 57ms to 28ms. We also examined how the frequency of quACKing influences de-jitter latency at the receiver. We found that sending quACKs every 4 packets (Sidekick(2x)) and every 8 packets (Sidekick(4x)) resulted in higher de-jitter latency than sending quACKs every 2 packets (Sidekick). This is consistent with the original findings as the data sender receives less frequent feedback on lost packets, leading to delayed retransmissions. As the paper notes, as long as the quACKing interval is less than the end-to-end RTT, the receiver can benefit from the Sidekick proxy, which is in line with our findings.

In real-world experiments, we find a fair improvement in de-jitter latency with Sidekick. Yuan et al. found that Sidekick can reduce the 99th percentile de-jitter latency from 2.3s to 204ms, a 91% reduction. In our experiments, we found that Sidekick can reduce P99 de-jitter latency by 79%, from 337ms to 72ms.

We have scaled the axes to match the original paper’s graph for easier comparison, but we note that the latency values are drastically different in the real-world scenario. This is likely due to the differences in hardware and network conditions as described in our methodology, but the relative improvement in tail latency is consistent with the original findings. In Figure 4, we scale the axes of our graph to match our 100th percentile latency, as is done in the original figure.

Figure 4. Re-scaled comparison of real-world scenario de-jitter latency. Since Yuan et al.’s de-jitter latencies with and without proxy assistance are much larger than ours, Figure 3 doesn’t show the full extent of our results when matching axes. Here, we present a re-scaled view of our real-world findings.

Ultimately, we find that Sidekick can significantly reduce the receiver’s de-jitter latency in both emulation and real-world environments. This demonstrates its effectiveness in improving the performance of low-latency media applications.

Discussion

Overall, we find that Sidekick is effective in reducing de-jitter latency in both emulation and real-world scenarios, consistent with the original paper’s findings. We were able to successfully replicate the results of the original paper even with differences in hardware and software, suggesting that these results will generalize to different network conditions and setups. Unfortunately, we did not have the time to explore the additional contributions of the paper, such as PACUBIC and ACK reduction via Sidekick, but we believe that these would be interesting avenues for future work. Nevertheless, we think that the versatility of sidekick protocols in improving the performance of secure end-to-end transport protocols will make them valuable tools for optimizing performance in a variety of scenarios.

During our initial experimentation in Gates, we noticed that we would experience significant packet loss between the sender and Wi-Fi access point. This was likely due to high traffic in the hallway and physical obstructions between the sender and access point. We decided to return at a later time when there were fewer people around to reduce this interference, and try to closely match the 3.6% loss rate described in the paper. This highlights the importance of considering physical network conditions when conducting real-world experiments, as these factors can significantly impact experimental results. This also echoes the original discussion in Section 6.6 of how a Wi-Fi hop can be variable when interacting with other devices and physical objects.

We believe that our replication study provides strong evidence for the effectiveness of Sidekick in reducing de-jitter latency in low-latency media applications. We hope that our results will encourage further research into sidekick protocols in different scenarios and environments. We are excited for future work that builds upon the original study and explores the development of in-network assistance mechanisms for improving the performance of secure end-to-end protocols.

We would like to thank Gina Yuan for clarifying a discrepancy between the 99th percentile de-jitter latency described in Section 6.2 and the graph in Figure 4a. This confirmed our interpretation of the graph of approximately 25ms for Sidekick and 52ms without proxy assistance. This also aligns more closely with our own emulation data of 28ms and 57ms, respectively.

Leave a comment