Authors: Harrison Ho, Andrew Duffy
The Goals of Mosh
Mosh is a remote terminal application which supports additional features, namely intermittent connectivity, roaming, and speculative echo of user keystrokes . The goal of Mosh is to serve as a better remote terminal application for mobile clients, as well as demonstrating the superiority of the State Synchronization Protocol––as opposed to TCP––for interactive mobile applications. They demonstrate that SSP has a much better keystroke response latency than existing SSH over TCP. The SSH task is a proxy to show that other interactive remote applications can make use of SSP to improve performance for users on high-delay and high-variability delay networks like cellular networks.
Most technical computer users have either directly or indirectly used SSH; it’s the de facto protocol for spawning and interacting with a remote shell. The user experience of SSH has two failings: firstly that it has poor performance on mobile networks, and secondly that it makes no allotment for mobility as it is based on TCP, which ties flows to the “4-tuple”, so switching networks will kill the connection. SSH is one of the most used applications, but many interactive remote applications can benefit from the ideas of Mosh and the SSP protocol; some that come to mind are telnet, IRC and the X window server, all of which currently operate over TCP.
The authors collected traces from individuals performing “typical, real-world sessions” to a remote host. The authors then replayed the traces over several networks, including a Sprint 3G cellular internet connection, a Verizon LTE service, and a trans-oceanic wired link. Finally, to test resilience to packet loss, the authors replayed the traces over a test network inducing an artificial RTT of 100 ms and a 50% round-trip packet loss probability. The graph below demonstrates the results of replaying those traces over the Sprint 3G cellular connection.
In all mentioned cases, the authors found that Mosh offered a lower mean and median latency for key presses compared to SSH. In fact, about 70% of the time, Mosh was confident enough to display its prediction and give a nearly instant keystroke response time. For keystrokes that Mosh could not predict, its latency distribution is similar to that of SSH. While 0.9% of the keystrokes resulted in erroneous predictions, Mosh was able to correct such predictions within one RTT. Finally, Mosh showed resilience to high packet loss, as it yielded a lower median and mean latency for key presses compared to SSH, even without the benefit of predictive local echo. A summary of statistics listed in the paper are shown below.
|Network||Remote Terminal||Median Latency
|Mean Latency (ms)||Standard Deviation (ms)|
|Verizon LTE in Cambridge, MA (high delay network)||Mosh||< 5||1700||2600|
|Verizon LTE in Cambridge, MA (high delay network)||SSH||5360||5030||2140|
|Simulated High Packet Loss Network, 100 ms RTT||Mosh||222||329||1630|
|Simulated High Packet Loss Network, 100 ms RTT||SSH||416||16800||52200|
We set out to reproduce Figure 2, which is the key-press latency distributions of SSH and Mosh over a cellular network with moderate average delay. We also wanted to reproduce the tables for various network scenarios described in the three tables in the later part of Section 4: latency distributions on high delay and high loss networks.
The result in Figure 2 shows the main motivation behind using Mosh over SSH: providing lower latency in inputting keystrokes in remote shells for mobile clients. The shape of the figure is also descriptive of the specific benefits of Mosh and how Mosh achieves those benefits. 70% of the inputted keystrokes in Mosh can be displayed with an almost negligible keystroke response time, as shown by the almost vertical red line starting at the keystroke response time of 0. In addition, the remaining 30% of keystrokes have a distribution more similar to that of SSH, demonstrating the fallback case of Mosh when it cannot immediately predict keystroke responses.
We chose to replicate the tables displaying median, mean, and standard deviation of latency for a few reasons. First, the tables are descriptive and illustrate the latency benefits of using Mosh over SSH. Second, the tables demonstrate Mosh’s benefits and flexibility in different situations, specifically for long distance remote connections and for lossy connections. Users would not necessarily switch to Mosh unless they knew that it provided concrete benefits in different conditions when connecting to remote servers.
|Network||Remote Terminal||Configured Minimum RTT (ms)||Median Latency
|Mean Latency (ms)||Standard Deviation (ms)|
|High Loss, No Prediction||Mosh||100||139||348||1734|
|High Loss, No Prediction||SSH||100||509||1426||5761|
We’ve reproduced our plots for all three main results below. Due to the way we simulated and the amount of jitter present in the simulation, results varied somewhat from the paper, but overall the trends were followed and indicate that Mosh behaves as the authors anticipated.
In our reproduction of Figure 2, we found that the results closely followed the paper: roughly 70% of key-presses were displayed instantly thanks to local echo, with a median of 1.5ms on a network with an RTT of 500ms. SSH, due to its use of TCP, had a median response latency just above the 500ms round-trip time. The trend of Mosh outperforming SSH seems to dip for a period, as the 90th percentile latency of SSH measured visually off of the chart appears to be less than that of Mosh. This may be attributable to our simulation environment, whereas the original authors used true 3G networks of the Cambridge area, we used a network with fixed parameters and variable jitter based on a trace the MahiMahi authors provided of the Verizon LTE network at an unknown date.
Above is a representation of the response times on a link with 2.5s of delay, similar to the wireless Verizon LTE network mentioned in the paper. Here we found that much fewer keystrokes were able to be speculatively echoed immediately: under 20% as compared with the 70% in the previous test. This raised the median latency to 1s for Mosh, rather than the nearly instant as specified in the paper. This seems to imply that Mosh’s echo ability is linked to delays in the line, or that there is something otherwise unfaithful about the terminal replay scripts created by the Mosh authors, or there is a bug in MahiMahi. It is unclear why the fraction of completions differed between the two traces. Both the high delay and high loss scenarios used a much shorter but characteristically similar keytrace to the one used for reproducing Figure 2.
Finally, we performed an experiment where predictive echoing was disabled, using a minimum RTT of 100 milliseconds. This is reflected in the above image; Mosh does not display any keystrokes instantaneously locally. Yet even without predictive echoing, Mosh still surpasses the performance of SSH at the highest percentiles due to its ability to discard dropped frames, and zoom ahead to displaying the most current frame thanks to eschewing TCP (a windowed protocol) for SSP (running on top of UDP).
The main thesis we wished to explore is whether Mosh over SSP offers a better keystroke response latency than existing SSH over TCP. Indeed, we find that for a variety of network types, Mosh offers a lower mean and median latency for key presses compared to SSH, as shown by the charts of the cumulative distribution of keystroke response times above. This phenomenon is mainly due to Mosh’s predictions; as shown in our reproduction of figure 2, over 70% of key-presses in Mosh are displayed almost instantly. Mosh also achieves a lower mean and median latency in situations of high delay. As demonstrated in the above chart using an average RTT of 5 seconds, nearly 20% of key-presses in Mosh displayed almost instantly, and Mosh demonstrates a lower median and mean latency for key presses. In addition, our results demonstrate Mosh’s resilience to high packet loss. Even without the benefit of local predictive echo, Mosh displays a lower median and mean latency for key presses, showing the State Synchronization Protocol’s resilience to packet loss.
Overall, we find that the results of the paper hold up to this day. We did not need to modify Mosh to achieve these results, other than disabling prediction for the high loss experiment, to achieve similar results to that of the paper.
The only assumption that the authors make is that users perform “typical, real-world sessions” in order to achieve the delay distributions. Here, we assume “real-world sessions” to have a balance of typing and navigation keystrokes, and have correspondingly used keytraces also having this balance to replicate the author’s results. However, as we will show in the next section, non-standard sessions with high amounts of navigation can degrade Mosh’s performance, making Mosh’s latency closer to that of SSH.
In the paper, the authors mention that Mosh can immediately predict most “typing”, but fails to predict “navigation” keystrokes, such as pressing a key to advance to the next window using the less command. We decided to test Mosh’s performance on a keytrace log consisting mostly of navigation, to examine Mosh’s performance on unpredictable keystrokes, using a minimum RTT of 200 milliseconds. This “adversarial” keytrace log consists of running the less command on a large file and repeatedly scrolling through the file using the page down key. As a result, Mosh cannot use its predictive local echo to preemptively output keystrokes, even though prediction is enabled.
|Terminal Emulator||Median Latency
|Mean Latency (ms)||Standard Deviation (ms)|
Here, Mosh performs slightly worse than SSH, as Mosh displays a higher median latency and mean latency. However, this difference is not statistically significant. This example demonstrates that in situations when the predictive local echo fails, Mosh has a similar latency compared with SSH.
We used the Mahimahi platform, a network emulator which can simulate several network conditions, such as high delay and high loss, and can replay wireless networks. In particular, we chose the Mahimahi platform for its ease of use and its flexibility in changing parameters. The logic specific to setting up the network only requires a few lines of shell scripting. For the remaining logic, we relied on the author’s original terminal replay scripts to record and playback keytraces.
The entire setup is very reproducible. We have managed to distill the setup of the system into a few scripts, to minimize the complexity of reproducing the results. The main factor which affects reproducibility is the type of trace. As seen in the extensions, having a keytrace consisting almost entirely of navigation can cause Mosh’s performance to be similar to that of SSH. Also, using high delay networks caused our measurements to diverge somewhat; for the network using a minimum RTT of 5 seconds, the percentage of instantaneous predictions was only 20%, compared to the 70% described in the Mosh paper. Still, simply running the scripts that we have prepared yields consistent results, due to the reliability of the Mahimahi network emulator, and the outputs of the scripts should be reproducible across different machines.
The steps to reproduce our results are straightforward, and documented on our GitHub project, available at https://github.com/a10y/stm-data.
Go there and follow the README, which we also reproduce here for simplicity:
- Go to Google Cloud Console and provision a new VM with 1 vCPU and a Boot Disk of Ubuntu 16.04 LTS
- SSH into the machine, and run the following commands:
git clone https://github.com/a10y/stm-data.git cd ./stm-data ./setup.sh # Say yes at all the prompts # Find the IP address using ifconfig, export it to this variable export SSHIP=FIND_THE_IP ssh $SSHIP # Enter "yes" to accept the host key, then exit # Run the reproduction scripts ./run.sh
We still suggest following the README as it is more detailed, but these instructions above are sufficient to generate the 4 plots.
The author’s original terminal replay scripts are somewhat wonky, and had an issue where they would hang when using mosh on certain inputs. This didn’t always manifest itself, but seemed linked to using any GUI application like vim or emacs that takes over the PTY. Because of this, we ended up needing to use traces that didn’t use GUI applications and instead incorporated lots of shell commands and scripting. To provide variety without using graphical applications, we used moving through history, tab completions, and reverse searches interspersed into the traces as methods of adding variety and making prediction non-trivial.
We were pleased with our MahiMahi experience, and found it simpler for basic tests of artificial loss and delay. Mininet, while a useful tool, seemed cumbersome for the task of running some shell commands inside of a simulated network with specific characteristics, though we did not attempt to use it.
 Winstein, K., & Balakrishnan, H. (2012, June). Mosh: An Interactive Remote Shell for Mobile Clients. In USENIX Annual Technical Conference (pp. 177-182).