Accelerating Remote X Performance

Table of Contents

Introduction

In the 2007 NERSC survey, several users complained about poor network performance for interactive, GUI-based applications or claimed that they did not use NERSC resources for interactive GUI-based applications due to poor network performance. The goal of this project was to determine and evaluate alternatives to X11 tunneling/compression via ssh that alleviate X11 performance problems that are mainly caused by the high network latency to remote locations.

Alternatives under consideration

The X11 protocol was designed for local area network connections. Because of this fact, several design decisions lead to poor performance over wide area network connections. In particular: (i) the X protocol is very verbose, requiring comparatively large amounts of data to be sent over the network and (ii) many operations require a "handshake" between client and server leading to long "wait" times on high-latency links before an operation can be completed. This section gives a brief review of technologies that we considered as alternatives to ssh X11 tunneling, which alleviates only (i) if compression is enabled. The following solutions (i) decrease verbosity by means of compression and (ii) minimize the impact of high RTT latency.

dxpc

The Differential X Protocol Compressor (dxpc) is "an X protocol compressor designed to improve the speed of X11 applications run over low-bandwidth links (such as dial up PPP connections)" (http://www.vigor.nu/dxpc/). It "understands" the X protocol and "intelligently" compresses X network traffic. Dxpc is designed to cope with low bandwidth links. It was not considered as a candidate for a solution since it does not employ any strategies to cope with high latency links, which seem to be the main cause for poor remote performance. It is included in this overview since it serves as basis for (Free)NX.

(Free)NX

NX is a commercial product by NoMachine Inc. Its compression of X11 traffic (nxproxy) is based on dxpc technology. In addition, NX sets up an environment that makes it possible to cope with low latency links by "shortcutting" X requests and the corresponding replies. NoMachine released the base implementation of the server as open source and clients are freely available from (their web page). Based on this source code, (FreeNX) provides users with scripts that simplify setup of the server on a machine. Many Linux distributions (Fedora Core, OpenSUSE) include NX server packages based on FreeNX.

Virtual Network Computing (VNC)

VNC shares the desktop of a machine to different clients. It uses the RFB (Remote Frame Buffer Protocol) to control another computer remotely. Unlike X11, RFB transmits frame buffer contents are transfered instead of individual commands to draw graphical objects. This mitigates latency problems, as it requires less synchronization between remote machines. VNC desktop sharing is available for a wide range of operating systems/desktops, including X11, Windows and MacOS. The most common way to share an X desktop via VNC is Xvnc, which starts a new X server with a virtual display.

Security Considerations

(Free)NX

(Free)NX uses the X11 protocol and ssh to make a machine remotely accessible. Due to this implementation, it should not open new security vulnerabilities. One possible concern is that (Free)NX requires creating a new user ("nx"), which is used to establish the X11 connection to a remote machine. The nx client authenticates itself as this user via an ssh key. Currently available nx clients require that this key does not have a pass phrase. If this key becomes compromised, it will create a vulnerability. However, the nx user does not provide access to a regular shell and ssh features, such as port forwarding, are now disabled by default for that key. In any case, it will be necessary to distribute that key to NERSC users in a secure way. (It may be advisable to give each user a separate ssh key to the nx account to limit this vulnerability.)

VNC

VNC is a new service (i.e., it is not currently in use on NERSC machines) and may create additional vulnerabilities. It does its own authorization independent of site policy. It requires that the user chooses a new password for the sharing the desktop. This password is transmitted and stored in an insecure way. This problem can (and should) be mitigated by tunneling the VNC connection via ssh and making the VNC port inaccessible from the network outside. In a way, this password is comparable to the X11 magic cookie, which is similarly insecure.

Test Methodology

To test the various solutions it was necessary to simulate various network conditions. Our basic setup consisted of three machines: one machine simulating the NERSC resource/computer, one machine serving as user machine and a third machine configured as a network bridge and is used to monitor various network bandwidth and latency configurations. This setup is similar to the setup proposed by a study on X11 network performance (\url{http://keithp.com/~keithp/talks/usenix2003/html/net.html}). In this setup, the alternatives (ssh, NX, VNC) were evaluated given different network parameters (latency, bandwidth).

Network Simulation


We used the NIST network simulator to simulate the network. The NIST network simulator allows users to specify delays for packages between machines, as well as limiting the bandwidth. NIST supports more elaborate settings, such as simulating varying latency distributions, but we did not use them in this context. After specifying latency and bandwidth constraints, we verified network conditions using iperf. We did so as a sanity check, as well measuring actual bandwidth over a TCP/IP connection, since latency alone influences TCP/IP performance.

Expected Latency Range

We worked with the NERSC Network and Security group to obtain estimates on ping times and available bandwidth between NERSC and the work sites of various NERSC users. In particular, we collected detailed information about the network connection between NERSC and the Oak Ridge National Laboratory (ORNL) as well as the Princeton Plasma Physics Laboratory (PPPL) by asking remote users to send us statistics obtained using the "NERSC Web100 based Network Diagnostic Tool (NDT)". The results suggest that round trip (RTT) latencies to UCLA, ORNL and PPPL are approximately 10ms, 66ms and 80ms respectively.

Measuring GUI Application Performance

We developed a timer application that allows us to measure the responsiveness of GUI-based application. This application runs on an X-based system and uses the XTEST and XDamage extensions. XTEST supports simulating mouse button press events, and XDamage makes it possible to receive notifications about screen updates. Our timer application uses XTEST to simulate mouse button clicks and keeps a time stamp of when it sent a mouse button press event. It subsequently monitors screen updates in a specified region of the screen and waits until no framebuffer changes occur for a user-specified time. By doing so, it makes it possible to measure the time between a mouse button press, e.g., on a menu, and the time of the last screen update that occurs in response to the event, providing an objective measure of application responsiveness. The timer tool allowed us to consider regular applications that NERSC users are likely to use instead of having to resort to synthetic test applications. We also performed tests to evaluate subjective user experience for operations that are difficult to measure, e.g., moving around window on the desktop.

Other Criteria

While the reason for performing this study are complaints about X11 performance over a high latency network, there are other considerations that needed to be taken into account when deciding on a solution. These criteria were:

Timing Results

We performed tests for two example sites. The UCLA test case served to measure benefits for off-site users that are still relatively close to NERSC. The PPPL test case, on the other hand, is close to worst-case conditions encountered by users in the continental United states. In both test instances, we only simulated RTT latencies (10ms for UCLA and 80ms for ORNL). Even though we did not impose any bandwidth limits, the flow control used in TCP/IP will limit bandwidth depending on RTT latency. We used the iperf tool to measure available bandwidth on our test networks. The resulting measurements for UCLA are:
ghweber@hpcrd7:~> iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.1 port 5001 connected with 10.0.1.2 port 33694
[  4]  0.0-10.0 sec    387 MBytes    324 Mbits/sec

ghweber@gunther3:~> iperf -c gunther1
------------------------------------------------------------
Client connecting to gunther1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 33694 connected with 10.0.0.1 port 5001
[  3]  0.0-10.0 sec    387 MBytes    324 Mbits/sec
\end{verbatim}
For PPPL the resulting measurements are:
\begin{verbatim}
ghweber@hpcrd7:~> iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.1 port 5001 connected with 10.0.1.2 port 54314
[  4]  0.0-10.1 sec  46.8 MBytes  38.8 Mbits/sec

ghweber@gunther3:~> iperf -c gunther1
------------------------------------------------------------
Client connecting to gunther1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 54313 connected with 10.0.0.1 port 5001
[  3]  0.0-10.0 sec  46.4 MBytes  38.9 Mbits/sec
We tested three alternatives under these network conditions using matlab as a test applications. In a sequence, we established a connection, started matlab (without splash screen), opened an edit window and cycled through some pulldown menus of the edit window. Tables 1 and 2 show the result of our measurements. Both show significant benefits of both VNC and NX over simple ssh X11 tunneling. Obviously, these benefits are more pronounced over the high latency link, where response times improve by an order of magnitude in most cases.
Table 1: Test results for simulated connection to UCLA.
Action SSH VNC FreeNX
Establish connection n/a ≈11s ≈16s
Start Matlab (-nosplash) 9.6s 4.9s 5s
Open edit window 2.9s 1.3s 1.2s
Activate File menu 0.6s 0.1s 0.1s
Activate Edit menu 0.6s 0.1s 0.1s
Activate Text menu 0.5s 0.2s 0.1s
Close edit window, redraw main window 1.5s 0.4s 0.3s
Close matlab 0.5s 0.6s 0.6s

Table 2: Test results for simulated connection to PPPL.
Action SSH VNC FreeNX
Establish connection n/a ≈5.7s ≈10.2s
Start Matlab (-nosplash) 39.5s 4.6s 5.4s
Open edit window 14.9s 1.3s 1.12s
Activate File menu 3.7s 0.3s 0.2s
Activate Edit menu 7.6s 0.4s 0.2s
Activate Text menu 5.1s 0.4s 0.3s
Close edit window 7.3s 1.4s 1.8s
Close matlab 2.1s 1.54s 1.1s

Conclusions

Based on the measurements, NX and VNC perform very similar. Currently, we have chosen to deploy Free(NX), mainly because setting up an NX connection requires considerably fewer steps on the remote user's side. Thus, NX is more convenient and easier to use, making it more likely to be utilized. Even though NX clients are available for most significant platforms (Linux, MacOS X, Windows and Solaris), VNC clients are more widely available including Java solutions. In the long term, it may be beneficial to offer both alternatives to NERSC users. In particular, we need to reevaluate these solutions if graphics hardware becomes available for analytics use at NERSC.