Logo Computer scientist,
engineer, and educator

Underneath the Linux desktop

← 1. Introduction 3. The window manager →

2. The X server

On Linux and most other Unix-based systems, the central part of the graphical user interface is the X server. The X server is an application that draws and manages a graphic display and various input devices on behalf of applications. Historically, the X server would always have been a piece of networked equipment, and it is a server in the true sense of the word — it provides services to various clients which would have been remote from it. These days, the client applications and the X server itself will typically be hosted on the same machine, although the network capabilities continue to exist, and are very visible at the programming level, if not to the end user. Strictly speaking, an application that uses the X server for interaction with the user is called a 'client', but the term client is also used for the display window allocated to such an application, and this ambiguous terminology does sometimes cause confusion.

In the Linux world, almost everybody uses the X server from the open-source Xorg project. In fact, Xorg is rapidly displacing proprietary X servers even from commercial Unix systems like Solaris. Xorg can be made to run on almost any hardware with a graphical display, but it does have the disadvantage of being rather memory-hungry. For very small systems, lightweight alternatives are available, but their standards-compliance tends to be rather questionable.

Xorg is designed to run with real hardware. In the examples that follow, I suggest using a virtual X server like Xvnc, as will be explained. In practice, any standards-compliant X server should behave in the same way.

The function of the X server

The X server is a relatively simple application with relatively simple capabilities. It controls one or more hardware screens, and divides them up into regions called windows (with a lower-case 'w') as requested by applications. Applications can draw text and graphic elements in their windows, and the X server keeps track of which output goes to which window. Windows have a stacking order maintained by the server, so that windows that overlap do not try to draw to the same piece of screen.

The windows managed by the server may belong to any one of many different application which may be running on different network hosts (although, as I've said, this is unusual in desktop systems).

The X server will manage input devices — mouse, keyboard, touchscreen, etc., — and send their data to applications. Simplistically, the input will go to the application that the X server recognizes as having input focus, but the true situation is a bit more complex than that. The X system is pretty democratic — applications can interact relatively freely with each other's windows, and this interaction includes getting access to input that isn't necessary directed to the application with input focus. It is almost essential that this be the case, because the whole X desktop arrangement relies on the close collaboration of a number of different applications. You can, in principle, run graphical applications on Linux using only an X server, without the whole raft of other desktop stuff. It's worth trying this if you have never seen how it looks. In many cases, it's easiest to try experiments of this sort using a virtual X server, rather than the one that runs your Linux desktop. A useful virtual X server is Xvnc, part of the VNC system. The way VNC works on Linux is to provide an implementation of a full X server, which has no screen hardware, but simulates a screen in memory. Most Linux developers and many Linux users are familiar with the use of VNC for remote desktop access, but there's nothing to stop the server and viewer being used on the same host, to simulate an X session without disturbing the main desktop.

$ Xvnc :3 securitytypes=none &
On some systems you might have to override an existing .xauthority file by pointing Xvnc at a dummy location:
$ XAUTHORITY=/tmp/dummy Xvnc :3 securitytypes=none &
The ':3' in this command line is the X display number — the constraint is that it should be higher than any other X server on the same host. It's not a problem to determine this number by trial-and-error: the Xvnc server won't start if it thinks there is a conflict. Starting Xvnc as above tells it not to use any authentication, which of course might be a problem outside of the desktop environment, but simplifies things here.
You can start the Xvnc server using scripts supplied with the VNC tools, but these typically start a complete desktop session, which is definitely not what we need here. We just want an X server, and nothing else

With the Xvnc server running running, you can connect a VNC viewer to it, like this:

$ vncviewer localhost:3 & 
If you run the X server this way, you'll typically see a grey featureless screen, which you can move the mouse around on. Clicking the mouse probably won't do anything, because there are no clients (applications) running on this screen. But you can start a client, e.g., xclock from the prompt:
xclock -display :3
And you'll see the xclock window, without any decorations. If you start a few other applications, you might be able to see that you can direct input to them by clicking the mouse in the window, but you won't be able to change the stacking order — the X server won't do this unless an application tells it to.

X in the raw — A couple of appplications running against the X server without all the usual desktop paraphenalia

In practice, 'naked' use of the X server like this isn't very practicable for an end user. The X system has always recognized that there are specific kinds of X client application that collaborate with the X server to make the end-user experience more agreeable. The most significant of these is the window manager. As its name suggests, the job of the window manager is to provide an interface by which users can change the size, position and stacking order of other application's windows. A window manager will typically decorate the windows, that is, provide a border or controls for user manager. It will invariably do this by creating windows of its own, and arranging them under, or around, the managed application windows. But there are other applications that have an important role in a modern desktop system, and which may well not even be visible: settings managers, desktops, compositing managers, etc. There's also a trend towards splitting up the window management functionality into different applications so we now have, for example, interchangeable window decorators, which just handle the visual elements of window management.

X windows and their relationship

In the X server, windows form a heirarchy of parent-child relationships. In general, the significant feature of a child window is that its position is specified relative to the position of its parent, and not relevant to the screen. This makes life a lot easier for programmers, who only have to keep track of the absolute screen position of a few windows, rather than many. But, unlike in the Microsoft Windows user interface, a child window in X does not automatically appear above its parent in the stacking order. It's entirely possible for the window manager, or the application, to position a child window beneath its parent.

At the top of the window hierarchy is the root window. All windows are, directly or indirectly, child windows of the root window. The root window is not, in general, controlled by a client, although many different clients will interact with it. The root window belongs to the X server. The very ancient utility xsetroot allows administrative control over the root window. For example:

$ xsetroot -solid black
Should set the background colour. If you do this on a modern Linux desktop like Gnome, you may be surprised to find it has absolutely no visible effect whatsoever. That's because on such a system you never normally see the root window, for reasons I will explain later. But if you do it on the Xvnc server started earlier:
$ xsetroot -display :3 -solid black
you should see it take effect.

Even though you might not see it, the root window is exceptionally important, for this reason: it's the one and only window that all applications know will exist. And this makes it a crucial element of inter-process communication. X desktop components communicate by means of the X root window — either by sending messages to it, or by setting properties on it.

Window properties

Any X window can have properties set on it, and not just by its owner application. Properties are essentially name-value data pairs, where the value can take a range of different formats. Any application can query the properties of another application's window, using the X server as a central repository of small data elements. For example, a modern window manager will set properties on the windows it is managing, to indicate to other components what kinds of things it is willing to do on those windows. It might be willing to resize them under user control for example, or it might not. Why might other applications need to know that? In a modern X desktop, it isn't just the window manager that has an interest in window states — taskbars, pagers, and other desktop components all have a role to play, and they rely on the window manager signalling what it thinks it can usefully do with a window.

Particularly important in desktop organization is the ability to read and write properties on the root window. For example, the window manager will maintain on the root window a list of windows it is currently managing. Other applications (typically application switchers or desktops) will read that list to determine which applications they can ask the window manager to control.

Window properties are named by atoms. Atoms are a key concept in X programming, although they rarely trouble the end user. An atom is, essentially, a number with a name. Any application can ask the X server to give a number to a name. If the name is already known to the X server, then it will simply give the application the existing number. If it isn't, it will assign a number to that name and use it for all future communication. Without a system of atoms it would be necessary to pass text strings of arbitrary length between the client and the server, which could potentially create problems for network bandwidth. With atoms, each window property is identified by a number at most four bytes long.

Window messages

The X server communicates with applications by means of messages. X applications typically spend most of their lives waiting for messages. These messages may come from user interaction (key strokes, mouse gestures) or from the server itself. Very importantly for our present purposes, messages can come from other applications. In addition, the X system is democratic, and an application can listen for messages destined for windows that belong to other applications.

Again, the root window is crucial here. An application cannot be sure that any other application will be running, and have a window on screen. But it can be sure that the root window exists. A great many desktop operations work by sending messages to the root window. Desktop components — window managers, panels, etc — register an interest in receiving these messages, and are notified by the X server when other applications post them.

You can't pass a huge amount of data as a window message, which can be a nuisance for the programmer. The reason for this is that in a traditional client-server programming model, all X window message have to pass across the network, which only has limited capacity. So the X specification limits the message size to a few tens of bytes, regardless of format.

An important category of messages for our present discussion are requests. A request is generated when an application tries to do something which another application has registered an interest in. For example, an application can make a function call to set the size and position of its window. In the 'naked' X server example described above, when the application makes such a call, the X server will simply honour it, and adjust the window. But if another application has registered an interest in that window, whether the window's owner likes it or not, the X server will not honour the request. Instead, it will send a request message to the the application that has registered an interest, and in the message it will state the requested size and position. In practice, that application is usually a window manager. In this way, the window manager can decide whether to honour the application's request to change its window, or modify or ignore that request.

X programming and toolkits

It is entirely possible to write applications that interact with the X server using very low-level operations. In fact, if you're masochistic enough, you can write an application that does X operations by sending bytes down the wire to the X server (of course, in a desktop system it won't be a real wire, but it will behave as if it were). In reality, low-level X programming is quite unusual these days, except for special purposes, unless you're developing desktop components themselves.

Unlike Microsoft Windows, the X system does not mandate any user interface. All it provides is screen regions for the applications to draw on. This means that, in practice, an application completely controls its own user interface appearance and behaviour. In practice, the amount of work required to draw and manage a complete user interface makes such things unproductive. Almost all modern X applications are created using toolkits (libraries) of one sort or another, and there are dozens of alternatives to choose from. The toolkit will deal with drawing user interface elements such as menus and buttons, and may undertake many other operations on behalf of the application. Of particular important for our present purposes is the Motif toolkit. Motif was, and is, proprietary, and has no real presence in the Linux world. But the Motif way of doing things became a sort of informal standard, and remains important for desktop systems.

GTK is based on a C programming model, although it can be used with other programming languages. It has its own object-oriented programming model and does not require the use of an object-oriented programming language. Qt, on the other hand, is implemented in C++, and is difficult to use outside of that language. Both are sustantial software libraries, and create large memory burdens for applications, compared with low-level X programming. But except in embedded systems memory is cheap these days, compared to developer time.

The existence of multiple X programming toolkits is a continued annoyance in the Linux world, both for developers and users, especially users who have grown used to the dictatorship of Microsoft. Consider the (apprently-) simple problem of displaying a text label on a button. All X applications will have access to the same fonts, but what font to use, and what size? Sometimes it makes sense to have such decisions under the control of the application, but most often it does not. Individual toolkits invariably have centralized configuration repositories, so that all applications based on the same toolkit, that create a button, will create it with a text label of the same size and appearance (and, of course, the same applies to other user interface elements). Very often the toolkits will contain end-user configuration tools to simplify the process of choosing look-and-feel. But getting consistency between toolkits is almost impossible. So a user can spend time finding a way to make the text in (say) the Web browser look right, but then still find that the text in the menus is completely different, and the text in the window captions different again. In many Linux systems these three sources of text will have to be configured in three different ways.

And none of this is helped by the fact that some X toolkits are, well, very ugly indeed, or nasty to use, or both.

For better or worse, in the Linux world the choice of toolkits, for almost all mainstream desktop applications, has contracted to GTK and Qt, although other systems still have an almost cultic following. But there are still plenty of applications based on other toolkits, including old or ugly ones, and applications that don't use toolkits at all (most window managers). It's likely that this will continue to be a headache for users and system administrators for some time to come.

← 1. Introduction 3. The window manager →

Copyright © 1994-2013 Kevin Boone. Updated Jul 30 2010