Underneath the Linux desktop
2. The X server
On Linux and most other Unix-based systems, the central part of the graphical
user interface is the X server. The X server is an application that
draws and manages a graphic display and various input devices on behalf of
applications. Historically, the X server would always have been a piece of
networked equipment, and it is a server in the true sense of the word — it
provides services to various clients which would have been remote from it.
These days, the client applications and the X server itself will typically be
hosted on the same machine, although the network capabilities continue to
exist, and are very visible at the programming level, if not to the end user.
Strictly speaking, an application that uses the X server for interaction with
the user is called a 'client', but the term client is also used for the display
window allocated to such an application, and this ambiguous terminology does
sometimes cause confusion.
In the Linux world, almost everybody uses the X server from the open-source
Xorg project. In fact, Xorg is rapidly displacing proprietary X servers even
from commercial Unix systems like Solaris. Xorg can be made to run on
almost any hardware with a graphical display, but it does have the
disadvantage of being rather memory-hungry. For very small systems,
lightweight alternatives are available, but their standards-compliance
tends to be rather questionable.
Xorg is designed to run with real hardware. In the examples that follow,
I suggest using a virtual X server like Xvnc, as will be explained.
In practice, any standards-compliant X server should behave in the same
The function of the X server
The X server is a relatively simple application with relatively simple
capabilities. It controls one or more hardware screens, and divides them up into
regions called windows (with a lower-case 'w') as requested by
applications. Applications can draw text and graphic elements in their windows,
and the X server keeps track of which output goes to which window. Windows have
a stacking order maintained by the server, so that windows that overlap
do not try to draw to the same piece of screen.
The windows managed by the server may belong to any one of many different
application which may be running on different network hosts (although, as I've
said, this is unusual in desktop systems).
The X server will manage input devices — mouse, keyboard, touchscreen, etc., — and send their data to applications. Simplistically, the input will go to the application that the X server recognizes as having input focus, but the true situation is a bit more complex than that. The X system is pretty democratic — applications can interact relatively freely with each other's windows, and this interaction includes getting access to input that isn't necessary directed to the application with input focus. It is almost essential that this be the case, because the whole X desktop arrangement relies on the close collaboration of a number of different applications.
You can, in principle, run graphical applications on Linux using only an X
server, without the whole raft of other desktop stuff. It's worth trying this
if you have never seen how it looks. In many cases, it's easiest to try
experiments of this sort using a virtual X server, rather than the one that
runs your Linux desktop. A useful virtual X server is
of the VNC system. The way VNC works on Linux is to provide an implementation
of a full X server,
which has no screen hardware, but simulates a screen in memory. Most Linux
developers and many
Linux users are familiar with the use of VNC for remote desktop access,
but there's nothing to stop the server and viewer being used on the
same host, to simulate an X session without disturbing the main
$ Xvnc :3 securitytypes=none &
On some systems you might have to override an existing .xauthority file
by pointing Xvnc at a dummy location:
$ XAUTHORITY=/tmp/dummy Xvnc :3 securitytypes=none &
The ':3' in this command line is the X display number — the constraint is that
it should be higher than any other X server on the same host. It's not
a problem to determine this number by trial-and-error: the Xvnc server
won't start if it thinks there is a conflict. Starting Xvnc as above
tells it not to use any authentication, which of course might be a
problem outside of the desktop environment, but simplifies things
With the Xvnc server running running, you can connect a VNC viewer to it,
You can start the Xvnc server using scripts supplied with the VNC tools, but these typically start a complete desktop session, which is definitely not what we need here. We just want an X server, and nothing else
$ vncviewer localhost:3 &
If you run the X server this way, you'll typically see a grey featureless
screen, which you can move the mouse around on. Clicking the mouse probably
won't do anything, because there are no clients (applications) running on this
screen. But you can start a client, e.g.,
xclock from the prompt:
xclock -display :3
And you'll see the xclock window, without any decorations. If you start a few other applications, you might be able to see that you can direct input to them by clicking the mouse in the window, but you won't be able to change the stacking order — the X server won't do this unless an application tells it to.
In practice, 'naked' use of the X server like this isn't very practicable for
an end user. The X system has always recognized that there are specific kinds
of X client application that collaborate with the X server to make the end-user
experience more agreeable. The most significant of these is the window
manager. As its name suggests, the job of the window manager is to provide
an interface by which users can change the size, position and stacking order of
other application's windows. A window manager will typically decorate
the windows, that is, provide a border or controls for user manager. It will
invariably do this by creating windows of its own, and arranging them under, or
around, the managed application windows. But there are other applications that
have an important role in a modern desktop system, and which may well not even
be visible: settings managers, desktops, compositing managers, etc. There's
also a trend towards splitting up the window management functionality into
different applications so we now have, for example, interchangeable window
decorators, which just handle the visual elements of window management.
|X in the raw — A couple of appplications running against the X server without all the usual desktop paraphenalia|
X windows and their relationship
In the X server, windows form a heirarchy of parent-child relationships. In
general, the significant feature of a child window is that its position is
specified relative to the position of its parent, and not relevant to the
screen. This makes life a lot easier for programmers, who only have to keep
track of the absolute screen position of a few windows, rather than many. But,
unlike in the Microsoft Windows user interface, a child window in X does
not automatically appear above its parent in the stacking order. It's
entirely possible for the window manager, or the application, to position a
child window beneath its parent.
At the top of the window hierarchy is the root window. All windows are,
directly or indirectly, child windows of the root window. The root window is
not, in general, controlled by a client, although many different clients will
interact with it. The root window belongs to the X server. The very ancient
xsetroot allows administrative control over the root
window. For example:
$ xsetroot -solid black
Should set the background colour.
If you do this on a modern Linux desktop like Gnome, you may be surprised to
find it has absolutely no visible effect whatsoever. That's because on such a
system you never normally see the root window, for reasons I will explain
later. But if you do it on the Xvnc server started earlier:
$ xsetroot -display :3 -solid black
you should see it take effect.
Even though you might not see it, the root window is exceptionally important, for this reason: it's the one and only window that all applications know will exist. And this makes it a crucial element of inter-process communication. X desktop components communicate by means of the X root window — either by sending
messages to it, or by setting properties on it.
Any X window can have properties set on it, and not just by its owner
application. Properties are essentially name-value data pairs, where the value
can take a range of different formats. Any application can query the properties
of another application's window, using the X server as a central repository of
small data elements. For example, a modern window manager will set
properties on the
windows it is managing, to indicate to other components what kinds of things it
is willing to do on those windows. It might be willing to resize them under
user control for example, or it might not. Why might other applications need
to know that? In a modern X desktop, it isn't just the window manager that
has an interest in window states — taskbars, pagers, and other desktop
components all have a role to play, and they rely on the window manager
signalling what it thinks it can usefully do with a window.
Particularly important in desktop organization is the ability to read and write
properties on the root window. For example, the window manager will maintain on
the root window a list of windows it is currently managing. Other applications
(typically application switchers or desktops) will read that list to determine
which applications they can ask the window manager to control.
Window properties are named by atoms. Atoms are a key concept in X programming, although they rarely trouble the end user. An atom is, essentially, a number with a name. Any application can ask the X server to give a number to a name. If the name is already known to the X server, then it will simply give the application the existing number. If it isn't, it will assign a number to that name and use it for all future communication. Without a system of atoms it would be necessary to pass text strings of arbitrary length between the client and the server, which could potentially create problems for network bandwidth. With atoms, each window property is identified by a number at most four bytes long.
The X server communicates with applications by means of messages. X applications typically spend most of their lives waiting for messages. These messages may come from user interaction (key strokes, mouse gestures) or from the server itself. Very importantly for our present purposes, messages can come from other applications. In addition, the X system is democratic, and an application can listen for messages destined for windows that belong to other applications.
Again, the root window is crucial here. An application cannot be sure that any other application will be running, and have a window on screen. But it can be sure that the root window exists. A great many desktop operations work by sending messages to the root window. Desktop components — window managers, panels, etc — register an interest in receiving these messages, and are notified by the X server when other applications post them.
You can't pass a huge amount of data as a window message, which can be a nuisance for the programmer. The reason for this is that in a traditional client-server programming model, all X window message have to pass across the network, which only has limited capacity. So the X specification limits the message size to a few tens of bytes, regardless of format.
An important category of messages for our present discussion are
requests. A request is generated when an application tries to do
something which another application has registered an interest in. For example,
an application can make a function call to set the size and position of its
window. In the 'naked' X server example described above, when the application
makes such a call, the X server will simply honour it, and adjust the window.
But if another application has registered an interest in that window, whether
the window's owner likes it or not, the X server will not honour the
request. Instead, it will send a request message to the the application that
has registered an interest, and in the message it will state the requested size
and position. In practice, that application is usually a window manager. In
this way, the window manager can decide whether to honour the application's
request to change its window, or modify or ignore that request.
X programming and toolkits
It is entirely possible to write applications that interact with the X server
using very low-level operations. In fact, if you're masochistic enough, you can
write an application that does X operations by sending bytes down the wire to
the X server (of course, in a desktop system it won't be a real wire, but it
will behave as if it were). In reality, low-level X programming is quite unusual
these days, except for special purposes, unless you're developing desktop
Unlike Microsoft Windows, the X system does not mandate any user interface. All it provides is screen regions for the applications to draw on. This means that, in practice, an application completely controls its own user interface appearance and behaviour. In practice, the amount of work required to draw and manage a complete user interface makes such things unproductive. Almost all modern X applications are created using toolkits (libraries) of one sort or another, and there are dozens of alternatives to choose from. The toolkit will deal with drawing user interface elements such as menus and buttons, and may undertake many other operations on behalf of the application. Of particular important for our present purposes is the Motif toolkit. Motif was, and is, proprietary, and has no real presence in the Linux world. But the Motif way of doing things became a sort of informal standard, and remains important for desktop systems.
GTK is based on a C programming model, although it can be used with other
programming languages. It has its own object-oriented programming model and
does not require the use of an object-oriented programming language. Qt, on the
other hand, is implemented in C++, and is difficult to use outside of that
language. Both are sustantial software libraries, and create large memory
burdens for applications, compared with low-level X programming. But except in
embedded systems memory is cheap these days, compared to developer time.
The existence of multiple X programming toolkits is a continued annoyance
in the Linux world, both for developers and users,
especially users who have grown used to the dictatorship of
Microsoft. Consider the (apprently-) simple problem of displaying a
text label on a button. All X applications will have access to the
same fonts, but what font to use, and what size? Sometimes it makes sense
to have such decisions under the control of the application, but most
often it does not. Individual toolkits invariably have centralized
configuration repositories, so that all applications based on
the same toolkit, that create
a button, will create it with a text label of the same size and
appearance (and, of course, the same applies to other user interface
elements). Very often the toolkits will contain end-user configuration
tools to simplify the process of choosing look-and-feel. But getting
consistency between toolkits is almost impossible. So
a user can spend time finding a way to make the text in (say) the Web browser
look right, but then still find that the text in the menus is completely
different, and the text in the window captions different again.
In many Linux systems these three sources of text will have to be
configured in three different ways.
And none of this is helped by the fact that some X toolkits are,
well, very ugly indeed, or nasty to use, or both.
For better or worse, in the Linux world the choice of
toolkits, for almost all mainstream desktop applications, has contracted to
GTK and Qt, although other systems still have an almost cultic
following. But there are still plenty of applications based on other
toolkits, including old or ugly ones, and applications that don't use
toolkits at all (most window managers). It's likely that this will continue
to be a headache for users and system administrators for some time to