• Articles about computing
• Articles about software development
OpenShift for engineers
This article provides a brief overview of the technical implementation
of the OpenShift cloud hosting platform. It assumes no knowledge of
OpenShift or cloud computing in general. The article is written
mostly for engineers who will develop applications for the OpenShift
platform, or support applications running on that platform. There is a
bit of a bias towards Java-based applications, because that's where
I have most experience. However, the general principles should apply
to other languages and technologies.
Cloud computing — an engineer's perspective
What is this cloud thing, anyway?
Cloud computing is all about providing managed services to subscribers.
The idea is not new in its concept — businesses have been managing
computing hardware and software for their customers for decades. What is
new is that recent enormous increases in Internet bandwidth mean that
a group of service can be hosted in some central location, perhaps distant
not only from the end users of those services, but also from the
businesses that will
provision them with applications and data.
Cloud services do not usually provide complete, packaged applications.
There are notable exceptions, of course — Google Docs is a service
that provides a set of document management applications, with related
document storage. Most cloud services, however, provide some sort of
hosting environment, on which a subscriber can deploy custom applications.
Very broadly, it has become common to divide cloud service offerings
into two basic types: infrastructure-as-a-service (IaaS) and
platform-as-a-service (PaaS). Broadly, OpenShift is a PaaS service.
Usually, IaaS offerings provide the subscriber with some well-defined
share of an operating system, hardware, and network infrastructure.
To the subscriber, the service
might be nothing more complex than an operating system user account,
with some way to
log into it (SSH, perhaps) and some way upload software to it (sFTP, perhaps).
As a subscriber you'll be
able to run an application that listens for clients on some TCP port,
and the service
provider's infrastructure will connect that port to some hostname
with Internet presence. As a subscriber to a service of this
type you might be able to see some
sign of other subscribers' activity. You might be aware that
your home directory is
/home/user1234, for example,
and draw the reasonable conclusion that there are at least 1233 other
subscribers. You might see that there are operating system processes
that have no connection with your own activities. In a
well-designed IaaS offering you won't be able to disrupt these processes,
or mess about with other user's files,
but you'll know they're there.
A more sophisticated IaaS offering might take the form of a genuine
virtual machine. As a subscriber you might have what appears to be complete,
exclusive access to an operating system and hardware. You might get
that there is only one network interface on the system, and no processes
except those you create. You might get administrator ('root')
access to the virtual machine. True virtual machines make for better
IaaS services than simple schemes like operating system user accounts,
because subscribers are completely isolated from the
activities of other subscribers, and can develop applications with little
regard for the implementation of the service itself. However, these
services have vey high overheads, and do not necessarily make good
platforms on top of which to offer Platform-as-a-Service schemes.
A compromise between the simplest, account-based services, and full
virtual machines, is some sort of lightweight container, such as
LXC or Docker. OpenShift currently uses plain Linux user accounts and
SELinux security policies to isolate one application from another, but it is
likely that it will transition to Docker containers eventually.
A PaaS offering provides more than just an operating system and a network
interface — it provides a managed runtime environment that can host
applications of a certain type. Where development is to be in Java,
the runtime environment may consist of some sort of Java-based
for example) or application server (JBoss EAP), or a container for
OSGi bundles (Fuse, Karaf). Naturally, for Java development the service
will have to provide a Java Virtual Machine and probably other
Java development tools as well.
IaaS offerings are not limited to Java, of course —
other programming languages have their
own particular ways of bundling and deploying executables.
The user of a PaaS service will have a defined way to supply executables
to the service. For Java, that might simply be a Web interface
by which to
upload a JAR or WAR file. In OpenShift, most subscribers will upload
either compiled code or source code to a git repository hosted on
OpenShift. The OpenShift infrastructure will read the git repository
and provision the service with the executable code.
In general, users of an IaaS service get some infrastructure too —
underneath your Java Virtual Machine, or Perl interpreter, or whatever,
there will have to be an operating system, with a filesystem and all the
usual bits and pieces. However, subscribers are generally shielded
from the operating system in services of this sort, and access to the
underlying operating system is not generally encouraged by service
operators. As far as practicable, OpenShift subscribers are expected
to interact with the service through an application's git repository.
and through specific OpenShift management tools.
Public and private cloud services
An interesting feature of cloud technology is that the same infrastructure
can be used to provide a public service to general subscribers, or
an internal service within a particular business. Red Hat, for example,
provides the public
service, to which anyone can subscribe.
OpenShift Online is based on the OpenShift Origin project, which is
an open-source PaaS implementation. But OpenShift Origin can also be
used internally by organizations, perhaps to simplify and centralize
their IT infrastructure. Red Hat provides a supported commercial
offering, OpenShift Enterprise, based on Origin, that businesses
can use to implement their own clouds.
The OpenShift platform
OpenShift is a PaaS provider — it provides a set of runtime environments
onto which subscribers can deploy code developed using particular
technologies and programming languages.
Brokers and nodes
An openshift platform installation consists of brokers and nodes.
A broker provides the administative interface to the service, by
allowing subscribers to create, modify, and delete application
containers called gears.
The concept of a gear is a central one, and I will describe it in
much more detail shortly.
OpenShift provides both a Web-based interface and a command-line tool
rhc) by which subscribers interact with the broker.
A node is any (real or virtual) machine on which subscriber's applications
are deployed. In general, a subscriber's interactions will be mostly
with the specific node or nodes which host the application's gears; access
to the broker is generally only needed to set up new applications, or
A set of gears managed by a particular subscriber is known as a domain.
There is a loose correspondence with a DNS domain, as by default each
gear will have a DNS name based on the subscriber's username.
The specific application runtime environment is provided by a cartridge — another important concept which is described later. This
basic architecture is shown below.
A gear is the basic unit of hosting in OpenShift. At present, a gear
is essentially a Linux user account, with a set of SELinux security
policies. It's not a true virtual machine, or even a lightweight
container like LXC. This simplistic architecture is justified by
the fact that subscribers do not usually interact directly with
gears — their technical implementation should be invisible. To the
subscriber a gear is a unit of resource — a certain amount of
disk space, a certain amount of RAM, etc.
Although a gear is a user account on a particular
OpenShift node, it is not a subscriber account.
That is, it is not the case that a particular subscriber
has a particular user account on a particular node. Instead,
gets its own user account. A particular subscriber may own
a large number of applications, each with its own gear, and therefore
its own user account. The gear infrastructure therefore isolates
not only one subscriber from another, but one application from other
applications from the same subscriber. Readers who are familiar
with Android application development may be interested to know that this
is almost exactly the same model that Android uses to isolate apps
from one another.
When a gear is assigned to a particular node by the OpenShift broker,
it will be allocated a machine-generated user ID and a network
interface. Because all gear accounts are unprivileged,
applications can bind only to IP ports numbered above 1024. However,
the OpenShift infrastructure takes care of mapping user-friendly
port numbers and Internet hostnames to the the gears's ports.
By convention, for example, Web browsers will use the HTTP port
80 or HTTPS port 443, and these will be proxied to (by default)
port 8080 on the gear's network interface.
Subscribers can log into gears using SSH, and will see a pretty
conventional Linux filesystem. Each gear has a directory under
/home, and the usual directories
/proc, etc., are present. The actual layout of files
in the gear directory will depend on the cartridge that was chosen
to populate the gear (see below).
OpenShift uses the Cgroups ('Control Groups') system to allocate
resources to gears. When a gear is created it will be allocated
a particular amount of resource. For simplicity, the subscriber is
typically offered the choice of 'small', 'medium', or 'large',
each of which corresponds to a particular allocation of
disk, CPU, RAM, etc. The exact specifications for each gear size
will depend on the installation; the free public OpenShift services
offers three 'small' gears, with 1Gb storage and 512Mb RAM per gear.
Although a gear is a unit of resource, subscribers generally can not
pool resources from multiple gears to the same application. It
isn't possible, for example, to use the three small gears of the
free service to create one application with 3Gb storage and 1.5Gb RAM.
However, it's possible to replicate the application on multiple gears
using OpenShift's built-in scaling mechanism, and share
client load between the gears.
OpenShift gears use client certificates for authentication, not user/password credentials. Since each gear has a separate user identity in Linux, user IDs would be unhelpful. Part of the preparation to use OpenShift is to create a keypair for SSH, and upload the public key to the subscriber's profile
Within a particular subscriber's account, an application is the unit
of administation. Using the OpenShift console or
command-line tool, subscribers create applications; the infrastructure
creates and populates the associated gears, according to the type of
An application does not correspond to an operating system process
although, very often, a application will indeed consist of one process.
A Tomcat-based application may, for example, have one instance of the
Java runtime, executing the Tomcat container. There are no restrictions on
the number of processes that an application can create, other than
the obvious one — memory; but there are only a limited number of TCP ports
available to connect the gear with the outside world.
Since the subscriber has SSH and sFTP access to the gear, the application
can, in principle, be absolutely anything, so long as it will run on
the Linux platform within the resources allocated. However, OpenShift
is a PaaS offering, and developers are expected to base their applications
Cartridges are what make OpenShift a platform, rather than an infrastructure,
cloud. When an application is created, using the console or the
command-line tool, it will always be based on some sort of cartridge.
In brief, the cartridge is a specification for OpenShift to populate
the gear. If a subscriber creates an application based on the
Tomcat cartridge, for example, OpenShift will create a gear on a particular
node, and the cartridge will install Tomcat in that gear.
As well as populating the gear, the cartridge serves as a mediator between
the OpenShift infrastructure and the developer. The diagram below illustrates
this with respect to a Tomcat cartridge, but the same basic
principles apply to most OpenShift cartridges.
A key feature of most cartridges is their support for build-on-deploy
provisioning. All cartridges are provided with a git repository, and
many cartridges will accept source code directly into this repository.
In the Tomcat example, the source code is expected to be based on a
Maven project (Maven is an automated build tool for Java
applications). If a Maven project is pushed to the repository,
code in the cartridge will invoke Maven to compile and package
it, and then install the compiled code in Tomcat. There are also
ways to push precompiled code and have that deployed, for situations
where the use of Maven would be inappropriate.
From a Tomcat developer's perspective, therefore, OpenShift is pretty
transparent. Java developers are generally used to working with git
repositories and Maven, so deploying the application on OpenShift
consists of little more than pushing the application's source code
to the OpenShift git repository using an SSH URL.
If git is the primary interface between the cartridge and the developer,
"hook scripts" are the primary interface to the OpenShift intfrastructure.
In practice, all cartridges will provide at least two scripts:
control. In fact, only the latter is
mandatory, and there are a whole bunch of others that might be invoked
at certain times in the life of the cartridge if they are provided.
The function of the
setup script should be fairly obvious:
this is invoked by OpenShift as soon as the gear has been created,
and is expected to install whatever software is appropriate to the
cartridge (Apache Tomcat, in this case). The
starts and stops the application; in practice it might simply invoke
appropriate scripts or binaries in the software that was installed
In implementation, a cartridge is simply a text file, whose name conventionally
.yml. The OpenShift broker will present a list
of known cartridges whose YML files have been registered, but a subscriber can
also provide the URL of a YML file outside the system. OpenShift will retrieve
it and, provided it is formatted correctly, process it as a cartridge.
The YML file will have at least one crucial entry: a source URL. This
is the URL of a bundle of software that will be retrieved and unpacked
in the newly created gear. This software must contain at least the
control script, as described above. Sometimes the
bundle identified by the source URL will not necessarily be the complete
package of software needed to build the gear. In such a case, the
setup script will retrieve additional software as appropriate
to that cartridge type. Note that there are no restrictions on outbound
Internet connections from within gears although, of course, there
are on inbound connections.
The YML file will also specify TCP port mappings needed by the cartridge,
and a bunch of environment variables to which the hook scripts may refer,
and which can be manipulated by the administration tools.
An application (and thus a gear) can be populated with more than one
cartridge. In most cases, where multiple cartridges are applied,
there will be a primary cartridge and one or more subsidiary cartridges.
The subsidiary cartridges add features which are typically used by
many different application technologies, such as relational databases.
Working with OpenShift as a developer
This sections discusses some of the ways that developers and deployers
OpenShift, that might be different from non-cloud development.
Administration and deployment of applications
OpenShift subscribers can interact with the service in a number of
ways: via the broker's web interface, using the
command-line tool, uploading code or data using git, or SSH direct
to the gear.
The web interface
Most routine administration operations can be carried out using the
web user interface to the OpenShift broker: create or delete an application,
restart and application, check its status, and add features.
The web user interface is clear and easy to use for these simple
operations. The screenshot below shows that I have created an
tomcat, that it is running, and
that it was populated by two cartridges — one for Tomcat 6, and
one for the MySQL database.
The command-line tool
The command-line tool
rhc is written in Ruby and is
available for many
different platforms (see
here for details).
On most Red Hat Linux systems, you should just be able to do
$ sudo yum install rubygem-rhc
It's worth bearing in mind that the version of
rhc in the
standard repositories for Linux versions earlier than Fedora 19 is
likely to be quite out-of-date, and you might be better to install
manually as described in the documentation referenced earlier.
Having obtained the tool, the usual first step is to run
$ rhc setup
This will create and upload the client certificate needed for authentication,
and record the OpenShift broker location for future use.
defaults to using the public OpenShift service, but this default can
be overridden when running
rhc can be used to create and delete applications,
retrieve log files, and perform
all the same operations as web console. However,
rhc potentially has the advantage that it can pass arguments
to the cartridge at creation time, and thus configure it more appropriately
to the developer's needs.
One use of
rhc which deserves a special mention
(because there is no equivalent in the web console), is its support for
port tunnelling. Debugging on OpenShift can be tricky because there is
very limited access to the gear that is hosting the application. You
can't attach a Java debugger to it, for example. Running
rhc portmap will create a tunnel over SSH for all ports
on which the cartridge's application is listening. On the workstation,
rhc will open ports corresponding to all the cartridge
ports. It is therefore possible to tunnel debug traffic and related
communications into the gear.
Provisioning using git
As discussed earlier, OpenShift favours git for provisioning the application.
The content pushed to the gear's git repository will vary according to the
cartridge. In the Tomcat case discussed above, the developer was expected
to push a Java Maven project. For other cartridges the developer may
push Perl scripts, or HTML documents, or whatever else is meaningful to
the cartridge. It is the cartridge's responsibility to interpret the
git repository in an appropriate way.
SSH to the gear
The lowest level of access is to start an SSH session on the gear itself,
and manipulate it using Linux utilities.
It is hoped that an application that runs on desktop system will run
on OpenShift with the appropriate cartridge. Developers should not,
in principle, need to know much about OpenShift, as the cartridge
abstracts away the low-level details.
However, there are a few things to watch out for.
OpenShift applications run in an environemnt with strict resource controls.
Your application won't be able to starve another of resources, even
temporarily. Proper attention to scaling and sizing is therefore even
more important in OpenShift that in conventional deployment.
Applications are expected to read and write files only within their
specific gear (user account); ideally, they should read and write only
files that can be provisioned using git. It's possible to use
SSH/sFTP to provide arbitrary files to the gear, but this makes it
hard to move an application to a different gear, should that become
The OpenShift environment will run a particular version of Linux, and
provide particular versions of common utilities like the Java JVM.
Although various Java versions are provided, it's tricky to configure a
Java-based cartridge to use a JVM other than its default. Moreover, all
the JVMs provided are variants of OpenJDK, not the Sun/Oracle product;
a Java application that relies on particular features of the Sun/Oracle
JDK will not work. There are similar limitations to utilties like
Perl and PhP. Ideally, applications should be developed to be reasonably
independent of specific utility versions.
On the public OpenShift service, applications owned by free subscriptions
are subject to idling — they will be shut down after a certain
period of inactivity. Inactivity is generally taken to mean no HTTP(S)
or SSH connections. Even where applications are hosted on a paid subscription,
developers need to be aware that they don't necessarily have control of
the cartridge's lifecycle, and applications should be robust about
unplanned shutdowns (of course, that's not just true in the cloud
environment, but problems of this kind are particularly troublesome here).
Not a virtual machine
An OpenShift gear is not a virtual machine — an application does not
have exclusive access to any system resource. SELinux security policies
(which are quite strict) act to restrict access to resources
that cannot safely be shared.
To take just one example, your application
won't be able to bind a socket to
this interface cannot safely be shared in a multi-tenancy environment
like OpenShift. Your application can bind only to the IP number which
its gear has been assigned. An application can find its IP number
using an environment variable if necessary, but it's best if applications
don't need to do this, for obvious reasons.
There are many subtle differences of this kind between working on
OpenShift and working in a non-cloud environment. While the developer
who works with OpenShift directly will find out about them soon enough,
and find ways to avoid their effects, the developers of third-party
libraries may not have that experience.
OpenShift is a platform for deploying application code using particular
technologies. So far as possible, low-level details of the
the platform and operating system are concealed from the developer, and
development for OpenShift should be broadly the same as for any other
environment. However, there are a number of limitations, and a knowledge
of the technical implementation is often very useful when it comes