• Articles about computing
• Articles about graphics and illustration
Perception and reality in photographs
Why is it that unedited photographs never seem to match our recollections
of the original scene? Among serious and professional photographers I've
noticed a reluctance to post-produce images and, indeed, it is easy to
end up with an 'overcooked' look. But is the image captured by the camera
an accurate record of our perception of the original scene?
Should it even attempt to be?
I've found that people who are not photographers seem invariably to prefer
images that photographers regard as overcooked — why should that be?
Below are four images of a scene in West Cornwall, all derived from the
same raw image. The original photograph was taken with a Nikon D300 on a
spring day that was bright and sunny,
but with distinct cumulus clouds. Exposure was metered such that the brightest
part of the sky was just short of clipping.
Image 1 shows the image as rendered by PhotoShop's raw converter, on default
settings. The sky is broadly as I remember it, although I remember it as
bluer, and with more contrast in the clouds. But, as you might expect
with such a set-up, the foreground is very dull. It looks as if it's about
to pour with rain. My recollection is of lush, green grass with reddish-brown
tufts. I can certainly remember being able to see detail in the stonework and
yet, in the image, this is all in shadow.
Image 2 shows the result of increasing the exposure by 2.5 stops
(or thereabouts) in PhotoShop. Now the grass looks more as I remember it,
but of course the sky is completely blown — it's a uniform white.
Again, this image does not match my recollection at all.
Image 3 show the result of applying a (simulated) ND graduated filter
with a 3 stop reduction in the sky. I've fiddled with the
overall exposure a bit and positioned the filter so as to minimise
the darkening of foreground features (although some is inevitable).
This image is pretty close to how I recall the scene when I was there.
However, I remember the colours being more vivid — in particular the
blossoms on the heather seemed to be more strikingly yellow.
Image 4 shows the image that best matches my memory of the scene — it is
the same as image 3 but with a 30% increase in vibrance. It is not only — to my eye — a much more appealing picture than image 1, it is also
more accurate than image 1, despite the post-processing.
It's common to blame limitations in camera technology, or the photographer,
when photographs don't look as we think they should. The real problem,
however, is that the human eye/brain system is not by any means a perfect
optical instrument. Engineers strive to make cameras and lenses more optically
accurate, with higher pixel density, better linearity, and less
distortion; and yet poor resolution, non-linearity, and distortion are
essential features of the human eye.
How is it that the eye, with all its faults, can see detail in clouds
and in ground shadows despite, perhaps, an 8-stop difference
in light intensity, when the best unedited photograph can not?
The answer is that it can't — it just pretends to.
When you look at a scene, your eyes are not still, as a camera tends to be.
Your eye will make saccades at a rate of 3-4 per second, even
if you fixate your gaze on a particular point. Each saccade will move
the eye's axis by a fraction of a degree. In addition, your eye will
make less frequent, larger movement. One such movement is the
vergence response, by which the two eyes align on different parts
of the scene successively in order to assess distance. These movements
are not under conscious control and are largely imperceptible. However,
without them we would be unable to see at all. The reason is that the
the light-sensitive cells in the retina respond primarily to
changes in light intensity, rather than to absolute intensity.
If the eyeball really were fixated at one point, the image would rapidly
be lost entirely as the retina adapted to the constant light.
It is this adaptation that allows us to discern details in parts of the
scene that are lit very differently, even though the camera cannot. In short,
our perception of a scene is not an optically accurate record at all — it is a composite image in the brain, formed from thousands of slightly
different images, each adjusted according to the prevailing light level.
The scene that we recall truly does not exist as an independent reality — it is a mental construct.
This is why applying a 3-stop ND grad filter creates a photograph that is
a more accurate representation of our perception of the scene, despite
its being a gross optical distortion.
What about colour? Research suggests that people remember colours as being
more saturated than they really are — the reason for this is
not entirely clear. In addition, people have a pure-colour bias; that is,
yellow with a tinge of green seems to be recalled as pure yellow. The
human eye does not discriminate colours in the same way as a camera
(film or digital) does — it does not measure the intensities of a number
of primary colours. Rather, the eye contains three kinds of colour
that each respond to a range of overlapping wavelengths of light. Some
colours stimulate all three kinds of cone, while some stimulate only one.
Colours that stimulate only one kind of receptor (e.g., close shades of
red) are hard to discriminate. Colours in the green-yellow
region tend to be easier to discriminate, as they stimulate two or even
three cone varieties. This mechanism of colour perception explains why
we see a rainbow as being comprised of distinct colours
(red-orange-yellow-etc) when in fact the colours of a rainbow form a
smooth variation in light wavelength.
Human colour perception is extremely complex, but it's clear that the one
thing it isn't is a system for registering exact light wavelengths.
This, perhaps, explains why we tend to remember colours as being nearer
to primary colours than they really are — we tend to favour colours
that most effectively stimulate the different types of cone differentially.
Rather than using a simple ND grad filter, a more modern approach to
adjusting for perceptual characteristics is to use a tone mapping
approach. Tone mapping is widely used in high dynamic range (HDR) imaging,
which seeks to create a composite image from a number of frames with different
exposure characteristics. In this example I don't have an HDR image set,
but tone mapping can be used whenever the dynamic range of the image is
greater than that of the display medium. The D300 has a dynamic range
of about 10 stops, as compared to the 7-8 stops of a JPEG image.
Image 5 shows the result of applying Mantuik's tone mapping method to
the camera raw data. It is, again, much as I perceived the original
scene, with the advantage that we don't have the darkening of foreground
features caused by the unadaptive profile of the ND grad filter.
In a sense, tone mapping amounts to applying an ND filter selectively,
according to local features of the scene.
Image 1: as PhotoShop's raw converter has it. The sky looks as I remember
it, but I don't recall it being so, well, dark.
Image 2: exposure adjusted +2.5 stops. Foreground looks right, but
I don't remember the sky being a whitewash like that.
Image 3: with a (simulated) ND grad filter of -3 stops.
Image 4: as image 3, but with vibrance increase 30%. Now
that's how I remember it.
In short, if the purpose of a photograph is to create an accurate record of the
perceived scene (and this, of course, is arguable), it may be necessary to
introduce substantial distortions of brightness and colour to achieve
this purpose. An unedited photograph is simply too optically accurate
to match human perception.
Image 5: Using the Mantuik HDR tone mapping operation.