Logo Computer scientist,
engineer, and educator
• Articles • Articles about computing • Articles about graphics and illustration

Perception and reality in photographs

Why is it that unedited photographs never seem to match our recollections of the original scene? Among serious and professional photographers I've noticed a reluctance to post-produce images and, indeed, it is easy to end up with an 'overcooked' look. But is the image captured by the camera an accurate record of our perception of the original scene? Should it even attempt to be? I've found that people who are not photographers seem invariably to prefer images that photographers regard as overcooked — why should that be?

Below are four images of a scene in West Cornwall, all derived from the same raw image. The original photograph was taken with a Nikon D300 on a spring day that was bright and sunny, but with distinct cumulus clouds. Exposure was metered such that the brightest part of the sky was just short of clipping.

Image 1 shows the image as rendered by PhotoShop's raw converter, on default settings. The sky is broadly as I remember it, although I remember it as bluer, and with more contrast in the clouds. But, as you might expect with such a set-up, the foreground is very dull. It looks as if it's about to pour with rain. My recollection is of lush, green grass with reddish-brown tufts. I can certainly remember being able to see detail in the stonework and yet, in the image, this is all in shadow.

Image 2 shows the result of increasing the exposure by 2.5 stops (or thereabouts) in PhotoShop. Now the grass looks more as I remember it, but of course the sky is completely blown — it's a uniform white. Again, this image does not match my recollection at all.

Image 3 show the result of applying a (simulated) ND graduated filter with a 3 stop reduction in the sky. I've fiddled with the overall exposure a bit and positioned the filter so as to minimise the darkening of foreground features (although some is inevitable). This image is pretty close to how I recall the scene when I was there. However, I remember the colours being more vivid — in particular the blossoms on the heather seemed to be more strikingly yellow.

Image 4 shows the image that best matches my memory of the scene — it is the same as image 3 but with a 30% increase in vibrance. It is not only — to my eye — a much more appealing picture than image 1, it is also more accurate than image 1, despite the post-processing. Why?

Image 1: as PhotoShop's raw converter has it. The sky looks as I remember it, but I don't recall it being so, well, dark.

Image 2: exposure adjusted +2.5 stops. Foreground looks right, but I don't remember the sky being a whitewash like that.

Image 3: with a (simulated) ND grad filter of -3 stops.

Image 4: as image 3, but with vibrance increase 30%. Now that's how I remember it.
It's common to blame limitations in camera technology, or the photographer, when photographs don't look as we think they should. The real problem, however, is that the human eye/brain system is not by any means a perfect optical instrument. Engineers strive to make cameras and lenses more optically accurate, with higher pixel density, better linearity, and less distortion; and yet poor resolution, non-linearity, and distortion are essential features of the human eye.

How is it that the eye, with all its faults, can see detail in clouds and in ground shadows despite, perhaps, an 8-stop difference in light intensity, when the best unedited photograph can not? The answer is that it can't — it just pretends to.

When you look at a scene, your eyes are not still, as a camera tends to be. Your eye will make saccades at a rate of 3-4 per second, even if you fixate your gaze on a particular point. Each saccade will move the eye's axis by a fraction of a degree. In addition, your eye will make less frequent, larger movement. One such movement is the vergence response, by which the two eyes align on different parts of the scene successively in order to assess distance. These movements are not under conscious control and are largely imperceptible. However, without them we would be unable to see at all. The reason is that the the light-sensitive cells in the retina respond primarily to changes in light intensity, rather than to absolute intensity. If the eyeball really were fixated at one point, the image would rapidly be lost entirely as the retina adapted to the constant light.

It is this adaptation that allows us to discern details in parts of the scene that are lit very differently, even though the camera cannot. In short, our perception of a scene is not an optically accurate record at all — it is a composite image in the brain, formed from thousands of slightly different images, each adjusted according to the prevailing light level. The scene that we recall truly does not exist as an independent reality — it is a mental construct.

This is why applying a 3-stop ND grad filter creates a photograph that is a more accurate representation of our perception of the scene, despite its being a gross optical distortion.

What about colour? Research suggests that people remember colours as being more saturated than they really are — the reason for this is not entirely clear. In addition, people have a pure-colour bias; that is, yellow with a tinge of green seems to be recalled as pure yellow. The human eye does not discriminate colours in the same way as a camera (film or digital) does — it does not measure the intensities of a number of primary colours. Rather, the eye contains three kinds of colour receptors (cones) that each respond to a range of overlapping wavelengths of light. Some colours stimulate all three kinds of cone, while some stimulate only one. Colours that stimulate only one kind of receptor (e.g., close shades of red) are hard to discriminate. Colours in the green-yellow region tend to be easier to discriminate, as they stimulate two or even three cone varieties. This mechanism of colour perception explains why we see a rainbow as being comprised of distinct colours (red-orange-yellow-etc) when in fact the colours of a rainbow form a smooth variation in light wavelength.

Human colour perception is extremely complex, but it's clear that the one thing it isn't is a system for registering exact light wavelengths. This, perhaps, explains why we tend to remember colours as being nearer to primary colours than they really are — we tend to favour colours that most effectively stimulate the different types of cone differentially.

Rather than using a simple ND grad filter, a more modern approach to adjusting for perceptual characteristics is to use a tone mapping approach. Tone mapping is widely used in high dynamic range (HDR) imaging, which seeks to create a composite image from a number of frames with different exposure characteristics. In this example I don't have an HDR image set, but tone mapping can be used whenever the dynamic range of the image is greater than that of the display medium. The D300 has a dynamic range of about 10 stops, as compared to the 7-8 stops of a JPEG image. Image 5 shows the result of applying Mantuik's tone mapping method to the camera raw data. It is, again, much as I perceived the original scene, with the advantage that we don't have the darkening of foreground features caused by the unadaptive profile of the ND grad filter. In a sense, tone mapping amounts to applying an ND filter selectively, according to local features of the scene.

Image 5: Using the Mantuik HDR tone mapping operation.

In short, if the purpose of a photograph is to create an accurate record of the perceived scene (and this, of course, is arguable), it may be necessary to introduce substantial distortions of brightness and colour to achieve this purpose. An unedited photograph is simply too optically accurate to match human perception.

Copyright © 1994-2013 Kevin Boone. Updated Apr 25 2012