AP Syllabus focus:
‘Monocular cues such as relative clarity, size, texture gradient, linear perspective, and interposition create depth on flat surfaces.’
Depth perception is not limited to two-eyed (binocular) vision. The visual system can infer distance and three-dimensional layout from patterns in a single retinal image, allowing depth judgments even in photos or with one eye closed.
Monocular Depth Cues: Core Idea
Monocular depth cues are visual signals that require only one eye and help the brain interpret depth and distance from a 2D image (such as a painting, photograph, or the image on one retina).
Monocular depth cues: Depth and distance information available to either eye alone, inferred from pictorial and spatial features in a single visual field.
These cues are especially useful when objects are far away (where some depth information is less noticeable) or when viewing a flat representation of a scene.
Key Monocular Cues Named in the Syllabus
Relative Clarity (Aerial Perspective)
Relative clarity uses the fact that distant objects often look hazy and have reduced contrast because light is scattered by the atmosphere. Closer objects typically appear sharper and more detailed.
Clearer, higher-contrast objects are perceived as closer
Fuzzier, lower-contrast objects are perceived as farther
This cue is strongest outdoors (fog, smog, humidity) but can also appear in any context where the medium reduces contrast.
Relative Size
Relative size compares the sizes of objects (or images of objects) to infer distance, assuming the objects are the same actual size. If two similar objects cast different-sized images on the retina, the brain typically interprets the smaller retinal image as coming from the more distant object.
Smaller retinal image (for an assumed-equal object) → perceived farther
Larger retinal image → perceived closer
This cue depends on reasonable assumptions about what “should” be similar in size (for example, two cars or two people).
Texture Gradient
A texture gradient is the pattern that textured surfaces (gravel, grass, tiles) appear more detailed and coarse up close and finer and denser as they recede into the distance.

The cobblestone surface illustrates texture gradient: nearby stones occupy larger visual angles and show sharper, more separable texture elements, while the same texture compresses into finer, denser detail farther away. This systematic change in apparent texture density is a strong monocular cue that a surface is receding in depth. Source
Large, distinct texture elements → perceived near
Small, tightly packed texture elements → perceived far
Texture gradients help the brain infer depth across continuous surfaces like roads, fields, or carpeting.
Linear Perspective
Linear perspective uses the geometric tendency for parallel lines to appear to converge with distance (like railway tracks or hallway edges). The more the lines converge, the greater the perceived distance.
Greater convergence of parallel lines → perceived greater depth
Vanishing point strengthens the impression of a receding space
This cue strongly supports depth perception in drawings and photographs because it reliably reflects how 3D space projects onto a 2D plane.
Interposition (Overlap)
Interposition (also called overlap) occurs when one object partially blocks another. The blocking object is perceived as closer, and the blocked object as farther.
If Object A covers part of Object B → A is closer, B is farther
Works even with minimal detail, as long as edges and boundaries are visible
Interposition is a powerful cue because it often provides an unambiguous ordering of which surfaces are in front.
How These Cues Work Together
In real scenes and images, the brain typically integrates multiple monocular cues at once rather than relying on only one.
Cues can converge (all indicate the same depth ordering), increasing confidence
Cues can sometimes conflict (for example, relative size suggests one ordering while clarity suggests another), making perception more error-prone
Monocular cues are a key reason people can perceive convincing depth in 2D media (art, film, screens), even though no physical depth exists on the display.
FAQ
They often work well because images can simulate perspective, texture, and haze.
Limits include display resolution, rendering choices, and mismatches between visual cues and head movement.
As you move, nearer objects shift across your visual field faster than distant ones.
This dynamic cue can be available with one eye, but it requires observer movement to generate depth information.
People with more exposure to pictures, photographs, or “carpentered” environments may rely more on perspective cues.
Less exposure can reduce sensitivity to certain pictorial conventions.
Strength depends on consistency and precision: coherent vanishing points, realistic gradients, and appropriate contrast patterns.
Small inconsistencies can weaken depth or create ambiguity.
Yes—unusual lighting, heavy fog, mirror reflections, or forced-perspective setups can distort clarity, size, and convergence information.
When cues conflict strongly, perceived depth can be inaccurate.
Practice Questions
Explain what is meant by interposition as a monocular depth cue. [2 marks]
1 mark: Identifies that one object overlapping/covering another provides depth information.
1 mark: States that the overlapping (blocking) object is perceived as closer and the blocked object as farther.
A student looks at a photograph of a straight road stretching into the distance. Using monocular depth cues, explain how the student can perceive depth in the photo. Refer to at least three named cues. [6 marks]
1 mark each (max 3): Correctly names three cues (e.g., linear perspective, texture gradient, relative clarity, relative size, interposition).
1 mark each (max 3): Correctly explains each named cue in the context of the road photo (e.g., converging edges; texture becomes finer; distant areas look hazier; objects appear smaller when farther away).
