What is Computer Vision? – Part 1: Human Vision

This is the first in a series of posts by Blippar's Strategic Planner Sam Ashken on computer vision for non-technical people. If that’s you, then read on! As a foundation, we’re going to start with a couple of posts on human vision. It’s easier to understand how computers see by comparing with how people see.

I am the father of an eight-month-old and it has been amazing to watch my son’s eyesight develop. It's a reminder that something which feels so natural as an adult is in fact a skill that is improved over time. So what’s actually happening when we see?

Vision is a two stage process:

• In the first stage, the eyes take in light which reflects off the objects all around us, and the retina converts the 3D objects in view into 2D images.
• Then, our brain's visual system “rebuilds” a 3D model of the world based on the information in the 2D image.

At first glance, it might seem like the hard part is the first stage – turning light from 3D objects into 2D images. After all, many of us struggle to turn 3D objects into 2D drawings. But this stage is actually relatively straight-forward compared to the really hard part of vision: choosing the correct interpretation of the 2D image on the retina and building the correct 3D model. In theory, a 2D image on the retina could have been generated by many different 3D objects. So how does the visual system know which one to choose from?

This kind of challenge, where a great deal is open to interpretation, is often referred to as an "ill-defined problem," and the visual system has to solve many of these. Here is a simple example:

Eiffel Tower visual trick

Why does the visual system temporarily get tricked by this sort of image, and how does it ultimately build the correct model of what’s going on here?

Depth perception, the reason why photos like the one above temporarily trick our brains, is a classic example of an ill-defined problem which the visual system must solve. To do so, it relies on many tactics. One example is stereoscopic vision (having two eyes).

With stereoscopic vision, your brain has a pair of 2D images but different points of view to compare. The position of nearby objects will vary more across the two images than the position of more distant objects will. This allows the visual system to draw conclusions as to which object is closer.

Another technique is with shadows. In the image below, the balls appear in the exact same positions above the grid of sqaures, but the difference in their shadows allows you to have two different interpretations of how close or distant they are.

Shadows visual trick

Texture and how texture changes with depth also helps to solve depth perception. In the photo below, you can see the texture pattern of a road becoming gradually smaller into the distance.

Road texture depth perception

And finally, prior knowledge of the relative size of objects is key to depth perception. These knowledge assumptions are what is being tricked in the Eiffel Tower image above.

As you can see, the visual system has many ways of solving the ill-defined problem of depth perception. Together they make for a very reliable system for correctly building the right 3D model of reality based on 2D images which are open to many interpretations.

In the next post in this series, we'll look briefly at a second ill-defined problem which the visual system needs to solve before looking at the role of probabilistic thinking in human vision in general.

Before then, a quick visual theory cliff-hanger! What’s this?

What is this?

Please click here for part 2 of this series around Computer Vision

Note: A number of the examples in this post are from Chapter 4 “The Mind’s Eye” of Steven Pinker’s How the Mind Works.

Sam Ashken


Sam Ashken