Notice especially that there are at least as many fibers (actually many more!) coming back from each stage of processing to an earlier stage as there are fibers going forward from each area into the next area higher up in the hierarchy. The classical notion of vision as a stage-by-stage sequential analysis of the image, with increasing sophistication as you go along, is demolished by the existence of so much feedback. What these back projections are doing is anybody’s guess, but my hunch is that at each stage in processing, whenever the brain achieves a partial solution to a perceptual “problem”—such as determining an object’s identity, location, or movement—this partial solution is immediately fed back to earlier stages. Repeated cycles of such an iterative process help eliminate dead ends and false solutions when you look at “noisy” visual images such as camouflaged objects (like the scene “hidden” in Figure 2.7).3 In other words, these back projections allow you to play a sort of “twenty questions” game with the image, enabling you to rapidly home in on the correct answer. It’s as if each of us is hallucinating all the time and what we call perception involves merely selecting the one hallucination that best matches the current input. This is an overstatement, of course, but it has a large grain of truth. (And, as we shall see later, may help explain aspects of our appreciation of art.)
FIGURE 2.6 David Van Essen’s diagram depicting the extraordinary complexity of the connections between the visual areas in primates, with multiple feedback loops at every stage in the hierarchy. The “black box” has been opened, and it turns out to contain…a whole labyrinth of smaller black boxes! Oh well, no deity ever promised us it would be easy to figure ourselves out.
FIGURE 2.7 What do you see? It looks like random splatterings of black ink at first, but when you look long enough you can see the hidden scene.
The exact manner in which object recognition is achieved is still quite mysterious. How do the neurons firing away when you look at an object recognize it as a face rather than, say, a chair? What are the defining attributes of a chair? In modern designer furniture shops a big blob of plastic with a dimple in the middle is recognized as a chair. It would appear that what is critical is its function—something that permits sitting—rather than whether it has four legs or a back rest. Somehow the nervous system translates the act of sitting as synonymous with the perception of chair. If it is a face, how do you recognize the person instantly even though you have encountered millions of faces over a lifetime and stored away the corresponding representations in your memory banks?
Certain features or signatures of an object can serve as a shortcut to recognizing it. In Figure 2.8a, for example, there is a circle with a squiggle in the middle but you see a pig’s rump. Similarly, in Figure 2.8b you have four blobs on either side of a pair of straight vertical lines, but as soon as I add some features such as claws, you might see it as a bear climbing a tree. These images suggest that certain very simple features can serve as diagnostic labels for more complex objects, but they don’t answer the even more basic question of how the features themselves are extracted and recognized. How is a squiggle recognized as a squiggle? And surely the squiggle in Figure 2.8a can only be a tail given the overall context of being inside a circle. No rump is seen if the squiggle falls outside the circle. This raises the central problem in object recognition; namely, how does the visual system determine relationships between features to identify the object? We still have precious little understanding.
FIGURE 2.8 (a) A pig rump.
(b) A bear.
The problem is even more acute for faces. Figure 2.9a is a cartoon face. The mere presence of horizontal and vertical dashes can substitute for nose, eyes, and mouth, but only if the relationship between them is correct. The face in Figure 2.9b has the same exact features as the one in Figure 2.9a, but they’re scrambled. No face is seen—unless you happen to be Picasso. Their correct arrangement is crucial.