How do we process multiple-person scenario?

(Project THEMPO – ERC StG)


Trani, Puglia. Photo by Giovanni Albore

The ability to quickly recognize conspecifics among other entities and to understand rapidly what they are doing, is a fundamental requirement for a social species such as the human. In the past decades, cognitive neuroscientists have gained important knowledge on these abilities, by studying how the human mind/brain recognizes a face or a body, and infers the goal and intention behind the current body pose or activity. However, in real life, a person is most often seen among other persons. For instance, when there is one person who is speaking, it is very likely that she is nearby one or more individuals who are listening. The readiness with which an observer understands that people in the scene are communicating, makes it unlikely that this understanding is achieved through a sequential, one-by-one processing of each person’s face, body posture, motor activity, goal, intention, emotion, and so on. Our work starts from the hypothesis that the human brain must be equipped with specialized mechanisms to parse the environment, select the portion in which a social interaction is happening, and process that portion of the environment as a whole. Our main objective is to explain how this is done, in terms of cognitive stages and neural operations. In this effort, we take advantage of cognitive manipulations, neurostimulation methodology (TMS) and advanced analytical approaches to decode information latent in neural patterns recorded with functional neuroimaging (fMRI).


Seeing People


Not all the objects in the environment around us have the same chance to attract our attention. Over the past decades, cognitive neuroscientists have gathered evidence showing that a human face or a human body is special to our mind/brain, that is, it is processed with the highest priority and more efficiently than other objects in the environment. But how does the system behave when, instead on one person, we see two (or more) persons? The processing of complex scenes involving many actors is closer to most real-world situations than the perception of one person in isolation.

In collaboration with Salvador Soto-Faraco, professor of cognitive neuroscience at the University Popmpeu Fabra of Barcelona and Timo Stein, professor of psychology at University of Amsterdam, we have set up a behavioral paradigm, based on stimulus masking, to measure the potency of stimuli in gaining access to attention and awareness. In this paradigm, stimuli are presented at threshold, namely, with low visibility, so that sometimes the system fails to process them. The rationale is that, under conditions of low-visibility, stimuli that are privileged by the human visual system are more likely to pass the threshold to be recognized correctly, than other stimuli. In this way, we have demonstrated that: 1) scenes containing interacting persons (i.e. two persons facing one another) are processed more efficiently than scenes in which the same two persons do not seem to interact (i.e. they face away from each other); 2) the processing mode (called configural) that makes the detection and recognition of individual human bodies and faces more efficient than the processing of other objects, is preserved in multiple-person scenarios, but only when persons are perceived as interacting (e.g. facing each other).

These effects are very fast and automatic, which suggests that they reflect a basic mechanism to organize the environment and to select the most relevant information. This mechanism proves sensitive to cues that are frequently associated with the occurrence of social interactions (for example, the relative positioning of two persons, facing or facing away from each other). Thus, this research adds to a growing body of results, suggesting that the human perceptual system is tuned to stimuli with social value, which have maximal relevance for our daily life and survival.

Based on this novel phenomenon, which we called two-body inversion effect, one may argue that even the most crowded environment, like a Bruce Springsteen’s concert or the Italy-France football worldcup final game, does not appear to the human eye as a homogeneous, uniform mass. The relative positioning of individuals in the crowd may be a powerful cue to parse the scene and select the relevant portions, most likely, those in which a social exchange, or event, is just happening.

Papeo L., Stein T., Soto-Faraco S. (2017). The two-body inversion effect. Psychological Science, 28(3), 369-379 » PDF