:: perception, attention, events, and the structure of Hollywood film::

Over the past dozen years, my students and I have been interested in popular movies -- their structure, their physical attributes, their narratives, and their relation to events -- and we now have data on a sample of 210 Hollywood movies released from 1915 to 2015, ten movies each year released at five year intervals. Below, the newer results are listed first; successively older work is described as you work your way down. Most of this has been summarized in my book, Movies on our minds (2021, Oxford Press).
I became increasingly interested in the structure of a kind of movie sequence, I call syntagma (generalizing from, but different than, those of Christian Metz). Harvesting data from our previous work I have discovered that syntagma are composed of scene-like units (we call them subscenes) that are covered with a unified sound stream (typically nondiegetic music), and shorter and flatter shot duration patterns. Movie viewers do not segment sequences nearly as frequently as they segment scenes, despited the fact the narrative shifts across subscenes (changes in location, characters, or time) have roughly the same distribution as the shifts between scenes. Syntagma also do not occur uniformly through movies, but a concentrated at the beginning of films (during what I call the prolog), during the second large section (the complication), and during the climax. [Cutting, Attention, Perception, & Psychophysics, 2019]
We revisited our results on temporal fractals in movies, creating not only strings of shot durations but also corresponding strings of shot motion, shot scale, shot clutter, shot luminance, and sound amplitude -- and scene duration strings as well. We replicated and extended our original results (see Cutting, DeLong, & Nothelfer, 2010, below). That is, we found that in movies from 1915 to 2015 increasing approximation to a fractal pattern. But more strikingly we found similar trends for scene durations, motion, and sound amplitude as well, although not for luminance, clutter, and shot scale. Many human rhythms -- resting heart beats, breaths, and footsteps -- are also fractal and many discussions of film editing suggest parallels to these. It would appear that for several dimensions of movies this is not a metaphor. [Cutting, DeLong, & Brunick, Cognitive Research: Principles and Implications, 2018].
Heartened by earlier results on temporal fractals I asked my colleague Karen Pearlman if she would allow me to analyze the final version of her film, Woman With an Editing Bench (1916) and some previous "drafts" to see if, in the process of finalizing the structure of the film, some of the measures discussed above, approached fractality (that is, in 1/f^α having an exponent α of 1.0). Pearlman had written extensively on bodily "rhythms" in film and since, as noted above, these tend to be fractal, it seemed possible. And indeed that's what we found. In the "Assembly" version of the film (when the rough shots are put together in their initially intended order) none of the dimensions were near fractal. In the "Fine Cut" version (when the film is ready to have some outsiders watch it and make comments) two of the dimensions are near fractal. And in the release film, three of the dimensions are near fractal. Moreover, one dimensions -- chroma (the interleaving of color segments representing the narrative and the black-and-white segments representing the protagonists thoughts) was the most important dimension controlled by Pearlman. [Cutting & Pearlman, Projections: The Journal for Movies and Mind, 2019].
We also have wanted to trace the development of Hollywood style as it evolved from the silent era to the present day. Using data from Barry Salt, Tim Smith, and our own studies we categorize shot scale in the left panel below, then map out the mean shot scale of movies from 1913 to 2015, and then compare those with the gaze congruency of viewers looking at clips of different shot scale. Notice that mean shot scale has converged on something near a medium closeup, which is also the scale showing the most gaze congruence across viewers. Thus, filmmakers, whether they know it or not, have optimized the visual stimulus in maximum control of viewer gaze. [Cutting & Armstrong, Cognitive Science, 2018].
In addition, we length-normalized each movie (between 60 min and 147 min in duration) to 100 bins and then measured various characteristics within those bins. What is generated is something of a roller-coaster profile. Below is shown the changes in duration-duration patterns for three groups of 60 movies -- those from 1930 to 1955, from 1960 to 1985, and from 1990 to 2015. Note the changes across periods.
The heavy red line is the best fitting polynomial, the white areas in the right two panels are the 95% confidence limit on that fit, and the pinkish areas are the 95% confidence interval on the data given the polynomial. Note also that there is no reliable pattern for the earliest group of films. [Cutting, Cognitive Research: Principles & Implications, 2016]
What is the average movie like? We length-normalized 23 movies released between 1940 and 2010, considered again 100 equal-length sections (or 20 in the case of the upper right panel), and then averaged across the movies to obtain a profile of the variation a number of parameters -- again, shot duration (the inverse of transitions), but also dissolves and other noncut, motion, brightness music, and shot scale of the "average" popular movies. The data are noisy but results show clear trends in each dimension across the length of the narrative. [Cutting, Psychonomic Bulletin & Review, 2016]
The most ubiquitous measure of movies has been mean shot duration (typically called average shot length, or ASL). Somewhat mysteriously, the mean shot duration has declined, essentially linearly (on a log scale) since the beginning of the sound era. Interestingly, a wholly separate function occurred for silent era movies. These data are gathered mostly from Barry Salt and the plot above includes more than 9300 English-language films. The decline for foreign language films (not shown) is about 40% as great, but is still a linear decline.
For 24 films we categorized every shot into 15 types and determined their mean frequency in those movies (right panel) and their normalized mean duration (left panel). Unsurprisingly, the various kinds of shot/reverse shot combinations (bracketed above) take up about 50% of all shots in movies. We also wondered if the overall decline is shot duration is uniform across all 15 categories of shots, and we found that it is. [Cutting & Candan, Projections, 2015].
It is well known that older films tend to have longer-duration shots. Why? One reason, although not the most important, is that older films tend to clutter the image with more people to look at. Once they've placed more characters in view, filmmakers must allow for viewers to look around a bit more at them. Looking at 16,000 images from 48 films I determined that each additional character seems to require on average an additional shot duration of 1.5 sec, as shown in the clouds of data points above and to the left. It is also the case that shorter duration shots also tend to be shorter in shot scale -- that is, more likely to be closeups as shown in the clouds of data on the right -- and it is difficult to have closeups with more people in view. A mediation analysis showed that the number of characters in view (which must be placed on the set before film is shot) affects shot scale (which must to modified to include all the characters) affects shot duration. Since films before 1955 averaged 2.5 characters in every shot, and those since 1990 averaged only 1.5 per shot, the more recent films can have shorter shot durations. [Cutting, 2015, Art & Perception].
Movie scenes typically shift one to the next when at least one of three variables change -- a change in location as shown for Erin Brockovich (2000) as Erin (Julia Roberts) and Ed Masry (Albert Finney) have an argument that starts in the courtyard in front of the law office and then continues inside the office; a change in characters as shown for Valentine's Day (2010) where in a single shot the focus passes from Liz (Anne Hathaway) in the middle of telephone sex to Morley (Jessica Alba) about to check in to the hotel; or a change in time as in Five Easy Pieces (1970) where Robert Dupea (Jack Nicholson) falls asleep on the couch in front of a fire and is about to be wakened the next morning in the same place. A change in a single dimension -- location, characters, or time-- is relatively rare. The most common scene shifts occur when location and character changes occur together, or when all three dimensions change at the same time. Of course, these three dimensions can appear in any combination and an exploration of them is considered in [Cutting, 2014, Acta Psychologica].
Scenes are traditionally set up using establishing shots. These include more environmental background (are longer in shot scale) and show the arrangement of the characters within in. We've shown that these shots are also longer in duration than the average shot in a film. We wondered what filmmakers do when they return to a location that they've shown before; what is a re-establishing shot like (which we defined as the return to a previously seen location, time, or character)? We analyzed all of the scenes and subscenes in 24 films for their locations and characters with an eye to whether the locations shifted, the characters changes, or the time changed since the last scene or subscene. Re-establishing shots are shorter in both shot scale and shot duration than establishing shots, but they are also longer in both scale and durations than the mean of the shots that follow them. The former effects are shown in the stills above. Those on the left are from two pizza delivery sequences in Home Alone (1990) that involve a sight gag of the car running over a driveway ornament. For the first arrival of the car (~7 min into the film) the scene is shown in an extreme long shot; for the second arrival (~48 min) the car is shown in a medium long shot. The stills on the right are taken from The Social Network (2010). That on top shows the new Facebook headquarters in an extreme long shot (~98 min) with Eduardo Saverin (Andrew Garfield) being briefed on his stock holdings in the company off camera to the left. After a flashforward to a hearing room, the film returns to the headquarters (~100 min) with Saverin confronting Mark Zuckerberg (Jesse Eisenberg) in a medium shot. These differences in shot scale and duration occur mostly for location shifts and not for shifts of characters or time. Compellingly, these results are also well-fit by a model of discourse processing from the psychological literature -- the event-indexing model. [Cutting, & Iricinschi, 2014, Cognitive Science,].
Popular movies are spatiotemporal arrays of light and motion, but we have known almost nothing about their distributions across whole films. We've known that the center of the screen is most important, but are light and motion most prominent there?Yes on both counts. Composite figures shown above represent grayscale overlays of every frame in two films. The top panels represent the mean luminances with three alterations. The strip at the right of the arrows is reversed top-to-bottom to show a vertical luminance gradient; the squares and the vertical rectangles are swapped to demonstrate a gradient from the center outward. The lower panels show frame overlays of the same films, but this time these are estimates of the distributions of motion, where brightness indicates more motion in the image. I also found that American films noir are not actually darker than other films of their era, but their surrounds are much darker, and that animated films, which have more motion than any other movie genre, have it concentrated in the center of the screen. In sum, content aside, filmmakers have crafted a space for our attention and given us good reasons to look at the center of the screen. [Cutting, 2014, Psychology of Aesthetics, Creativity, and the Arts].
We've also become interested in the narratives of films. Incompletely satisfied with the notion of narrative space as used in film studies and in studies of literature we wished to create an objective, holistic, dynamic representations of the visual narratives of films. Using the scene-parsing data above we've gone through the same 24 films creating a whole-film matrix for each film of scenes and the characters who appear in those scenes. To be concrete, if a given character appears in a given scene, a "1' is entered in that cell; if a character is not in that scene, that cell gets a "0." A co-occurrence measure (Cohen's kappa) of all major characters across all scenes in the film was calculated, and these values used as inputs into a nonmetric multidimensional scaling program to create a base map of the film. The film was then divided into acts (see Thompson, 1999), new co-occurrence measures for all characters calculated for each act, a new scaling solution calculated, and then fit to the base map through Procrustes analysis. The layouts of the acts were then compared and the trajectories of the characters across the space noted, as in the figure for All About Eve(1950) above. Notice that Margo (Bette Davis), the established NYC actress beginning to realize that she is no longer appropriate for ingenue parts; and Eve (Anne Baxter), the aspiring and unscrupulous newcomer. Although these measures are based on the visual narrative alone (not the dialog) they seem to match well with character development in each film. Interestingly, is some less succesful films, however, they do not. [Cutting, Iricinschi, & Brunick, 2013, Projections]
Film tradition divides movies into units: frames, shots, scenes, and acts. In our research described above we have investigated psychological ramifications of frame differences, shot differences, and differences within and between acts -- although we have by no means exhausted what we plan to do in these domains. Our current research venue, however, is scenes. Although scenes seem relatively easy to define they are much more difficult to discern with certainty in the ongoing stream of most films, and viewers agree about 90% of the time on their boundaries. We are merging the psychological field of event perception with that of scene analysis to discover the psychological structure of scenes, or more correctly subscenes. Our results suggest that event segmentation can take place on the basis of fairly low-level visual information -- shot transition type, shot durations, motion, luminance, and color. The latter is shown in the figure above for Inception, 2010. Every frame of the movie is represented by a vertical raster with color represented from reds at the bottom to blues at the top (independent of luminance). One can see the scene changes, particularly between dreams levels. Indeed, as much as 50% of the variance in viewer segmentations of 24 films released from 1940 to 2010 can be accounted for by all such physical information. This result contrasts with the emphasis in the psychological literature on events. That literature suggests that event segmentation is done largely on the basis of understanding actors intents (a kind of theory-of-mind analysis). [Cutting, Brunick, & Candan, 2012, JEP:HPP]
Most people recognize that contemporary films have shorter shot durations than those of earlier eras. How else have popular films changed? Several changes are plotted in the figure above, where every dot represents one of 160 films. In addition to shot durations (fig a above) and motion (fig b, where visual activity is the inverse of across-frame pixel correlation) consider two more. First, there has been an increasing negative correlation between shot durations and the amount of motion in a shot (fig c above). That is, in the studio era (here 1935 to 1960) there was no correlation between the two. However, more recently, shorter shots have come to have proportinately more motion in them. This seems relatively obvious for Hollywood-style action movies (action scenes are filled with short shots and chaotic motion). Second, popular films have gotten darker (fig d). This is partly due to improvements in film stock and digital editing, but it is yet another way in which filmmakers control the attention of viewers. [Cutting, Brunick, DeLong, Iricinschi, & Candan, 2011, i-Perception]
The shot duration series data for our films can be found on the cinemetrics website. Some of the other statistics for the films can be found here and here.
Most films contain many shots knit together by several types of transitions, and by far the most prevalent is the cut. Over the last 75 years fades and wipes have become increasingly rare. Dissolves have also diminished in frequency but, unlike the others, they remain an important part of the general visual narrative and have shown a small increase in contemporary film. We tracked the usage of dissolves in our 150 films. We found (1) that after a lull between 1970 and 1990, dissolves have become more numerous, although not nearly so common as during the studio era; (2) that shots surrounding single dissolves are fairly long compared to the median shot durations of a given film, suggesting visual preparation for scene change before a dissolve; and (3) that after their nadir dissolves have increasingly reappeared in clusters reflecting a rebirth of the Hollywood montage. In terms of the functions and meanings of these montage sequences in the stream of a films narrative, we found that contemporary films focused more on setups, altered mental states, and celebrations. Older films focused on these, but also on travel and time gaps of various sizes. [Cutting, Brunick, & DeLong, 2011, Empirical Studies in the Arts]
In her book Storytelling in the New Hollywood (1999) Kristin Thompson suggested that most films are composed of four acts of generally equal-length the setup, the complicating action, the development, and the climax (often including an epilog). Exploring our sample of 150 Hollywood films, we found that acts shape transitions. That is, aside from the very beginnings and ends of films, we found that fades, dissolves, and other non-cut transitions are more common in the third and less common in the fourth quarters of films. In this manner shots and transitions within acts provide the viewer with pacing information about the narrative [Cutting, Brunick, & DeLong, 2011, Projections: The Journal for Movies and Mind] and [Cutting, Brunick, & DeLong, 2012, Projections: The Journal for Movies and Mind]
In his book The Way Hollywood Tells It (2006) David Bordwell noted three other changes in more recent films: (a) the use of a wider range of lens lengths, (b) the increased use of close-ups, and (c) shots from increasingly mobile cameras. In particular, close-ups systematically generate greater visual change in the framed image, and smaller and more mobile cameras create more movement across the entire image. We dub the combination of actor/object motion with camera movement as visual activity. To analyze this in the same 150 films we correlated near successive frames with one another along the length of each and found increasingly amounts of activity in films from 1935 to 2005. As with ASL, however, visual activity is not constant within a film, but fluctuates over time. We explored the relationship between visual activity and its duration in the three visually intense films below, and proposed a limit to how much activity can be sustained in films while keeping most of the audience entertained. Moreover, we suggest that ASL and visual activity are two dimensions of increased intensity in continuity editing. [Cutting, DeLong, & Brunick, 2011, Psychology of Aesthetics, Creativity, and the Arts]

Bordwell also documented four changes in popular film since about 1960. These concern the structure and nature of shots, which are designed to capture the viewer's attention and control eye movements. The first change is the progression towards shorter average shot lengths (ASLs). Shorter shots help rivet attention to the narrative and heighten the emotional response of viewers. Diminishing ASLs, however, are not the only change in popular film that concerns shots. Using time-series and power analyses we have analyzed patterns of shot lengths in 150 Hollywood films from 1935 to 2005. Our results revealed multiscale shot fluctuations - differential waves of shorter and longer shots progressing along the entire length of a film. Importantly, these rhythms have begun to match the waves of attention measured under laboratory conditions. We suggest that contemporary films are gradually developing shot patterns that mimic the attention patterns endogenous to our minds. Like generally shorter shot lengths, this pattern of shot editing may also serve to make films more engrossing. [Cutting, DeLong, & Nothelfer, 2010, Psychological Science, see also Cutting, DeLong, & Brunick, Cognitive Research: Principles & Implications, 2018]
