[From: Studying Contemporary American Film: A Guide to Movie Analysis, by Thomas Elsaesser and Warren Buckland, pp. 101-16; bar charts omitted]
3.4. Statistical Style Analysis: Theory
The statistical style analysis of motion pictures is primarily a systematic version of mise en scène criticism – or, more accurately, mise en shot criticism. We have already seen that Eisenstein invented the term mise en shot to focus attention on the way shots are staged – that is, the way the parameters of the shot translate the actions and events into film. The advantage of statistical style analysis over mise en scène/shot criticism is that it offers a more detached, systematic, and explicit mode of analysis. Statistical style analysis characterizes style in a numerical, systematic manner – that is, it analyzes style by measuring and quantifying it. At its simplest, the process of measuring involves counting elements, or variables, that reflect a film’s style, and then performing statistical tests on those variables.
More specifically, there are three standard aims of statistical style analysis: (1) to offer a quantitative analysis of style, usually for the purpose of recognizing patterns, a task now made feasible with the use of computer technology. In language texts, the quantitative analysis of style and pattern recognition is usually conducted in the numerical analysis of the following variables: word length, or syllables per word, sentence length, the distribution
The first aim, the quantitative analysis of style, involves descriptive statistics, and the second and third (authorship attribution and chronology) involve both descriptive and inferential statistics. As its name implies, descriptive statistics simply describes a text as it is, by measuring and quantifying it in terms of its numerical characteristics. The result is a detailed, internal, molecular description of a text’s (or group of texts’) formal variables. Inferential statistics
3.4.1. The quantitative analysis of style
One of the few film scholars to apply statistical style analysis to film is Barry Salt. In his essay ‘Statistical Style Analysis of Motion Pictures’ (Salt 1974), and later in his book Film Style and Technology (Salt 1992), Salt describes the individual style of directors by systematically collecting data on the formal parameters of their films. Salt then represents the quantity and frequency of these formal parameters in bar graphs,
3.4.2. Authorship attribution
Authorship attribution is a long-standing, traditional subject in New Testament scholarship, study of the Classics, literary scholarship as well as in the legal context (for inferring whether the defendant wrote his or her confession, or whether it was ‘co-authored’ with the police, for example). Statistical style analysis has contributed its computerised statistical methods to these areas with controversial results.
One of the principles behind authorship attribution of written texts is that the stylometrist should not focus on a few unusual stylistic traits of a text, but on the frequency of common words an author uses – particularly minor or function words, whose use are independent on the subject matter or context. These include words such as prepositions (of, to, in) as well as synonymous function words such as kind
At first it may seem odd to distinguish writing style by analyzing an author’s consistent use of frequent function words, which he or she is not conscious of using. But as A.Q. Morton argues, these words offer the stylometrist a common point of comparison between authors: ‘A test of authorship is some habit which is shared by all writers and is used by each at a personal rate, enabling his work
A writer’s style can therefore be measured in terms of a constant use of language features, or a combination of features. Just one example, on Raymond Chandler:
Chandler ’s style, like that of any author, consists of the conjunction of its constituent elements … . Much of the action and color in Chandler’s stories is conveyed by dialogue, which comprises, on average, 44% of all the words in a story; for every thousand words of text, there are, on average, approximately 30 verbal exchanges, which last approximately 15 words apiece. For every thousand words of text, Chandler’s
This information identifies Chandler’s style – at least from a quantitative perspective, and can be used as the norm by which to attribute an anonymous story to Chandler.
If we think of the descriptive possibilities of stylometric authorship studies for film analysis, we note that, as with mise en scène criticism, statistics can be used to make auteur criticism more rigorous – that is, detached, systematic, and explicit. The auteur critic should then focus on the frequency of the common stylistic parameters a director uses – whose use are independent on the subject matter or context – rather
The inferential dimension of authorship attribution has a more limited application to film, but some films such as Poltergeist have disputed authorship (was it directed by Tobe Hooper or Steven Spielberg?). By systematically analyzing the parameters of the shots in Poltergeist, and then comparing the results to samples from Hooper’s and Spielberg’s other films, it may be possible to identify the film’s authorship (defined in terms of mise en
On a cautionary note, the variables chosen to determine a director’s style need to be valid (Salt has covered this problem by collecting data on the variables under a director’s control). Secondly, the results need to be statistically significant, rather than due to chance occurrence. Many statistical tests are in fact tests for significance.
3.4.3. Chronology
The third area of statistical style analysis is chronology. Here again the statistics used can be either descriptive or inferential. A description quantifies and measures the changes in a body of work, usually of a single author. The point here is that an author’s work changes in a predictable manner. An inferential study uses these descriptions of change to place an author’s work into chronological order where that chronology is
In film, chronology studies can be used descriptively to identify a change in style across a director’s work. The most obvious example is charting the change of any shot parameter across a director’s career, such as average shot length, distribution of shot scales, use of camera movement, and so on.
3.5. Statistical Style Analysis: Method
In his Film Quarterly essay ‘Statistical Style Analysis of Motion Pictures’ (Salt 1974), Barry Salt aimed to identify the individual style of a director by systematically collecting data on the formal parameters of films, particularly those formal parameters that are most directly under the director’s control, including:
duration of the shot (including the calculation of average shot length, or ASL)
shot scale
camera movement
angle of shot
strength of the cut (measured in terms of the spatio-temporal displacement from one shot to the next).
Salt collected data from these parameters by laboriously going through the film shot by shot. For most of his analyses, he in fact collected data on all the shots that appear in the first 30 minutes of each film, because this is a representative sample from the film. We shall employ (and test the viability of) this practice in our statistical style analysis of The English Patient in section 3.6. Salt
After analyzing a sample of films from four directors, Salt finds that both shot scale and ASL are significant and defining characteristics of a director’s style. (Calculating the ASL involves dividing the duration of the film by the number of shots.) However, the distribution of shot scale is similar for the four directors he analyses.
In a statistical style analysis of Max Ophuls’ films (Salt 1992, Chapter 22), Salt uses a standard stylometric tests to analyze the distribution of stylistic parameters in each film. Firstly, the histograms, or bar charts, representing the number of each shot type in each film (the number of close-ups, long shots, etc.). Secondly, he takes equal lengths of film, calculates the expected number of shots and shot types in each
1. Salt recommends intervals of one minute (i.e. 100ft intervals on 35mm film);
2. If calculating shot types one can define the intervals in terms of no. of shots (e.g. 50) and calculate the expected no. of shot types, and the actual no. of shot types;
3. Take the ASL of the whole film, and then analyze it scene by scene (each scene is defined in terms of spatio-temporal unity and in terms of events). Work out the expected no. of shots and shot types for each scene, and count the actual no. of shots. If the ASL is 10 seconds, and the scene lasts 2 minutes, the expected number of shots for that scene is 12.
In his analysis of Letter From an Unknown Woman, Salt notes the following:
For instance, in scene 1 five shots would be expected if the cutting were even throughout every part of the film, but in fact there are only three shots. Contrariwise, in scene no. 5, while only seven shots would be expected, there are actually fourteen. (Salt 1992: 309)
This type of analysis can also be applied to the expected no. of shot types in each scene and the actual no. of shot types. Salt’s analysis of Ophuls’ film Caught shows how this information can be useful in analyzing a film’s style:
Caught is the first Max Ophuls film in which there is a very definite reduction in the amount of variation in Scale of Shot and cutting rate from scene to scene, and this becomes very apparent if a breakdown into 100ft sections is made on a 35mm. print. After the point in the film at which Leonora has married Smith-Ohlrig and been left alone in his mansion, we have for the
Salt is able to determine, not only how the shot lengths and scales are distributed across the whole film, but also how this film compares to Ophuls’ other films (‘Caught is the first Max Ophuls film in which there is a very definite reduction in the amount of variation in Scale of Shot and cutting rate from scene to scene’). Salt develops this historical analysis by considering Ophuls’ later films, and
For example, in La Ronde, with the scene between the Young man and The Chambermaid we get, after the first 11 shots, long strings of up to 10 shots each with the same camera distance in every shot. Most of these are also in the Medium or medium Long Shot scale, and the film continues in the same manner after this scene. At one point there is a string of 15 consecutive
In summary, statistical style analysis is a very precise and accurate tool for determining both the stability and the change in style that takes place across a filmmaker’s career. Statistical style analysis focuses the research on how films are put together, rather than how they are perceived or comprehended.
Barry Salt carried out his statistical analysis by hand, which limited the types of tests he could perform on the data he collected. With the exponential growth in computer technology and software over the last decade, statistical style analysis can now be carried out using computer technology and powerful software programs. In the following analysis of The English Patient, data was still collected by hand, but it was then entered
The following analysis of The English Patient will consist of both the visual and numerical representation of data (particularly bar graphs, and frequency and percent tables). Then a few simple statistical tests will be applied: measure of the mean or average shot length; measure of the standard deviation of shot length; and the skewness of the values for shot length and shot scale. (The results will also be compared to
These tests properly apply only to ratio data (where zero is an absolute value – zero weight, zero time, etc.). Only shot length is, strictly speaking, ratio data. In the shot scale, numbers have been assigned to the categories, which means they constitute a nominal scale (e.g., Very Long Shot is 7, but there is not reason why it couldn’t be 1). However, by using the nominal scale consistently (1 =
Other stylistic issues that can be raised (but won’t be for this exercise) is to enter the number of scenes in the SPSS program, and then calculate the average number of shots per scene, and therefore calculate the expected number of shots per scene, and the actual number. Other useful data can be collected on: positional reference (for example, what position do close ups typically take in a film? – the
3.6 Statistical Style Analysis: The English Patient
Data was recorded from the following five parameters of the shot over the first 30 minutes of The English Patient: shot length, shot scale, camera movement, camera direction, and camera angle. For comparative purposes, the same data were recorded from the first 30 minutes of Jurassic Park. Barry Salt has already argued that 30 minutes is a representative sample to analyze. To test this hypothesis, we shall compare the results
The statistical tests applied in this section to the collected data are the simplest ones available on SPSS: calculating the frequency of variables (that is, counting them), representing those frequencies as percentages, calculating the mean, the standard deviation, and the skewness of the results.
The first 30 minutes of The English Patient (up to the moment where Caravaggio introduces himself to Hana, and they go into the kitchen of the monastery) consists of 356 shots. In terms of shot length, the main values are to be found in Table 1.
The first column indicates shot length values (1 second, 2 seconds, and so on); the second column the number of times this shot length appears in the first 30 minutes of The English Patient (1 second shots appear 41 times, 2 second shots 84 times); and the third column indicates the percentage of shots with each value (1 second shots constitute 11.5 % of all the shots in the sample, while
Table 1 only represents shots of length 1 to 10 seconds. There are additional values, up to 129 seconds (the opening credit sequence shot), but the frequency of shot lengths above 10 seconds is usually very small – one or two examples. Shots of length 1 to 10 seconds constitute 92% of all the shots in the sample.
Table 2 shows that the mean (the average) value of shot length of this sample is 5.1. In other words, the average shot length (ASL) of the film is 5 seconds (there is, on average, a cut every five seconds). The standard deviation of shot length is 8, indicating a wide dispersion of values around the mean, while the skewness of values is 10.97, indicating a very strong postive skewedness of
Loading...
The value of this information may not be readily apparent. One of the best ways to make sense of it is to conduct a comparative analysis. The first 30 minutes of Jurassic Park (up to the end of the scene where Grant, Sattler, Malcolm, and Gennaro see a dinosaur egg hatch in the lab) consists of 252 shots, in comparison to The English Patient’s 356, a difference of 104 shots. This
We can make many other comparisons. Jurassic Park’s values for shot length can be found in Tables 3 and 4. The shot lengths in the range 1 to 10 seconds only constitute 80% of all the shots in the sample, suggesting that Spielberg’s film has a wider variety of shot lengths. This is reflected in a skewness value of 2.68 (the mean value is 7 seconds and standard deviation is 6.69).
Loading...
We can explore this difference in shot length values further. In The English Patient, 52% of the shots fall in the range 1 to 3 seconds. In Jurassic Park, only 35% of the shots fall within this range. We have to include the values up to 5 seconds before Jurassic Park reaches the same percentage (in fact shots falling in the range 1 to 5 seconds constitute 54% of the film’s
With the above tests we are simply scratching the surface of what can be achieved with statistical style analysis. It is also possible to apply the same tests to the results obtained from the other four parameters of the shot. But because this would make the chapter even longer than it already is, we shall instead consider camera movement and shot scale. With the data collected on camera movement, we can
Loading...
The still camera is by far the most common value (85% of all shots), with only 15% of the shots containing camera movement. This seems to confirm John Seale’s claim that he likes to keep the camera still.
In comparison, Jurassic Park contains the following values for camera movement:
Loading...
These results may surprise some readers, especially the high percentage of still shots in an action blockbuster. But the percentages are significantly different to The English Patient, since Jurassic Park has 11% more moving shots than The English Patient.
Finally, in terms of shot scale, the distribution in both films confirms to what statisticians call a ‘normal distribution’, with high values in the middle (the mean) and progressively lower values on either side (see Figure 3). The result of these normal distributions is that the standard deviation and skewness values are low. Both directors favour medium close ups (28% in Jurassic Park, and 33% in The English Patient) and
Loading...
In summary, The English Patient contains a short range of shot lengths averaging out at 5 seconds, heavily biased towards shots of 1-3 seconds, with a very high percentage of still shots. Jurassic Park has a much wider distribution of shot lengths, which average out at 7 seconds, with a bias (but not as much as in The English Patient) towards shots below this value, with a slightly more percentage of
One final task needs to be carried out to check the viability of the above results – the representative nature of the first 30 minutes of a film. Here we shall simply note major similarities and differences between a statistical style analysis of the first 30 minutes of Jurassic Park, and an analysis of the whole film. (When two figures are quoted, the first one always refers to the 30 minute
The information that the SPSS software has yielded is simply the raw material for writing about the style of The English Patient, and for comparing its style to the style of other films. The above analysis only presents a small sample of data and even fewer tests on the stylistic patterns to be found in the film. The primary difference between this analysis and more conventional mise en scène analysis
References
Farringdon, Jill (1996), Analysing for Authorship: A Guide to the Cusum Technique (Cardiff: University of Wales Press).
Foster, Don (2001), Author Unknown: On the Trail of Anonymous (London: Macmillan).
Kenny, Anthony (1982), The Computation of Style (Oxford: Pergamon Press).
Salt, Barry (1974), ‘The Statistical Style Analysis of Motion Pictures’ Film Quarterly, 28, 1: 13-22.
____ (1992), Film Style and Technology: History and Analysis (London: Starword).
Sigelman, Lee, and William Jacoby (1996), ‘The Not-So-Simple Art of Imitation: Pastiche, Literary Style, and Raymond Chandler’, Computers and the Humanities 30, 1: 11-28.