Barry Salt: Graphs and Numbers

Back in 1974, when I initiated the systematic study of film style using statistics, which I called ‘Statistical Style Analysis’ (as I still do), Sight and Sound rejected an article showing my first results, which of course contained graphs and tables of numbers, although they had just published my piece putting forward a general theoretical framework for film analysis, ‘Let a Hundred Flowers Bloom’. Discussing this quite some time later with Ray Durgnat, I said to him that most people’s minds just freeze up when they see a graph. He riposted that it was worse than that, for most peoples minds freeze up when they see a decimal point. Fortunately, ‘The Statistical Style Analysis of Motion Pictures’ was published later that year by Film Quarterly, because the editor had a bit of a scientific education, before he got into writing about movies.

I was already well aware of the problem that most people have with mathematics, so although I briefly indicated in ‘Statistical Style Analysis of Motion Pictures’ what I thought was the nature of shot length distributions in feature films, I did not go further into the matter until much later, but restricted myself to only the most basic use of statistics when dealing with other areas of film style, because

Nick Redfern seems to be suggesting banning the use of the concept of the Average Shot Length, but he surely can’t be serious. Such an idea seems reminiscent of the Catholic church continuing its ban on the discussion of the idea of the earth going round the sun, even after the concept was in wide use. As I have shown, in ‘The Metrics in Cinemetrics’ and elsewhere, the combination of the

Another concept being put up for discussion is the idea of naming some members of a distribution as ‘outliers’, and then excluding them from consideration. I have already indicated why this is a bad idea in ‘The Metrics in Cinemetrics’. As another example, I will use the shot lengths for The Grapes of Wrath, as recorded by myself and placed on the Cinemetrics database.

Figure 1

That little bar on the end of the graph represents the three shots in the film longer than 100 seconds. They are actually 100 seconds, 104 seconds, and 160 seconds in length respectively. You might think they are far detached from the rest of the shots in the film, which are all shorter than 70 seconds, but actually the theoretical distribution that best approximates the actual distribution of shots predicts

The series of shot lengths making up any film is also unique, and so is the distribution of lengths resulting from them. They make up the whole population with which our statistical analysis of a film deals, and are not a random sample from some larger population. So any test or method which assumes that they are part of a larger population is being misapplied. However, the shape of the distributions

Figure 2

The coincidence of the three graphs shows that they do indeed have almost the same shape, and the small discrepancies correspond to the general way the shape of shot length distributions changes slightly as we come up to more recent times and faster cutting. In the case of the Lognormal distribution, to which most film shot length distributions approximate if the ASL is less than somewhere around 15 seconds, the

I am very pleased to see that Mike Baxter’s detailed paper endorses the results and positions I have put forward in ‘The Metrics of Cinemetrics’ and elsewhere. The one part of his work I have some small doubts about is his analysis of what he calls ‘lumpy’ distributions. Of the twelve distributions he discusses in this context, not all of them look lumpy to me.

I would agree that when looking at the shot length distribution for Pursuit to Algiers directly one can see lumps:-

Figure 3

but it does not look at all like a bimodal distribution to me. I see no second modal peak standing out from amongst the many small lumps. The only one amongst Mike Baxter’s twelve examples quoted that does have a suggestion of a real second maximum peak when we look at the actual distribution is Harvey:

Figure 4

One could take it, perhaps, that there is a second distribution having its mode at 16 seconds, and the cross-over between the two is around 12 seconds, but then how do you tell which shot in the film is in which distribution? As far as I remember the film, the scene dissection was rather clumsy, and in particular the handling of the long takes.

Anyway, kernel density estimates are done by putting the distribution into a very small number of class intervals, so they could be creating something that is not really there on the finer scale of the actual distributions. Look at the shot length distribution of Foreign Correspondent:

Figure 5

That looks fairly smooth to me as shot length distributions go. Maybe the small deficiency in shots of length between seven and eight seconds freaked out Mike Baxter’s KDE calculation for some reason.

Finally, I repeat the knock-down counter example I gave in ‘The Metrics in Cinemetrics’ to Nick Redfern’s comparison of The Scarlet Empress (1935) and The Lights of New York (1928) with my comparison of the distributions for The Lights of New York and The New World (2005). Both these films have median shot lengths of 5.1 seconds, so on this ground alone you might think they have similar distributions, but

Figure 6

Figure 7

The crucial feature is that in The Lights of New York there are a substantial number of shots with length greater than 50 seconds, in fact 12 of them, represented by the tall bar at the right end of the graph, whereas there is only one for The New World. The reason for this substantial number of long takes in The Lights of New York is that is subject to the technical

Barry Salt, 2012