Recently, Yuri and I have been focusing our Cinemetrics efforts on two fronts.
The first, and more daunting issue, is that the data used on the website is of varying quality. That is, it is not necessarily completely precise, accurate, or complete. We seek to define what these three terms mean, in the context of the goal of the project and analysis of cutting rates. Unfortunately, this work has been moving very slowly, since it is not a trivial task.
The second issue we've been addressing is the exploration of new analysis techniques. Specifically, I have been working on ways to overcome a limitation of other analysis techniques used on the Cinemetrics data. Previous researchers have focused on one number when making comparisons between films or groups of films: the average shot length (ASL). This is not an unreasonable statistic to use for comparing two films; it provides a broad
Other researchers--Barry Salt is a prominent example--have attempted to describe the relative proportions of different length shots within each film. That is, they have attempted to describe distribution of shot lengths by comparing histograms of shot lengths for various films to histograms produced by distributions, such as the lognormal distribution. Although this technique may have use, it still is summarizing data over the whole film. That is, it does
Analyzing films by looking at changes in shot lengths across their lengths has long been a goal of the Cinemetrics project. To this end, Yuri and Gunars developed their inverted shot length versus shot number graph. This graph has many useful properties, and the use of an inverted y-axis was innovative. They continued to develop this technique by introducing features such as a moving average graph, to smooth noise, and
There are two primary issues with the shot length versus shot number graph. First is that it is not necessarily intuitive. Because the x-axis is shot number, not time, looking at the middle of the graph does not necessarily mean looking at the middle of the film. In fact, suppose we had a film of 100 shots. The 99 shots occur within the first minute, and the 100th shot is 99
The ideal solution is to somehow plot shot length versus time. This poses several problems. First: the Cinemetrics data is composed of shot number-timecode pairs. Thus we can calculate shot length and time code for each shot. Simply plotting these data points using a scatter plot is a potential way of visualizing this information. However, this still underemphasizes long shots. In our hypothetical example, we would also have no data
One potential solution I have devised is to generate a shot length versus fraction of film dataset. By partitioning each film into the same number of equal-length segments we can calculate the average shot length for each segment, regardless of whether or not there was a cut in that segment. For example, we can divide our hypothetical 100 minute film into 100 equal partitions. For each segment, we can calculate
Suppose we have a second film of a different length. We can use the same partition, count shots, then calculate ASL method for each partition in the new film. As long as we use the same number of partitions, the values between the two films are directly comparable. That is, if we use 100 partitions, the 57th partition for each film always represents the time period 57% of the length through
We do have a few reservations regarding this method. First, if we are comparing two films of dramatically different lengths, say 10 minutes and 100 minutes, then the shot length increasing significantly from one partition to the next means very different things in the context of each film. In the shorter film, it means that change occurred over 6 seconds; in the long film it means that change occurred over
Using this technique, we have generated ASL by partition data for every film in the Cinemetrics database. We are experimenting with comparisons, both between films and between groups of films. In fact, using this technique we can even generate a "curve" that represents shot length versus fraction of film complete for the average film. That is, given a group of films (or the whole database), we can average the ASL
We have our own reservations regarding this technique and would appreciate any feedback. We believe, however, that this will open Cinemetrics' data to new analysis techniques, particularly those that rely on having sequences of data measured at uniform time intervals--ie time series and other advanced techniques. Input from statisticians on the validity of this method would be particularly helpful.