DISCUSSION TOPIC
"INTRO TO STATISTICS"

BACK TO THE DISCUSSION BOARD
Posted by: Nick Redfern Date: 2009-07-29

I've just come across an interesting and useful paper intended as an introduction to statistics for medical researchers, but which is clear enough for anyone to understand. It provides some basics on the difference between samples and popualtions, statistics and parameters, and estimation. This is a good place to start if you've never tried to do statistics before, and it has a list of useful references (check out the references to Altman, Bland, and Moses). It is, of course, free to access.

Ref: Douglas Curran-Everett, Sue Taylor, and Karen Kafada, Fundamental concepts in statistics: elucidation and illustration, Journal of Applied Physiology 85 (3) 1998: 775-786. Available online: http://jap.physiology.org/cgi/content/full/85/3/775. [If the link doe not work then you can access the paper by searching for it through Pubmed].

Replied by:Adelheid Heftberger Date:2009-07-29

Thanks a lot - downloaded it already, the link works fine. Glad you are here in Cinemetrics now.

 

Replied by:Yuri Tsivian Date:2009-08-02

Thanks, Nick. The link worked for me, too. Can't say I got all the details, but the basic message is clear. We need statistics and we must be wary of not misusing it. The encouraging thing is that, as it turns out, not only we on Cinemetrics but also physiogists and biologists are said to be overly cavallier about statistics and it tool. Poor consolation, still, consolation. Another good news is that there is always a hope of help -- as Nick Redfern helped Heidi and me to find out way out of "Heftberger correlation" conundrum. Nice to know that we who collect data are monitored by someone who knows how to get sense of them, see for instance Redfern's recent review of Bordwell/Thompson vs. Salt controversy. Again: I read the essay that Nick Redfern reccomends in his post above, and let me quote one of its final passages: "This review was written by a physiologist and two statisticians embodies one of the most basic notions in all science: collaboration." Collaboration is what we count on, too.

Replied by:Barry Salt Date:2009-08-08

Nick Redfern has helpfully checked 40 shot length records from the Cinemetrics database in his piece

 "Some brief notes on cinemetrics II" on his website for how well they conform to the Lognormal distribution, and

he has found that 22 of them pass the strict test he is using. He had already done three Chaplin films in his piece

"Testing normality in cinemetrics" in the same place, and altogether that gives 24 out of 43 films that conform to

the Lognormal distribution. That proportion is rather similar to my results in my article "The Numbers Speak",

which is on the Cinemetrics site. So we can say that it is probable that more than half of all films, with an ASL

less than 20 seconds, conform to the Lognormal distribution. It is worth noting that of the remainder, a substantial

proportion just miss out on being Lognormal according to both Nick's and my analyses.

This is valuable information, as it leads to questions about why this distribution appears for shot lengths, and I

note in general terms what might be the explanation in "The Numbers Speak". This is an area that needs more

investigation.

To increase knowledge about films, we want causal explanations for the features of shot length distributions, and

you are not going to get them by using non-parametric statistics, and throwing away the information above about

their Lognormality.

To give another example, decades ago in "Film Style and Technology" (page 219), I identified a major stylistic

shift in the work of Fritz Lang when he went to Hollywood, just by looking at the Closeness of Shot histograms

shown in that piece, where the change is obvious in a way that is not the case when looking at a simple table of

the figures. That is what graphical representations like mine and those of Cinemetrics are for. (Nick Redfern has

recently confirmed this change in Lang's work using a statistical test in his "Power functions and the mean

relative frequency of shot scales".) In this case possible causal explanations are obvious, and hardly need

stating. That is, Fritz Lang looked at Hollywood films when he went there, and knew he had to conform, up to a

point, if he was to continue working there. Plus there would be some pressure from Hollywood cameramen to put

the camera at a distance that they would ordinarily use in any particular set-up.

In other cases, such as the change in shape of the distributions from Boetticher's "Seven Men From Now" (1956)

to his "Ride Lonesome" (1959) on the Cinemetrics database, as indicated by the massive change in their

Standard Deviations, and briefly discussed by Yuri and myself, more investigation is needed.

The films that seriously fail to conform to the Lognormal distribution must do so for a reason. That is, the makers

must have made at least a semi-conscious effort to break away from the norm of the Lognormal distribution, so

identifying them is important.

The Lognormal distribution, as you can read in "The Numbers Speak", has its shape ordinarily defined by two

parameters, the median and the shape factor. So the median is a good measure to have in this case, as in

others. However, in the case of the Lognormal distribution, these two factors can be derived from the mean (that

is, the ASL) and the standard deviation by inverting the two relations:

Mean = Median exp(1/2((shape factor)^2))

Standard deviation = Median (((exp((shape factor)^2))^2 - (exp((shape factor)^2)))^ 1/2

(Sorry about that barbarous notation, but you can't write the equations down neatly in ANSI code. If you want to

get further into this, just google "Lognormal distribution")


In other words, all you need in theory is the standard deviation and the ASL.

So for Lognormal distributions the median IS related to the ASL, and hence the ASL is useful, too. In any case,

the ASL has been adopted by many other people as a standard measure for film statistics since I invented it 30

years ago. This is partly because it is easy to get. You just have to know how many shots there are in a film, and

the film's length, to work it out. That is how I come to have a database of over 9,000 ASLs from complete films,


I consciously chose to call it the Average Shot Length, rather than the more correct Mean Shot Length, because I

reckoned that a smaller number of the many rather innumerate people in film studies would be put off by the

former name.

You can only get the median by listing all the shot lengths in a film, as in Cinemetrics.
 

Replied by:Yuri Tsivian Date:2009-08-09

Some footnotes to the above posts. Here is where the essay The Numbers Speak by Barry Salt which Barry refers to is found. If I understand it correctly, Barry and Nick Redfern mostly agree about whether or not this or that film fits with the log normal distribution, but differ on what kind of statistics tests is best applicable to cinemetrics data. Barry opts for parametric statistics while Nick in his essay "The distribution of shot lengths in the films of Charles Chaplin" comes to this conclusion:

"Parametric statistical tests assume that sample data is drawn from an underlying distribution. Shot length data for motion pictures is typically not normally distributed, although in some cases it may be log normally distributed. This is not the case for all films (even though the data is positively skewed), and so the assumption of a log normal distribution is not universally valid. Taking into account the variability of shot length distributions, it is recommended that nonparametric tests that make no assumptions about the distribution of data are appropriate in analysing film style."

It will take the rest of us some learning time and effort to really understand what is at stake here, though with Barry's last post (and with an Intro to statistics books in hand) I am beginning to get it. Let me ask a few lay questions. We all more or less agree that ASL (cutting rate) and Standard Deviation (cutting swing) are two variables that are relevant to the cutting style of this or that film. The good news is, the Cinemetrics measurements as Gunars Civjans designed it provide us with both. Another good new is, as Barry Salt says above, that both can be used as interrelated parameters from which the median and the shape factor can be automatically derived -- the two things needed to arrive at the log normal graph. Am I still on track, Barry, Nick?

If so, a question: does it mean that, the two formulae in hand, Gunars can write a program that will, if needed, instantly convert any submission into a graph showing how well if at all this film conforms to the log normal distribution? Will it be of much help to have this capability, say, in the Labs section of the site?

More generally, are there other ways to interconnect and graphically represent the cutting rate (ASL) and cutting swing (StDev) of a film? There are films in which their values are equal or close, and there are films and filmmakers that show a tendency of the cutting swing being lower or higher than the corresponding cutting rate. Ozu is usually on the lower side, see Matt Hauske's interpretation of this, while directors like Welles, Ophuels and Godard are typically on the high end of the cutting swing, sometimes much higher than their corresponding ASL. In my forthcoming post (my talk at the Cognitivist conference in Copenhagen) I will address this issue from the point of view of the history of film style, but I suspect there may be, aside from stylistic choices, some kind of mathematical interdependency between the mean and standard deviation, any books to look it up?

Replied by:Barry Salt Date:2009-08-09

No, there is no general connection between the mean and the standard deviation for statistical distributions in general. That is why we have them both. The idea of a lognormal correlation check is very attractive, but doing it may be a bit complicated.

Replied by:Nick Redfern Date:2009-08-12

It's also useful to know how not to do statistics:

http://www.talkingsquid.net/archives/870 demonstrates how to lie with graphs.

Replied by:Nick Redfern Date:2009-08-13

I've just come across another excellent resource for statistics that is free to access.

OnlineStatistics: An Interactive Multimedia Course of Study has been developed by Rice University and the University of Houston, and provides a comprehensive, clear, and well-thought out introduction to statistics with worked through examples, a glossary explaining staistical terms and some online calculators. You can even download it if you wish.

It can be accessed at: http://onlinestatbook.com/index.html.

Replied by:Barry Salt Date:2009-08-14

As Yuri has noticed, for films in general the ASL and the Standard Deviation (STD) are usually of a rather similar size. For the Lognormal distribution this ratio of (STD)/(ASL) depends solely on the Shape Factor through a slightly complicated relationship which you can look up. As I noted in "The Numbers Speak", the Shape Factor tends to be around 0.8 for shot length distributions. This is the source of the effect under discussion, but not the reason for the existence of the effect. Working out values, I find that a value of the Shape Factor of 0.83 corresponds to a ratio of (STD)/(ASL) of 1. A Shape Factor of 1.0 to a (STD)/(ASL) ratio of 1.3, a Shape Factor of 0.7 to an (STD)/(ASL) ratio of 0.8, and so on.

I think it would be useful and instructive to get the distribution of the (STD)/(ASL) ratio values for the films in the Cinemetrics database, which should be very easy. I predict it will have the Normal distribution shape, but you never know.

Replied by:Nick Redfern Date:2009-08-16

Dividing the standard deviation by mean shot length gives you the coefficient of variation.

 

Based on samples of Hollywood films produced between 1920-1928 (n=20, all silent) and 1929-1931 (n=30, all sound), the coeffecient of variation for the silent films is ~0.9 (95% CI: 0.8, 1.0) and for the sound films is ~1.2 (95% CI: 1.1, 1.3).

 

This would indicate that there is greater variation in the shot lengths of the sound films than the silent films.

Replied by:Barry Salt Date:2009-08-22

For the 181 silent features (1913-1929) in the Cinemetrics database, I get 0.97 for the Coefficient of Variation, and for 1607 sound features in the database, I get Coefficient of Variation = 1.14. This is a difference, but less of a one the Nick Redfern gets. This difference is partly due to the fact that the years 1929-1931 are dominated by Charles O'Brien's data, which is biased towards musicals. These are always slower cut than the average.

More on this in due course.

Replied by:Yuri Tsivian Date:2009-08-24

Another lay question, but then, the Discussion board is the place for those. Even though the results revealed by Redfern's and Salt's analyses do not coincide in fractural values both seem to agree in one respect, namely, that 1 (the whole number) could serve us a workable "rule-of-thumb" indicator of whether the film in question shows a bias towards "uniform cutting strategy" (the Coefficient of Variation smaller than 1) or towards "diverse cutting strategy" (the Coefficient of Variation greater than 1).

This seems to tally with what Salt says above, namely, that the Shape Factor of 0.83 corresponds to a ratio of (STD)/(ASL) of 1 and also to what Redfern says above about the coefficient of variation of Hollywood silent vs. sound films. Isn't this another good reason to posit (STD)=(ASL) (or (STD)/(ASL)=1) as a dividing point in our thinking and talking about film editing? Waht they say also agrees with the langage of crosses (the Greek vs. Latin vs. St George crosses) used earlier on to label the three types of editing at Cinemetrics. We could now jettison these medieval symbols in favor of the more sound ones, like the Shape Factor and Coefficient of Variation proposed by Salt and Redfern.

If this makes sense Gunars might think of introducing another index for each submitted film, called the "Swing Factor," for instance: movie "So-And-So," directed, say, by Ophuels, the Swing Factor = 1.8; movie "This-Or-That" directed, say, by Ozu, the Swing Factor = 0.6.

If this makes a lot of sense statistically, this I don't know, but this dividing point sounds like a good tool for me for assesing the style of cutting this or that film leans toward. Any comments?

Replied by:Barry Salt Date:2009-08-31

As I predicted, the distribution of values of the Coefficient of Variation (standard abbreviation Cv ) for film shot lengths is fairly close to being a Normal distribution, as shown here.

As I said previously, the mean value of Cv for sound features in the Cinemetrics database is 1.14. The correlation between the actual experimental values from the Cinemetrics database and the theoretical Normal distribution given by R2 is 0.97, which is fairly good.

Now, this distribution suggests to me that in general film-makers are UNCONSCIOUSLY working for some sort of standard mix of long and short shots in their scene dissection, but don’t always hit it. However, there are some who want to put in extra long takes beyond the normal mixture of lengths, and they are represented in the vestigial right tail of the graph, which departs from normality. The numbers of these are small, but who they are is important. I list the films with Cv greater than 1.9 in chronological order:

Night Birds
 

The Skin Game

Citizen Kane
Macbeth
The Fall of Berlin, Pt. 2
Touch of Evil
Ride Lonesome
Verboten!
Who’s Afraid of Virginia Woolf?
Week End
Catch 22
Husbands
Paris, Texas
 

Wild at Heart

 

Amore Perros

Children of Men
 

You will notice that after two early sound films, there are none from the rest of the ‘thirties. Although these films are nearly all have long ASLs, Amore Perros (ASL = 4.9) shows that the association of large Cv with large ASL is not necessary. Conversely, long ASLs do not necessarily produce large values of Cv, as shown by Werckmeister Harmonies, with an ASL of 219 seconds, but a Cv of 0.5. That is, a film-maker who is trying to do doing nothing but long takes will not create a film with large Cv, though they will probably create one with small Cv.

By the way, I don’t think it is a good idea to give an invented name to something that already has a technical name, if that is what Yuri Tsivian is suggesting. But then, my pet peeve is with film studies people who use their invented term “Non-diegetic music” for what film-makers already call “Underscore”.

Replied by:Barry Salt Date:2009-08-31

Got that list of films with Cv 2.0 or greater wrong. Sorry, here is the correct one.

The Front Page
Rain
Citizen Kane
Touch of Evil
Lady From Shanghai
Macbeth
Forty Guns
Ride Lonesome
Verboten!
Who’s Afraid of Virginia Woolf?
Week End
En Passion
Tout va bien
Electra Glide in Blue
1900
Paris, Texas
Wild at Heart
Amore Perros
Code Unknown
Yo (Me)

Replied by:Nick Redfern Date:2009-11-12

 

NuMBers is a set of tutorials produced by Anglia Ruskin University to teach statistics to biomedical students. It's well designed, clear, and easy to follow. It explains various aspects of using statistical methods that can easily be adapted for film studies, and talks yout hrough the decision making processes invovled in choosing the appropriate method to use. You can save some of the tutorials as pdf files for easy reference and has a glossary for easy reference. If you're just starting out with statistics and not quite sure what to do, this is a good place to begin.

The tutorials can be accessed here: http://web.anglia.ac.uk/numbers/index.html. It is, of course, free to access.

Replied by:Nick Redfern Date:2009-11-25

From the University of Washington comes this very useful guide to presenting statistical results. The style is taken from APA journals.

http://depts.washington.edu/psywc/handouts/pdf/stats.pdf.

An alternative example of using APA style can be found here: http://twopaces.com/Reporting_Statistics_in_APA_Format.pdf.

The American Journal of Physiology has also provided a set of guidelines for researchers using statistics that is useful:

http://ajpgi.physiology.org/cgi/reprint/287/2/G307.

As ever, it is a question of adapting these examples for use in film studies.

 

Replied by:Nick Redfern Date:2010-02-10

I've just come across a most useful resource for people interested in applying statistical methods in film studies. It is a Wiki based on the book Research Methods and Statistics in Psychology: Success in Your Psychology Degree by Jeremy Miles. As the title suggests, it is aimed at Psychology students but it is clear and straightforward and provides a good overall introduction. It is written for first year undergraduates and so it is of a goos standard without being overly complicated. It has a series of useful tips throught covering research methods and experiment design, statistical analysis, and the presentation of results.

As ever, it is free to access.

The main page of the wiki is here: http://www.researchmethodsinpsychology.com/wiki/index.php?title=Main_Page.

Replied by:Barry Salt Date:2010-04-17

I have asserted earlier in this thread that there is a relatively fixed ratio between the ASL and the Median for movies, and now, with a bit of help from Gunars, I have the proof of this. Considering all the 1520 sound fiction feature films in the Cinemetrics database, I find that the average ratio of their Median to their ASL is 0.620. The values are clustered quite close to this average value, as you can see from their distribution graph here:

fig 1

This distribution has the common Normal (or Gaussian) shape, and its Standard Deviation is 0.124 . This is the reason that you can approximately predict what the Median value for any film will be, just from its ASL. Putting this another way, 82% of films have a Median/ASL coefficient in the range 0.5 to 0.7 .
I had predicted to myself that removing films with an ASL of 15 seconds and higher, which is the region where the fit to the Lognormal distribution begins to break down, would sharpen the above distribution, but in fact doing this makes hardly any change to the shape of the distribution, or to its mean value. However, removing non-American films from the population does sharpen the relation slightly, without changing the Median/ASL ratio much. The 581 American sound films in the Cinemetrics database have a Median/ASL ratio of 0.626, with a standard deviation of 0.109 .
When we turn to all the 186 silent fictional features in the Cinemterics database, the Median/ASL ratio changes appreciably, to a mean value of 0.711, and a standard deviation of 0.082 , as in the graph here:
fig 2

But concentrating just on the 92 American silent fiction features in the Cinemetrics database has very little effect on the result we get. (Median/ASL ratio = 0.714, with a standard deviation of 0.090).

Since the Median/ASL ratio relates directly to the shape factor s for Lognormal distributions, this means that all the sound films have pretty much the same shape factor, and the silent films mostly have a different shape factor. Why there is an appreciable difference between sound and silent films in their shot length distribution shapes is an interesting question.

Replied by:Nick Redfern Date:2010-04-27

The difference between the median/mean ration for two film or two groups of films (e.g. silent films and sound films) can be explained by the presence of outliers in the data and the influence they have on the mean shot length. This can be demonstrated by looking at the shot length distributions for the two versions of Blackmail, The Lights of New York, and Scarlett Empress. (The data for these films can be found in the Cinemetrics database).

Imagine you have two data sets that are identical except for a single value. For example,

A:  1, 2, 3, 4 5, 6, 7, 8, 9, 10

B:  1, 2, 3, 4 5, 6, 7, 8, 9, 20

For data set A, the median is 5.5, the mean is 5.5, and so the median/mean ratio is 5.5/5.5 = 1.0. For data set B, the median is 5.5, the mean is 6.5, and the median/mean ratio is 5.5/6.5 = 0.85. The change in the ratio is due to the influence of a single outlying data point, and does not reflect the fact that the two datasets are otherwise identical.

This is precisely what we see when we look at the two versions of Blackmail. In the table below, we have the mean shot length, the median shot length, and the ratio of the median to the mean.

 

Blackmail (silent)

Blackmail (sound)

Median shot length (s)

5.6

5.1

Mean shot length (s)

8.1

10.4

Median/mean

0.69

0.49

Replied by:Nick Redfern Date:2010-04-27

My last post doesn't seem to have uploaded properly, so this will obviously be very confusing. I'll put the full version on my blog on Thursday as this week's post.

Replied by:Barry Salt Date:2010-09-05

First, a reminder that the Median/ASL ratio is important because it is closely related to the shape of the shot length distribution, insofar as that distribution conforms to the standard Lognormal statistical distribution, because of the mathematical relationship:

Mean = Median exp(1/2((shape factor)^2))

Now, the difference in the Median/ASL ratio between American sound and silent films that Nick Redfern and I have discussed above on this thread proves to be more complicated than I realised.

Closer inspection of the American sound film corpus on the Cinemetrics database shows that the ratio varies a bit with the magnitude of the ASL. To take an extreme case, the mean Median/ASL ratio for the 25 feature films with ASL less than 3 seconds is 0.74, while for the 64 films with ASL less than 4 seconds, it is 0.71, and so on, till we get to the 217 films with ASL less than 7 seconds, by which point the mean ratio of Median to ASL is 0.68.

So what we have here is a correlation between the Median/ASL ratio and the ASL for American sound films. It is not a strong correlation, because when we calculate the correlation coefficient (r) for this relation, for all the American sound films, it comes out at about 0.3.

Turning back to the 84 American silent films in the database, we find that only 17 of them have ASLs greater than 7 seconds, so their Median/ASL ratio of 0.73 may reasonably be compared with that of 0.68 for the group of sound films with ASL less than 7 seconds. However, the remaining difference between silent films and the group of faster cut sound films remains to be addressed.

Up to this point, we have just been groping around with descriptive statistics. It is time to start being real scientists, and look for the cause of the phenomenon. The most obvious difference between sound and silent films in this context is that the silent films have dialogue intertitles. The unexamined convention in film analysis is that a dialogue intertitle should be counted as a shot. But up to the editing stage, American silent films were shot with the actors speaking the lines in the script, without regard for where the dialogue intertitles would subsequently go. So perhaps the intertitles are distorting the "natural" lengths of the shots, and hence the shape of the shot length distributions of silent films. After all, the duration of a dialogue title is limited by the amount of text you can get onto one title card, and these titles were traditionally given a length that would enable the average person to read them through twice.

So the first simple thing to do is look at any silent films that have no dialogue titles. There are very few of these, with the much the best known being Der letzte Mann (1925). This has a Median/ASL ratio of 0.63, just like the average sound film. This is encouraging, but it could be a lucky fluke. How to bring more silent films into the enquiry? The fairly obvious thing to do is to take the dialogue titles out of a silent film, and then see how it measures up.

So here is the shot length distribution for It (1927), counting the dialogue titles as shots in the usual way, and with the theoretical Lognormal distribution corresponding to the shape factor and median shot length determined from the actual distribution of shot lengths imposed on the histogram as well.

chart_1_9_2010
Number of shots with lengths (in seconds) falling within the given ranges

I next removed the dialogue titles in a non-linear editing program to create a new version of the film without them. In American silent films of the 'twenties, most dialogue titles occur between shots of the scene taken from different camera positions, so removing them does not disturb the length of the remaining shots. But in the case where the dialogue title has been cut into the middle of a continuous take, which happens to a lesser extent, I ignore the new cuts resulting when I measure the length of the shot in my new version. The shot length distribution of the reduced film is shown below.

chart_2_9_2010
Number of shots with lengths (in seconds) falling within the given ranges

As you can see, the actual shot length distribution is now a somewhat better fit with the theoretical Lognormal, and indeed the correlation coefficient for the fit is now 0.972, whereas for original film it is 0.957. More to our concerns, the Median/ASL ratio, which was 0.76, is now reduced to 0.68, and the distinction between the shapes of the shot length distributions for sound and silent films is vanishing is vanishing in this case. Obviously, more examples are needed to verify this effect, but my explanation for the phenomenon looks promising.

Incidentally, all the above has been achieved using ordinary, parametric statistics. Non-parametric statistics are not needed in dealing with shot length distributions, and indeed are useless for investigating the important close correspondence between most of the observed shot length distributions and the theoretical Lognormal distribution. Talk about "outliers" in shot length distributions is completely misleading, as the small scatter of shots with long lengths forming the extended right tail of the distribution are part of what MAKES it a Lognormal distribution. To intentionally throw away this knowledge by insisting that only non-parametric statistics be used is utterly against the spirit of scientific investigation.

Replied by:Yuri Tsivian Date:2010-09-09

This sounds like a promising road to explore. To make this and similar studies easier, Gunars has just added two more functions to the existing mesurement results:

a) MSL/ASL ratio and b) CV (coefficient of variation, StDev/ASL) both of which now appear above the diagram in both the simple and advanced modes.

Here is a quick idea that might help check and finetune the hypothesis Barry Salt proposed in the above reply posted on 2010-09-05. It so happens that some of Cinemetrics clients, specifically, Torey Liepa and Charles O'Brien, have been submitting data with dialog titles already measured as a separate group. It suffices to sort the database by "Submitted by" to spot a number of American silent films with "dialog titles" marked. The next step is to target a title, extract "raw data" from it and copy/paste those to an Excel sheet. What remains is to filter out "dialog" data, obtain average and median for the remaining ones, and divide the latter by the former. This will give us the MED/ASL ratio from which Barry knows how to work our the lognormal distribution.

Here are 3 samples.

Birth of a Nation, The (187 min. version): (7) ASL 7 gives us MSL/ASL ration with dialog titles counted as shots = 0.69, and MSL/ASL - with dialog titles cut out = 0.69. This lack of difference may be due the relatively scarce number of dialogue titles (30) against the array of other shots (around 15 hundred) and, perhaps, Griffith's proverbially verbose titling style.

Show, The: (7) ASL 4.7 shows MSL/ASL ratio with dialog titles counted as shots = 0.79, and MSL/ASL - with dialog titles cut out = 0.75 which, in tendency, seems to conform with Salt's theory.

Circus, The (2nd try): (7) ASL 5:  MSL/ASL ratio with dialog titles counted as shots = 0.64, and MSL/ASL - with dialog titles cut out = 0.65. This example is interesting because, distinct from the former two, the ASL of the dialog titles (2.5) here is lower than the ASL of the "action" shots (5.3).

I am sure Barry will make more of these numbers than I can, so back to you.

Replied by:Barry Salt Date:2010-09-10

Hmm. Obviously my idea needs more examples. I did check another film, Seventh Heaven, for change in Median/ASL ratio after removing titles, and got a reduction from 0.75 to 0.71. I think your title removal is probably not quite the same as mine. You will note in my submission above that when the title occurs in the middle of the shot, I join the resulting two halves together to make one shot. You can do this working with a copy in an NLE, but working with the Cinemetrics record, the two halves of the shot become two shots. Since, as I say above, such shots are usually a minority, this will probably not make that much difference, but again this needs to be checked. With any luck I will do Little Annie Rooney before long.

Replied by:Barry Salt Date:2010-11-10

Some more examples of the effect of dialogue intertitles on the shot length distributions of silent films are now available. Here are the median/ASL ratios (which determine the shape factor for Lognormal distributions) for three American silent films. They are given for the original form of the film, with dialogue titles treated as shots, and for the film modified by omitting the dialogue titles.

 

MOST IMPORTANT: When this has been done, if the dialogue title (or titles) are cut into the middle of what was obviously one continuous take, then the parts of this take are joined together again to make one continuous shot. This does NOT give the same result as leaving these fragments as separate shots in the analysis.

 

Film

Median/ASL (Dialogue titles counted as shots)

Median/ASL (Dialogue titles removed)

It (1927)

0.76

0.68

Seventh Heaven (1927)

0.75

0.71

Little Annie Rooney (1925)

0.67

0.63

 

Now for American silent films made between 1920 and 1928 in the Cinemetrics database, the ASLs cover the range from 3.3 seconds to 7.6 seconds, and the mean value of the Median/ASL ratio is 0.72 for this group. So what we need for a comparable group of sound films are those that cover the same range of ASLs. When this comparable group has been selected, they are found to have a mean Median/ASL ratio of 0.66. The three silent films I have analysed above are indeed clustered around this lower figure after their dialogue titles are surgically removed, and sutures rejoin the parted shots.

To actually see the difference between the distributions with, and without, dialogue titles, I add the distribution graphs for Seventh Heaven and Little Annie Rooney to that for It shown in my piece a couple of entries back on this thread.

Here is the shot length distribution for Seventh Heaven (1927), counting the dialogue titles as shots in the usual way, and with the theoretical Lognormal distribution corresponding to the shape factor and median shot length determined from the actual distribution of shot lengths imposed on the histogram as well.

Then we have the shot length distribution with the dialogue titles cut out, as previously described.

 

The correlation coefficient, which indicates the goodness of fit of the actual distribution to the theoretical Lognormal distribution using the median and shape factor derived from the actual values goes from r=0.987 to r=0.981 on removal of the titles. Both these values imply a fairly good fit to the Lognormal distribution for this data, though a whisker better for original distribution.

 

The similar results found for Little Annie Rooney are as follows:

 

In this case the correlation coefficients for the relation between the actual and theoretical distributions are 0.936 when including the dialogue titles, going to 0.982 when they are removed. This indicates an appreciably better fit to the Lognormal distribution when the dialogue titles are cut out in this particular case.

 

If we look at the distribution of the lengths of dialogue titles on their own for Seventh Heaven, we get the following graph. (The results are similar for Little Annie Rooney and It.)

You can see immediately that this distribution is a quite different shape to that of a Lognormal distribution, as it lacks the extended right tail, and also that it has broader shoulders.

 

Removing these dialogue titles has the appreciable effect on the original distributions that we have observed because in the typical American silent film they make up an substantial proportion of the number of shots in the film. The proportion of dialogue titles in an American silent film is usually around 15% of the shots in the film, though this is not true for slapstick comedy, which uses much less titling than ordinary dramas and comedies -- in fact about half as much.

This last point means that the shape of the shot distributions for slapstick comedies will be less different from that of sound films with the same ASL than are those of ordinary silent films from equivalent sound films. This has been observed by Nick Redfern recently, in his analysis of the silent and sound films of Laurel and Hardy, which he refers to on his "Laurel and Hardy data" thread of the Cinemetrics discussion board.

 

Incidentally, this investigation represents the first appearance of a sub-discipline that might be called "experimental film history", as a kind of analogue to the "experimental archaeology" that has come to the fore in archaeology recent times.

Barry Salt, 2010