[From:   Studying   Contemporary   American   Film:   A   Guide   to   Movie   Analysis,   by   Thomas   Elsaesser   and   Warren   Buckland,   pp.   101-16;   bar   charts   omitted] 
3.4.   Statistical   Style   Analysis:   Theory 
The   statistical   style   analysis   of   motion   pictures   is   primarily   a   systematic   version   of   mise   en   scène   criticism   –   or,   more   accurately,   mise   en   shot   criticism.   We   have   already   seen   that   Eisenstein   invented   the   term   mise   en   shot   to   focus   attention   on   the   way   shots   are   staged   –   that   is,   the   way   the   parameters   of   the   shot   translate   the   actions   and   events   into   film.   The   advantage   of   statistical   style   analysis   over   mise   en   scène/shot   criticism   is   that   it   offers   a   more   detached,   systematic,   and   explicit   mode   of   analysis.   Statistical   style   analysis   characterizes   style   in   a   numerical,   systematic   manner   –   that   is,   it   analyzes   style   by   measuring   and   quantifying   it.   At   its   simplest,   the   process   of   measuring   involves   counting   elements,   or   variables,   that   reflect   a   film’s   style,   and   then   performing   statistical   tests   on   those   variables. 
More   specifically,   there   are   three   standard   aims   of   statistical   style   analysis:   (1)   to   offer   a   quantitative   analysis   of   style,   usually   for   the   purpose   of   recognizing   patterns,   a   task   now   made   feasible   with   the   use   of   computer   technology.   In   language   texts,   the   quantitative   analysis   of   style   and   pattern   recognition   is   usually   conducted   in   the   numerical   analysis   of   the   following   variables:   word   length,   or   syllables   per   word,   sentence   length,   the   distribution   
The   first   aim,   the   quantitative   analysis   of   style,   involves   descriptive   statistics,   and   the   second   and   third   (authorship   attribution   and   chronology)   involve   both   descriptive   and   inferential   statistics.   As   its   name   implies,   descriptive   statistics   simply   describes   a   text   as   it   is,   by   measuring   and   quantifying   it   in   terms   of   its   numerical   characteristics.   The   result   is   a   detailed,   internal,   molecular   description   of   a   text’s   (or   group   of   texts’)   formal   variables.   Inferential   statistics 
3.4.1.   The   quantitative   analysis   of   style 
One   of   the   few   film   scholars   to   apply   statistical   style   analysis   to   film   is   Barry   Salt.   In   his   essay   ‘Statistical   Style   Analysis   of   Motion   Pictures’   (Salt   1974),   and   later   in   his   book   Film   Style   and   Technology   (Salt   1992),   Salt   describes   the   individual   style   of   directors   by   systematically   collecting   data   on   the   formal   parameters   of   their   films.   Salt   then   represents   the   quantity   and   frequency   of   these   formal   parameters   in   bar   graphs, 
3.4.2.   Authorship   attribution 
Authorship   attribution   is   a   long-standing,   traditional   subject   in   New   Testament   scholarship,   study   of   the   Classics,   literary   scholarship   as   well   as   in   the   legal   context   (for   inferring   whether   the   defendant   wrote   his   or   her   confession,   or   whether   it   was   ‘co-authored’   with   the   police,   for   example).   Statistical   style   analysis   has   contributed   its   computerised   statistical   methods   to   these   areas   with   controversial   results. 
One   of   the   principles   behind   authorship   attribution   of   written   texts   is   that   the   stylometrist   should   not   focus   on   a   few   unusual   stylistic   traits   of   a   text,   but   on   the   frequency   of   common   words   an   author   uses   –   particularly   minor   or   function   words,   whose   use   are   independent   on   the   subject   matter   or   context.   These   include   words   such   as   prepositions   (of,   to,   in)   as   well   as   synonymous   function   words   such   as   kind   
At   first   it   may   seem   odd   to   distinguish   writing   style   by   analyzing   an   author’s   consistent   use   of   frequent   function   words,   which   he   or   she   is   not   conscious   of   using.   But   as   A.Q.   Morton   argues,   these   words   offer   the   stylometrist   a   common   point   of   comparison   between   authors:   ‘A   test   of   authorship   is   some   habit   which   is   shared   by   all   writers   and   is   used   by   each   at   a   personal   rate,   enabling   his   work 
A   writer’s   style   can   therefore   be   measured   in   terms   of   a   constant   use   of   language   features,   or   a   combination   of   features.   Just   one   example,   on   Raymond   Chandler: 
Chandler   ’s   style,   like   that   of   any   author,   consists   of   the   conjunction   of   its   constituent   elements   …   .   Much   of   the   action   and   color   in   Chandler’s   stories   is   conveyed   by   dialogue,   which   comprises,   on   average,   44%   of   all   the   words   in   a   story;   for   every   thousand   words   of   text,   there   are,   on   average,   approximately   30   verbal   exchanges,   which   last   approximately   15   words   apiece.   For   every   thousand   words   of   text,   Chandler’s   
This   information   identifies   Chandler’s   style   –   at   least   from   a   quantitative   perspective,   and   can   be   used   as   the   norm   by   which   to   attribute   an   anonymous   story   to   Chandler. 
If   we   think   of   the   descriptive   possibilities   of   stylometric   authorship   studies   for   film   analysis,   we   note   that,   as   with   mise   en   scène   criticism,   statistics   can   be   used   to   make   auteur   criticism   more   rigorous   –   that   is,   detached,   systematic,   and   explicit.   The   auteur   critic   should   then   focus   on   the   frequency   of   the   common   stylistic   parameters   a   director   uses   –   whose   use   are   independent   on   the   subject   matter   or   context   –   rather 
The   inferential   dimension   of   authorship   attribution   has   a   more   limited   application   to   film,   but   some   films   such   as   Poltergeist   have   disputed   authorship   (was   it   directed   by   Tobe   Hooper   or   Steven   Spielberg?).   By   systematically   analyzing   the   parameters   of   the   shots   in   Poltergeist,   and   then   comparing   the   results   to   samples   from   Hooper’s   and   Spielberg’s   other   films,   it   may   be   possible   to   identify   the   film’s   authorship   (defined   in   terms   of   mise   en   
On   a   cautionary   note,   the   variables   chosen   to   determine   a   director’s   style   need   to   be   valid   (Salt   has   covered   this   problem   by   collecting   data   on   the   variables   under   a   director’s   control).   Secondly,   the   results   need   to   be   statistically   significant,   rather   than   due   to   chance   occurrence.   Many   statistical   tests   are   in   fact   tests   for   significance. 
3.4.3.   Chronology 
The   third   area   of   statistical   style   analysis   is   chronology.   Here   again   the   statistics   used   can   be   either   descriptive   or   inferential.   A   description   quantifies   and   measures   the   changes   in   a   body   of   work,   usually   of   a   single   author.   The   point   here   is   that   an   author’s   work   changes   in   a   predictable   manner.   An   inferential   study   uses   these   descriptions   of   change   to   place   an   author’s   work   into   chronological   order   where   that   chronology   is 
In   film,   chronology   studies   can   be   used   descriptively   to   identify   a   change   in   style   across   a   director’s   work.   The   most   obvious   example   is   charting   the   change   of   any   shot   parameter   across   a   director’s   career,   such   as   average   shot   length,   distribution   of   shot   scales,   use   of   camera   movement,   and   so   on. 
3.5.   Statistical   Style   Analysis:   Method 
In   his   Film   Quarterly   essay   ‘Statistical   Style   Analysis   of   Motion   Pictures’   (Salt   1974),   Barry   Salt   aimed   to   identify   the   individual   style   of   a   director   by   systematically   collecting   data   on   the   formal   parameters   of   films,   particularly   those   formal   parameters   that   are   most   directly   under   the   director’s   control,   including: 
duration   of   the   shot   (including   the   calculation   of   average   shot   length,   or   ASL) 
shot   scale 
camera   movement 
angle   of   shot 
strength   of   the   cut   (measured   in   terms   of   the   spatio-temporal   displacement   from   one   shot   to   the   next). 
Salt   collected   data   from   these   parameters   by   laboriously   going   through   the   film   shot   by   shot.   For   most   of   his   analyses,   he   in   fact   collected   data   on   all   the   shots   that   appear   in   the   first   30   minutes   of   each   film,   because   this   is   a   representative   sample   from   the   film.   We   shall   employ   (and   test   the   viability   of)   this   practice   in   our   statistical   style   analysis   of   The   English   Patient   in   section   3.6.   Salt 
After   analyzing   a   sample   of   films   from   four   directors,   Salt   finds   that   both   shot   scale   and   ASL   are   significant   and   defining   characteristics   of   a   director’s   style.   (Calculating   the   ASL   involves   dividing   the   duration   of   the   film   by   the   number   of   shots.)   However,   the   distribution   of   shot   scale   is   similar   for   the   four   directors   he   analyses. 
In   a   statistical   style   analysis   of   Max   Ophuls’   films   (Salt   1992,   Chapter   22),   Salt   uses   a   standard   stylometric   tests   to   analyze   the   distribution   of   stylistic   parameters   in   each   film.   Firstly,   the   histograms,   or   bar   charts,   representing   the   number   of   each   shot   type   in   each   film   (the   number   of   close-ups,   long   shots,   etc.).   Secondly,   he   takes   equal   lengths   of   film,   calculates   the   expected   number   of   shots   and   shot   types   in   each 
1.   Salt   recommends   intervals   of   one   minute   (i.e.   100ft   intervals   on   35mm   film); 
2.   If   calculating   shot   types   one   can   define   the   intervals   in   terms   of   no.   of   shots   (e.g.   50)   and   calculate   the   expected   no.   of   shot   types,   and   the   actual   no.   of   shot   types; 
3.   Take   the   ASL   of   the   whole   film,   and   then   analyze   it   scene   by   scene   (each   scene   is   defined   in   terms   of   spatio-temporal   unity   and   in   terms   of   events).   Work   out   the   expected   no.   of   shots   and   shot   types   for   each   scene,   and   count   the   actual   no.   of   shots.   If   the   ASL   is   10   seconds,   and   the   scene   lasts   2   minutes,   the   expected   number   of   shots   for   that   scene   is   12. 
In   his   analysis   of   Letter   From   an   Unknown   Woman,   Salt   notes   the   following: 
For   instance,   in   scene   1   five   shots   would   be   expected   if   the   cutting   were   even   throughout   every   part   of   the   film,   but   in   fact   there   are   only   three   shots.   Contrariwise,   in   scene   no.   5,   while   only   seven   shots   would   be   expected,   there   are   actually   fourteen.   (Salt   1992:   309) 
This   type   of   analysis   can   also   be   applied   to   the   expected   no.   of   shot   types   in   each   scene   and   the   actual   no.   of   shot   types.   Salt’s   analysis   of   Ophuls’   film   Caught   shows   how   this   information   can   be   useful   in   analyzing   a   film’s   style: 
Caught   is   the   first   Max   Ophuls   film   in   which   there   is   a   very   definite   reduction   in   the   amount   of   variation   in   Scale   of   Shot   and   cutting   rate   from   scene   to   scene,   and   this   becomes   very   apparent   if   a   breakdown   into   100ft   sections   is   made   on   a   35mm.   print.   After   the   point   in   the   film   at   which   Leonora   has   married   Smith-Ohlrig   and   been   left   alone   in   his   mansion,   we   have   for   the   
Salt   is   able   to   determine,   not   only   how   the   shot   lengths   and   scales   are   distributed   across   the   whole   film,   but   also   how   this   film   compares   to   Ophuls’   other   films   (‘Caught   is   the   first   Max   Ophuls   film   in   which   there   is   a   very   definite   reduction   in   the   amount   of   variation   in   Scale   of   Shot   and   cutting   rate   from   scene   to   scene’).   Salt   develops   this   historical   analysis   by   considering   Ophuls’   later   films,   and 
For   example,   in   La   Ronde,   with   the   scene   between   the   Young   man   and   The   Chambermaid   we   get,   after   the   first   11   shots,   long   strings   of   up   to   10   shots   each   with   the   same   camera   distance   in   every   shot.   Most   of   these   are   also   in   the   Medium   or   medium   Long   Shot   scale,   and   the   film   continues   in   the   same   manner   after   this   scene.   At   one   point   there   is   a   string   of   15   consecutive 
In   summary,   statistical   style   analysis   is   a   very   precise   and   accurate   tool   for   determining   both   the   stability   and   the   change   in   style   that   takes   place   across   a   filmmaker’s   career.   Statistical   style   analysis   focuses   the   research   on   how   films   are   put   together,   rather   than   how   they   are   perceived   or   comprehended. 
Barry   Salt   carried   out   his   statistical   analysis   by   hand,   which   limited   the   types   of   tests   he   could   perform   on   the   data   he   collected.   With   the   exponential   growth   in   computer   technology   and   software   over   the   last   decade,   statistical   style   analysis   can   now   be   carried   out   using   computer   technology   and   powerful   software   programs.   In   the   following   analysis   of   The   English   Patient,   data   was   still   collected   by   hand,   but   it   was   then   entered 
The   following   analysis   of   The   English   Patient   will   consist   of   both   the   visual   and   numerical   representation   of   data   (particularly   bar   graphs,   and   frequency   and   percent   tables).   Then   a   few   simple   statistical   tests   will   be   applied:   measure   of   the   mean   or   average   shot   length;   measure   of   the   standard   deviation   of   shot   length;   and   the   skewness   of   the   values   for   shot   length   and   shot   scale.   (The   results   will   also   be   compared   to   
These   tests   properly   apply   only   to   ratio   data   (where   zero   is   an   absolute   value   –   zero   weight,   zero   time,   etc.).   Only   shot   length   is,   strictly   speaking,   ratio   data.   In   the   shot   scale,   numbers   have   been   assigned   to   the   categories,   which   means   they   constitute   a   nominal   scale   (e.g.,   Very   Long   Shot   is   7,   but   there   is   not   reason   why   it   couldn’t   be   1).   However,   by   using   the   nominal   scale   consistently   (1   = 
Other   stylistic   issues   that   can   be   raised   (but   won’t   be   for   this   exercise)   is   to   enter   the   number   of   scenes   in   the   SPSS   program,   and   then   calculate   the   average   number   of   shots   per   scene,   and   therefore   calculate   the   expected   number   of   shots   per   scene,   and   the   actual   number.   Other   useful   data   can   be   collected   on:   positional   reference   (for   example,   what   position   do   close   ups   typically   take   in   a   film?   –   the 
3.6   Statistical   Style   Analysis:   The   English   Patient 
Data   was   recorded   from   the   following   five   parameters   of   the   shot   over   the   first   30   minutes   of   The   English   Patient:   shot   length,   shot   scale,   camera   movement,   camera   direction,   and   camera   angle.   For   comparative   purposes,   the   same   data   were   recorded   from   the   first   30   minutes   of   Jurassic   Park.   Barry   Salt   has   already   argued   that   30   minutes   is   a   representative   sample   to   analyze.   To   test   this   hypothesis,   we   shall   compare   the   results 
The   statistical   tests   applied   in   this   section   to   the   collected   data   are   the   simplest   ones   available   on   SPSS:   calculating   the   frequency   of   variables   (that   is,   counting   them),   representing   those   frequencies   as   percentages,   calculating   the   mean,   the   standard   deviation,   and   the   skewness   of   the   results. 
The   first   30   minutes   of   The   English   Patient   (up   to   the   moment   where   Caravaggio   introduces   himself   to   Hana,   and   they   go   into   the   kitchen   of   the   monastery)   consists   of   356   shots.   In   terms   of   shot   length,   the   main   values   are   to   be   found   in   Table   1. 
The   first   column   indicates   shot   length   values   (1   second,   2   seconds,   and   so   on);   the   second   column   the   number   of   times   this   shot   length   appears   in   the   first   30   minutes   of   The   English   Patient   (1   second   shots   appear   41   times,   2   second   shots   84   times);   and   the   third   column   indicates   the   percentage   of   shots   with   each   value   (1   second   shots   constitute   11.5   %   of   all   the   shots   in   the   sample,   while 
Table   1   only   represents   shots   of   length   1   to   10   seconds.   There   are   additional   values,   up   to   129   seconds   (the   opening   credit   sequence   shot),   but   the   frequency   of   shot   lengths   above   10   seconds   is   usually   very   small   –   one   or   two   examples.   Shots   of   length   1   to   10   seconds   constitute   92%   of   all   the   shots   in   the   sample. 
Table   2   shows   that   the   mean   (the   average)   value   of   shot   length   of   this   sample   is   5.1.   In   other   words,   the   average   shot   length   (ASL)   of   the   film   is   5   seconds   (there   is,   on   average,   a   cut   every   five   seconds).   The   standard   deviation   of   shot   length   is   8,   indicating   a   wide   dispersion   of   values   around   the   mean,   while   the   skewness   of   values   is   10.97,   indicating   a   very   strong   postive   skewedness   of 
Loading...
The   value   of   this   information   may   not   be   readily   apparent.   One   of   the   best   ways   to   make   sense   of   it   is   to   conduct   a   comparative   analysis.   The   first   30   minutes   of   Jurassic   Park   (up   to   the   end   of   the   scene   where   Grant,   Sattler,   Malcolm,   and   Gennaro   see   a   dinosaur   egg   hatch   in   the   lab)   consists   of   252   shots,   in   comparison   to   The   English   Patient’s   356,   a   difference   of   104   shots.   This   
We   can   make   many   other   comparisons.   Jurassic   Park’s   values   for   shot   length   can   be   found   in   Tables   3   and   4.   The   shot   lengths   in   the   range   1   to   10   seconds   only   constitute   80%   of   all   the   shots   in   the   sample,   suggesting   that   Spielberg’s   film   has   a   wider   variety   of   shot   lengths.   This   is   reflected   in   a   skewness   value   of   2.68   (the   mean   value   is   7   seconds   and   standard   deviation   is   6.69).   
Loading...
We   can   explore   this   difference   in   shot   length   values   further.   In   The   English   Patient,   52%   of   the   shots   fall   in   the   range   1   to   3   seconds.   In   Jurassic   Park,   only   35%   of   the   shots   fall   within   this   range.   We   have   to   include   the   values   up   to   5   seconds   before   Jurassic   Park   reaches   the   same   percentage   (in   fact   shots   falling   in   the   range   1   to   5   seconds   constitute   54%   of   the   film’s   
With   the   above   tests   we   are   simply   scratching   the   surface   of   what   can   be   achieved   with   statistical   style   analysis.   It   is   also   possible   to   apply   the   same   tests   to   the   results   obtained   from   the   other   four   parameters   of   the   shot.   But   because   this   would   make   the   chapter   even   longer   than   it   already   is,   we   shall   instead   consider   camera   movement   and   shot   scale.   With   the   data   collected   on   camera   movement,   we   can 
Loading...
The   still   camera   is   by   far   the   most   common   value   (85%   of   all   shots),   with   only   15%   of   the   shots   containing   camera   movement.   This   seems   to   confirm   John   Seale’s   claim   that   he   likes   to   keep   the   camera   still. 
In   comparison,   Jurassic   Park   contains   the   following   values   for   camera   movement: 
Loading...
These   results   may   surprise   some   readers,   especially   the   high   percentage   of   still   shots   in   an   action   blockbuster.   But   the   percentages   are   significantly   different   to   The   English   Patient,   since   Jurassic   Park   has   11%   more   moving   shots   than   The   English   Patient. 
Finally,   in   terms   of   shot   scale,   the   distribution   in   both   films   confirms   to   what   statisticians   call   a   ‘normal   distribution’,   with   high   values   in   the   middle   (the   mean)   and   progressively   lower   values   on   either   side   (see   Figure   3).   The   result   of   these   normal   distributions   is   that   the   standard   deviation   and   skewness   values   are   low.   Both   directors   favour   medium   close   ups   (28%   in   Jurassic   Park,   and   33%   in   The   English   Patient)   and 
Loading...
In   summary,   The   English   Patient   contains   a   short   range   of   shot   lengths   averaging   out   at   5   seconds,   heavily   biased   towards   shots   of   1-3   seconds,   with   a   very   high   percentage   of   still   shots.   Jurassic   Park   has   a   much   wider   distribution   of   shot   lengths,   which   average   out   at   7   seconds,   with   a   bias   (but   not   as   much   as   in   The   English   Patient)   towards   shots   below   this   value,   with   a   slightly   more   percentage   of 
One   final   task   needs   to   be   carried   out   to   check   the   viability   of   the   above   results   –   the   representative   nature   of   the   first   30   minutes   of   a   film.   Here   we   shall   simply   note   major   similarities   and   differences   between   a   statistical   style   analysis   of   the   first   30   minutes   of   Jurassic   Park,   and   an   analysis   of   the   whole   film.   (When   two   figures   are   quoted,   the   first   one   always   refers   to   the   30   minute 
The   information   that   the   SPSS   software   has   yielded   is   simply   the   raw   material   for   writing   about   the   style   of   The   English   Patient,   and   for   comparing   its   style   to   the   style   of   other   films.   The   above   analysis   only   presents   a   small   sample   of   data   and   even   fewer   tests   on   the   stylistic   patterns   to   be   found   in   the   film.   The   primary   difference   between   this   analysis   and   more   conventional   mise   en   scène   analysis   
References 
Farringdon,   Jill   (1996),   Analysing   for   Authorship:   A   Guide   to   the   Cusum   Technique   (Cardiff:   University   of   Wales   Press). 
Foster,   Don   (2001),   Author   Unknown:   On   the   Trail   of   Anonymous   (London:   Macmillan). 
Kenny,   Anthony   (1982),   The   Computation   of   Style   (Oxford:   Pergamon   Press). 
Salt,   Barry   (1974),   ‘The   Statistical   Style   Analysis   of   Motion   Pictures’   Film   Quarterly,   28,   1:   13-22. 
____   (1992),   Film   Style   and   Technology:   History   and   Analysis   (London:   Starword). 
Sigelman,   Lee,   and   William   Jacoby   (1996),   ‘The   Not-So-Simple   Art   of   Imitation:   Pastiche,   Literary   Style,   and   Raymond   Chandler’,   Computers   and   the   Humanities   30,   1:   11-28.